<a href="https://colab.research.google.com/github/AksharaVaidee/2022305027/blob/main/wikichatbot%20and%20resume.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# === Install all dependencies and set up ===
!apt-get install -y libmagic1
!pip install google-generativeai pyresparser nltk spacy pdfplumber python-magic pdfminer.six
!python -m nltk.downloader punkt
!python -m nltk.downloader stopwords
!python -m spacy download en_core_web_sm

import pdfplumber
from pyresparser import ResumeParser
import google.generativeai as genai
from google.colab import files

# === Upload Resume ===
uploaded = files.upload()
file_path = list(uploaded.keys())[0]
print("Uploaded:", file_path)

# === Extract Text from PDF using pdfplumber ===
def extract_text_from_pdf(path):
    text = ""
    with pdfplumber.open(path) as pdf:
        for page in pdf.pages:
            text += page.extract_text() or ""
    return text

resume_text = extract_text_from_pdf(file_path)
print("\nText extracted from resume.")

# === Structured Parsing using pyresparser ===
try:
    parsed_data = ResumeParser(file_path).get_extracted_data()
    print("\nStructured Resume Data:")
    for key, value in parsed_data.items():
        print(f"{key.capitalize()}: {value}")
except Exception as e:
    print("\nResume parsing failed:", e)
    parsed_data = {}

# === Gemini Configuration and Resume Analysis ===
api_key = "AIzaSyAZue26UoSFeazZuqTfRbhPJv-lnPvEdk0"  # <-- Replace with your actual Gemini API Key
genai.configure(api_key=api_key)
model = genai.GenerativeModel("gemini-1.5-pro")

question = "List the candidate's key technical skills based on the resume."
response = model.generate_content(f"""
You are a professional HR assistant. Analyze the following resume text and answer the question.

Resume:
{resume_text}

Question:
{question}
""")

print("\nGemini Analysis Result:")
print(response.text)

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
libmagic1 is already the newest version (1:5.41-3ubuntu0.1).
0 upgraded, 0 newly installed, 0 to remove and 30 not upgraded.
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
Collecting en-core-web-sm==3.8.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.8.0/en_core_web_sm-3.8.0-py3-none-any.whl (12.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.8/12.8 MB[0m [31m58.9 MB/s[0m eta [36m0:00:00[0m
[?25h[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_sm')
[38;5;3m⚠ Restart to reload dependencies[0m
If you are in a Jupyter or Colab notebook, you may need to restart Python in
order to

Saving chandhini cv (5th sem).pdf to chandhini cv (5th sem) (2).pdf
Uploaded: chandhini cv (5th sem) (2).pdf

Text extracted from resume.

Resume parsing failed: [E053] Could not read config file from /usr/local/lib/python3.11/dist-packages/pyresparser/config.cfg

Gemini Analysis Result:
The candidate's key technical skills, as demonstrated on the resume, include:

* **Microbiology Techniques:** Quantification of microbes (sampling and serial dilution), bacterial counts, antibiotic sensitivity assays, staining techniques (Gram's, simple, differential), and biochemical tests for bacterial identification.
* **Cell Biology Techniques:** Staining (Hematoxylin and Eosin, Leishman, Giemsa), cell/tissue lysis and fractionation (homogenization, osmosis, differential centrifugation), and cell viability studies (Trypan blue, tetrazolium salts).
* **Biochemistry Techniques:** Protein estimation (Biuret, Bradford, Lowry methods), enzymatic assays (phosphatase, glucose estimation via GOD-POD), and 

In [None]:
# === Install Required Packages ===
!pip install -q langchain langchain-community wikipedia ctransformers faiss-cpu sentence-transformers

# === Imports ===
from langchain_community.llms import CTransformers
from langchain.chains import RetrievalQA
from langchain_community.utilities.wikipedia import WikipediaAPIWrapper
from langchain.vectorstores import FAISS
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.docstore.document import Document

# === Step 1: Load LLM (Open Source, Local) ===
llm = CTransformers(
    model="TheBloke/Mistral-7B-Instruct-v0.1-GGUF",
    model_file="mistral-7b-instruct-v0.1.Q4_K_M.gguf",
    model_type="mistral",
    config={"max_new_tokens": 512, "temperature": 0.5}
)

# === Step 2: Wikipedia Search and Document Creation ===
def fetch_wikipedia_docs(query):
    wiki = WikipediaAPIWrapper(top_k_results=2, doc_content_chars_max=3000)
    content = wiki.run(query)
    return [Document(page_content=content, metadata={"source": "Wikipedia"})]

# === Step 3: Chunking and Embedding ===
def create_retriever(docs):
    splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
    chunks = splitter.split_documents(docs)
    embeddings = HuggingFaceEmbeddings()
    vectordb = FAISS.from_documents(chunks, embeddings)
    return vectordb.as_retriever()

# === Step 4: Ask Question to Wikipedia Chatbot ===
def ask_bot(question):
    docs = fetch_wikipedia_docs(question)
    retriever = create_retriever(docs)
    qa = RetrievalQA.from_chain_type(llm=llm, retriever=retriever)
    answer = qa.invoke(question)
    return answer

# === Interactive Loop to Ask Questions ===
def start_chatbot():
    print("Welcome to the Wikipedia Chatbot! Ask anything about a topic.")
    print("Type 'exit' to end the chat.")

    while True:
        # Get user input (question)
        question = input("\nAsk a question: ")

        if question.lower() == 'exit':
            print("Goodbye! Exiting the chatbot.")
            break

        # Ask the bot and get the answer
        answer = ask_bot(question)

        # Display the bot's answer
        print(f"\n[Bot Answer]: {answer}")

# === Start the chatbot ===
start_chatbot()


Fetching 1 files:   0%|          | 0/1 [00:00<?, ?it/s]

Fetching 1 files:   0%|          | 0/1 [00:00<?, ?it/s]

Welcome to the Wikipedia Chatbot! Ask anything about a topic.
Type 'exit' to end the chat.

Ask a question: tell me about tuberculosis


  embeddings = HuggingFaceEmbeddings()



[Bot Answer]: {'query': 'tell me about tuberculosis', 'result': " Tuberculosis is a contagious disease usually caused by Mycobacterium tuberculosis (MTB) bacteria. It affects the lungs but can also affect other parts of the body. Most infections show no symptoms, known as inactive or latent tuberculosis. A small proportion of latent infections progress to active disease that, if left untreated, can be fatal. Tuberculosis has been present in humans since ancient times and was responsible for an estimated quarter of all deaths in 1/estimated quarter of an estimated quarter of the majority of deaths in Europe'an estimated quarter of all deaths in estimated quarter of quarter of deaths in estimated quarter of deaths in 25"}
