<a href="https://colab.research.google.com/github/Gowthamabinav-VP/SDC_GENAI/blob/main/Resume_screening_asst.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
# ✅ Install dependencies
!pip install faiss-cpu sentence-transformers PyPDF2

# ✅ Imports
import os
import PyPDF2
import faiss
import numpy as np
from sentence_transformers import SentenceTransformer
from google.colab import files

# ✅ Load pre-trained sentence transformer
model = SentenceTransformer('all-MiniLM-L6-v2')

# ✅ Upload resumes
print("📁 Upload resumes (PDF or TXT)")
uploaded_files = files.upload()

docs = []
filenames = []

# ✅ Extract text from uploaded files
for fname in uploaded_files:
    text = ""
    if fname.endswith('.pdf'):
        with open(fname, 'rb') as f:
            reader = PyPDF2.PdfReader(f)
            for page in reader.pages:
                text += page.extract_text()
    elif fname.endswith('.txt'):
        with open(fname, 'r', encoding='utf-8') as f:
            text = f.read()
    docs.append(text)
    filenames.append(fname)

# ✅ Create embeddings
print("🔍 Creating embeddings...")
embeddings = model.encode(docs, convert_to_tensor=False)
embeddings_np = np.array(embeddings).astype("float32")

# ✅ Build FAISS index
index = faiss.IndexFlatL2(embeddings_np.shape[1])
index.add(embeddings_np)

# ✅ Input query
query = input("📝 Enter your question (e.g., 'Who has experience in Python and AWS?'):\n")

# ✅ Encode query and search
query_embedding = model.encode([query]).astype("float32")
D, I = index.search(query_embedding, k=3)

# ✅ Show matching resumes
print("\n✅ Top Matching Resumes:")
for idx in I[0]:
    print(f"\n📄 {filenames[idx]}")
    print("-" * 40)
    print(docs[idx][:1000])  # preview first 1000 characters
    print("...")


Collecting faiss-cpu
  Downloading faiss_cpu-1.11.0-cp311-cp311-manylinux_2_28_x86_64.whl.metadata (4.8 kB)
Collecting PyPDF2
  Downloading pypdf2-3.0.1-py3-none-any.whl.metadata (6.8 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch>=1.11.0->sentence-tran

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.5k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

📁 Upload resumes (PDF or TXT)


Saving John_Doe_Resume.pdf to John_Doe_Resume.pdf
🔍 Creating embeddings...
📝 Enter your question (e.g., 'Who has experience in Python and AWS?'):
Who has experience in Python and AWS?

✅ Top Matching Resumes:

📄 John_Doe_Resume.pdf
----------------------------------------
John Doe - Cloud Engineer
Contact Information:
Email: john.doe@example.com
Phone: (123) 456-7890
Location: New York, NY
Summary:
Experienced Cloud Engineer with over 3 years of experience in building and maintaining scalable
cloud infrastructure.
Proficient in Python and AWS services including EC2, S3, Lambda, and CloudFormation.
Experienced in using Docker, Jenkins, and Terraform.
Skills:
- Programming: Python, Bash
- Cloud: AWS (EC2, S3, Lambda, CloudFormation)
- DevOps: Docker, Jenkins, Terraform, Git
- Monitoring: CloudWatch, Datadog
Experience:
Cloud Engineer, TechCorp Inc. - Jan 2021 to Present
- Designed and deployed serverless applications using AWS Lambda and API Gateway.
- Automated infrastructure using Terr