### Setup

In [7]:
import os
from dotenv import load_dotenv
load_dotenv()

os.environ["GROQ_API_KEY"] = os.getenv("GROQ_API_KEY")
os.environ["HF_TOKEN"] = os.getenv("HF_TOKEN")

### 1. Load Your Data

In [12]:
from langchain_community.document_loaders import PyPDFLoader
loader = PyPDFLoader("D:/Soham_Mistry.pdf")
docs = loader.load()
print(docs)

[Document(metadata={'producer': 'Microsoft® Word 2024', 'creator': 'Microsoft® Word 2024', 'creationdate': '2025-07-24T20:41:26+05:30', 'author': 'Soham', 'moddate': '2025-07-24T20:41:26+05:30', 'source': 'D:/Soham_Mistry.pdf', 'total_pages': 1, 'page': 0, 'page_label': '1'}, page_content='SOHAM MISTRY \nPune, Maharashtra   \nP: +91 89562 85029  \nsdmistry1001@gmail.com  \nGitHub - SohamMistry01   \nLinkedIn - soham-mistry \n \n \nPROFILE  \n \n \nEngineering mind with a machine learning core — \nturning ideas into AI-powered realities. Experienced \nin developing robust ML models, optimizing \nperformance, and integrating them into real-world \napplications across domains like healthcare, NLP, and \nautomation. Skilled in leveraging cutting-edge AI \nframeworks like LangChain and LangGraph to \narchitect intelligent, multi-agent systems. Open to \ndynamic opportunities in AI, ML, or data-driven \nengineering roles to create meaningful impact. \n \nEDUCATION  \n \n \nAISSMS Institute o

### 2. Split the Document into Chunks

In [13]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200
)
splits = text_splitter.split_documents(docs)
print(f"Split into {len(splits)} chunks.")

Split into 4 chunks.


### 3. Create Vector Embeddings

In [14]:
from langchain_huggingface import HuggingFaceEmbeddings
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

### 4. Store Embeddings in a Vector Store

In [15]:
from langchain_community.vectorstores import FAISS
vectorstore = FAISS.from_documents(documents=splits, embedding=embeddings)

### 5. Set Up the Retriever

In [16]:
retriever = vectorstore.as_retriever()

### 6. Create the RAG Chain

In [17]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
from langchain_groq import ChatGroq

template = """ 
    Answer the question based only on the following context:
    {context}
    Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

llm = ChatGroq(model_name="llama-3.1-8b-instant", temperature=0)

rag_chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

### 7. Ask a Question

In [18]:
question = "What is the main topic of the document?"
response = rag_chain.invoke(question)
print(response)

The main topic of the document appears to be the resume or CV of a person named Soham Mistry, highlighting his educational background, technical skills, certifications, work experience, and projects related to Artificial Intelligence (AI), Machine Learning (ML), and Data Science.


In [19]:
question2 = "What skills does this person have?"
response2 = rag_chain.invoke(question2)
print(response2)

Based on the provided context, the person, Soham Mistry, has the following skills:

**Technical Skills:**

1. Python
2. Data Science
3. Machine Learning
4. Deep Learning
5. Natural Language Processing (NLP)
6. Generative AI
7. Large Language Models (LLMs)
8. Reinforcement Agents (RAG)
9. AI Agents
10. Prompt Engineering
11. Agentic AI
12. Data Analysis
13. Data Visualization
14. SQL
15. Power BI
16. Tableau

**Frameworks:**

1. Scikit-learn
2. XGBoost
3. Tensorflow
4. LangChain
5. LangGraph
6. Flask
7. Django
8. FastAPI
9. Asyncio
10. Pydantic

**Non-Technical Skills:**

1. Leadership
2. Team Management
3. Event Planning
4. Event Management

**Languages:**

1. English
2. Hindi
3. Marathi
4. Sanskrit

Note that this list may not be exhaustive, as the provided context only includes a few documents that mention Soham's skills.
