# NLP, RAG, and Agent Challenge

**Test Summary:**

- Overview

  - This three-part exercise evaluates your ability to integrate natural language processing, retrieval-augmented generation, and custom agent design. 
  - You’ll demonstrate end-to-end proficiency—from data ingestion and normalization, to building a RAG system, to orchestrating an intelligent “recruiting agent” that combines tools and resources.

  - This test is designed to assess your problem-solving approach, coding skills, and thought process. There are no right or wrong answers—just follow your instincts and provide your solutions.

**Important Guidelines:**

- Work Independently:

  - Do not rely on external generative AI assistants (e.g., ChatGPT, LLaMA) for code generation or design decisions.
  - Implement all solutions yourself, drawing on your own understanding and research.

- Maintain Data Privacy:

  - Keep all sample data and code within your local environment.
  - Do not share any files or information on public platforms (GitHub, Google Drive, etc.).

- Document Your Thought Process:

  - For each step, submit both your code and a concise write-up explaining your approach, design trade-offs, and any assumptions made.
  - Focus on clarity, modularity, and maintainability.

- Adhere to Best Practices:

  - Write clean, well-structured code with meaningful variable names and comments where needed.

## Step 1: **Load and Normalize the Resume Data**
- **Task**: Load the provided CSV file containing the resume data. Perform basic exploratory data analysis (EDA) to understand the structure and content of the data.
- **Normalization**: Use NLP techniques to clean and normalize the text data

### Solution

### Explain the rationale behind your solution

## Step 2: **Build a RAG-based Candidate Recommendation System**
 
- **Task**: Generate embeddings for the resumes using a pre-trained model (e.g., BERT, Sentence Transformers, or any other embedding model). 
- Build a RAG for returning relevant candidates.
  - Implement a search functionality where a user can input keywords, and the system retrieves the most relevant resumes based on the embeddings. Given a new resume or job description, return similar resumes from the dataset.
  - Build a recommend(query: str, k: int) → List[ResumeID] function that:
    - Encodes the query into the same embedding space.
    - Retrieves the top-k most similar resumes.
    - Returns the most relevant resumes.

### Solution

### Explain the rationale behind your solution

## Step 3: **Create a “Recruiting Agent” with the RAG Tool**
- **Task**: Create a recruiting agent using the implemented RAG for candidates
- Use an agent orchestration framework (LangChain, LangGraph, SmolAgents, or a custom loop) and define a tool named CandidateRecommender that calls your RAG-based recommend() under the hood.

- The conversational agent can take user questions like:
  - “Which candidates best match this role: [JD text]?”
  - “Who are top data-scientist candidates with AWS experience?”
  - “Summarize [Person name] background.”

- **Additional information**
  - Feel free to add more tools or resources you judge necessary to the agent.

### Solution

### Explain the rationale behind your solution