<div style="overflow: hidden;">
  <h1 style="float: left;">Envint Assessment</h1>
  <img src="/Users/dhruvyadav/Desktop/Assesments/Envint-Assignment/image.png" alt="image.png" height="50" width="50" style="float: right;" />
</div>

## **Introduction**

This project implements a web application that uses a Large Language Model (LLM) to extract key information from PDFs. It enables users to manage multiple projects, each with its own set of PDFs, and ensures that questions are answered solely based on the PDFs uploaded for the active project. The application supports creating, listing, and deleting projects, switching between projects without re-uploading PDFs, and providing accurate, context-aware responses. Built with Streamlit, FAISS, and Groq AI, it is optimized with caching mechanisms for performance and features a simple interface for usability.<br><br>
The comeplete code resides in `app.py`. The code has seperate sections for each component.

---

## **Libraries used**
- **Streamlit**: For UI
- **Langchain**: For eveloping the application
- **Groq**: For model inferencing (It provides lightening fast model inferencing)
- **OpenAI**: For generating embeddings <i>(I could've used opensource embeddings from hugging face but I don't have enough storage in my laptop, also OpenAI embeddings are the safest bet)</i>
- **OS**: For managing project directories 
- **FAISS**: Vectorstore

---

## **Core Features**

1. **Environment Setup**
   - Loads API keys for GROQ (`GROQ_API_KEY`) and OpenAI (`OPENAI_API_KEY`) from environment variables.
   - Along with this, llm is also initialized here as well.

2. **Project Management**
   - Allows creation, deletion, and selection of projects.
   - Projects have isolated directories (`projects/<project_name>`) to store PDFs and vector stores.
   - Session state (`st.session_state`) maintains project-specific data.

3. **PDF Handling**
   - Upload PDFs into the selected project's directory.
   - Process and load PDFs using `PyPDFDirectoryLoader`.
   - Split documents into manageable chunks with `RecursiveCharacterTextSplitter`.

4. **Vector Store**
   - Uses `FAISS` for vector storage and retrieval, backed by OpenAI embeddings.
   - Supports reloading previously saved vector stores for faster access.
   - Embeds split documents into vector space for retrieval.

5. **Retrieval Augmented Generation (RAG)**
   - Combines FAISS-based retrieval with a chat-based LLM model (`ChatGroq`) for answering user questions.
   - Uses a custom `ChatPromptTemplate` to ensure responses are contextually accurate.
   - Creates a `retrieval_chain` for answering user questions.

6. **Streamlit UI**
   - Sidebar for project management (create, delete, and switch projects).
   - Main area for uploading PDFs, initializing the vector store, and asking questions.
   - Dynamic updates to session state for seamless interactivity.


---

## **Structure**

1. **Environment Initialization**
- **LLM Setup**: Initializes the Groq AI-based Large Language Model (LLM) using the `ChatGroq` API.
- **Directory Structure**: Creates a `projects` directory to store project-specific PDFs and vector stores.
- **Session State**: Utilizes `st.session_state` to manage project-specific data and ensure seamless transitions between projects.



2. **Project Management**
- **Project Creation**: Allows users to create new projects, each with its own isolated directory (`projects/<project_name>`).
- **Project Deletion**: Enables deletion of projects, including all associated files and cached states.
- **Project Switching**: Supports dynamic switching between projects without requiring re-upload of PDFs or reinitialization of vector stores.

Key functions:
- `create_project`: Initializes a new project with the required directory structure.
- `delete_project`: Cleans up project files and session state for deleted projects.
- `set_current_project`: Updates session state to reflect the active project.



3. **Retrieval-Augmented Generation (RAG)**
- **Document Handling**: 
  - PDF files are uploaded and processed using `PyPDFDirectoryLoader` to extract text content.
  - Text is split into manageable chunks with `RecursiveCharacterTextSplitter` to optimize embedding creation.
- **Vector Store**: 
  - Embeddings are generated using OpenAI's embedding models and stored in a FAISS vector store for efficient retrieval.
  - Existing vector stores are reused if available, avoiding redundant computations.
- **Retrieval Chain**:
  - Combines FAISS-based chunk retrieval with Groq AI's LLM for context-aware question answering.
  - Uses a custom `ChatPromptTemplate` to ensure answers are strictly based on the uploaded PDFs.



4. **User Interface (UI)**
- **Framework**: Built with **Streamlit** for an interactive and intuitive experience.
- **Features**:
  - **Sidebar**: Includes options for creating, listing, deleting, and switching projects.
  - **Main Area**: Allows users to upload PDFs, initialize vector stores, and ask questions.
  - **Caching**: Uses `@st.cache_resource` to optimize performance for repeated LLM initialization and vector store loading.
- **Seamless Workflow**:
  - Users can easily upload PDFs, initialize the retrieval system, and query project-specific information.
  - All operations update dynamically based on session state, ensuring smooth navigation and interaction.





---

## **Workflow**

1. **Project Management**
- User creates a new project from the sidebar.
- The project directory is initialized, and state is updated in st.session_state.

2. **PDF Upload**
- PDFs are uploaded via the file uploader and saved in the project's pdfs directory.
- Uploaded documents are parsed, split into chunks, and embedded into a vector store.

3. **Question Answering**
- When a user asks a question, the system retrieves the most relevant chunks from the vector store.
- The LLM processes the retrieved context to generate a project-specific answer.

4. **Project Switching**
- Switching between projects dynamically loads the relevant vector store and documents without requiring re-upload.


---

## **Demo**

Watch the demo of the app here: <a href="https://www.youtube.com/watch?v=mOlA6L104-M">Link</a>

---