# **Gmail Responder**

## 📧 ➡️ 🤖 ➡️ 📜 ➡️ 🤖 ➡️ 📧 🔥

---

## Project Description:

Imagine you are a professor who has published many self-explanatory papers. Despite this, you frequently receive questions that are clearly addressed in your papers, but readers often fail to understand or overlook the information. Responding to these emails is time-consuming, so you plan to create a tool to solve this problem.

### Objective:
To develop a system that automates responses to emails with questions about your papers using advanced AI techniques.

### Approach:
1. **Retrieval-Augmented Generation (RAG)**: Implement RAG to fetch relevant information from your papers when prompted.
2. **Large Language Model (LLM) with an Agent**: Use an LLM to generate responses to emails that ask questions about your papers.
3. **Email Filtering**: Ignore emails that are not related to your papers.

### Tools and Technologies:
- **LangChain**: For building the RAG system and integrating various components.
- **LlamaIndex's LlamaParse**: To read and parse the data from your papers.
- **OpenAI Embeddings**: For embedding the parsed data into a format suitable for retrieval.
- **LlamaIndex Ingestion Pipeline**: To ingest and process the data efficiently.
- **Qdrant Vector Store**: For storing and retrieving vectorized data.
- **LangChain Gmail Toolkit**: To interact with Gmail and manage email communications.
- **LangChain Agents**: To orchestrate the workflow and ensure smooth operation of the system.

### Workflow:
1. **Data Ingestion**: Use LlamaIndex's LlamaParse to read and parse the data from your papers.
2. **Data Embedding**: Utilize OpenAI embeddings to convert the parsed data into vectors.
3. **Data Storage**: Store the vectorized data in Qdrant Vector Store for efficient retrieval.
4. **Email Handling**: Use LangChain Gmail Toolkit to fetch emails and filter out irrelevant ones.
5. **Information Retrieval**: Implement RAG to fetch relevant information from the stored data.
6. **Response Generation**: Use an LLM with an agent to generate appropriate responses to the emails.
7. **Email Response**: Send the generated responses back to the email sender using LangChain Gmail Toolkit.

---


In [None]:
# %pip install llama-index
# %pip install llama_parse
# %pip install -qU langchain-pinecone pinecone-notebooks
# %pip install llama-index-vector-stores-pinecone

### Import Libraries

In [2]:
# Load enviroment secrets
import os
from dotenv import load_dotenv
load_dotenv()

# GPT 3.5
from llama_index.llms.openai import OpenAI

# To set LLM and embedding model
from llama_index.core import Settings

# LlamaParse to parse the data
from llama_parse import LlamaParse

# For nest asynchronous loops
import nest_asyncio

# vector database
from pinecone import Pinecone, ServerlessSpec

### Setting the LLM

In [3]:
Settings.llm = OpenAI(temperature=0.2, model="gpt-3.5")

### Using `LlamaParse`

In [8]:
from llama_parse import LlamaParse


nest_asyncio.apply()

documents = LlamaParse(result_type="markdown").load_data('./data/2404.19756v2.pdf')

Started parsing the file under job_id ba256611-b5ef-4764-bc18-550ed07ce5dd
....

In [9]:
documents

[Document(id_='c3817680-07be-4554-b3de-cabd12e60470', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, text='# arXiv:2404.19756v2 [cs.LG] 2 May 2024\n\nKAN: Kolmogorov–Arnold Networks\n\nJames Halverson3,41 Massachusetts Institute of Technology\nZiming Liu1,4* Yixuan Wang2ci´1Sachin Vaidya1Marin Soljaˇ c ,4 Thomas Y. Hou2Fabian Ruehle3,4Max Tegmark1,4\n2 California Institute of Technology3 Northeastern University\n4 The NSF Institute for Artificial Intelligence and Fundamental Interactions\n\nAbstract\n\nInspired by the Kolmogorov-Arnold representation theorem, we propose Kolmogorov-Arnold Networks (KANs) as promising alternatives to Multi-Layer Perceptrons (MLPs). While MLPs have fixed activation functions on nodes (“neurons”), KANs have learnable activation functions on edges (“weights”). KANs have no linear weights at all – every weight parameter is replaced by a univariate function parametrized as a spline. We show that 

In [None]:
pinecone_api_key = os.environ('')
pc = Pinecone(api_key=)

pc.create_index(
    name="papers",
    dimension=1536, # dimensions are for text-embedding-ada-002
    metric="euclidean",
    spec=ServerlessSpec(cloud="aws", region="us-west-2"),
)


In [None]:
# initialize without metadata filter
from llama_index.core import StorageContext


vector_store = PineconeVectorStore(pinecone_index=pinecone_index)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(
    documents, storage_context=storage_context
)