### Setting Environment Variables

In [13]:
import os
from dotenv import load_dotenv
load_dotenv()
# os.environ['GROQ_API_KEY'] = os.getenv('GROQ_API_KEY')
os.environ['HUGGINGFACE_API_KEY'] = os.getenv('HUGGINGFACE_API_KEY')
# os.environ['OPENAI_API_KEY'] = os.getenv('OPENAI_API_KEY')

### Loading or data ingestion

In [22]:
#Data Ingestion
from langchain_community.document_loaders import TextLoader
loader = TextLoader("speech.txt")
text_data = loader.load()
text_data[0].page_content[:500]

'The world must be made safe for democracy. Its peace must be planted upon the tested foundations of political liberty. We have no selfish ends to serve. We desire no conquest, no dominion. We seek no indemnities for ourselves, no material compensation for the sacrifices we shall freely make. We are but one of the champions of the rights of mankind. We shall be satisfied when those rights have been made as secure as the faith and the freedom of nations can make them.\n\nJust because we fight withou'

In [24]:
from langchain_community.document_loaders import WebBaseLoader
import bs4

loader = WebBaseLoader(web_path="https://lilianweng.github.io/posts/2023-06-23-agent/",
                       bs_kwargs=dict(parse_only=bs4.SoupStrainer(
                           class_ =("post-title","post-content","post-header")
                                    )),
                       )
web_data = loader.load()
web_data[0].page_content[:500]

'\n\n      LLM Powered Autonomous Agents\n    \nDate: June 23, 2023  |  Estimated Reading Time: 31 min  |  Author: Lilian Weng\n\n\nBuilding agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\nAgent System Overview#\nIn'

In [25]:
from langchain_community.document_loaders import PyPDFLoader
loader = PyPDFLoader("attention.pdf")
pdf_data = loader.load()
pdf_data[0]

Document(metadata={'source': 'attention.pdf', 'page': 0, 'page_label': '1'}, page_content='Provided proper attribution is provided, Google hereby grants permission to\nreproduce the tables and figures in this paper solely for use in journalistic or\nscholarly works.\nAttention Is All You Need\nAshish Vaswani∗\nGoogle Brain\navaswani@google.com\nNoam Shazeer∗\nGoogle Brain\nnoam@google.com\nNiki Parmar∗\nGoogle Research\nnikip@google.com\nJakob Uszkoreit∗\nGoogle Research\nusz@google.com\nLlion Jones∗\nGoogle Research\nllion@google.com\nAidan N. Gomez∗ †\nUniversity of Toronto\naidan@cs.toronto.edu\nŁukasz Kaiser∗\nGoogle Brain\nlukaszkaiser@google.com\nIllia Polosukhin∗ ‡\nillia.polosukhin@gmail.com\nAbstract\nThe dominant sequence transduction models are based on complex recurrent or\nconvolutional neural networks that include an encoder and a decoder. The best\nperforming models also connect the encoder and decoder through an attention\nmechanism. We propose a new simple network arch

## Transform

In [26]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000,chunk_overlap=200)
pdf_docs = text_splitter.split_documents(pdf_data)
pdf_docs[0]

Document(metadata={'source': 'attention.pdf', 'page': 0, 'page_label': '1'}, page_content='Provided proper attribution is provided, Google hereby grants permission to\nreproduce the tables and figures in this paper solely for use in journalistic or\nscholarly works.\nAttention Is All You Need\nAshish Vaswani∗\nGoogle Brain\navaswani@google.com\nNoam Shazeer∗\nGoogle Brain\nnoam@google.com\nNiki Parmar∗\nGoogle Research\nnikip@google.com\nJakob Uszkoreit∗\nGoogle Research\nusz@google.com\nLlion Jones∗\nGoogle Research\nllion@google.com\nAidan N. Gomez∗ †\nUniversity of Toronto\naidan@cs.toronto.edu\nŁukasz Kaiser∗\nGoogle Brain\nlukaszkaiser@google.com\nIllia Polosukhin∗ ‡\nillia.polosukhin@gmail.com\nAbstract\nThe dominant sequence transduction models are based on complex recurrent or\nconvolutional neural networks that include an encoder and a decoder. The best\nperforming models also connect the encoder and decoder through an attention\nmechanism. We propose a new simple network arch

## Embedding using ollama

In [27]:
import ollama
embeddings = []
for doc in pdf_docs:
    embedding = ollama.embeddings(model='nomic-embed-text', prompt=doc.page_content)
    embeddings.append(embedding)
print(embeddings[0].embedding[:50])

[-0.21340374648571014, 0.7963298559188843, -3.3215370178222656, -0.9042803049087524, 1.1767783164978027, -1.0729793310165405, 1.2346901893615723, 0.7699195742607117, -0.46011823415756226, 0.6620697379112244, -1.078200340270996, 0.516563355922699, 1.6366974115371704, 0.7122877240180969, 0.5576314926147461, -0.028381891548633575, -0.6622126698493958, 0.8595151901245117, -0.6544469594955444, -0.4371986985206604, -1.1408793926239014, -0.35976892709732056, 0.3077172636985779, -0.11545611917972565, 0.6937441825866699, 0.5661580562591553, -0.7673529386520386, -0.5526708364486694, -0.9140962362289429, 0.46238458156585693, 1.0609550476074219, -1.0529851913452148, -0.7114295959472656, -0.43026867508888245, -1.1668598651885986, -0.3960408866405487, 0.6229568123817444, -0.5989586710929871, 0.05249446630477905, 1.6050783395767212, 0.6491849422454834, 0.45040571689605713, -0.5054923892021179, -0.817855954170227, 0.5923900604248047, -0.047030188143253326, 0.4535031318664551, 0.12213018536567688, 0.75

### Embedding using Hugging Face

In [28]:
from langchain_huggingface.embeddings import HuggingFaceEmbeddings
huggingface_embeddings=HuggingFaceEmbeddings(
    model_name="BAAI/bge-small-en-v1.5",      #sentence-transformers/all-MiniLM-l6-v2
    model_kwargs={'device':'cpu'},
    encode_kwargs={'normalize_embeddings':True}
)
texts = [doc.page_content for doc in pdf_docs]

document_embeddings = huggingface_embeddings.embed_documents(texts)

print("Document Embeddings:", document_embeddings[0][0:50])
# query_embedding = huggingface_embeddings.embed_query(query)
# print("Query Embedding:", query_embedding)

Document Embeddings: [-0.047576695680618286, 0.007524257991462946, -0.0424019955098629, -0.034610260277986526, -0.01708778738975525, 0.030573129653930664, 0.003201359184458852, 0.015050658024847507, 0.06337112188339233, -0.00830100942403078, 0.006791823543608189, -0.023811908438801765, 0.04445257410407066, 0.08122535049915314, 0.03007696010172367, 0.01822604611515999, -0.039639756083488464, 0.037486735731363297, 0.04610450565814972, -0.053401101380586624, 0.052758313715457916, -0.045865725725889206, 0.002279018284752965, 0.010212023742496967, -0.023033063858747482, -0.024285677820444107, -0.015006880275905132, -0.033458247780799866, -0.046329524368047714, -0.23741863667964935, 0.02165311761200428, -0.03722285106778145, 0.07150194048881531, 0.010493944399058819, -0.033909671008586884, -0.03073902614414692, -0.05888020247220993, -0.03791927546262741, -0.06669481843709946, -0.005461020395159721, 0.00712302653118968, 0.02402227371931076, -0.008979959413409233, -0.032758165150880814, -0.032

## Storing Ollama Embeddings in VectorDB
FAISS.from_documents() expects a object along with list of chunks for that reason we implementing a class having two functions one of which is used to embed data(in the form of chunks) that we want to our llm to learn and another function embed every single query that we're gonna make

In [8]:
from langchain_community.vectorstores import FAISS
from langchain_community.vectorstores import Chroma
from langchain.embeddings.base import Embeddings
import ollama

class NomicEmbedText(Embeddings):
    def embed_documents(self, texts):
        embeddings = []
        for text in texts:
            embedding = ollama.embeddings(model='nomic-embed-text', prompt=text)
            embeddings.append(embedding['embedding'])  
        return embeddings

    def embed_query(self, text):
        embedding = ollama.embeddings(model='nomic-embed-text', prompt=text)
        return embedding['embedding']

nomic_embedder = NomicEmbedText()

# db = FAISS.from_documents(pdf_docs[:15], nomic_embedder)
db1 = Chroma.from_documents(pdf_docs[:15], nomic_embedder)

### Querying the VectorDB

In [9]:
query = "what are Recurrent Neural Networks?"
retireved_results=db1.similarity_search(query)
print(retireved_results[0].page_content)

1 Introduction
Recurrent neural networks, long short-term memory [13] and gated recurrent [7] neural networks
in particular, have been firmly established as state of the art approaches in sequence modeling and
transduction problems such as language modeling and machine translation [ 35, 2, 5]. Numerous
efforts have since continued to push the boundaries of recurrent language models and encoder-decoder
architectures [38, 24, 15].
Recurrent models typically factor computation along the symbol positions of the input and output
sequences. Aligning the positions to steps in computation time, they generate a sequence of hidden
states ht, as a function of the previous hidden state ht−1 and the input for position t. This inherently
sequential nature precludes parallelization within training examples, which becomes critical at longer
sequence lengths, as memory constraints limit batching across examples. Recent work has achieved


## Storing HuggingFace Embeddings in VectorDB

In [10]:
from langchain_huggingface.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS
huggingface_embeddings=HuggingFaceEmbeddings(
    model_name="BAAI/bge-small-en-v1.5",      #sentence-transformers/all-MiniLM-l6-v2
    model_kwargs={'device':'cpu'},
    encode_kwargs={'normalize_embeddings':True}
)
texts = [doc.page_content for doc in pdf_docs]

db2 = FAISS.from_documents(pdf_docs[:15], huggingface_embeddings)
# db = Chroma.from_documents(pdf_docs[:15], huggingface_embeddings)

query = "what are Recurrent Neural Networks?"
retireved_results=db2.similarity_search(query)
print(retireved_results[0].page_content)

1 Introduction
Recurrent neural networks, long short-term memory [13] and gated recurrent [7] neural networks
in particular, have been firmly established as state of the art approaches in sequence modeling and
transduction problems such as language modeling and machine translation [ 35, 2, 5]. Numerous
efforts have since continued to push the boundaries of recurrent language models and encoder-decoder
architectures [38, 24, 15].
Recurrent models typically factor computation along the symbol positions of the input and output
sequences. Aligning the positions to steps in computation time, they generate a sequence of hidden
states ht, as a function of the previous hidden state ht−1 and the input for position t. This inherently
sequential nature precludes parallelization within training examples, which becomes critical at longer
sequence lengths, as memory constraints limit batching across examples. Recent work has achieved


### Using Remote Inference models from Huggingface
Basically we're running an llm(infernece model) on the remote server instead of pulling it and running in our local system 

In [17]:
from huggingface_hub import InferenceClient

client = InferenceClient(token=os.getenv('HUGGINGFACE_API_KEY'))

messages = [
	{ "role": "user", "content": "hi there" }
]

stream = client.chat.completions.create(
    model="mistralai/Mistral-7B-Instruct-v0.2", 
	messages=messages, 
	temperature=0.5,
	max_tokens=2048,
	top_p=0.7,
	stream=True
)

for chunk in stream:
    print(chunk.choices[0].delta.content,end="")

 Hello! How can I help you today? If you have any questions or need assistance with something, feel free to ask. I'm here to help. If you just want to chat, we can do that too. What's on your mind?

### Using Remote Inference models from GroqCloud

In [18]:
from groq import Groq

messages = [
	{ "role": "user", "content": "hi there" }
]
client = Groq()
completion = client.chat.completions.create(
    model="llama-3.3-70b-versatile",
    messages=messages,
    temperature=1,
    max_tokens=1024,
    top_p=1,
    stream=True,
    stop=None,
)

for chunk in completion:
    print(chunk.choices[0].delta.content or "", end="")

It's nice to meet you. Is there something I can help you with or would you like to chat?