In [2]:
import os
from dotenv import load_dotenv

load_dotenv() 

GROQ_API_KEY = os.getenv("GROQ_API_KEY")

if not GROQ_API_KEY:
    raise RuntimeError("""
GROQ_API_KEY not found.

Create a .env file with:
GROQ_API_KEY=your_key_here
""")


In [3]:
from groq import Groq

client = Groq(api_key=GROQ_API_KEY)

response = client.chat.completions.create(
    model="llama-3.1-8b-instant",
    messages=[
        {
            "role": "user",
            "content": "main subjects in GATE DA syllabus ?"
        }
    ],
    temperature=0.1
)

print(response.choices[0].message.content)



The GATE (Graduate Aptitude Test in Engineering) DA (Defense Aerospace) syllabus is a broad and comprehensive one, covering various subjects related to aerospace engineering. Here are the main subjects typically covered in the GATE DA syllabus:

1. **Aerospace Engineering**:
	* Aerodynamics: subsonic, supersonic, and hypersonic flows, boundary layers, drag, lift, and thrust.
	* Aerothermodynamics: heat transfer, thermal protection systems, and plasma dynamics.
	* Propulsion: rocket propulsion, jet propulsion, ramjet, scramjet, and air-breathing propulsion.
	* Spacecraft Design: structural, thermal, and propulsion systems.
2. **Thermodynamics**:
	* Laws of thermodynamics, thermodynamic properties, and thermodynamic cycles.
	* Heat transfer: conduction, convection, and radiation.
	* Refrigeration and air conditioning systems.
3. **Fluid Mechanics**:
	* Fluid properties, fluid statics, and fluid kinematics.
	* Fluid dynamics: Bernoulli's equation, Euler's equation, and Navier-Stokes equat

In [15]:
from langchain_core.runnables import RunnableLambda

def groq_llm(prompt: str):
    response = client.chat.completions.create(
        model="llama-3.1-8b-instant",
        messages=[{"role": "user", "content": prompt}],
        temperature=0.1
    )
    return response.choices[0].message.content

llm = RunnableLambda(groq_llm)


In [4]:
from dotenv import load_dotenv
import os

load_dotenv()

MODEL_NAME = os.getenv("OLLAMA_MODEL")
CHUNK_SIZE = int(os.getenv("CHUNK_SIZE"))
CHUNK_OVERLAP = int(os.getenv("CHUNK_OVERLAP"))

MODEL_NAME, CHUNK_SIZE, CHUNK_OVERLAP


(None, 500, 50)

In [5]:
from langchain_community.document_loaders import TextLoader
loader = TextLoader("data/sample.txt")
documents = loader.load()
print(documents[0].page_content)


Attention Is All You Need
Ashish Vaswani∗
Google Brain
avaswani@google.com
Llion Jones∗
Google Research
llion@google.com
NoamShazeer∗
Google Brain
noam@google.com
Niki Parmar∗
Google Research
nikip@google.com
Aidan N. Gomez∗ †
University of Toronto
aidan@cs.toronto.edu
Jakob Uszkoreit∗
Google Research
usz@google.com
Łukasz Kaiser∗
Google Brain
lukaszkaiser@google.com
Illia Polosukhin∗ ‡
illia.polosukhin@gmail.com
Abstract
The dominant sequence transduction models are based on complex recurrent or
convolutional neural networks that include an encoder and a decoder. The best
performing models also connect the encoder and decoder through an attention
mechanism. We propose a new simple network architecture, the Transformer,
based solely on attention mechanisms, dispensing with recurrence and convolutions
entirely. Experiments on two machine translation tasks show these models to
be superior in quality while being more parallelizable and requiring significantly
less time to train. Our model

In [6]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=CHUNK_SIZE,
    chunk_overlap=CHUNK_OVERLAP
)

chunks = text_splitter.split_documents(documents)

len(chunks), chunks[0].page_content


(72,
 'Attention Is All You Need\nAshish Vaswani∗\nGoogle Brain\navaswani@google.com\nLlion Jones∗\nGoogle Research\nllion@google.com\nNoamShazeer∗\nGoogle Brain\nnoam@google.com\nNiki Parmar∗\nGoogle Research\nnikip@google.com\nAidan N. Gomez∗ †\nUniversity of Toronto\naidan@cs.toronto.edu\nJakob Uszkoreit∗\nGoogle Research\nusz@google.com\nŁukasz Kaiser∗\nGoogle Brain\nlukaszkaiser@google.com\nIllia Polosukhin∗ ‡\nillia.polosukhin@gmail.com\nAbstract')

In [7]:
for i, chunk in enumerate(chunks):
    print(f"Chunk {i+1}:\n", chunk.page_content, "\n")


Chunk 1:
 Attention Is All You Need
Ashish Vaswani∗
Google Brain
avaswani@google.com
Llion Jones∗
Google Research
llion@google.com
NoamShazeer∗
Google Brain
noam@google.com
Niki Parmar∗
Google Research
nikip@google.com
Aidan N. Gomez∗ †
University of Toronto
aidan@cs.toronto.edu
Jakob Uszkoreit∗
Google Research
usz@google.com
Łukasz Kaiser∗
Google Brain
lukaszkaiser@google.com
Illia Polosukhin∗ ‡
illia.polosukhin@gmail.com
Abstract 

Chunk 2:
 illia.polosukhin@gmail.com
Abstract
The dominant sequence transduction models are based on complex recurrent or
convolutional neural networks that include an encoder and a decoder. The best
performing models also connect the encoder and decoder through an attention
mechanism. We propose a new simple network architecture, the Transformer,
based solely on attention mechanisms, dispensing with recurrence and convolutions
entirely. Experiments on two machine translation tasks show these models to 

Chunk 3:
 be superior in quality while being more pa

In [8]:
from langchain_community.embeddings import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)

embeddings.embed_query("Test embedding")


  embeddings = HuggingFaceEmbeddings(


[0.0035622562281787395,
 -0.040075987577438354,
 0.01275063119828701,
 -0.003927467856556177,
 0.05375683680176735,
 0.044486720114946365,
 -0.024579839780926704,
 -0.01569996029138565,
 -0.052196353673934937,
 -0.03485812619328499,
 0.03438963368535042,
 -0.028950180858373642,
 0.0553855374455452,
 0.04806943237781525,
 -0.05770893767476082,
 -0.012370922602713108,
 0.07081103324890137,
 0.07332155853509903,
 -0.048683639615774155,
 -0.010235898196697235,
 -0.06322772800922394,
 -0.02790844440460205,
 0.03950728848576546,
 -0.045231208205223083,
 0.015962865203619003,
 -0.025239691138267517,
 -0.016029199585318565,
 0.04795804247260094,
 0.0657644122838974,
 -0.06304241716861725,
 0.13697513937950134,
 -0.03624489903450012,
 -0.04366994649171829,
 0.07229743897914886,
 0.05494534596800804,
 0.02193175069987774,
 0.0012016239343211055,
 -0.009898477233946323,
 -0.0523470863699913,
 0.04692899063229561,
 0.015536223538219929,
 -0.008105807937681675,
 0.04297666996717453,
 0.028818568214

In [9]:
from langchain_community.vectorstores import Chroma
import shutil
import os

if os.path.exists("./chroma_db"):
    shutil.rmtree("./chroma_db")

vectordb = Chroma.from_documents(
    documents=chunks,
    embedding=embeddings,
    persist_directory="./chroma_db"
)


In [10]:
retriever = vectordb.as_retriever(
    search_kwargs={"k": 9}
)


In [11]:
query = "Why Self-Attention?"
docs = retriever.invoke(query)

for i, doc in enumerate(docs):
    print(f"Result {i+1}:\n", doc.page_content, "\n")



Result 1:
 reduced to a constant number of operations, albeit at the cost of reduced effective resolution due
to averaging attention-weighted positions, an effect we counteract with Multi-Head Attention as
described in section 3.2.
Self-attention, sometimes called intra-attention is an attention mechanism relating different positions
of a single sequence in order to compute a representation of the sequence. Self-attention has been 

Result 2:
 PEpos.
Wealso experimented with using learned positional embeddings [8] instead, and found that the two
versions produced nearly identical results (see Table 3 row (E)). We chose the sinusoidal version
because it may allow the model to extrapolate to sequence lengths longer than the ones encountered
during training.
4 WhySelf-Attention
In this section we compare various aspects of self-attention layers to the recurrent and convolu 

Result 3:
 the approach we take in our model.
Asside benefit, self-attention could yield more interpretable models.

In [12]:
from langchain_core.prompts import PromptTemplate


prompt = PromptTemplate(
    input_variables=["context", "question"],
    template="""
You are an assistant answering questions from the GATE DA syllabus.
Ignore repetitions
ONLY Use the information provided below.
If the answer is not present, say "Not found in syllabus."

Context:
{context}

Question:
{question}

Answer:
"""
)


In [13]:
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)


In [16]:
query = "Why Self-Attention?"
docs = retriever.invoke(query)  
context = format_docs(docs)

formatted_prompt = prompt.format(
    context=context,
    question=query
)
response = llm.invoke(formatted_prompt)
print(response)

Self-attention could yield more interpretable models. We inspect attention distributions from our models and present and discuss examples in the appendix. Not only do individual attention heads clearly learn to perform different tasks, many appear to exhibit behavior related to the syntactic and semantic structure of the sentences.


In [17]:
def ask_syllabus(query):
    docs = retriever.invoke(query)
    context = format_docs(docs)
    prompt_text = prompt.format(context=context, question=query)
    return llm.invoke(prompt_text).content


In [18]:
ask_syllabus("Why Self-Attention?")

AttributeError: 'str' object has no attribute 'content'