# GenAI, RAG, Agents, Deployment

### 1: What is an LLM :

* LLM ek large language model hota hai jo huge text data par train kiya gaya hota hai. Ye model natural language ko samajhta hai, generate karta hai, aur instructions follow karta hai.

* LLM ko aise samajh ki ye ek intelligent text machine hai. Ye tumhare sawaal ko context ke saath samajh kar human jaise jawab banata hai.

* Real world me text sabse common data hota hai, isliye LLMs bahut powerful tools ban gaye.

* LLMs ko hum chatbots, summarization, code generation, translation, QnA, automation sab me use kar sakte hai.

* Inka strength ye hota hai ki bina task specific training ke bhi bohot saare tasks samajh jaate hai.

* LLMs reasoning, creativity aur context handling me best models ban chuke hai.

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Model name mention
model_name = "gpt2"

# Tokenizer load
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Model load
model = AutoModelForCausalLM.from_pretrained(model_name)

# Prompt
prompt = "Explain what is a Large Language Model."

# Prompt ko tokens me convert karna
inputs = tokenizer(prompt, return_tensors="pt")

# Output generate
output_tokens = model.generate(
    **inputs,
    max_length=50,   # 50 tokens tak output
    temperature=1.0  # randomness control
)

# Tokens ko text me decode karna
print(tokenizer.decode(output_tokens[0], skip_special_tokens=True))


* Model ka naam.

* Tokenizer load hota hai jo text ko numbers me convert karta hai.

* Model load hota hai jo sequence generate karta hai.

* Prompt wo input hai jo model ko guide karta hai.

* Tokenization model ko data samajhne laayak banati hai.

* Model generate karta hai next text based on input.

* Decode karne se numbers wapas text me convert hote hai.

### 2: Prompting :

* Prompting matlab model ko correct instruction dena. Jaise tum human ko instructions dete ho, waise hi model ko bhi instructions diye jaate hai.

* Prompting ek technique hai jisme hum model ko batate hai ki kya karna hai. Jitna clear prompt, utna better output.

* LLM fully prompt driven model hai, agar prompt sahi hua to result meaningful aata hai.

* Prompt engineering ek skill ban chuki hai jisse output quality improve hoti hai.

* Isi se hum ek hi model se multiple tasks karwa sakte hai.

* Bina fine tuning ke tasks chal jaate hai, sirf prompt se hi control ho jata hai.

In [1]:
prompt = """
You are a helpful assistant.
Explain tokenization in simple terms.
Give a short example.
"""

print(prompt)



You are a helpful assistant.
Explain tokenization in simple terms.
Give a short example.



### 3: Temperature :

* Temperature model ke randomness ko control karta hai.

* Temperature ko tum creativity level samajh sakte ho.

* Temperature low ho to model safe and accurate reply deta hai.

* Temperature high ho to model more creative hota hai.

* Controlled generation chahiye to use low temperature.

* Creativity chahiye to high temperature.

* Dialogue flow model ke temperature se decide hota hai.

* Different tasks ke liye different temperature values best kaam karti hai.

* 0 to 0.3 → predictable output

* 0.7 → balanced

* 1.0+ → creative output

In [None]:
output_tokens = model.generate(
    **inputs,
    max_length=60,
    temperature=0.2
)

### 4: Tokens :

* Token small piece of text hota hai. Model text ko direct nahi samajhta, balki tokens me convert karke read karta hai.

* Token words ka chhota version hota hai. Model har token ko number banata hai aur uske basis par next token predict karta hai.

* Model ki cost tokens per hoti hai.

* Token limit decide karta hai ki tum kitna long prompt de sakte ho.

* Tokenization hi model ki input pipeline hai.

* Text compression type ka system hai jo model ko efficient banata hai.

* Example :

    - "Hello world" might be → ["Hello", " world"]

### 5: Use models via OpenRouter :

* OpenRouter ek API service hai jisme bahut sare models ek jagah milte hai jaise GPT, Claude, Llama, Mixtral.

* Ye ek marketplace jaisa hai jaha tum ek hi API se multiple LLMs ko use kar sakte ho.


In [None]:
import requests

url = "https://openrouter.ai/api/v1/chat/completions"
headers = {
    "Authorization": "Bearer YOUR_API_KEY"
}

data = {
    "model": "meta-llama/Llama-3-8b",
    "messages": [
        {"role": "user", "content": "Explain LLM basics"}
    ]
}

response = requests.post(url, headers=headers, json=data)
print(response.json())
