## **Gen-AI and Agentic AI- Dev Bhushan Sir**

#  Personal AI Assistant in Google Colab (No OpenAI Key Needed)
This notebook will help you and your students build a simple **AI Assistant** using open-source models (no API key required).

*Notes*
1. GenAI:It focuses on creating new content (like text, images, or code)
2. Agentic AI:It is designed to autonomously make decisions and take actions to achieve specific goals


**Step 1: Install Transformers**
We will use Hugging Face Transformers library to run free AI models.

In [1]:
!pip install transformers torch accelerate -q

**Step 2: Import a Model**
We will use Google's Flan-T5 Small model for Q&A and chat.

In [2]:
from transformers import pipeline

# Load a small open-source model
assistant = pipeline('text2text-generation', model='google/flan-t5-small')

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/308M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json: 0.00B [00:00, ?B/s]

Device set to use cpu


**Step 3: Build a Chat Function**
This function keeps track of chat history and generates AI responses.

In [3]:
chat_history = []

def ask_ai(user_input):
    global chat_history
    # Add user input to history
    chat_history.append(f'User: {user_input}')

    # Use last 5 turns as context
    context = '\n'.join(chat_history[-5:])

    # Generate response
    response = assistant(context)[0]['generated_text']

    # Save response
    chat_history.append(f'Assistant: {response}')
    return response

## Step 4: Chat with Your Assistant
Now you can chat with your AI assistant directly in Colab. Type 'exit' to stop.

In [4]:
while True:
    user_input = input('You: ')
    if user_input.lower() in ['exit', 'quit', 'bye']:
        print('👋 Assistant: Goodbye!')
        break
    reply = ask_ai(user_input)
    print('🤖 Assistant:', reply)

You: who am I?
🤖 Assistant: I am a professional tennis player. I am a professional ice hockey player. I am a professional ice hockey player. I am a professional ice hockey player. I am a professional ice hockey player. I am a professional ice hockey player. I am a professional ice hockey player. I am a professional ice hockey player. I am a professional ice hockey player. I am a professional ice hockey player. I am a professional ice hockey player. I am a professional ice hockey player. I am a professional ice hockey player. I am a professional ice hockey player. I am a professional ice hockey player. I am a professional ice hockey player. I am a professional ice hockey player. I am a professional ice hockey player. I am a professional ice hockey player. I am a professional ice hockey player. I am a professional ice hockey player. I am a professional ice hockey player. I am a professional ice hockey player. I am a professional ice hockey player. I am a professional ice hockey player. I

##PART-2(Kinda Better Version)

In [12]:
# !pip install transformers accelerate -q

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline
import torch

model_name = "google/flan-t5-large"  # upgraded from -small

# load tokenizer + model safely
device = 0 if torch.cuda.is_available() else -1
print("Using device:", "cuda:0" if device == 0 else "cpu")

tokenizer = AutoTokenizer.from_pretrained(model_name)

model_kwargs = {}
if torch.cuda.is_available():
    model_kwargs = {
        "torch_dtype": torch.float16,
        "device_map": "auto"
    }

model = AutoModelForSeq2SeqLM.from_pretrained(model_name, **model_kwargs)

# Create a text2text pipeline that will run on the chosen device
assistant = pipeline("text2text-generation", model=model, tokenizer=tokenizer)

def build_prompt(user_input, chat_history=None):
    """
    Few-shot + context prompt that steers the model to answer directly.
    We avoid forcing a hard "I don't know" instruction which FLAN sometimes echoes verbatim.
    """
    # Short few-shot examples to encourage direct answering
    few_shot = (
        "You are a helpful assistant. Answer concisely and truthfully.\n\n"
        "User: What is 2+2?\nAssistant: 4\n\n"
        "User: Who wrote 'Pride and Prejudice'?\nAssistant: Jane Austen\n\n"
    )

    ctx = ""
    if chat_history:
        ctx = "\n".join(chat_history[-6:]) + "\n"

    prompt = f"{few_shot}{ctx}User: {user_input}\nAssistant:"
    return prompt

# main ask function
def ask_ai(user_input, chat_history=None, max_new_tokens=128, use_beam=True):
    """
    Ask the model a question.
      - use_beam=True => deterministic beam search (num_beams=4)
      - use_beam=False => sampling (do_sample=True) with temperature; useful for creative tasks
    """
    prompt = build_prompt(user_input, chat_history)

    if use_beam:
        out = assistant(
            prompt,
            max_new_tokens=max_new_tokens,
            num_beams=4,
            do_sample=False,   # deterministic with beams
        )[0]["generated_text"]
    else:
        out = assistant(
            prompt,
            max_new_tokens=max_new_tokens,
            do_sample=True,
            top_p=0.9,
            temperature=0.7,
            num_return_sequences=1,
        )[0]["generated_text"]

    return out.strip()

# quick interactive demo
if __name__ == "__main__":
    chat_history = []
    print("Q: What is the capital of France?")
    r = ask_ai("What is the capital of France?", chat_history, use_beam=True)
    print("A:", r)
    chat_history.append("User: What is the capital of France?")
    chat_history.append("Assistant: " + r)


    print("\nQ: Who wrote 'Pride and Prejudice'?")
    r2 = ask_ai("Who wrote 'Pride and Prejudice'?", chat_history, use_beam=True)
    print("A:", r2)

    q3="Who is the best racer?"
    print("\nQ:",q3)
    r3 = ask_ai(q3, chat_history, use_beam=True)
    print("A:", r3)

    q4="Who wrote Harry Potter?"
    print("\nQ:",q4)
    r4 = ask_ai(q4, chat_history, use_beam=True)
    print("A:", r4)


Using device: cuda:0


Device set to use cuda:0


Q: What is the capital of France?
A: Paris

Q: Who wrote 'Pride and Prejudice'?
A: Jane Austen

Q: Who is the best racer?
A: Michael Schumacher

Q: Who wrote Harry Potter?
A: J. K. Rowling


P.S This version does not ask queries continously, we have to manually enter the query in code