<a href="https://colab.research.google.com/github/anuradha2504/LLaMa_Chatbot/blob/main/Chatbot_LLaMa_Anu.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**1. Install Depentencies - Installations**

Essential libraries to be install:
PyTorch: Serves as the backbone for deep learning operations.
Accelerate: Optimizes PyTorch operations, especially on GPU.

In [1]:
!pip install transformers torch accelerate



**2.Loggin to huggingface**

Take access on model - meta-llama/Llama-2-7b-chat-hf,
Ensures we have the correct permissions to fetch the model.

Gain access to the model on Hugging Face: Link.
https://huggingface.co/meta-llama/Llama-2-7b-chat-

Use the Hugging Face CLI to login and verify your authentication status.

In [None]:
!huggingface-cli login

**3. Load Model & Tokenizer**

Here, we are preparing our session by loading both the Llama model and its associated tokenizer.

The tokenizer will help in converting our text prompts into a format that the model can understand and process.

In [None]:
from transformers import AutoTokenizer
import transformers
import torch

model = "meta-llama/Llama-2-7b-chat-hf" # meta-llama/Llama-2-7b-hf

tokenizer = AutoTokenizer.from_pretrained(model, use_auth_token=True)

**4.Creating the Llama Pipeline**

setting up a pipeline for text generation.

This pipeline simplifies the process of feeding prompts to our model and receiving generated text as output.

In [None]:
from transformers import pipeline

llama_pipeline = pipeline(
    "text-generation",  # LLM task
    model=model,
    torch_dtype=torch.float16,
    device_map="auto",
)

**5. Get Responses**

With everything set up, let's see how Llama responds.

Create chat helper function

In [5]:
def get_llama_response(prompt: str) -> None:
    """
    Generate a response from the Llama model.

    Parameters:
        prompt (str): The user's input/question for the model.

    Returns:
        None: Prints the model's response.
    """
    sequences = llama_pipeline(
        prompt,
        do_sample=True,
        top_k=10,
        num_return_sequences=1,
        eos_token_id=tokenizer.eos_token_id,
        max_length=256,
    )
    print("Chatbot:", sequences[0]['generated_text'])



prompt = 'I liked "Breaking Bad" and "Band of Brothers". Do you have any recommendations of other shows I might like?\n'
get_llama_response(prompt)

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


Chatbot: I liked "Breaking Bad" and "Band of Brothers". Do you have any recommendations of other shows I might like?

I have a bit of a thing for gritty, intense dramas with complex characters and thought-provoking themes.

Do you have any recommendations?


**6. Sample queries- Prompt**

In [6]:
prompt = """I'm a programmer and Python is my favorite language because of it's simple syntax and variety of applications I can build with it.\
Based on that, what language should I learn next?\
Give me 5 recommendations"""
get_llama_response(prompt)

Chatbot: I'm a programmer and Python is my favorite language because of it's simple syntax and variety of applications I can build with it.Based on that, what language should I learn next?Give me 5 recommendations and explain why you think they're good choices.

Here are five programming languages you might consider learning next, based on your interest in Python:

1. JavaScript: JavaScript is a popular language used for web development, and it's closely related to Python in terms of syntax and structure. It's also used in a wide range of applications, including mobile app development, game development, and server-side programming.

Why it's a good choice: JavaScript is a versatile language that can be used for a variety of applications, and it's well-suited for beginners who are already familiar with Python's syntax.

2. Java: Java is an object-oriented language that's widely used in enterprise software development, Android app development, and web development. It's known for its plat

**7. More Queries**

In [7]:
prompt = 'How to learn fast?\n'
get_llama_response(prompt)

Chatbot: How to learn fast?

There are many ways to learn quickly, but here are some strategies that can help:

1. Set clear goals: Identify what you want to learn and set specific goals for yourself. This will help you stay motivated and focused.
2. Break it down: Break down the task or subject you want to learn into smaller, manageable chunks. This will make it easier to digest and retain the information.
3. Use visual aids: Visual aids such as diagrams, flowcharts, and pictures can help you understand complex concepts and retain information better.
4. Practice consistently: Consistency is key to learning quickly. Set aside a specific time each day or week to practice and review what you have learned.
5. Get enough sleep: Sleep plays an important role in memory consolidation and learning. Aim for 7-9 hours of sleep each night to help your brain process and retain information.
6. Teach someone else: Teaching someone else what you have learned is a great way to reinforce your own knowl

In [8]:
prompt = 'I love basketball. Do you have any recommendations of team sports I might like?\n'
get_llama_response(prompt)

Chatbot: I love basketball. Do you have any recommendations of team sports I might like?

Basketball is a great sport, but there are many other team sports that you might enjoy as well. Here are a few options to consider:

1. Volleyball: Volleyball is a fun and fast-paced sport that involves hitting a ball over a net with your hands. It's a great workout and can be played both indoors and outdoors.
2. Soccer: Soccer, or football as it's called in many parts of the world, is a popular team sport that involves kicking a ball into a goal. It's a great way to stay active and can be played on a variety of surfaces, including grass, turf, and indoor courts.
3. Lacrosse: Lacrosse is a fast-paced sport that involves using a stick to catch, carry, and throw a ball into a goal. It's a great workout and can be played both indoors and outdoors.
4. Field Hockey: Field hockey is a team sport that involves using a stick to hit a ball into a goal. It's a great workout


In [9]:
prompt = 'How to get rich?\n'
get_llama_response(prompt)

Chatbot: How to get rich?

There are many ways to become rich, but here are some general tips that can help you on your journey to wealth:

1. Start by setting clear financial goals: What do you want to achieve? When do you want to achieve it? How much money do you need to make it happen? Write down your goals and make them specific, measurable, achievable, relevant, and time-bound (SMART).
2. Live below your means: Spend less than you earn. Create a budget that accounts for all your expenses, and make sure you're not overspending. Cut back on unnecessary expenses like dining out, subscription services, and other luxuries.
3. Invest wisely: Invest your money in assets that have a high potential for growth, such as stocks, real estate, or a small business. Do your research, diversify your portfolio, and avoid get-rich-quick schemes.
4. Build multiple income streams: Don't rely on just one source of income. Create additional streams of income through freelancing, consulting, or starting 

**8. Make it conversational**

Let's create an interactive chat loop, where you can converse with the Llama model.

Type your questions or comments, and see how the model responds!

In [10]:
while True:
    user_input = input("You: ")
    if user_input.lower() in ["bye", "quit", "exit"]:
        print("Chatbot: Goodbye!")
        break
    get_llama_response(user_input)

You: quit
Chatbot: Goodbye!


In [11]:
SYSTEM_PROMPT = "You are a helpful, concise assistant."
MAX_TOKENS = 512
TEMPERATURE = 0.7
TOP_P = 0.95

def format_prompt(history, user_message):
    messages = [{"role": "system", "content": SYSTEM_PROMPT}]
    for u, a in history:
        if u:
            messages.append({"role": "user", "content": u})
        if a:
            messages.append({"role": "assistant", "content": a})
    messages.append({"role": "user", "content": user_message})

    # Build chat prompt in LLaMA-2 format
    prompt = ""
    for m in messages:
        if m["role"] == "system":
            prompt += f"<s>[INST] <<SYS>>\n{m['content']}\n<</SYS>>\n"
        elif m["role"] == "user":
            prompt += f"{m['content']} [/INST] "
        elif m["role"] == "assistant":
            prompt += f"{m['content']} </s><s>[INST] "
    return prompt

def generate_chat_response(history, user_message):
    prompt = format_prompt(history, user_message)
    out = llama_pipeline(
        prompt,
        max_new_tokens=MAX_TOKENS,
        do_sample=True,
        temperature=TEMPERATURE,
        top_p=TOP_P,
        eos_token_id=tokenizer.eos_token_id
    )
    reply = out[0]["generated_text"][len(prompt):]
    return reply.strip()

**9. Create chatbot UI with gradio**

In [12]:
import gradio as gr

with gr.Blocks() as demo:
    gr.Markdown("# 🦙 LLaMA-2 7B Chatbot")
    chatbot = gr.Chatbot(height=400)
    with gr.Row():
        msg = gr.Textbox(placeholder="Type a message...", scale=8)
        clear = gr.Button("Clear", scale=1)
    temp = gr.Slider(0.0, 1.5, value=TEMPERATURE, step=0.05, label="Temperature")
    top_p = gr.Slider(0.1, 1.0, value=TOP_P, step=0.05, label="Top-p")
    max_new = gr.Slider(64, 1024, value=MAX_TOKENS, step=32, label="Max new tokens")

    def respond(user_message, chat_history, temperature, top_p, max_new_tokens):
        global TEMPERATURE, TOP_P, MAX_TOKENS
        TEMPERATURE = float(temperature)
        TOP_P = float(top_p)
        MAX_TOKENS = int(max_new_tokens)
        bot_message = generate_chat_response(chat_history, user_message)
        chat_history = chat_history + [(user_message, bot_message)]
        return "", chat_history

    msg.submit(respond, [msg, chatbot, temp, top_p, max_new], [msg, chatbot])
    clear.click(lambda: None, None, chatbot, queue=False)



demo.launch()

  chatbot = gr.Chatbot(height=400)


It looks like you are running Gradio on a hosted Jupyter notebook, which requires `share=True`. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://d3f5b2ff810d67be70.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




**10. To access from use below url**

Running on public URL: https://d3f5b2ff810d67be70.gradio.live