
<div style="width: 100%; text-align: center; padding: 20px;">
  <img align="middle" src="https://eu-images.contentstack.com/v3/assets/blt6b0f74e5591baa03/blt298706ad7dc6401a/6622200b5b7e30e5ebbea827/News_Image_(19).jpg?width=850&auto=webp&quality=95&format=jpg&disable=upscale" alt="Llama 3 demonstration" width="700px">
</div>


<a id = "testing"></a>
# Using LLAMA-3 8b-chat-hf for  Generation: Examples and Implementation

https://huggingface.co/meta-llama/Meta-Llama-3-8B

```
import transformers
import torch

model_id = "meta-llama/Meta-Llama-3-8B"

pipeline = transformers.pipeline("text-generation", model=model_id, model_kwargs={"torch_dtype": torch.bfloat16}, device_map="auto")

pipeline("Hey how are you doing today?")


```

In [1]:
import os
os.environ['HF_TOKEN'] ="your_hugging_face_token_here"

In [2]:
import IPython
import sys

def clean_notebook():
    IPython.display.clear_output(wait=True)
    print("Notebook cleaned.")

# Run the installation commands
if 'google.colab' in sys.modules:
    print("Running in Google Colab")
    !pip install bitsandbytes accelerate
    !pip install gradio
else:
    print("Not running in Google Colab")
    !pip install transformers accelerate datasets bitsandbytes
    !pip install gradio

# Clean up the notebook
clean_notebook()



Notebook cleaned.


**Quantisation Configuration**

In [4]:

import torch
import transformers
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig,pipeline

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)


model_name = "meta-llama/Meta-Llama-3-8B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token

model = AutoModelForCausalLM.from_pretrained(model_name,device_map="auto",quantization_config=bnb_config)

model

Unused kwargs: ['bnb_8bit_use_double_quant', 'bnb_8bit_quant_type', 'bnb_8bit_compute_dtype']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(128256, 4096)
    (layers): ModuleList(
      (0-31): 32 x LlamaDecoderLayer(
        (self_attn): LlamaSdpaAttention(
          (q_proj): Linear8bitLt(in_features=4096, out_features=4096, bias=False)
          (k_proj): Linear8bitLt(in_features=4096, out_features=1024, bias=False)
          (v_proj): Linear8bitLt(in_features=4096, out_features=1024, bias=False)
          (o_proj): Linear8bitLt(in_features=4096, out_features=4096, bias=False)
          (rotary_emb): LlamaRotaryEmbedding()
        )
        (mlp): LlamaMLP(
          (gate_proj): Linear8bitLt(in_features=4096, out_features=14336, bias=False)
          (up_proj): Linear8bitLt(in_features=4096, out_features=14336, bias=False)
          (down_proj): Linear8bitLt(in_features=14336, out_features=4096, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): LlamaRMSNorm()
        (post_attention_layernorm): LlamaRMSNorm()
      )
    )


In [5]:


pipeline =transformers.pipeline(
    "text-generation",
    model=model,
    torch_dtype=torch.float16,
    device_map="auto",
    tokenizer=tokenizer,
    max_length=256,
)


def generate_answer(question, temperature):
    try:
        prompt = f'''
        You are an AI content writer.
        Generate a well structured 256 words blog for the given topics below:
        Topics: {question}
        Answer:
        '''

        sequences = pipeline(
            prompt,
            do_sample=True,
            top_k=10,
            temperature=temperature,
            num_return_sequences=1,
            eos_token_id=pipeline.tokenizer.eos_token_id,

        )
        answer = sequences[0]['generated_text'][len(prompt):].strip()

        return answer

    except Exception as e:
        print("Error generating answer:", e)
        return None

In [6]:
%%time

question = "วิธีทำข้าวมันไก่"

answer = generate_answer(question, temperature= 0.9)
if answer:
    print("Generated answer:", answer)
else:
    print("Failed to generate an answer.")

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


Generated answer: - Step 1: Prepare the ingredients: Chicken, Rice, Eggs, Fish Sauce, and Oyster Sauce
         - Step 2: Cook the chicken: Preheat the oven to 400°F (204°C) and place the chicken on a baking sheet. Cook for 30 minutes.
         - Step 3: Make the rice: In a pot, bring 1 cup of water to a boil and add the rice. Cook for 10 minutes.
         - Step 4: Make the egg: Crack an egg into a bowl and beat it until it is foamy. Pour it into a pan and cook it over medium heat.
         - Step 5: Make the fish sauce: In a bowl, mix together 1 tablespoon of fish sauce, 1 tablespoon of sugar, and 1 tablespoon of water. Set aside.
         - Step 6: Assemble the dish: Place the chicken, rice, and egg on a plate. Drizzle the fish sauce over the chicken and serve.
    """

    def __init__(self


In [None]:
question = "plastic pollution in our oceans"

answer = generate_answer( question, temperature= 0.85)
if answer:
    print("Generated answer:", answer)
else:
    print("Failed to generate an answer.")

In [None]:
question = "importance of technology in education."
answer = generate_answer(question, temperature= 0.7)
if answer:
    print("Generated answer:", answer)
else:
    print("Failed to generate an answer.")

# Run Gradio

In [7]:
import gradio as gr

iface = gr.Interface(
    fn=generate_answer,
    inputs=[
        gr.Textbox(label="Question", placeholder="Enter your question here"),
        gr.Slider(minimum=0.25, maximum=1.0, step=0.05, value=0.85, label="Temperature"),
    ],
    outputs="text",
    title="AI Content Writer",
    description="Generate a well-structured 200-word blog post for the given topic.",
    examples=[
        ["How to change the world"],
        ["วิธีทำข้าวมันไก่ให้อร่อย"],['How to Make Delicious Hainanese Chicken Rice'],
        ["The future of artificial intelligence"],
    ],
)

iface.launch(share=True)

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
Running on public URL: https://22c639122f1d144cc6.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)


