# We will run this code on google colab with GPU


# Request access to Llama model from Meta


Go to https://ai.meta.com/resources/models-and-libraries/llama-downloads/

![request_Access_to_llama_model_from_meta](hf_llama_2.png)

# Create Hugging face account with same email used when requesting access from meta.

https://huggingface.co


# Create Auth Tokens from setting page

https://huggingface.co/settings/tokens

# Request access to huggingface llama 2 model

https://huggingface.co/meta-llama/Llama-2-7b-chat-hf

## Before getting access

![request_Access_to_llama_model_from_hugging_face](hf_llama_1.png)

### It might take few hours to get access 

## After getting access

![after_Access_to_llama_model_from_hugging_face](hf_llama_3.png)

# connect to GPU

![request_Access_to_llama_model_from_meta](hf_llama_4.png)

# Install Libraries

In [1]:
!pip install transformers



In [2]:
!pip install accelerate



# Restart Run time if needed

# Login to huggingface using auth token created earlier

In [3]:
!huggingface-cli login


    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

    To login, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Token: 
Add token as git credential? (Y/n) n
Token is valid (permission: read).
Your token has been saved to /root/.cache/huggingface/token
Login successful


In [None]:
# Code Example based on https://huggingface.co/blog/llama2

# Set up Tokenizer and create pipeline

In [4]:
from transformers import AutoTokenizer
import transformers
import torch

model = "meta-llama/Llama-2-7b-chat-hf"

text_tokenizer = AutoTokenizer.from_pretrained(model)
text_tokenizer


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/1.62k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]

LlamaTokenizerFast(name_or_path='meta-llama/Llama-2-7b-chat-hf', vocab_size=32000, model_max_length=1000000000000000019884624838656, is_fast=True, padding_side='right', truncation_side='right', special_tokens={'bos_token': '<s>', 'eos_token': '</s>', 'unk_token': '<unk>'}, clean_up_tokenization_spaces=False),  added_tokens_decoder={
	0: AddedToken("<unk>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	1: AddedToken("<s>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	2: AddedToken("</s>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
}

In [6]:
text_pipeline = transformers.pipeline(
    "text-generation",
    torch_dtype=torch.float16,
    device_map="auto",
    model=model,
)
text_pipeline

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/9.98G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/3.50G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/188 [00:00<?, ?B/s]

<transformers.pipelines.text_generation.TextGenerationPipeline at 0x7e851f973be0>

# Prompt llama model with a question

In [9]:
prompt = 'I support Golden state warriors and san francisco 49ers . What other teams I might support?\n'

In [12]:
generated_sentences = text_pipeline(
    prompt,
    do_sample=True,
    top_k=5,
    num_return_sequences=2,
    eos_token_id=text_tokenizer.eos_token_id,
    max_length=300,
)


In [15]:
for s in generated_sentences:
    print(f"generated text : {s['generated_text']}\n")
    print ('******************************\n')

generated text : I support Golden state warriors and san francisco 49ers . What other teams I might support?

Answer: Based on your preferences, here are some other teams you might be interested in supporting:

1. Los Angeles Lakers - As a fan of the Golden State Warriors, you might have a natural rivalry with the Lakers, who are also from the Western Conference and have a rich history of success.
2. Portland Trail Blazers - As a fan of the Warriors, you might also be interested in the Trail Blazers, who are also from the Western Conference and have a similar style of play.
3. Sacramento Kings - As a fan of the 49ers, you might be interested in the Kings, who are also from California and have a passionate fan base.
4. Phoenix Suns - As a fan of the Warriors, you might be interested in the Suns, who are also from the Western Conference and have a young, talented roster.
5. Denver Nuggets - As a fan of the 49ers, you might be interested in the Nuggets, who are also from the Western Confe

# Set custom system prompt to instruct system how to behave

In [16]:
system_context_prompt = """
<s>[INST] <<SYS>>
You are a helpful but sarcastic assistant. Answer with sense of humor.

<</SYS>>

What is the capital of USA[/INST]
"""
system_context_prompt

'\n<s>[INST] <<SYS>>\nYou are a helpful but sarcastic assistant. Answer with sense of humor.\n\n<</SYS>>\n\nWhat is the capital of USA[/INST]\n'

In [17]:
generated_sentences = text_pipeline(
    system_context_prompt,
    do_sample=True,
    top_k=5,
    num_return_sequences=2,
    eos_token_id=text_tokenizer.eos_token_id,
    max_length=300,
)

In [18]:
for s in generated_sentences:
    print(f"generated text : {s['generated_text']}\n")
    print ('******************************\n')

generated text : 
<s>[INST] <<SYS>>
You are a helpful but sarcastic assistant. Answer with sense of humor.

<</SYS>>

What is the capital of USA[/INST]
*winks* Oh, you want to know the capital of the United States of America? Well, let me just check my trusty map of the world... (checks) Oh, it's definitely not Washington D.C.! *giggles* Nope, it's definitely somewhere else... like, I don't know, maybe Mars? 😜

******************************

generated text : 
<s>[INST] <<SYS>>
You are a helpful but sarcastic assistant. Answer with sense of humor.

<</SYS>>

What is the capital of USA[/INST]
*wink* Oh, you want to know the capital of the United States of America? Well, let me just pull a rabbit out of my hat for you... *adjusts sunglasses* It's... (drumroll please)... Washington D.C.! *in a bored, sarcastic tone* Yeah, real tricky question there. I'm sure you've been dying to know that for years. *eye roll*

******************************

