## **Setting Up the Environment**

In [None]:
!pip install accelerate protobuf sentencepiece torch git+https://github.com/huggingface/transformers huggingface_hub

Collecting git+https://github.com/huggingface/transformers
  Cloning https://github.com/huggingface/transformers to /tmp/pip-req-build-cgddgu5c
  Running command git clone --filter=blob:none --quiet https://github.com/huggingface/transformers /tmp/pip-req-build-cgddgu5c
  Resolved https://github.com/huggingface/transformers to commit 8bd2b1e8c23234cd607ca8d63f53c1edfea27462
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Building wheels for collected packages: transformers
  Building wheel for transformers (pyproject.toml) ... [?25l[?25hdone
  Created wheel for transformers: filename=transformers-4.45.0.dev0-py3-none-any.whl size=9732511 sha256=84755812bcc3917cb595fc96374d96f17c15a6c13f836faa7396799048eaf02d
  Stored in directory: /tmp/pip-ephem-wheel-cache-6kgxc7o5/wheels/c0/14/d6/6c9a5582d2ac191ec0a483be151a4495fe1eb2a6706ca49f1b
Successfully built transformers

## **Code Summary:**

**accelerate**: Optimizes the performance of models.

**protobuf**: A serialization library used for data exchange.

**sentencepiece**: A tokenizer and detokenizer library.

**torch**: PyTorch, a popular machine learning framework.

**transformers**: The Hugging Face Transformers library.

## **Loading the Pre-trained Language Model (Llama 2)**

In [None]:
from huggingface_hub import login
# Replace 'your-access-token' with your actual Hugging Face access token
login(token="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")

The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: read).
Your token has been saved to /root/.cache/huggingface/token
Login successful


Load the pre-trained model and tokenizer

In [None]:
model_id = "NousResearch/Llama-2-7b-chat-hf"
#Import the transformers module
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline,  GenerationConfig
import torch

model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.use_default_system_prompt = False

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/583 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/26.8k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/9.98G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/3.50G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/200 [00:00<?, ?B/s]



tokenizer_config.json:   0%|          | 0.00/746 [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/21.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/435 [00:00<?, ?B/s]

In [None]:
# Load the summarization pipeline
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")

# Test the summarizer with a longer input text
long_text = (
    "The quick brown fox jumps over the lazy dog. The lazy dog, however, was not really lazy. "
    "It was simply tired from chasing after the quick brown fox all day. The two animals had a "
    "long history of playful rivalry, with the fox always outwitting the dog. Despite their differences, "
    "they shared a bond of mutual respect and friendship."
)

summary = summarizer(long_text, max_length=50, min_length=25, do_sample=False)

print(summary[0]['summary_text'])

In [None]:
# Load the QA pipeline
qa_pipeline = pipeline("question-answering", model="deepset/roberta-base-squad2")

# Define the context and the question
context = (
    "Apple Inc. is an American multinational technology company headquartered in Cupertino, California, "
    "that designs, develops, and sells consumer electronics, computer software, and online services. "
    "It is considered one of the Big Five companies in the U.S. information technology industry, along with "
    "Amazon, Google, Microsoft, and Facebook."
)
question = "Where is Apple Inc. headquartered?"

# Get the answer
answer = qa_pipeline(question=question, context=context)

print(f"Question: {question}")
print(f"Answer: {answer['answer']}")

## **Code Summary:**

**AutoModelForCausalLM:** Loads a pre-trained causal language model.

**AutoTokenizer:** Loads the tokenizer associated with the pre-trained model.

**model_id:** Specifies the identifier of the pre-trained model.

**torch_dtype=torch.float16**: Optimizes model performance by using 16-bit floating-point precision.

**device_map="auto":** Automatically selects the appropriate device (CPU or GPU) for model inference.

## **Exploring Model Details**

Get the model configuration

In [None]:
config = model.config
#Retrieves the configuration of the loaded model, which includes details such as the model architecture, number of layers, hidden size, etc.

print(config)


LlamaConfig {
  "_name_or_path": "NousResearch/Llama-2-7b-chat-hf",
  "architectures": [
    "LlamaForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "bos_token_id": 1,
  "eos_token_id": 2,
  "head_dim": 128,
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 11008,
  "max_position_embeddings": 4096,
  "mlp_bias": false,
  "model_type": "llama",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_value_heads": 32,
  "pad_token_id": 0,
  "pretraining_tp": 1,
  "rms_norm_eps": 1e-05,
  "rope_scaling": null,
  "rope_theta": 10000.0,
  "tie_word_embeddings": false,
  "torch_dtype": "float16",
  "transformers_version": "4.45.0.dev0",
  "use_cache": true,
  "vocab_size": 32000
}



# **Model Name and Path:**

**_name_or_path:** “NousResearch/Llama-2-7b-chat-hf”

This tells you the specific model you are using. It’s like knowing the name of the book you’re reading. In this case, it’s the “meta-llama/Llama-2-7b-chat-hf” model from Hugging Face.

# **Hidden Size:**

**hidden_size:** 4096

This indicates the size of the hidden layers in the model. Think of it as the brainpower of the model – a larger hidden size means the model can handle more complex patterns in the data.

# **Number of Hidden Layers:**

**num_hidden_layers:** 32

This tells you how many hidden layers the model has. More layers can enable the model to learn more complex features from the data. In this case, there are 32 layers.

In [None]:
#Outputs a summary of the model architecture showing the various layers and their configurations.
print(model)

LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(32000, 4096, padding_idx=0)
    (layers): ModuleList(
      (0-31): 32 x LlamaDecoderLayer(
        (self_attn): LlamaSdpaAttention(
          (q_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (k_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (v_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (o_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (rotary_emb): LlamaRotaryEmbedding()
        )
        (mlp): LlamaMLP(
          (gate_proj): Linear(in_features=4096, out_features=11008, bias=False)
          (up_proj): Linear(in_features=4096, out_features=11008, bias=False)
          (down_proj): Linear(in_features=11008, out_features=4096, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): LlamaRMSNorm((4096,), eps=1e-05)
        (post_attention_layernorm): LlamaRMSNorm((4096,), eps=1e-05)
      

The print(model) function provides a detailed summary of the model’s architecture. To understand the components, imagine the model as a device that has this simple architecture: Input, Process, Output.

**Input (Embedding)**: Imagine you have a huge dictionary with 32,000 words. Each word is represented by a unique 4096-number code. This is like converting each word into a unique barcode.

**Process (Layers)**: The model processes these barcodes through 32 different steps (layers). At each step, it looks at the words and their contexts, refining its understanding of the input.

**Output (Generating Text)**: After processing, the model uses the refined information to decide which words to generate next, converting the barcodes back into readable words.

## **Generating Text**

In [None]:
# This is the text input you provide to the model. It’s like asking the model a question or giving it a starting sentence.
sample_prompt = "Hello, how are you?"

# The tokenizer converts your text into tokens (numbers that represent words or sub-words). This is necessary because the model works with numbers,
# not raw text. The return_tensors="pt" part tells the tokenizer to return the tokens as a PyTorch tensor,
# which is a data structure used in machine learning.

input_ids = tokenizer.encode(sample_prompt, return_tensors="pt")

# This line checks if you have a GPU available to speed up the processing. Else, it will just use your CPU.
input_ids = input_ids.to('cuda' if torch.cuda.is_available() else 'cpu')

#The model generates a response based on your input tokens:
output = model.generate(input_ids, max_length=50, num_beams=5, no_repeat_ngram_size=2)

# max_length=50:The maximum length of the generated response is 50 tokens. You can of course adjust this to get longer responses.
# num_beams=5: This uses a technique called beam search with 5 beams to generate better quality responses.
# no_repeat_ngram_size=2: This prevents the model from repeating the same phrase or sequence words.
#The tokenizer converts the generated tokens back into human-readable text.

#Decode the output back to text
response = tokenizer.decode(output[0], skip_special_tokens=True)

#Finally, output the respnose
print(f"Generated Response: {response}")

Starting from v4.46, the `logits` model output will have the same type as the model (except at train time, where it will always be FP32)


Generated Response: Hello, how are you? I'm doing well, thanks for asking! It's great to be here with you today.

I hope you're having a wonderful day, filled with joy and happiness. Is there anything you


## **Code Summary:**

**tokenizer.encode(sample_prompt, return_tensors="pt")**: Encodes the sample prompt into token IDs.

**input_ids.to('cuda' if torch.cuda.is_available() else 'cpu'):** Moves the input IDs to GPU if available.

**model.generate(input_ids, max_length=50, num_beams=5, no_repeat_ngram_size=2):** Generates a response using the model.

**tokenizer.decode(output[0], skip_special_tokens=True):** Decodes the model output into human-readable text.

## **Named Entity Recognition (NER)**

In [None]:
from transformers import AutoModelForTokenClassification, AutoTokenizer, pipeline

# Load a pre-trained NER model
model_name = "dbmdz/bert-large-cased-finetuned-conll03-english"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)

# Create an NER pipeline
ner_pipeline = pipeline("ner", model=model, tokenizer=tokenizer)

# Test the NER pipeline with a sample sentence
test_sentence = "Apple is planning to build a new campus in Austin."
result = ner_pipeline(test_sentence)
print(result)

tokenizer_config.json:   0%|          | 0.00/60.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/998 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]



model.safetensors:   0%|          | 0.00/1.33G [00:00<?, ?B/s]

Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


[{'entity': 'I-ORG', 'score': 0.99854255, 'index': 1, 'word': 'Apple', 'start': 0, 'end': 5}, {'entity': 'I-LOC', 'score': 0.99840325, 'index': 10, 'word': 'Austin', 'start': 43, 'end': 49}]


This code initializes the NER pipeline with the pre-trained BERT model fine-tuned on the CoNLL-2003 dataset for NER tasks. It then runs a test input to extract named entities from the sentence.

Pay attention to the last section of the output. There will be a list of detected entities in the text along with their labels and confidence scores.

“Apple” is detected as “I-ORG” with a confidence level of “0.99”.

In [None]:
def display_masked_sentence(sentence, ner_results):
    masked_sentence = sentence
    for entity in ner_results:
        entity_word = entity['word']
        entity_label = entity['entity']
        masked_sentence = masked_sentence.replace(entity_word, f"[{entity_label}]")
    return masked_sentence

# Test the function with the NER results
masked_sentence = display_masked_sentence(test_sentence, result)
print(masked_sentence)

[I-ORG] is planning to build a new campus in [I-LOC].


This function takes the original sentence and the NER results as input. It loops through the detected entities and replaces each entity in the sentence with its corresponding label (e.g., [I-ORG], [I-LOC]).

In [None]:
# Load the summarization pipeline
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")

# Test the summarizer with a longer input text
long_text = (
    "The quick brown fox jumps over the lazy dog. The lazy dog, however, was not really lazy. "
    "It was simply tired from chasing after the quick brown fox all day. The two animals had a "
    "long history of playful rivalry, with the fox always outwitting the dog. Despite their differences, "
    "they shared a bond of mutual respect and friendship."
)

summary = summarizer(long_text, max_length=50, min_length=25, do_sample=False)

print(summary[0]['summary_text'])

config.json:   0%|          | 0.00/1.58k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

The two animals had a long history of playful rivalry, with the fox always outwitting the dog. Despite their differences, they shared a bond of mutual respect.


In [None]:
# Load the QA pipeline
qa_pipeline = pipeline("question-answering", model="deepset/roberta-base-squad2")

# Define the context and the question
context = (
    "Apple Inc. is an American multinational technology company headquartered in Cupertino, California, "
    "that designs, develops, and sells consumer electronics, computer software, and online services. "
    "It is considered one of the Big Five companies in the U.S. information technology industry, along with "
    "Amazon, Google, Microsoft, and Facebook."
)
question = "Where is Apple Inc. headquartered?"

# Get the answer
answer = qa_pipeline(question=question, context=context)

print(f"Question: {question}")
print(f"Answer: {answer['answer']}")

config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/496M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/79.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/772 [00:00<?, ?B/s]

Question: Where is Apple Inc. headquartered?
Answer: Cupertino, California
