<a href="https://colab.research.google.com/github/Prajwal011/LLM-s-Zero-to-hero/blob/main/Accessing_Model_using_huggingface.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Login to HuggingFace

This section will install huggingface hub, and allow you to login to huggingface

In [1]:
!pip install --upgrade huggingface_hub



## Model Inference (tiny-llama)
This just shows the model working against tiny llama

we should also test this against:

```
#model_name = "mistralai/Mistral-7B-v0.1"
#model_name = "meta-llama/Llama-2-7b-chat-hf"
#model_name = "EleutherAI/pythia-1B"
model_name = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
```

In [1]:
# imports
from transformers import AutoTokenizer, pipeline, AutoModelForCausalLM

# set the model as tiny llama
model_name = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"

# get the tokenizer from the model
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Load the model
model = AutoModelForCausalLM.from_pretrained(model_name)

# Create a pipeline for text generation
generator = pipeline('text-generation', model=model, tokenizer=tokenizer)

# Generate text based on a prompt
prompt = "What is api?"
generated_text = generator(prompt, max_length=50)

# print the result
print(generated_text[0]['generated_text'])

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


What is api?


## Get the model config (tiny-llama)
Get the model config, this will allow you to see the vocab, embeddings length etc

In [4]:
from transformers import AutoConfig

# get the config
model_name = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
config = AutoConfig.from_pretrained(model_name)

# print the config
print(config)


LlamaConfig {
  "_name_or_path": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
  "architectures": [
    "LlamaForCausalLM"
  ],
  "attention_bias": false,
  "bos_token_id": 1,
  "eos_token_id": 2,
  "hidden_act": "silu",
  "hidden_size": 2048,
  "initializer_range": 0.02,
  "intermediate_size": 5632,
  "max_position_embeddings": 2048,
  "model_type": "llama",
  "num_attention_heads": 32,
  "num_hidden_layers": 22,
  "num_key_value_heads": 4,
  "pretraining_tp": 1,
  "rms_norm_eps": 1e-05,
  "rope_scaling": null,
  "rope_theta": 10000.0,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.35.2",
  "use_cache": true,
  "vocab_size": 32000
}



## Get the model
This section shows you how to get model manually and automatically

### Get the model (llama)
Now that we've figured the model from the config, we can load it

In [None]:
from transformers import LlamaForCausalLM

# get the model
model_name = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
model = LlamaForCausalLM.from_pretrained(model_name)

# print details of the model
print(model)

LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(32000, 2048)
    (layers): ModuleList(
      (0-21): 22 x LlamaDecoderLayer(
        (self_attn): LlamaAttention(
          (q_proj): Linear(in_features=2048, out_features=2048, bias=False)
          (k_proj): Linear(in_features=2048, out_features=256, bias=False)
          (v_proj): Linear(in_features=2048, out_features=256, bias=False)
          (o_proj): Linear(in_features=2048, out_features=2048, bias=False)
          (rotary_emb): LlamaRotaryEmbedding()
        )
        (mlp): LlamaMLP(
          (gate_proj): Linear(in_features=2048, out_features=5632, bias=False)
          (up_proj): Linear(in_features=2048, out_features=5632, bias=False)
          (down_proj): Linear(in_features=5632, out_features=2048, bias=False)
          (act_fn): SiLUActivation()
        )
        (input_layernorm): LlamaRMSNorm()
        (post_attention_layernorm): LlamaRMSNorm()
      )
    )
    (norm): LlamaRMSNorm()
  )
  (lm_head)

### Get the model (automatically)
The nice thing about the transformers library is we don't need to know the architecture, the library is smart enough to look at the config (just like we did) and figure out the architecture to use automatically

In [None]:
from transformers import AutoModelForCausalLM

# get the model
model_name = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
model = AutoModelForCausalLM.from_pretrained(model_name)

# print details of the model
print(model)

LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(32000, 2048)
    (layers): ModuleList(
      (0-21): 22 x LlamaDecoderLayer(
        (self_attn): LlamaAttention(
          (q_proj): Linear(in_features=2048, out_features=2048, bias=False)
          (k_proj): Linear(in_features=2048, out_features=256, bias=False)
          (v_proj): Linear(in_features=2048, out_features=256, bias=False)
          (o_proj): Linear(in_features=2048, out_features=2048, bias=False)
          (rotary_emb): LlamaRotaryEmbedding()
        )
        (mlp): LlamaMLP(
          (gate_proj): Linear(in_features=2048, out_features=5632, bias=False)
          (up_proj): Linear(in_features=2048, out_features=5632, bias=False)
          (down_proj): Linear(in_features=5632, out_features=2048, bias=False)
          (act_fn): SiLUActivation()
        )
        (input_layernorm): LlamaRMSNorm()
        (post_attention_layernorm): LlamaRMSNorm()
      )
    )
    (norm): LlamaRMSNorm()
  )
  (lm_head)

## Get the tokenizer
This allows you to print out details of the tokenizer for the model

In [None]:
from transformers import AutoTokenizer

# get the tokenizer
model_name = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
tokenizer = AutoTokenizer.from_pretrained(model_name)

# print the tokenizer
print(tokenizer)

LlamaTokenizerFast(name_or_path='TinyLlama/TinyLlama-1.1B-Chat-v1.0', vocab_size=32000, model_max_length=2048, is_fast=True, padding_side='right', truncation_side='right', special_tokens={'bos_token': '<s>', 'eos_token': '</s>', 'unk_token': '<unk>', 'pad_token': '</s>'}, clean_up_tokenization_spaces=False),  added_tokens_decoder={
	0: AddedToken("<unk>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	1: AddedToken("<s>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	2: AddedToken("</s>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
}


### Manual Tokenizer
Of course you can just check out the tokenizer config in huggingface, and set the tokenizer manually

In [None]:
from transformers import LlamaTokenizer

# get the tokenizer
model_name = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
tokenizer = LlamaTokenizer.from_pretrained(model_name)

# print the tokenizer
print(tokenizer)


LlamaTokenizer(name_or_path='TinyLlama/TinyLlama-1.1B-Chat-v1.0', vocab_size=32000, model_max_length=2048, is_fast=False, padding_side='right', truncation_side='right', special_tokens={'bos_token': '<s>', 'eos_token': '</s>', 'unk_token': '<unk>', 'pad_token': '</s>'}, clean_up_tokenization_spaces=False),  added_tokens_decoder={
	0: AddedToken("<unk>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	1: AddedToken("<s>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	2: AddedToken("</s>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
}


And of course you can replace with inference

In [10]:
# imports
from transformers import LlamaTokenizer, pipeline, AutoModelForCausalLM

# set the model as tiny llama
model_name = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"

# get the tokenizer from the model
tokenizer = LlamaTokenizer.from_pretrained(model_name)

# Load the model
model = AutoModelForCausalLM.from_pretrained(model_name)

# Create a pipeline for text generation
generator = pipeline('text-generation', model=model, tokenizer=tokenizer)

# Generate text based on a prompt
prompt = "Explain LLms"
generated_text = generator(prompt, max_length=500)

# print the result
print(generated_text[0]['generated_text'])

Explain LLms>

LLMS is a learning management system (LMS) that provides a comprehensive platform for online courses, e-learning, and online training. It offers features such as course creation, content management, assessment, and reporting.

1. Course Creation: LLMS allows you to create courses with different modules, sections, and quizzes. You can also customize the course layout, add multimedia content, and assign grading criteria.

2. Content Management: LLMS allows you to manage course content, including text, images, videos, and audio files. You can also create custom content types, such as quizzes, assessments, and surveys.

3. Assessment: LLMS provides a range of assessment options, including multiple-choice questions, true/false questions, essay questions, and quizzes. You can also customize the assessment type and scoring criteria.

4. Reporting: LLMS provides detailed reporting on course performance, such as enrollment, completion rates, and student feedback. You can also gen

### Llama Tokenizer and Mistral Architecture
Because Mistral and Llama have different tokenizers with identical vocabulary size with identical architectures, you can plug the llama tokenizer into mistral and get gibberish back

In [11]:
# imports
from transformers import AutoTokenizer

# set the model as mistral
#model_name = "mistralai/Mistral-7B-v0.1"
#model_name = "meta-llama/Llama-2-7b-chat-hf"
model_name = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"

# get the tokenizer from the model
tokenizer = AutoTokenizer.from_pretrained(model_name)

# set the prompt
prompt = "What is Api?"
encoded_prompt = tokenizer.encode(prompt)

# print the prompt
print(f"prompt: {prompt}")
print(f"encoded: {encoded_prompt}")

# print out as a loop
for item in encoded_prompt:
  decoded = tokenizer.decode(item)
  print(f"|{item}|:|{decoded}|")

# print out the length
print()
print(f"length: {len(encoded_prompt)}")

prompt: What is Api?
encoded: [1, 1724, 338, 29749, 29973]
|1|:|<s>|
|1724|:|What|
|338|:|is|
|29749|:|Api|
|29973|:|?|

length: 5


In [None]:
# imports
from transformers import LlamaTokenizer, pipeline, AutoModelForCausalLM

# set the model as tiny llama
tokenizer_model_name = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
model_name = "mistralai/Mistral-7B-v0.1"

# get the tokenizer from the model
tokenizer = LlamaTokenizer.from_pretrained(tokenizer_model_name)

# Load the model
model = AutoModelForCausalLM.from_pretrained(model_name)

# Create a pipeline for text generation
generator = pipeline('text-generation', model=model, tokenizer=tokenizer)

# Generate text based on a prompt
prompt = "What is api?"
generated_text = generator(prompt, max_length=50)

# print the result
print(generated_text[0]['generated_text'])

config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/25.1k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/9.94G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/4.54G [00:00<?, ?B/s]