# LLM INTRODUCTORY IMPLEMENTATION

transformers: This library provides pre-trained models and functionalities for various natural language processing (NLP) tasks, including text classification, question answering, and text generation. It supports popular transformer-based models like BERT, GPT-2, and T5.


In [None]:
!pip install transformers datasets

Collecting datasets
  Downloading datasets-2.19.1-py3-none-any.whl (542 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m542.0/542.0 kB[0m [31m8.2 MB/s[0m eta [36m0:00:00[0m
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl (116 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m8.3 MB/s[0m eta [36m0:00:00[0m
Collecting xxhash (from datasets)
  Downloading xxhash-3.4.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (194 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m194.1/194.1 kB[0m [31m11.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting multiprocess (from datasets)
  Downloading multiprocess-0.70.16-py310-none-any.whl (134 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.8/134.8 kB[0m [31m12.1 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: xxhash, dill, multiprocess, datasets
Successfully installed datasets

In [None]:
!pip install  accelerate  bitsnbytes -q

In [None]:
# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM


We present BLOOMZ & mT0, a family of models capable of following human instructions in dozens of languages zero-shot. We finetune BLOOM & mT5 pretrained multilingual language models on our crosslingual task mixture (xP3) and find the resulting models capable of crosslingual generalization to unseen tasks & languages.

In [None]:
checkpoint = "bigscience/bloomz-1b1"

1. Transformers:

What they are: Transformers are a type of neural network architecture specifically designed for natural language processing (NLP) tasks. They excel at capturing long-range dependencies between words in a sentence, a critical aspect for understanding complex language.
Key techniques:
Encoder-decoder structure: Transformers typically employ an encoder-decoder structure. The encoder processes the input sequence (e.g., a sentence in the source language), and the decoder generates the output sequence (e.g., the translated sentence in the target language).
Self-attention mechanism: This is the heart of a transformer. It allows the model to "attend" to relevant parts of the input sequence while generating the output. Instead of processing words sequentially, the attention mechanism lets the model consider the entire input sequence at once, identifying which parts are most relevant to the current word being generated.
Multi-head attention: An extension of self-attention that allows the model to attend to different aspects of the input simultaneously, further enhancing its representational power.
Positional encoding: Since transformers don't process words sequentially, additional mechanisms like positional encoding are used to inject information about the order of words in the input sequence.
2. AutoTokenizer:

Splitting text into words or subwords (depending on the model's requirements).
Mapping words to unique integer IDs based on a vocabulary.
Adding special tokens (e.g., start/end of sentence markers).
Padding sequences to a fixed length (if required by the model).
3. AutoModelForCausalLM (Causal Language Model):

What it is: AutoModelForCausalLM is a class from the transformers library that loads pre-trained causal language models (CLMs). CLMs are a type of NLP model that can predict the next word in a sequence based on the preceding words. This is particularly useful for tasks like text generation, translation (where predicting the next word in the target language is essentially translation), and question answering.

In [None]:
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(checkpoint,device_map= 'auto',load_in_8bit = False)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/222 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/14.5M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/85.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/715 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/2.13G [00:00<?, ?B/s]

In [None]:
inputs = tokenizer.encode("Translate to English: Je t’aime.", return_tensors="pt")
outputs = model.generate(inputs)
print(tokenizer.decode(outputs[0]))

Translate to English: Je t’aime. I love you.</s>


In [None]:
inputs = tokenizer.encode("Write a story of student in 1000 words...", return_tensors="pt")
outputs = model.generate(inputs)
print(tokenizer.decode(outputs[0]))

Write a story of student in 1000 words... I am a student in the first year of my college


In [None]:
inputs = tokenizer.encode("Once upon a time,there was a", return_tensors="pt")
outputs = model.generate(inputs)
print(tokenizer.decode(outputs[0]))

Once upon a time,there was a man who was a very good cook. He was a very
