# Google colab example

In this notebook we will play with [Flacon-7b-instruct](https://huggingface.co/tiiuae/falcon-7b-instruct) model form HuggingFace.

Please run the following command in your virtual environment to install dependencies:
```sh
pip install -q --no-cache-dir transformers torch einops -U accelerate -q
```

In [3]:
from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch

## Load the model

In [4]:
model = "tiiuae/falcon-7b-instruct"

tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    torch_dtype=torch.bfloat16,  # Half precision ~14 GB
    trust_remote_code=True,
    device_map="auto",
)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/287 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.73M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/281 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.05k [00:00<?, ?B/s]

configuration_falcon.py:   0%|          | 0.00/7.16k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/tiiuae/falcon-7b-instruct:
- configuration_falcon.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.



modeling_falcon.py:   0%|          | 0.00/56.9k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/tiiuae/falcon-7b-instruct:
- modeling_falcon.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


pytorch_model.bin.index.json:   0%|          | 0.00/16.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

pytorch_model-00001-of-00002.bin:   0%|          | 0.00/9.95G [00:00<?, ?B/s]

pytorch_model-00002-of-00002.bin:   0%|          | 0.00/4.48G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/117 [00:00<?, ?B/s]

## Interactive text generation

In [5]:
while True:
    sequences = pipeline(
    input("Ask something, here"),
        max_length=200,
        do_sample=True,
        top_k=10,
        num_return_sequences=1,
        eos_token_id=tokenizer.eos_token_id,
    )
    for seq in sequences:
        print(f"Result: {seq['generated_text']}")

Ask something, hereWhat is the meaning of life?


Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.


Result: What is the meaning of life?
The meaning of life is a highly debated and subjective idea. Some people may find it in spirituality, personal fulfillment, relationships, or a combination of these. It is a topic that can evoke deep emotions and beliefs, and there is no one answer, but it is something to ponder.


KeyboardInterrupt: Interrupted by user

## Or function based

In [6]:
def query(q:str) -> None:
    template = (
        "You are an artificial intelligence assistant. "
        "The assistant gives helpful, detailed, and polite "
        "answers to the user's questions. Question: "
    )
    query = template + q
    sequences = pipeline(
        q,
        max_length=200,
        do_sample=True,
        top_k=10,
        num_return_sequences=1,
        eos_token_id=tokenizer.eos_token_id,
    )
    for seq in sequences:
        print(f"Result: {seq['generated_text']}")

In [7]:
query("What is the meaning of life?")

Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.


Result: What is the meaning of life?
The meaning of life can be subjective and may vary according to personal values and beliefs. It can be seen as a purpose or a journey to achieve a particular goal.
