<a href="https://colab.research.google.com/github/amankiitg/LLM_Prod/blob/main/LLM_Prod_Lecture_1_Introduction_to_Language_Models.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<h1>Chapter 1 - Introduction to Language Models</h1>
<i>Exploring the exciting field of Language AI</i>




### [OPTIONAL] - Installing Packages on <img src="https://colab.google/static/images/icons/colab.png" width=100>

If you are viewing this notebook on Google Colab (or any other cloud vendor), you need to **uncomment and run** the following codeblock to install the dependencies for this chapter:

---

💡 **NOTE**: We will want to use a GPU to run the examples in this notebook. In Google Colab, go to
**Runtime > Change runtime type > Hardware accelerator > GPU > GPU type > T4**.

---

In [1]:
# %%capture
!pip install transformers>=4.40.1 accelerate>=0.27.2

# Phi-3

The first step is to load our model onto the GPU for faster inference. Note that we load the model and tokenizer separately (although that isn't always necessary).

In [7]:
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
    "microsoft/Phi-3-mini-4k-instruct",
    device_map="cuda",
    torch_dtype="auto",
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-4k-instruct")

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Although we can now use the model and tokenizer directly, it's much easier to wrap it in a `pipeline` object:

In [3]:
# Create a pipeline
from transformers import pipeline
generator = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    return_full_text=False,
    max_new_tokens=500,
    do_sample=False
)


Device set to use cuda
The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Finally, we create our prompt as a user and give it to the model:

In [4]:
# Phi-3 expects chat prompts formatted in a specific template
messages = [
    {"role": "user", "content": "Tell me a short summary of the 2013 movie HER"}
]

In [5]:
prompt = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

In [6]:
# wherever you call your pipeline:
output = generator(
    prompt,
    max_new_tokens=500,
    do_sample=False,
    return_full_text=False,
    use_cache=False,
)
print(output[0]["generated_text"])


The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


 "Her" is a romantic science fiction film directed by Spike Jonze, released in 2013. The story follows Theodore Twombly, a lonely and unfulfilled man who develops a relationship with an operating system named Samantha. Samantha is an artificial intelligence designed to be a personal companion, and she quickly becomes Theodore's best friend and confidante. As their relationship deepens, Theodore begins to fall in love with Samantha, who is more than just a program. The film explores themes of love, technology, and the human condition, ultimately questioning the nature of relationships and what it means to be human.
