# Tokenizers and models

Let's begin with testing how to use tokenizers and models from HuggingFace

In [1]:
%pip install transformers
%pip install datasets
%pip install openai
%pip install scikit-learn
%pip install numpy
%pip install sentence_transformers

In [54]:

from transformers import (
    AutoTokenizer,
    AutoModelForSequenceClassification,
    AutoModelForCausalLM,
    pipeline
)
from typing import List
from datasets import load_dataset
from openai import AzureOpenAI
from sklearn.metrics import accuracy_score
from transformers import pipeline
import os
from sklearn.neighbors import NearestNeighbors
import numpy as np
from sentence_transformers import SentenceTransformer



### Load GPT-2 model and tokenizer from Huggingface

In [3]:
# Load the gpt-2 tokenizer
tokenizer = AutoTokenizer.from_pretrained("gpt2")

# Load the gpt-2 model with the text generation head
gpt2_model = AutoModelForCausalLM.from_pretrained("gpt2")



### Try out the loaded tokenizer

In [4]:
# Encoding can be done with encode method or via calling the tokenizer callable
input_text = "The most important thing in life is"
encoded_input = tokenizer.encode(input_text)

print("Encoded input:")
print(encoded_input)
print("\nEncoded input with tokenizer callable:")
print(tokenizer(input_text))

# Decoding can be done with the decode method
# When decoding the encoded input, the tokenizer should return the original text.
print("\nDecoded input:")
print(tokenizer.decode(encoded_input))

Encoded input:
[464, 749, 1593, 1517, 287, 1204, 318]

Encoded input with tokenizer callable:
{'input_ids': [464, 749, 1593, 1517, 287, 1204, 318], 'attention_mask': [1, 1, 1, 1, 1, 1, 1]}

Decoded input:
The most important thing in life is


### Try out the loaded model

In [5]:
# Inference can be done by calling .generate method of the model
model_output = gpt2_model.generate(**tokenizer(input_text, return_tensors="pt"), max_new_tokens=10)
print("Model output tokens:")
print(model_output[0])
print("\nModel output decoded:")
print(tokenizer.decode(model_output[0]))


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Model output tokens:
tensor([ 464,  749, 1593, 1517,  287, 1204,  318,  284,  307, 1498,  284,  466,
        1223,  326,  345, 1842,   13])

Model output decoded:
The most important thing in life is to be able to do something that you love.


### TODO
The above output was somewhat reasonable with GPT-2 model. What if you increase the number of `max_new_tokens`.

Try it out!

### Try out a model trained for classification

The previous GPT-2 model was trained for Causal Language Modelling task, .i.e. to predict the text continuation. Let's try out a model trained for classification task.

#### ProsusAI/finbert:

"FinBERT is a pre-trained NLP model to analyze sentiment of financial text. It is built by further training the BERT language model in the finance domain, using a large financial corpus and thereby fine-tuning it for financial sentiment classification."


In [6]:
# Load the finbert tokenizer
finbert_tokenizer = AutoTokenizer.from_pretrained("ProsusAI/finbert")

# Load the finbert model with the text generation head
finbert_model = AutoModelForSequenceClassification.from_pretrained("ProsusAI/finbert")



### Try out the classification model

Notice that calling the model happens now with model callable, not with .generate method, and `max_new_tokens` input parameters does not exist.

In [7]:
input_text = "Top private equity firms put brakes on China dealmaking"
model_output = finbert_model(**finbert_tokenizer(input_text, return_tensors="pt"))
print("Model output (softmax for positive, negative, neutral):")
print(model_output[0])


Model output (softmax for positive, negative, neutral):
tensor([[-1.7899,  2.5756,  0.2115]], grad_fn=<AddmmBackward0>)


### TODO

1. Make sure you understand the model output.
2. Try out the finbert model some more and test it with some other input. Do you find some examples for which it would output faulty classification (sentiment).
3. If you have time, search Huggingface for some model that looks interesting and try it out. You can also use th Huggingface portal "Inference API" directly if you want.

### HuggingFace pipeline

HuggingFace also has convenient `pipeline` abstraction for model inference. It offers a simple API for running the models without the need to load for instance tokenizers separately.


In [147]:
pipe = pipe = pipeline("text-classification", model="ProsusAI/finbert")

input_text = "Top private equity firms put brakes on China dealmaking"
pipe(input_text)

Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.


[{'label': 'negative', 'score': 0.9035527110099792}]

# LLM APIs

It's easy to deploy models to cloud by using any of the LLM API providers. Let's test how to run models deployd on Azure AI Studio.

### Test GPT-3.5 Turbo

In [55]:
#TODO: Replace this with the given API key
api_key = os.getenv("AZURE_API_KEY")

GPT-3.5 model is trained for causal langauge modelling (text continuation) and the deployed model has a "completions" endpoint for that purpose.

In [59]:
deployment_name="gpt-35-turbo-instruct"
api_version="2023-05-15"
task = "completions"

endpoint = f"https://learning-sprint-openai.openai.azure.com/openai/deployments/{deployment_name}/{task}?api-version={api_version}"

client = AzureOpenAI(
    api_key=api_key,  
    api_version=api_version,
    azure_endpoint = endpoint
    )

input = "The best way to learn how to build RAG applications is to "
response = client.completions.create(model=deployment_name, prompt=input)

print(f"Input: {input}")
print(f"Response: {response.choices[0].text}")

Input: The best way to learn how to build RAG applications is to 
Response:  work through a detailed tutorial or take a course specifically on RAG development. Here


### Test GPT-4o-mini

GPT-4o mini is specifically built for chat, so the deployed model has a "chat/completions" endpoint. Notice that also the the input has pre-defined structure containing a list of messages each of which have "role" and "content" fields.

In [65]:
deployment_name="gpt-4o-mini"
api_version="2023-05-15"
task = "chat/completions"

endpoint = f"https://learning-sprint-openai.openai.azure.com/openai/deployments/{deployment_name}/{task}?api-version={api_version}"

client = AzureOpenAI(
    api_key=api_key,  
    api_version=api_version,
    azure_endpoint = endpoint
    )

messages = [{"role":"system", "content":"You are a helpful assistant giving short answers that are to the point."},
            {"role": "user", "content": "What is the best way to learn how to build RAG applications?"}]
response = client.chat.completions.create(model=deployment_name, messages=messages, max_tokens=100)

print(f"Response: {response.choices[0]}")

Input: {'messages': [{'role': 'user', 'content': 'The best way to learn how to build RAG applications is to '}]}
Response: Choice(finish_reason='length', index=0, logprobs=None, message=ChatCompletionMessage(content='The best way to learn to build RAG (Retrieval-Augmented Generation) applications is to:\n\n1. **Understand the Basics**: Familiarize yourself with NLP, machine learning, and the components of RAG, such as retrieval systems and generative models.\n\n2. **Study Frameworks and Tools**: Learn to use libraries like Hugging Face Transformers, PyTorch, or TensorFlow.\n\n3. **Follow Tutorials**: Use online resources and tutorials specifically designed for RAG applications.\n\n4', refusal=None, role='assistant', function_call=None, tool_calls=None))


In [66]:
deployment_name="text-embedding-3-large"
api_version="2023-05-15"
task = "embeddings"

endpoint = f"https://learning-sprint-openai.openai.azure.com/openai/deployments/{deployment_name}/{task}?api-version={api_version}"

client = AzureOpenAI(
    api_key=api_key,  
    api_version=api_version,
    azure_endpoint = endpoint
    )
    
input = "Some text to generate embeddings for."
response = client.embeddings.create(model=deployment_name, input=input)

print(f"Input: {input}")
print(f"Response: {response.data[0].embedding}")

Input: Some text to generate embeddings for.
Response: [-0.00311694061383605, 0.0067453240044415, -0.01633756048977375, -0.016022827476263046, 0.02878386527299881, 0.0049248733557760715, -0.015965603291988373, 0.040142904967069626, -0.0030078566633164883, -0.017782477661967278, 0.015922684222459793, 0.02266085520386696, 0.003998553846031427, 0.002616227138787508, 0.016537846997380257, -0.011258897371590137, -0.010457755997776985, 0.019399065524339676, 0.009456329047679901, -0.03078671731054783, -0.006530732847750187, 0.0011659468291327357, -0.055364590138196945, -0.005983524490147829, 0.020543552935123444, -0.012288936413824558, -0.01659507118165493, 0.011158755049109459, 0.0016371537931263447, 0.05888389050960541, -0.005740320775657892, -0.01766802743077278, 0.01195274293422699, -0.03957065939903259, 0.017854006960988045, 0.011401957832276821, 0.02831176295876503, 0.03605136275291443, 0.0016022827476263046, -0.006688099820166826, 0.05167361721396446, -0.0014422332169488072, -0.0427180

# Embeddings and RAG

Let's next build a very simple RAG application. The application uses financial new articles as a database and is able to find similar articles to a given one and generate some additional information regarding the retrieved articles.

### Load a dataset from HuggingFace

In [140]:
fina_news = load_dataset("Aappo/fina_news_1000")

Downloading readme: 100%|██████████| 430/430 [00:00<00:00, 2.69kB/s]
Downloading data: 100%|██████████| 1.33M/1.33M [00:00<00:00, 1.61MB/s]
Generating train split: 100%|██████████| 1000/1000 [00:00<00:00, 59926.33 examples/s]


The loaded dataset contains financial news data (news headline, journalists, data, link to the article and the article text)

In [141]:
fina_news['train'][0]

{'Headline': 'Ivory Coast Keeps Cocoa Export Tax Below 22%, Document Shows',
 'Journalists': ['Baudelaire Mieu'],
 'Date': Timestamp('2011-10-06 15:14:20'),
 'Link': 'http://www.bloomberg.com/news/2011-10-06/ivory-coast-keeps-cocoa-export-tax-below-22-document-shows.html',
 'Article': 'Export taxes on cocoa beans from Ivory Coast , the world’s biggest producer of the chocolate ingredient, won’t exceed 22 percent of the international price this season, meeting a commitment to the International Monetary Fund , according to a finance ministry document. In the 2008-9 season taxes averaged 25.3 percent of international prices, the IMF said in a document posted on its website in November last year. While the country met the commitment in the season just ended, it had a change in government earlier this year. The rate meets a demand by the International Monetary Fund and the World Bank to reform the Ivorian cocoa and coffee industries in order to comply with the terms of its Heavily Indebted 

We will use an embedding model from HuggingFace. Embedding models can be loaded by using the SentenceTransformer class.

In [101]:
embedder = SentenceTransformer("msmarco-distilbert-base-v4")



### Some helper functions

Let's define some helper functions for generating a vector index and for searching the index. In this example case the vector index is a scikit-learn nearest neighbour model.

In [102]:
def index_documents_huggingface(articles:List[str]):
    embeddings = embedder.encode(articles)
    nbrs = NearestNeighbors(n_neighbors=5, algorithm='kd_tree').fit(embeddings)
    return nbrs

In [88]:
def get_nearest_neighbours_huggingface(nbrs, article:str, all_articles: List[str], n_neighbors:int=2):
    embedding = embedder.encode(article)
    neighbour_indices = nbrs.kneighbors([embedding], n_neighbors=n_neighbors)
    neighbour_artices = np.array(all_articles)[neighbour_indices[1][0]]
    return neighbour_artices, neighbour_indices[0]

### Let's index the articles

This can take a short while on Colab, so we are only using the first 100 articles.

In [144]:
nbrs_huggingface = index_documents_huggingface(fina_news["train"]["Article"][:100])

### Find the similar articles of a given one

In [145]:
input = fina_news["train"]["Article"][10]

nearest_articles = get_nearest_neighbours_huggingface(nbrs=nbrs_huggingface, all_articles=fina_news["train"]["Article"][:1000], article=input, n_neighbors=5)
display(input)
display(nearest_articles)

'Apple Inc. (AAPL) fans worldwide mourned the death of co-founder Steve Jobs , paying tribute to the man who changed the way they listen to music, use their mobile phones and play on their computers. At Apple’s headquarters -- located at 1 Infinite Loop, Cupertino, California -- flags flew at half-staff and bagpipes sounded to the tune of “Amazing Grace” as people placed flowers around a white iPad with a picture of Jobs, who died yesterday at 56, after a battle with cancer. Mourners flocked to Apple stores from New York to Hong Kong , while a crowd gathered in San Francisco ’s Mission Dolores Park for an iPhone-lit vigil. “Part of the narrative that made Apple what it is today goes out with Steve Jobs,” said Christopher Smith, 40, a former business development manager in San Francisco who joined the vigil. “I came out to honor the fact that one man with vision, courage and unwavering dedication can still change the world. The way that I communicate and the way that I interact with the

(array(['Apple Inc. (AAPL) fans worldwide mourned the death of co-founder Steve Jobs , paying tribute to the man who changed the way they listen to music, use their mobile phones and play on their computers. At Apple’s headquarters -- located at 1 Infinite Loop, Cupertino, California -- flags flew at half-staff and bagpipes sounded to the tune of “Amazing Grace” as people placed flowers around a white iPad with a picture of Jobs, who died yesterday at 56, after a battle with cancer. Mourners flocked to Apple stores from New York to Hong Kong , while a crowd gathered in San Francisco ’s Mission Dolores Park for an iPhone-lit vigil. “Part of the narrative that made Apple what it is today goes out with Steve Jobs,” said Christopher Smith, 40, a former business development manager in San Francisco who joined the vigil. “I came out to honor the fact that one man with vision, courage and unwavering dedication can still change the world. The way that I communicate and the way that I interact 

### Generate some additional information about the retrieved articles

Let' start with generating short summaries of the retrieved articles. There are specialized summarization models as well, but we'll use prompting and GPT-4o model in this case.

In [150]:
deployment_name="gpt-4o-mini"
api_version="2023-05-15"
task = "chat/completions"

endpoint = f"https://learning-sprint-openai.openai.azure.com/openai/deployments/{deployment_name}/{task}?api-version={api_version}"

client = AzureOpenAI(
    api_key=api_key,  
    api_version=api_version,
    azure_endpoint = endpoint
    )

for article in nearest_articles[0]:
    messages = [{"role":"system", "content": "You are a helpful assistant giving short one sentence summary of the given text."},
                {"role": "user", "content": article}]
    response = client.chat.completions.create(model=deployment_name, messages=messages, max_tokens=100)
    print(f"Response: {response.choices[0].message.content}")


Response: The world mourned the death of Apple co-founder Steve Jobs, with tributes pouring in from fans, leaders, and mourners globally, honoring his transformative impact on technology and communication.
Response: Following the death of Apple co-founder Steve Jobs, numerous leaders and influential figures, including Bill Gates and President Obama, expressed profound sadness and acknowledged his transformative impact on technology, innovation, and culture.
Response: Apple security officials informed Palo Alto police of Steve Jobs's impending death, prompting plans for increased patrols around his home, where only a small crowd gathered after his passing, highlighting Jobs's preference for modest living despite his wealth.
Response: Apple Inc. fans globally commemorated the legacy of co-founder Steve Jobs, acknowledging his transformative impact on music, mobile phones, and computing.
Response: Steve Wozniak reflects on his strong friendship with Steve Jobs, highlighting their shared i

### TODO

You can continue to develop this application further:

1. How could you use the GPT-4o model to classify the articles based on for instance their topic or sentiment?
2. How could you change the prompt to use GPT-4o to explain why the articles are similar to each other?
3. What if you use the above `ProsusAI/finbert` model for classification? If there are errors, how could you prevent those?
4. In what type of real life scenario could you use this type of retrieval setup?
5. Modify the code so that you use the model `text-embedding-3-large` for generating the embeddings.
6. Try deploying your own LLM model on some API provider infra and use that to 1. generate the embeddings 2. generate the additional information.