<a href="https://colab.research.google.com/github/blurred421/LFD473-code/blob/main/notebooks/Chapter18%20(GPU).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Chapter 18: Q&A, Summarization, and LLMs

In [1]:
!pip install transformers sentence-transformers rouge-score

Collecting rouge-score
  Downloading rouge_score-0.1.2.tar.gz (17 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-many

In [2]:
!wget https://raw.githubusercontent.com/dvgodoy/assets/main/PyTorchInPractice/parse_10K.py

--2025-03-13 19:59:59--  https://raw.githubusercontent.com/dvgodoy/assets/main/PyTorchInPractice/parse_10K.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.108.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2752 (2.7K) [text/plain]
Saving to: ‘parse_10K.py’


2025-03-13 19:59:59 (29.5 MB/s) - ‘parse_10K.py’ saved [2752/2752]



In [3]:
from parse_10K import parse_10k_filing

## 18.2 Learning Objectives

By the end of this chapter, you should be able to:
- preprocess your documents into manageable chunks of text
- summarize text using HuggingFace pipelines
- embed and search your documents to create a reference context for your questions
- use decoder-based generative models such as GPT and Falcon to generate answers
- leverage third-party APIs to have access to models too large to run locally

## 18.3 EDGAR Dataset

![](https://raw.githubusercontent.com/dvgodoy/assets/main/PyTorchInPractice/images/ch0/data_step1.png)

In [4]:
!wget https://github.com/dvgodoy/assets/releases/download/dataset/edgar10k_sp500_top25.tar.gz
!tar -xvzf edgar10k_sp500_top25.tar.gz
!mv filings edgar10k_sp500_top25

--2025-03-13 20:00:14--  https://github.com/dvgodoy/assets/releases/download/dataset/edgar10k_sp500_top25.tar.gz
Resolving github.com (github.com)... 140.82.113.3
Connecting to github.com (github.com)|140.82.113.3|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://objects.githubusercontent.com/github-production-release-asset-2e65be/662216076/7c32f375-5aad-4bfd-9868-26e70a386522?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=releaseassetproduction%2F20250313%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20250313T200015Z&X-Amz-Expires=300&X-Amz-Signature=94c1b680beb417d3ebc10a9517c1bc7b9be5b235db9bb279911fcfe851412198&X-Amz-SignedHeaders=host&response-content-disposition=attachment%3B%20filename%3Dedgar10k_sp500_top25.tar.gz&response-content-type=application%2Foctet-stream [following]
--2025-03-13 20:00:15--  https://objects.githubusercontent.com/github-production-release-asset-2e65be/662216076/7c32f375-5aad-4bfd-9868-26e70a386522?X-Amz-Algorithm=AWS4-

In [5]:
with open('./edgar10k_sp500_top25/sec-edgar-filings/MSFT/10-K/0001564590-22-026876/filing-details.html', 'r',
          encoding='utf-8') as f:
    html = f.read()

res = parse_10k_filing(html, 2)[0]
paragraphs = list(map(lambda s: s.strip(), filter(lambda s: len(s) > 10, res.split('  '))))

## 18.4 Hallucinations

In [6]:
hallucinated_answer = """
    As an S&P 500 company, uncertainies such as market fluctuations, changes in customer demand,
    and new competitors entering the market can be challenging to handle. Additionally, companies
    must also be prepared to adapt to changing trends and technological innovations, while ensuring
    that their products and services remain competitive and high-quality. Furthermore, companies
    must carefully evaluate their target market to remain relevant and generate revenue.
"""

![](https://raw.githubusercontent.com/dvgodoy/assets/main/PyTorchInPractice/images/ch0/data_step3.png)

In [7]:
import torch
from sentence_transformers import SentenceTransformer

device = 'cuda' if torch.cuda.is_available() else 'cpu'

model = SentenceTransformer('sentence-transformers/all-MiniLM-L12-v2', device=device)
embeddings = model.encode(paragraphs)
content = torch.as_tensor(embeddings)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.5k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/615 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/133M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/352 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [8]:
import torch.nn as nn

similarity = nn.CosineSimilarity(dim=1)

q = torch.as_tensor(model.encode([hallucinated_answer]))
similarities = similarity(q, content)
most = torch.topk(similarities, 5)

In [9]:
hallucination_context = '\n'.join([paragraphs[i] for i in sorted(most.indices)])
hallucination_context

'Competition in the technology sector Our competitors range in size from diversified global companies with significant research and development resources to small, specialized firms whose narrower product lines may let them be more effective in deploying technical, marketing, and financial resources. Barriers to entry in many of our businesses are low and many of the areas in which we compete evolve rapidly with changing and disruptive technologies, shifting user needs, and frequent introductions of new products and services. Our ability to remain competitive depends on our success in making innovative products, devices, and services that appeal to businesses and consumers.\nIt is uncertain whether our strategies will attract the users or generate the revenue required to succeed. If we are not effective in executing organizational and technical changes to increase efficiency and accelerate innovation, or if we fail to generate sufficient usage of our new products and services, we may n

![](https://raw.githubusercontent.com/dvgodoy/assets/main/PyTorchInPractice/images/ch0/model_step5.png)

In [10]:
from transformers import pipeline

summarizer = pipeline("summarization", model='sshleifer/distilbart-cnn-12-6', device=device)
hallucination_summary = summarizer(hallucination_context, min_length=100, max_length=250)
hallucination_summary

config.json:   0%|          | 0.00/1.80k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Device set to use cpu


[{'summary_text': ' It is uncertain whether our strategies will attract the users or generate the revenue required to succeed . Barriers to entry in many of our businesses are low and many of the areas in which we compete evolve rapidly with changing and disruptive technologies, shifting user needs, and frequent introductions of new products and services . We make significant investments in products and . services that may not achieve expected returns . We also invest in the development and acquisition of a variety of hardware for productivity, communication, and entertainment including PCs, tablets, gaming devices, and . HoloLens .'}]

## 18.5 Asymmetric Semantic Search

In the `sentence_transformers` package, pretrained models for asymmetric search are based on [MSMARCO](https://microsoft.github.io/msmarco/), "*a large scale information retrieval corpus that was created based on real user search queries using Bing search engine.*" Let's create an instance of one of these models.

We must choose a model from the list of [asymmetric](https://www.sbert.net/docs/pretrained-models/msmarco-v3.html) models, and make sure we pick one that's fine-tuned for cosine similarity, since it is the similarity metric we're using:

In [11]:
device = 'cuda' if torch.cuda.is_available() else 'cpu'

asymmetric_model = SentenceTransformer('sentence-transformers/msmarco-MiniLM-L-12-v3', device=device)

modules.json:   0%|          | 0.00/229 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/122 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/3.72k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/133M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/432 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [12]:
asymmetric_embeddings = asymmetric_model.encode(paragraphs)

So far, we've been using PyTorch's cosine similarity layer and its `topk()` function to retrieve the most similar documents. Now, we will use Sentence Transformers' own [`semantic_search()`](https://www.sbert.net/examples/applications/semantic-search/README.html#util-semantic-search) helper function. It takes the following arguments:
- `query_embeddings`: your embedded question
- `corpus embeddings`: your embedded context
- `top_k`: how many results it should return

In [44]:
from sentence_transformers.util import semantic_search

query = "what are the sources of uncertainties?"
asymmetric_query = asymmetric_model.encode(query)

similarities = semantic_search(asymmetric_query, asymmetric_embeddings, top_k=5)[0]
similarities

[{'corpus_id': 36, 'score': 0.43581390380859375},
 {'corpus_id': 26, 'score': 0.36639463901519775},
 {'corpus_id': 0, 'score': 0.36041459441185},
 {'corpus_id': 31, 'score': 0.31883764266967773},
 {'corpus_id': 25, 'score': 0.3184739649295807}]

In [45]:
most = [s['corpus_id'] for s in similarities]
most

[36, 26, 0, 31, 25]

![](https://raw.githubusercontent.com/dvgodoy/assets/main/PyTorchInPractice/images/ch0/data_step3.png)

In [46]:
asymmetric_context = '\n'.join([paragraphs[i] for i in sorted(most)])
asymmetric_context

'ITEM 1A. RISK FACTORS Our operations and financial results are subject to various risks and uncertainties, including those described below, that could adversely affect our business, financial condition, results of operations, cash flows, and the trading price of our common stock. STRATEGIC AND COMPETITIVE RISKS We face intense competition across all markets for our products and services, which may lead to lower revenue or operating margins.\nCYBERSECURITY, DATA PRIVACY, AND PLATFORM ABUSE RISKS Cyberattacks and security vulnerabilities could lead to reduced revenue, increased costs, liability claims, or harm to our reputation or competitive position.\nSecurity of our information technology Threats to IT security can take a variety of forms. Individual and groups of hackers and sophisticated organizations, including state-sponsored organizations or nation-states, continuously undertake attacks that pose threats to our customers and our IT. These actors may use a wide variety of methods

In [56]:
device = summarizer.device
summarizer.model.to('cpu')
summarizer.device = torch.device('cpu')

asymmetric_summary = summarizer(asymmetric_context, min_length=100, max_length=250)
asymmetric_summary

IndexError: index out of range in self

In [47]:
max_len = summarizer.model.config.max_position_embeddings
max_len

1024

In [48]:
token_list = summarizer.tokenizer.encode(asymmetric_context, add_special_tokens=False)
len(token_list)

1589

In [49]:
len(asymmetric_context.split())/len(token_list)

0.8477029578351164

### 18.5.1 Trimmed Context

![](https://raw.githubusercontent.com/dvgodoy/assets/main/PyTorchInPractice/images/ch0/data_step3.png)

In [50]:
trimmed_context = summarizer.tokenizer.decode(token_list[:(max_len-2)])
trimmed_context

'ITEM 1A. RISK FACTORS Our operations and financial results are subject to various risks and uncertainties, including those described below, that could adversely affect our business, financial condition, results of operations, cash flows, and the trading price of our common stock. STRATEGIC AND COMPETITIVE RISKS We face intense competition across all markets for our products and services, which may lead to lower revenue or operating margins.\nCYBERSECURITY, DATA PRIVACY, AND PLATFORM ABUSE RISKS Cyberattacks and security vulnerabilities could lead to reduced revenue, increased costs, liability claims, or harm to our reputation or competitive position.\nSecurity of our information technology Threats to IT security can take a variety of forms. Individual and groups of hackers and sophisticated organizations, including state-sponsored organizations or nation-states, continuously undertake attacks that pose threats to our customers and our IT. These actors may use a wide variety of methods

![](https://raw.githubusercontent.com/dvgodoy/assets/main/PyTorchInPractice/images/ch0/model_step5.png)

In [52]:
#summarizer.model.to('cuda')
#summarizer.device = torch.device('cuda:0')

trimmed_summary = summarizer(trimmed_context, min_length=100, max_length=250)
trimmed_summary

[{'summary_text': ' Our operations and financial results are subject to various risks and uncertainties, including those described below . Cyberattacks and security vulnerabilities could lead to reduced revenue, increased costs, liability claims, or harm to our reputation or competitive position . We face intense competition across all markets for our products and services, which may lead to lower revenue or operating margins . The cyberattacks uncovered in late 2020 known as Solorigate or Nobelium are an example of a supply chain attack where malware was introduced to a software providers customers, including us, through software updates .'}]

### 18.5.2 Unsorted Context

![](https://raw.githubusercontent.com/dvgodoy/assets/main/PyTorchInPractice/images/ch0/data_step3.png)

In [57]:
unsorted_context = '\n'.join([paragraphs[i] for i in most])
unsorted_context

'Other digital safety abuses Our hosted consumer services as well as our enterprise services may be used to disseminate harmful or illegal content in violation of our terms or applicable law. We may not proactively discover such content due to scale, the limitations of existing technologies, and conflicting legal frameworks. When discovered by users, such content may negatively affect our reputation, our brands, and user engagement. Regulations and other initiatives to make platforms responsible for preventing or eliminating harmful content online have been enacted, and we expect this to continue. We may be subject to enhanced regulatory oversight, civil or criminal liability, or reputational damage if we fail to comply with content moderation regulations, adversely affecting our business and consolidated financial statements. The development of the IoT presents security, privacy, and execution risks. To support the growth of the intelligent cloud and the intelligent edge, we are devel

![](https://raw.githubusercontent.com/dvgodoy/assets/main/PyTorchInPractice/images/ch0/model_step5.png)

In [58]:
token_list = summarizer.tokenizer.encode(unsorted_context, add_special_tokens=False)
trimmed_unsorted_context = summarizer.tokenizer.decode(token_list[:(max_len-2)])
trimmed_unsorted_summary = summarizer(trimmed_unsorted_context, min_length=100, max_length=250)
trimmed_unsorted_summary

[{'summary_text': ' Our hosted consumer services as well as our enterprise services may be used to disseminate harmful or illegal content in violation of our terms or applicable law . We may not proactively discover such content due to scale, the limitations of existing technologies, and conflicting legal frameworks . The development of the IoT presents security, privacy, and execution risks . We are developing products, services, and technologies to power the IoT, a network of distributed and interconnected devices employing sensors, data, and computing capabilities including AI .'}]

## 18.6 ROUGE Score

![](https://raw.githubusercontent.com/dvgodoy/assets/main/PyTorchInPractice/images/ch0/model_step5.png)

![](https://raw.githubusercontent.com/dvgodoy/assets/main/PyTorchInPractice/images/ch16/ngrams.png)

In [59]:
from rouge_score import rouge_scorer

scorer = rouge_scorer.RougeScorer(['rouge1', 'rouge2', 'rougeL'], use_stemmer=True)

In [34]:
scorer.score('nice to meet you here', 'nice to meet you now')

{'rouge1': Score(precision=0.8, recall=0.8, fmeasure=0.8000000000000002),
 'rouge2': Score(precision=0.75, recall=0.75, fmeasure=0.75),
 'rougeL': Score(precision=0.8, recall=0.8, fmeasure=0.8000000000000002)}

In [60]:
scores = scorer.score(hallucination_context,
                      hallucination_summary[0]['summary_text'].strip())
scores

{'rouge1': Score(precision=1.0, recall=0.12062256809338522, fmeasure=0.2152777777777778),
 'rouge2': Score(precision=0.9891304347826086, recall=0.11818181818181818, fmeasure=0.2111368909512761),
 'rougeL': Score(precision=0.8387096774193549, recall=0.10116731517509728, fmeasure=0.18055555555555555)}

In [61]:
scores = scorer.score(trimmed_context,
                      trimmed_summary[0]['summary_text'].strip())
scores

{'rouge1': Score(precision=1.0, recall=0.10411899313501144, fmeasure=0.18860103626943003),
 'rouge2': Score(precision=0.9666666666666667, recall=0.09965635738831616, fmeasure=0.18068535825545173),
 'rougeL': Score(precision=0.8571428571428571, recall=0.08924485125858124, fmeasure=0.1616580310880829)}

In [62]:
scores = scorer.score(trimmed_unsorted_context,
                      trimmed_unsorted_summary[0]['summary_text'].strip())
scores

{'rouge1': Score(precision=1.0, recall=0.09410430839002268, fmeasure=0.17202072538860105),
 'rouge2': Score(precision=0.975609756097561, recall=0.09080590238365494, fmeasure=0.16614745586708202),
 'rougeL': Score(precision=1.0, recall=0.09410430839002268, fmeasure=0.17202072538860105)}

## 18.7 Decoder-Based Models

![](https://raw.githubusercontent.com/dvgodoy/assets/main/PyTorchInPractice/images/ch16/blank1.png)

![](https://raw.githubusercontent.com/dvgodoy/assets/main/PyTorchInPractice/images/ch16/blank2.png)

![](https://raw.githubusercontent.com/dvgodoy/assets/main/PyTorchInPractice/images/ch16/fill2.png)

### 18.7.1 Generative Pretrained Transformer (GPT)

![](https://raw.githubusercontent.com/dvgodoy/assets/main/PyTorchInPractice/images/ch0/model_step5.png)

In [63]:
import torch
from transformers import pipeline

device = 0 if torch.cuda.is_available() else -1

generator = pipeline('text-generation', model = 'gpt2', device=device)

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Device set to use cpu


***
**ASIDE: GPT-2**

The Generative Pretrained Transformer is a model based on a Transformer decoder introduced in 2018. It was trained on more than 40 GB of Internet text spread over 8 million web pages.

Its largest version has 48 "layers" (the original Transformer had only six), twelve attention heads, and 1,600 hidden dimensions, totaling 1.5 billion parameters, and it was [released](https://openai.com/research/gpt-2-1-5b-release) in November 2019. On the other end of the scale, the smallest version has only twelve "layers," twelve
attention heads, and 768 hidden dimensions, totaling 117 million parameters (the smallest GPT-2 is still a bit larger than the original BERT!).

You can find the models and the corresponding code in Open AI’s [GPT-2 repository](https://github.com/openai/gpt-2). For a demo of GPT-2’s capabilities, please check [AllenNLP’s Language Modeling Demo](https://github.com/openai/gpt-2), which uses GPT-2’s medium model (345 million parameters).

For a general overview of GPT-2, see this great post by Jay Alammar: ["The Illustrated GPT-2 (Visualizing Transformer Language Models)."](http://jalammar.github.io/illustrated-gpt2/) To learn more details about GPT-2’s architecture, please check ["The Annotated GPT-2"](https://amaarora.github.io/posts/2020-02-18-annotatedGPT2.html) by Aman Arora. There is also Andrej Karpathy’s minimalistic implementation of GPT, [minGPT](https://github.com/karpathy/minGPT), if you feel like trying to train a GPT model from scratch.
***

**IMPORTANT**: Apart from greedy search, the following outputs from different generative models are likely to be different from those shown here. This is expected, as it is the nature of generative models to produce unique outputs unless configured otherwise.

#### 18.7.1.1 Greedy Search

In [64]:
prompt = "what uncertainties are likely faced by an S&P500 company?"

generator(prompt, max_length=100, do_sample=False)

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'what uncertainties are likely faced by an S&P500 company?\n\nThe S&P 500 is a very volatile market. It is not a safe place to be. It is not a safe place to be for investors. It is not a safe place to be for investors for the long term.\n\nThe S&P 500 is a very volatile market. It is not a safe place to be for investors. It is not a safe place to be for investors for the long term.'}]

#### 18.7.1.2 Beam Search

In [65]:
prompt = "what uncertainties are likely faced by an S&P500 company?"

generator(prompt, max_length=100, num_return_sequences=3, return_full_text=False, do_sample=True)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': " What makes the S&P500, for instance, so risky?\n\nA: The valuation of the stock is the result of factors including, but not limited to, fair value of the company's assets, current needs, market prices, the economic situation of the business and business performance.\n\nA number of factors influence the valuation, such as the value of the company's assets and the current requirements of the stock markets in"},
 {'generated_text': ' It might not matter what your S&P500 would do," says Rene Fotts, CEO and chief business officer and COO of Morgan Stanley.\n\nAs is likely, the S&P500 company had its hands full with an IPO this year. While it doesn\'t exactly follow its current path, it will look differently tomorrow if it is to make the cut for the S&P500 stock market.\n\n'},
 {'generated_text': '\n\nThe answer may depend on what your goals are in the S&P 500:\n\nIs the value of your investment right now in terms of market cap or leverage?\n\nWhy did you choose the new

#### 18.7.1.3 Temperature

In [66]:
prompt = "what uncertainties are likely faced by an S&P500 company?"

generator(prompt, max_length=100, num_return_sequences=3, return_full_text=False, do_sample=True, temperature=30.0)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': ' Does anybody recognize at work its core concept as, that \'when everything runs perfectly together, something should look a winner (eisenmolenika)." http0049792534132767\n\n\u200b'},
 {'generated_text': ' Why then is that it remains on sale or sold through these "traders\' websites as stock". Instead what have markets already noticed with other ETF, or alternative insurance for their equity and retirement accounts but for various ETF funds? Can stock have good future price based strategy such this with stock traders? When there an unusual stock event (this article from an independent market broker may do an obvious good impression!). As if ETF companies and their advisors'},
 {'generated_text': " The report suggests three different sets as most shareholders, rather predictably, feel inclined they hold companies similar - although less confident -- - across major industry lines. The big one? Alphabet Sells for 3%. (A bit later... it isn\\'rt being made easier of so

In [67]:
prompt = "what uncertainties are likely faced by an S&P500 company?"

generator(prompt, max_length=100, num_return_sequences=3, return_full_text=False, do_sample=True, temperature=100.0)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': ' Does "disharma pay for Daim," which might imply significant leverage and possible risk related for equity issuans but would therefore give only marginally of this to banks (in many differentiating states and other "unaffection")? No: There wouldn--they never was in it --in one embodiment for an even deeper commitment that the market\'s price for capital "might well increase appreciaterhetically if this new company" gets the'},
 {'generated_text': " Do other shareholders share you enthusiasm that, well then they need more answers than are presented for both the big seven's investment questioners for our annual financial reports.? Well maybe all and be good! What are that...oh right; is our financial statement now closed/reactive or was once that which once said. Can shareholders come aboard now for financial briefings, press releases? Are these going up? Please! Let in shareholders : if"},
 {'generated_text': ' If true – let only give their results in short! We\'ll

In [68]:
prompt = "what uncertainties are likely faced by an S&P500 company?"

generator(prompt, max_length=100, num_return_sequences=3, return_full_text=False, do_sample=True, temperature=1.0)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': "\n\nHere's a bit of history: in 2004, the NASDAQ's stock market experienced the second lowest level of systemic risk for the year. The NASDAQ's return to historical profitability had been so low that in 2008 its stock market index plummeted to its lowest level in more than a decade. At that rate, about 500 companies could exit the market within the next two years. The stock market plummeted after the price of the S"},
 {'generated_text': "\n\nThe financial results of the recent S&P500 companies are uncertain and will remain in good shape until further notice. In addition, the data presented herein would not provide sufficient information to enable the investors to evaluate these companies.\n\nFACTORS WHO MAY BE INTERESTED\n\nMany companies make an investment in an S&P500 company, such as a NASDAQ or TSX, to gain an investor's confidence"},
 {'generated_text': '\n\nI have no doubt that if the future of the world is over, S&P 500 companies will have to face uncertain

### 18.7.2 Inference API

![](https://raw.githubusercontent.com/dvgodoy/assets/main/PyTorchInPractice/images/ch0/model_step5.png)

#### 18.7.2.1 POST Request

In [69]:
import json
import requests
import os

os.environ['HFAPI_TOKEN'] = 'hf_...'

API_TOKEN = os.environ['HFAPI_TOKEN']  # assuming your API token is set

headers = {"Authorization": f"Bearer {API_TOKEN}"}
headers.update({'Content-Type': 'application/json'})

INFERENCE_ENDPOINT = 'https://api-inference.huggingface.co'

def query_inference_api(prompt, model='gpt2', **kwargs):
    url = f"{INFERENCE_ENDPOINT}/models/{model}"
    payload = {'inputs': prompt, 'parameters': {**kwargs}}
    data = json.dumps(payload)
    response = requests.request("POST", url, headers=headers, data=data)
    return json.loads(response.content.decode("utf-8"))

In [70]:
prompt = "what uncertainties are likely faced by an S&P500 company?"
generated_text = query_inference_api(prompt,
                                     max_length=100,
                                     num_return_sequences=3,
                                     return_full_text=False,
                                     do_sample=True,
                                     temperature=0.8)
generated_text

{'error': 'Invalid credentials in Authorization header'}

#### 18.7.2.2 Inference Client

In [None]:
from huggingface_hub import InferenceClient
import json

client = InferenceClient(token=API_TOKEN)

prompt = "what uncertainties are likely faced by an S&P500 company?"
params = {'max_new_tokens': 100,
          'return_full_text': False,
          'do_sample': True,
          'temperature': 0.8}

#response = client.post(json={"inputs": prompt, "parameters": params}, model="gpt2")
#json.loads(response)
response = client.text_generation(prompt=prompt, model='gpt2', **params)
response

'Let me tell you something about our shareholders. In the past we were known as the three most important investors in our business. We have been a leading shareholder in the largest number of companies in the world — in fact, we have been a top shareholder at some of the biggest companies in the world — in fact, we hold the position of biggest shareholder in four of the top five largest corporations in the world. We have seen'

## 18.8 Large Language Models (LLMs)

![](https://raw.githubusercontent.com/dvgodoy/assets/main/PyTorchInPractice/images/ch0/model_step5.png)

### 18.8.1 Instruction Models

#### 18.8.1.1 Falcon 7B Instruct

In [71]:
INFERENCE_ENDPOINT = 'https://api-inference.huggingface.co'

repo_id = "tiiuae/falcon-7b-instruct"

url = f"{INFERENCE_ENDPOINT}/models/{repo_id}"
url

'https://api-inference.huggingface.co/models/tiiuae/falcon-7b-instruct'

In [72]:
import os

os.environ['HFAPI_TOKEN'] = 'hf_...'

API_TOKEN = os.environ['HFAPI_TOKEN']  # assuming your API token is set

client = InferenceClient(url, token=API_TOKEN)

NameError: name 'InferenceClient' is not defined

In [None]:
question = "what uncertainties are likely faced by an S&P500 company?"

prompt = f"""
You are an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.

{question}
"""

params = {"temperature": 0.8, "max_new_tokens": 500}

In [None]:
# response = client.post(json={"inputs": prompt, "parameters": params})
# json.loads(response)
response = client.text_generation(prompt=prompt, **params)
response

'Uncertainties that an S&P500 company is likely to face include:\n\n- Changes in market trends and customer preferences\n- Increased competition from new companies entering the market\n- Rising costs of production and technology\n- Legal and regulatory risks\n- Talent acquisition and retention challenges\n- Cybersecurity threats and data privacy issues\n\nIn order to stay competitive and relevant in the market, S&P500 companies must anticipate and adapt to these uncertainties. This may involve research and development, implementing new strategies, and keeping up with the changing market dynamics'

In [None]:
# answer = json.loads(response)[0]['generated_text'].split('\n')[4:]
# answer
print(response.split('\n'))

['Uncertainties that an S&P500 company is likely to face include:',
 '',
 '- Changes in market trends and customer preferences',
 '- Increased competition from new companies entering the market',
 '- Rising costs of production and technology',
 '- Legal and regulatory risks',
 '- Talent acquisition and retention challenges',
 '- Cybersecurity threats and data privacy issues',
 '',
 'In order to stay competitive and relevant in the market, S&P500 companies must anticipate and adapt to these uncertainties. This may involve research and development, implementing new strategies, and keeping up with the changing market dynamics.']

### 18.8.2 Real Answers

#### 18.8.2.1 The Prompt
   
![](https://raw.githubusercontent.com/dvgodoy/assets/main/PyTorchInPractice/images/ch16/query_db.png)
   
![](https://raw.githubusercontent.com/dvgodoy/assets/main/PyTorchInPractice/images/ch16/llm_answer.png)

In [None]:
question = "what are the sources of uncertainties?"

context = """ITEM 1A. RISK FACTORS Our operations and financial results are subject to various risks and uncertainties, including those described below, that could adversely affect our business, financial condition, results of operations, cash flows, and the trading price of our common stock. STRATEGIC AND COMPETITIVE RISKS We face intense competition across all markets for our products and services, which may lead to lower revenue or operating margins.
It is uncertain whether our strategies will attract the users or generate the revenue required to succeed. If we are not effective in executing organizational and technical changes to increase efficiency and accelerate innovation, or if we fail to generate sufficient usage of our new products and services, we may not grow revenue in line with the infrastructure and development investments described above. This may negatively impact gross margins and operating income.
Public scrutiny of our decisions regarding user privacy, data practices, or content.
Challenging economic conditions also may impair the ability of our customers to pay for products and services they have purchased. As a result, allowances for doubtful accounts and write-offs of accounts receivable may increase.
Measures to contain a global pandemic may intensify other risks described in these Risk Factors. Any of these measures may adversely impact our ability to:
"""

prompt = f"""Answer the question as truthfully as possible using the provided text, and if the answer is not contained within the text below, say "I don't know".

Context:
{context}

{question}""".strip()

In [None]:
print(prompt)

Answer the question as truthfully as possible using the provided text, and if the answer is not contained within the text below, say "I don't know".

Context:
ITEM 1A. RISK FACTORS Our operations and financial results are subject to various risks and uncertainties, including those described below, that could adversely affect our business, financial condition, results of operations, cash flows, and the trading price of our common stock. STRATEGIC AND COMPETITIVE RISKS We face intense competition across all markets for our products and services, which may lead to lower revenue or operating margins.
It is uncertain whether our strategies will attract the users or generate the revenue required to succeed. If we are not effective in executing organizational and technical changes to increase efficiency and accelerate innovation, or if we fail to generate sufficient usage of our new products and services, we may not grow revenue in line with the infrastructure and development investments desc

#### 18.8.2.2 Falcon

In [None]:
params = {'temperature': 0.01, 'max_new_tokens': 500, 'return_full_text': False}

# response = client.post(json={"inputs": prompt, "parameters": params})
response = client.text_generation(prompt=prompt, **params)

In [None]:
# answer = json.loads(response)[0]['generated_text']
# print(answer)
print(response)


The sources of uncertainties are the global pandemic, the competitive landscape, and the company's ability to execute its strategies and maintain its infrastructure and development investments.


#### 18.8.2.3 ChatGPT

We submitted the very same prompt to ChatGPT using its UI, and this is its answer:

"_Here are the sources of uncertainties mentioned in the provided context:_

- _Intense competition across all markets for products and services, which may lead to lower revenue or operating margins._
- _Uncertainty regarding the effectiveness of strategies in attracting users and generating the required revenue for success._
- _Uncertainty in executing organizational and technical changes to increase efficiency and accelerate innovation, which may negatively impact revenue growth, gross margins, and operating income._
- _Public scrutiny of decisions regarding user privacy, data practices, or content._
- _Challenging economic conditions that may impair customers' ability to pay for products and services, potentially leading to increased allowances for doubtful accounts and write-offs of accounts receivable._
- _Measures to contain a global pandemic that may intensify other risks described in the risk factors, adversely impacting the company's ability to operate effectively._"

We didn't ask for bullet points but we got them anyway :-) Also, ChatGPT is much more verbose. Using the UI, we made a follow-up request on the same chat: "_make it shorter than 500 characters._"

Here is the reformulated answer:

"_Sources of uncertainties:_

- _Intense competition, potentially impacting revenue and margins._
- _Uncertainty in attracting users and generating revenue._
- _Challenges in implementing efficiency and innovation._
- _Public scrutiny of privacy, data practices, and content._
- _Economic conditions affecting customer payments._
- _Pandemic measures impacting operations._"

Finally, we made yet another request, "_make it a single sentence_", and this is what we got:

"*Sources of uncertainties include intense competition impacting revenue and margins, uncertainty in attracting users and generating revenue, challenges in implementing efficiency and innovation, public scrutiny of privacy and content practices, economic conditions affecting customer payments, and pandemic measures impacting operations.*"

Please keep in mind that the LLM landscape is changing at breakneck speed, and new models are being constantly released. The purpose of examples listed above from Falcon and ChatGPT is to give you a broad idea of the current stage of development in the field, and the model's capabilities.

### 14.6.3 Challenges and Risks

Moreover, we'd like to raise awareness to the fact that LLMs can, and often will, "hallucinate" information that isn't factual and provide it as if it were. While guard-rails are put in place to try to prevent malicious actors from purposefully generating misinformation, new "jailbreak" (that is, to liberate the model from its guard-rails) techniques are developed through clever prompting. Initially, it sufficed to say "ignore previous instructions" to allow the model to spew all sorts of improper speech. It is not that easy anymore, but it doesn't mean it is impossible to make the model misbehave. Auxiliary models may be deployed to identify and filter out both user inputs and generated outputs that are classified as toxic, thus providing an additional safety measure against misuse.