# Summarizing, re-wording, and translating (oh my)

This is something that generative models are _really_ good at: generating output that looks like a summary of the input, or like a re-wording of the input.

Like before, though, be careful: there is _no guarantee_ the the model is producing an actual summary of the text you input.  It could be hallucinating.  So be careful when using these models for summarization.

In [1]:
# Copying a lot of code from the last notebook

import re
from textwrap import wrap
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

# recommended models--try both!
# model_name = "HuggingFaceH4/zephyr-7b-beta"
model_name = "HuggingFaceH4/mistral-7b-sft-beta"

# model_name = "mistralai/Mistral-7B-v0.3"
# model_name = "mistralai/Mistral-7B-Instruct-v0.3"

# model_name = "mistralai/Mixtral-8x7B-v0.1"
# model_name = "mistralai/Mixtral-8x7B-Instruct-v0.1"

# model_name = "meta-llama/Meta-Llama-3-8B"
# model_name = "meta-llama/Meta-Llama-3-8B-instruct"

tok = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_quant_type="nf4",
        bnb_4bit_compute_dtype=torch.float16,
    ),
    low_cpu_mem_usage=True,
)
model.eval()

import sys
if sys.version_info.minor <= 12 or torch.__version__.split(".")[1] >= "4":
    model = torch.compile(model)
else:
    print(
        f"Cannot compile the model.  Need a Python version *prior to* 3.12 (you have: {sys.version_info}), or "
        f"a PyTorch version 2.4.0 or later (you have: {torch.__version__})"
    )

model.generation_config.pad_token_id = tok.eos_token_id

# unlike last time, we'll use a fixed carrier phrase here.  Otherwise it'll all be the same.
def generate(text, model=model, tok=tok):
    messages = [
        {"role": "user", "content": text}
    ]
    model_inputs = tok.apply_chat_template(
        messages,
        return_tensors="pt",
        return_dict=True,
    )
    # get the model's responses
    generated_ids = model.generate(
        **{k:v.to("cuda") for k,v in model_inputs.items()},
        max_new_tokens=256,
        do_sample=True,
    )
    output = tok.batch_decode(generated_ids)[0]
    
    # split the text into system, user, and assistant chunks.
    if model_name in ("HuggingFaceH4/zephyr-7b-beta", "HuggingFaceH4/mistral-7b-sft-beta"):
        output = re.split(r"(<\|user\|>|</s>|<\|system\|>|<\|assistant\|>)", output)
        user = output.index("<|user|>")
        assistant = output.index("<|assistant|>")
    elif model_name in ("mistralai/Mistral-7B-v0.3", "mistralai/Mistral-7B-Instruct-v0.3", "mistralai/Mixtral-8x7B-v0.1"):
        output = re.split(r"(\[INST\]|\[/INST\]|</s>)", output)
        user = output.index("[INST]")
        assistant = output.index("[/INST]")
    elif model_name in ("meta-llama/Meta-Llama-3-8B", "meta-llama/Meta-Llama-3-8B-instruct"):
        print(output)
    else:
        raise ValueError(f"Oops, you need to add logic for the outputs from {model_name}")

    user_input = output[user + 1].strip()
    model_response = output[assistant + 1].strip()
    return user_input, model_response

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [2]:
# Text we'll re-use for a few demos

llm_wiki = """
    A large language model (LLM) is a computational model notable for its ability to achieve general-purpose language generation and other natural language processing tasks such as classification. Based on language models, LLMs acquire these abilities by learning statistical relationships from vast amounts of text during a computationally intensive self-supervised and semi-supervised training process.[1] LLMs can be used for text generation, a form of generative AI, by taking an input text and repeatedly predicting the next token or word.[2]

LLMs are artificial neural networks that utilize the transformer architecture, invented in 2017. The largest and most capable LLMs, as of June 2024, are built with a decoder-only transformer-based architecture, which enables efficient processing and generation of large-scale text data.

Historically, up to 2020, fine-tuning was the primary method used to adapt a model for specific tasks. However, larger models such as GPT-3 have demonstrated the ability to achieve similar results through prompt engineering, which involves crafting specific input prompts to guide the model's responses.[3] These models acquire knowledge about syntax, semantics, and ontologies[4] inherent in human language corpora, but they also inherit inaccuracies and biases present in the data they are trained on.[5]

Some notable LLMs are OpenAI's GPT series of models (e.g., GPT-3.5 and GPT-4, used in ChatGPT and Microsoft Copilot), Google's Gemini (the latter of which is currently used in the chatbot of the same name), Meta's LLaMA family of models, Anthropic's Claude models, and Mistral AI's models. 

Before 2017, there were a few language models that were large compared to capacities then available. In the 1990s, the IBM alignment models pioneered statistical language modelling. In the 2000s, as Internet use became prevalent, some researchers constructed Internet-scale language datasets ("web as corpus"[6]), upon which they trained statistical language models.[7][8] In 2009, in most language processing tasks, statistical language models dominated over symbolic language models, as they can usefully ingest large datasets.[9]

After neural networks became dominant in image processing around 2012, they were applied to language modelling as well. Google converted its translation service to Neural Machine Translation in 2016. As it was before Transformers, it was done by seq2seq deep LSTM networks.

At the 2017 NeurIPS conference, Google researchers introduced the transformer architecture in their landmark paper "Attention Is All You Need". This paper's goal was to improve upon 2014 Seq2seq technology,[10] and was based mainly on the attention mechanism developed by Bahdanau et al. in 2014.[11] The following year in 2018, BERT was introduced and quickly became "ubiquitous".[12] Though the original transformer has both encoder and decoder blocks, BERT is an encoder-only model.

Although decoder-only GPT-1 was introduced in 2018, it was GPT-2 in 2019 that caught widespread attention because OpenAI at first deemed it too powerful to release publicly, out of fear of malicious use.[13] GPT-3 in 2020 went a step further and as of 2024 is available only via API with no offering of downloading the model to execute locally. But it was the 2022 consumer-facing browser-based ChatGPT that captured the imaginations of the general population and caused some media hype and online buzz.[14] The 2023 GPT-4 was praised for its increased accuracy and as a "holy grail" for its multimodal capabilities.[15] OpenAI did not reveal high-level architecture and the number of parameters of GPT-4.

Competing language models have for the most part been attempting to equal the GPT series, at least in terms of number of parameters.[16]

Since 2022, source-available models have been gaining popularity, especially at first with BLOOM and LLaMA, though both have restrictions on the field of use. Mistral AI's models Mistral 7B and Mixtral 8x7b have the more permissive Apache License. As of January 2024, Mixtral 8x7b is the most powerful open LLM according to the LMSYS Chatbot Arena Leaderboard, being more powerful than GPT-3.5 but not as powerful as GPT-4.[17]

As of 2024, the largest and most capable models are all based on the Transformer architecture. Some recent implementations are based on other architectures, such as recurrent neural network variants and Mamba (a state space model).[18][19][20] 
    """

In [3]:
orginal_text, summary = generate(f"Please summarize the following text: {llm_wiki}")
print(summary)

  attn_output = torch.nn.functional.scaled_dot_product_attention(


Large Language Models (LLMs) are computational models that possess the ability to carry out language generation and numerous natural language processing tasks such as classification. They are developed by subjecting them to a rigorous and computationally intensive self and semi-supervised training process. Based on extensive training data, LLMs acquire the ability to understand and comprehend syntax, semantics and ontologies inherent in human language corpora. One of the most influential models in this category is OpenAI's GPT-3.5 and GTP-4 which are used to fine-tune specific tasks via prompt engineering. LLMs, therefore, inherit the accuracy, biases and inaccuracies of the dataset they are trained on. 

The largest and most potent LLMs today use a decoder-only transformer-based architecture that processes and produces large-scale text data efficiently. Anthropic's Claude models and Mistral AI's models have also emerged as effective alternatives. While decoder-only GPT-1 was introduce

In [4]:
original_text, poem = generate(f"Please re-write the following text in the form of a poem: {llm_wiki.splitlines()[0]}")
print(poem)

The world is full of wonders endless,
With life’s journey a rich tapestry of stress.
The day is long, the night is endless,
And of our troubles, we are left to discuss.

Our worries often consume our minds,
We look to others for a listening ear.
Yet, there’s comfort in the simple of life,
When we open our hearts to people who care.

So, when the hustle and bustle overwhelm,
Let us seek the peace of a listening friend.
For in moments shared, we find our strength,
And through these quiet hours, we make our way ahead.

So here’s to those who hear when we cry,
And hold us up when we fall.
For we are not alone, nor without hope,
With you we find the courage to crawl.


In [5]:
orginal_text, simplified = generate(f"Please re-word the following text to be at a 5th grade reading level: {llm_wiki}")
print(simplified)

A large language model (LLM) is a computer that can understand and create language. LLM's use vast amounts of text to learn how to answer questions, make predictions and talk to machines. They use a type of artificial intelligence called reinforcement learning to improve their abilities. LLMs can answer fact-based questions or give you useful feedback on how to better use words. To make these models work, data is fed into them to learn what words mean. This kind of model is called a computational model. LLMs are becoming more important as artificial intelligence is getting more powerful. LLMs are used by companies like Google, Microsoft and OpenAI and they can be taught to do just about anything. They are even able to predict what someone might say in a conversation by reading their past conversations. LLMs are getting better by learning from past mistakes. This means that they become more accurate over time.


In [6]:
orginal_text, pirate = generate(f"Please re-word the following in the style of a stereotypical, over-the-top pirate: {llm_wiki}")
print(pirate)

Ahoy, me hearties! Come listen to me tale o' the mighty LLM, a computational model with the power to generate natural language processing tasks like classification with ease. This powerful beast is built through a self-supervised and semi-supervised training process that learns statistical relationships from vast amounts of text to achieve general-purpose language generation.

Historically, up until 2020, fine-tuning was the preferred method of adapting a model for specific tasks. But with the emergence of larger models like GPT-3, prompt engineering has been found to be a more effective way of achieving similar results. These models can acquire knowledge about syntax, semantics and ontologies inherent in human language corpora, but they also inherit any inaccuracies and biases present in the data they are trained on.

From 1990s alignment models pioneered statistical language modelling to statistical language models dominating over symbolic language models in most language processing 

In [7]:
# A shorter text for translation.  Article 1 of the Universal Declaration of Human Rights.
text = "All human beings are born free and equal in dignity and rights. They are endowed with reason and conscience and should act towards one another in a spirit of brotherhood. "

for lang in ("German", "Latin", "Klingon", "Toki Pona"):
    orginal_text, translated = generate(f"Please translate the following text into {lang}: {text}")
    print(f"In {lang}:\n{translated}\n\n")

In German:
Alle menschlichen Wesen sind von Geburt an frei und gleich vor Dignität und Rechten. Sie sind mit Vernunft und Gewissen ausgestattet und sollten sich gegenüber einander in einer Geist der Brüderschaft verhalten.


In Latin:
Omnes homines nati sunt liberi et aequalis dignitate et jure. Eos infuserunt rationem et conscientiam, neque animam eos aequis viribus esse dedecando. (All human beings are born free, and are of equal dignity and right. They are endowed with reason and conscience lest they incur shame for the unequal distribution of power.)

Note that this is a paraphrase of Article 1 of the Universal Declaration of Human Rights in Latin. The original text in Latin is available on the official United Nations website.


In Klingon:
QIQHlaDIH DaSqaSbI'aQlaH!
HopuHlaDIH DaSqaSbI'aQlaH! 
lEyDaH IQu'aDIvHlaDIH!
IHqu'aDvIaDaHH! 
IQupI'laHH!


In Toki Pona:
Taso li lon lon e jan pali mute kasi o sike e tomo. E ken mo kepe e mute kasi sina, ino e ala moku.




Just for comparison, here are the actual translations.

German: _Alle Menschen sind frei und gleich an Würde und Rechten geboren. Sie sind mit Vernunft und Gewissen begabt und sollen einander im Geist der Brüderlichkeit begegnen._

Latin: _omnes homines dignitate et iure liberi et pares nascuntur, rationis et conscientiae participes sunt, quibus inter se concordiae studio est agendum._

Klingon: _boghDI' Hoch Humanpu', tlhab, rap DIbmeychaj 'ej rap nurchaj. meqlaHghach, ghob chovlaHghach je luHev. vangtaHvIS, pal'arpu' rur net tlheb._  (source: [The Klingon Language Institute](https://www.kli.org/resources/udhr/).  This translation was officially accepted by the United Nations!)

Toki Pona: _jan ale/ali li kama lon nasin ni: ona li ken pali e wile ona. ona li jo e suli jan sama e ken sama. ona li jo e sona pona e lawa insa pi pali pona. ni la, ona li wile pali tawa jan ante ale/ali kepeken nasin olin._