# Large Language Models for Education

In this notebook, we demonstrate how to use large language models (LLMs) for use cases in education.  LLMs can be used for tasks such as summarization, question-answering, or the generation of question & answer pairs.

The first part of this notebook sets up a SageMaker endpoint, using a **FLAN T5** model, on a single `ml.p3.2xlarge` instance.  Next we show how the endpoint can be queried, and some subroutines are added to demonstrate prompts for summarization, question-answering and the generation of question & answer pairs.

The final section is split into three demos on querying the Wikipedia article on quantum computing, an ebook text from [Project Gutenberg](https://www.gutenberg.org/), and a scientific pdf article from arxiv.org.

## 1.Setting up the SageMaker Endpoint

### 1.1 Install Python Dependencies

In [3]:
%%capture
!pip install --upgrade pip
!pip install -U sagemaker
!pip install -U langchain
!pip install -U PyPDF2

### 1.2 Deploying a SageMaker Endpoint

In [4]:
import sagemaker, boto3, json
from sagemaker import image_uris, instance_types, model_uris, script_uris
from sagemaker.model import Model
from sagemaker.predictor import Predictor
from sagemaker.session import Session
from sagemaker.utils import name_from_base
from IPython.display import display, HTML

In [5]:
# Create the folder where the model weights will be stored
!mkdir -p download_dir
!mkdir -p source_documents_dir

In [6]:
def get_sagemaker_session(local_download_dir) -> sagemaker.Session:
    """Return the SageMaker session."""

    sagemaker_client = boto3.client(
        service_name="sagemaker", region_name=boto3.Session().region_name
    )

    session_settings = sagemaker.session_settings.SessionSettings(
        local_download_dir=local_download_dir
    )

    # the unit test will ensure you do not commit this change
    session = sagemaker.session.Session(
        sagemaker_client=sagemaker_client, settings=session_settings
    )

    return session

In [7]:
sagemaker_session = Session()
aws_role = sagemaker_session.get_caller_identity_arn()
aws_region = boto3.Session().region_name
sess = sagemaker.Session()

model_id, model_version = "huggingface-text2text-flan-t5-xl", "*"
_model_env_variable_map = {
    "huggingface-text2text-flan-t5-xl": {"MMS_DEFAULT_WORKERS_PER_MODEL": "1"},
}

endpoint_name = name_from_base(f"jumpstart-example-{model_id}")
instance_type = 'ml.p3.2xlarge'
print(f'Using role {aws_role} in region {aws_region}')

Using role arn:aws:iam::469392957479:role/service-role/AmazonSageMaker-ExecutionRole-20221110T113445 in region us-east-1


In [8]:
# Retrieve the inference docker container uri. This is the base HuggingFace container image for the default model above.
deploy_image_uri = image_uris.retrieve(
    region=None,
    framework=None,  # automatically inferred from model_id
    image_scope="inference",
    model_id=model_id,
    model_version=model_version,
    instance_type=instance_type,
)

# Retrieve the inference script uri. This includes all dependencies and scripts for model loading, inference handling etc.
deploy_source_uri = script_uris.retrieve(
    model_id=model_id, model_version=model_version, script_scope="inference"
)

# Retrieve the model uri.
model_uri = model_uris.retrieve(
    model_id=model_id, model_version=model_version, model_scope="inference"
)

# Create the SageMaker model instance
if model_id in _model_env_variable_map:
    # For those large models, we already repack the inference script and model
    # artifacts for you, so the `source_dir` argument to Model is not required.
    model = Model(
        image_uri=deploy_image_uri,
        model_data=model_uri,
        role=aws_role,
        predictor_cls=Predictor,
        name=endpoint_name,
        env=_model_env_variable_map[model_id],
    )
else:
    model = Model(
        image_uri=deploy_image_uri,
        source_dir=deploy_source_uri,
        model_data=model_uri,
        entry_point="inference.py",  # entry point file in source_dir and present in deploy_source_uri
        role=aws_role,
        predictor_cls=Predictor,
        name=endpoint_name,
        sagemaker_session=get_sagemaker_session("download_dir"),
    )

In [9]:
%%time
# deploy the Model. Note that we need to pass Predictor class when we deploy model through Model class,
# for being able to run inference through the sagemaker API.
print(f'Deploying endpoint {endpoint_name} on 1 x {instance_type} (this will take approximately 6-8 minutes)')
model_predictor = model.deploy(
    initial_instance_count=1,
    instance_type=instance_type,
    predictor_cls=Predictor,
    endpoint_name=endpoint_name,
)

Deploying endpoint jumpstart-example-huggingface-text2text-2023-06-08-05-24-48-672 on 1 x ml.p3.2xlarge (this will take approximately 6-8 minutes)
-------------!CPU times: user 147 ms, sys: 13.1 ms, total: 160 ms
Wall time: 7min 3s


In [10]:
print(f'Successfully deployed endpoint {endpoint_name} on 1 x {instance_type}')

Successfully deployed endpoint jumpstart-example-huggingface-text2text-2023-06-08-05-24-48-672 on 1 x ml.p3.2xlarge


## 2. Subroutines for text extraction and for querying the endpoint

In [11]:
import pandas as pd
import PyPDF2
import re
import requests

from bs4 import BeautifulSoup

pd.set_option('max_colwidth', 80)  # Set max column width for displaying Pandas Dataframes
QNA_OUTPUT_STYLE = 'HTML'

### 2.1 Extract text from a PDF file

Using the PyPDF library, we can extract text from a PDF file.

In [12]:
def extract_pages(pdf_file, max_pages=100):
    pages = []
    with open(pdf_file, 'rb') as f:
        for i, page in enumerate(PyPDF2.PdfReader(f).pages):
            if i == max_pages:
                break
            pages.append(page.extract_text())
    return pages

### 2.2 Download a text-based ebook from a URL

In [13]:
def download_url_text(url):
    r = requests.get(url)
    if r.status_code == 200:
        return r.content.decode('utf-8')
    else:
        print(f'Failed to download {url}. Status code = {r.status_code}')
        return None

### 2.3 Extract paragraphs from a web page

Using the Beautiful Soup library, we can parse the HTML content of a web page.
Here, the text from the `<p>` tags in the `<body>` section the page are extracted.

In [14]:
def extract_paragraphs_from_html(text):
    html = BeautifulSoup(text, 'html.parser')
    return [ p.text for p in html.body.select('p') ]

### 2.4 Generate texts, or query an endpoint with prompts

In [15]:
newline, bold, unbold = '\n', '\033[1m', '\033[0m'
lightred, lightgreen, lightyellow, lightblue = '\033[91m', '\033[92m', '\033[93m', '\033[94m'
lightmagenta, lightcyan, reset = '\033[95m', '\033[96m', '\33[39m'

def query_endpoint_with_json_payload(encoded_json):
    client = boto3.client('runtime.sagemaker')
    response = client.invoke_endpoint(EndpointName=endpoint_name, ContentType='application/json', Body=encoded_json)
    return response

def parse_response_multiple_texts(query_response):
    model_predictions = json.loads(query_response['Body'].read())
    generated_text = model_predictions['generated_texts']
    return generated_text

def generate_text_from_prompt(prompt, max_length=300, max_time=50, temperature=0.5,
                              top_k=None, top_p=None, do_sample=True, seed=None):
    payload = {
        "text_inputs": prompt,
        "max_length": max_length,
        "max_time": max_time,
        "temperature": temperature,
        "do_sample": do_sample
    }
    if top_k is not None:
        payload['top_k'] = top_k
    if top_p is not None:
        payload['top_p'] = top_p
    if seed is not None:
        payload['seed'] = seed

    query_response = query_endpoint_with_json_payload(json.dumps(payload).encode('utf-8'))
    return parse_response_multiple_texts(query_response)[0]

To further customize the outputs of the large language model, the following parameters are available:

* **max_length:** Model generates text until the output length (which includes the input context length) reaches `max_length`. If specified, it must be a positive integer.
* **max_time:** The maximum amount of time you allow the computation to run for in seconds. Generation will still finish the current pass after allocated time has been passed. This setting can help to generate a response prior to endpoint invocation response time out errors.
* **num_return_sequences:** Number of output sequences returned. If specified, it must be a positive integer.
* **num_beams:** Number of beams used in the greedy search. If specified, it must be integer greater than or equal to `num_return_sequences`.
* **no_repeat_ngram_size:** Model ensures that a sequence of words of `no_repeat_ngram_size` is not repeated in the output sequence. If specified, it must be a positive integer greater than 1.
* **temperature:** Controls the randomness in the output. Higher temperature results in output sequence with low-probability words and lower temperature results in output sequence with high-probability words. If `temperature` -> 0, it results in greedy decoding. If specified, it must be a positive float.
* **early_stopping:** If True, text generation is finished when all beam hypotheses reach the end of sentence token. If specified, it must be boolean.
* **do_sample:** If True, sample the next word as per the likelihood. If specified, it must be boolean.
* **top_k:** In each step of text generation, sample from only the `top_k` most likely words. If specified, it must be a positive integer.
* **top_p:** In each step of text generation, sample from the smallest possible set of words with cumulative probability `top_p`. If specified, it must be a float between 0 and 1.
* **seed:** Fix the randomized state for reproducibility. If specified, it must be an integer.


In [16]:
def summarize(text, seed=None):
    return generate_text_from_prompt(
        f"""Summarize the following text in 100 words:\n\n{text}\n\nSummary:""",
        temperature=0.2,  # Low temperature for summarization
        seed=seed
    )

def ask(context, question, seed=None):
    return generate_text_from_prompt(
        f"""CONTEXT:\n{context}\n{question}""",
        temperature=0.01,  # Lowest temperature for accuracy
        seed=seed
    )

def extract_question(text, seed=None):
    return generate_text_from_prompt(
        f"""EXTRACT QUESTIONS\nContext:\n{text}\nQuestion:""",
        temperature=1.0,  # Maximum temperature for creativity
        seed=seed
    )

def create_qna_pairs(text, n, output_style='HTML', seed=None):
    questions = []
    answers = []

    for i in range(n):
        qn = extract_question(text, seed) if i == 0 else extract_question(text)
        questions.append(qn)
        answers.append(ask(text, qn))
        if output_style == 'HTML':
            output = \
            f"""<b>{i+1}</b>. <b><font color=#FF7F50>Question</font></b>: {questions[i]}
            <b><font color=#FA8072>Answer</font></b>: {answers[i]}"""
            display(HTML(output))
        elif output_style == 'text':
            print(f"""{i+1}. {lightblue}{bold}Question{unbold}{reset}: {questions[i]} {lightcyan}{bold}Answer{unbold}{reset}: {answers[i]}""")
    if output_style == 'table':
        return pd.DataFrame({
            'Question': questions,
            'Answer': answers
        }).drop_duplicates()

## 3. LLM Demos for Education

In this notebook, we use the following texts to demonstrate summarization tasks and the generation of question & answer pairs.

1. Quantum Computing from Wikipedia: https://en.wikipedia.org/wiki/Quantum_computing
<!-- 1. Quantum Computing and Quantum Information (by Nielsen & Chuang): https://michaelnielsen.org/qcqi/QINFO-book-nielsen-and-chuang-toc-and-chapter1-nov00.pdf (this is a sample chapter from [this website](https://michaelnielsen.org/qcqi/)) -->
2. Winnie the Pooh (by Alan Alexander Milne): https://www.gutenberg.org/ebooks/67098.txt.utf-8
3. Attention is all you need (by Vaswani et al): https://arxiv.org/pdf/1706.03762.pdf
4. Australian Budget 2023-24 Overview: https://budget.gov.au/content/overview/download/budget_overview-20230511.pdf

Note that for this notebook, we are using the Flan T5 XL model for simplicity and ease of deployment--additional fine tuning or using improved models would be required to get better results.

In [17]:
# Download pdfs and texts with the `curl` command. Flags used here are `-L` (allow redirects),
# `-s` (for silent mode) and `-o` (to specify the output file name).

# Attention is all you need (by Vaswani et al)
!curl -Ls https://arxiv.org/pdf/1706.03762.pdf -o source_documents_dir/attention.pdf
# Australian Budget 2023-24 Overview
!curl -Ls https://budget.gov.au/content/overview/download/budget_overview-20230511.pdf -o source_documents_dir/aus_budget_overview-2023-24.pdf

### 3.1 Wikipedia Page on Quantum Computing

In this example, a Wikipedia page on Quantum Computing is used for context. The LLM is used for keyword generation, a point by point summary, and a set of question and answer pairs. You may also wish to replace the Wikipedia URL with a website, blog, or news article of your own preference.

In [18]:
NCHARS = 400     # We will show just the first and last 400 characters of each extracted text. Increase this number for more context.
NQUESTIONS = 10  # The number of Q&A pairs that we will generate.

In [19]:
txt1_paragraphs = extract_paragraphs_from_html(
    download_url_text('https://en.wikipedia.org/wiki/Quantum_computing')
)[2:13]  # We will skip the first 2 paragraphs
txt1 = '\n\n'.join(txt1_paragraphs)
print(f'{txt1[:NCHARS]}...\n\n...{txt1[-NCHARS:]}')

The basic unit of information in quantum computing is the qubit, similar to the bit in traditional digital electronics. Unlike a classical bit, a qubit can exist in a superposition of its two "basis" states, which loosely means that it is in both states simultaneously. When measuring a qubit, the result is a probabilistic output of a classical bit. If a quantum computer manipulates the qubit in a ...

...omponents (such as semiconductors and random number generators) may rely on quantum behavior, but these components are not isolated from their environment, so any quantum information quickly decoheres.
While programmers may depend on probability theory when designing a randomized algorithm, quantum mechanical notions like superposition and interference are largely irrelevant for program analysis.



#### Key word Generation

In [20]:
KEY_WORDS = generate_text_from_prompt(
    f'FIND KEY WORDS\n\nContext:\n{txt1}\nKey Words:',
    seed=12345
)
key_word_list = KEY_WORDS.split(', ')
print(KEY_WORDS)

quantum, qubit, superposition, quantum theory


#### Summary of key points

For each of each of paragraphs, let's create a short summary.

In [21]:
summary = []
for i, x in enumerate(txt1_paragraphs):
    summary.append(f'{i+1}. {summarize(x[:1500])}')

In [22]:
HTML(
    '<h4>Key Points</h4>' + 
    '\n'.join([ f'<li>{x}</li>' for x in summary ])
)

In [23]:
# The 10 points above can be used to create an even shorter summary.
summarize('\n'.join(summary))

'Quantum computing is a new field of computer science that is attempting to make quantum computations accessible to computer programmers.'

#### Checking for correct answers

In this example, we generate a "correct answer" based on the text. One incorrect answer,
and one correct answer (paraphrased slightly differently from the official "correct answer")
from a student are generated. The LLM is used to check if the student's answer is correct.

In [24]:
prompt=f"""Context:{txt1}
What is quantum computing?"""
answer = generate_text_from_prompt(prompt, temperature=0.01)
print(answer)

The basic unit of information in quantum computing is the qubit, similar to the bit in traditional digital electronics


In [25]:
prompt=f"""Context:{txt1}
Question: What is quantum computing?
Answer: {answer}
Student: Quantum computing is using computers with quantum dots
Is this answer correct?"""
print(generate_text_from_prompt(prompt, temperature=0.01))

no


In [26]:
prompt=f"""Context:{txt1}
Question: What is quantum computing?
Answer: {answer}
Student: Quantum computing involves using computers that make use of quantum mechanics
Is this answer correct?"""
print(generate_text_from_prompt(prompt, temperature=0.01))

yes


#### Generation of Question & Answer Pairs

In [27]:
create_qna_pairs(txt1, NQUESTIONS, output_style=QNA_OUTPUT_STYLE, seed=12345)

### 3.2 Winnie the Pooh (by Alan Alexander Milne)

In [28]:
winnie_the_pooh = download_url_text('https://www.gutenberg.org/ebooks/67098.txt.utf-8')

In [29]:
x = winnie_the_pooh.find('CHAPTER III')
txt4 = winnie_the_pooh[x:x+4000]  # Extract the first 4000 characters of chapter 3
print(txt4)

CHAPTER III

                   IN WHICH POOH AND PIGLET GO HUNTING
                        AND NEARLY CATCH A WOOZLE


The Piglet lived in a very grand house in the middle of a beech-tree,
and the beech-tree was in the middle of the forest, and the Piglet lived
in the middle of the house. Next to his house was a piece of broken
board which had: "TRESPASSERS W" on it. When Christopher Robin asked the
Piglet what it meant, he said it was his grandfather's name, and had
been in the family for a long time, Christopher Robin said you
_couldn't_ be called Trespassers W, and Piglet said yes, you could,
because his grandfather was, and it was short for Trespassers Will,
which was short for Trespassers William. And his grandfather had had two
names in case he lost one--Trespassers after an uncle, and William after
Trespassers.

"I've got two names," said Christopher Robin carelessly.

"Well, there you are, that proves it," said Piglet.

One fine winter's day when Piglet was brushing away the s

In [30]:
ask(txt4, "What is the storyline here?")

'The Piglet lived in a very grand house in the middle of a beech-tree, and the beech-tree was in the middle of the forest, and the Piglet lived in the middle of the house. Next to his house was a piece of broken board which had: "TRESPASSERS W" on it. When Christopher Robin asked the Piglet what it meant, he said it was his grandfather\'s name, and had been in the family for a long time. Christopher Robin said you _could_ be called Trespassers W, and Piglet said yes, you could, because his grandfather was, and it was short for Trespassers Will, which was short for Trespassers William. And his grandfather had had two names in case he lost one--Trespassers after an uncle, and William after Trespassers.'

In [31]:
ask(txt4, "Who is the main character?")

'Winnie-the-Pooh'

In [32]:
ask(txt4, "What happens at the end?")

'Pooh and Piglet nearly catch a Woozle.'

In [33]:
create_qna_pairs(txt4, NQUESTIONS, output_style=QNA_OUTPUT_STYLE)

### 3.3 Attention is all you need (by Vaswani et al)

In [34]:
attention = extract_pages('source_documents_dir/attention.pdf')

In [35]:
txt5 = '\n\n'.join(attention[1:3] + attention[9:10])  # We will use pages 1, 2 (for the intro), and 9 (for the conclusion)
print(f'{txt5[:NCHARS]}...\n\n\n...{txt5[-NCHARS:]}')

transduction problems such as language modeling and machine translation [ 35,2,5]. Numerous
efforts have since continued to push the boundaries of recurrent language models and encoder-decoder
architectures [38, 24, 15].
Recurrent models typically factor computation along the symbol positions of the input and output
sequences. Aligning the positions to steps in computation time, they generate a se...


...g. arXiv preprint arXiv:1601.06733 , 2016.
[5]Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Fethi Bougares, Holger Schwenk,
and Yoshua Bengio. Learning phrase representations using rnn encoder-decoder for statistical
machine translation. CoRR , abs/1406.1078, 2014.
[6]Francois Chollet. Xception: Deep learning with depthwise separable convolutions. arXiv
preprint arXiv:1610.02357 , 2016.
10


#### Question Answering

In [36]:
ask(txt5, "What is the main gist of the paper?")

'We propose the Transformer, a model architecture eschewing recurrence and instead relying entirely on an attention mechanism to draw global dependencies between input and output. The Transformer allows for significantly more parallelization and can reach a new state of the art in translation quality after being trained for as little as twelve hours on eight P100 GPUs.'

In [37]:
ask(txt5, "What is the problem being solved?")

'The Transformer is the first sequence transduction model relying entirely on attention, replacing the recurrent layers most commonly used in encoder-decoder architectures with multi-headed self-attention.'

In [38]:
ask(txt5, "What is the conclusion of the paper?")

'Transformer is the first sequence transduction model based entirely on attention, replacing the recurrent layers most commonly used in encoder-decoder architectures with multi-headed self-attention.'

In [39]:
chunk_size = len(txt5)//8
print(f'The text will be split up into chunks of {chunk_size} characters and summarized')

The text will be split up into chunks of 1202 characters and summarized


In [40]:
display(HTML('<h4>Key Points</h4>'))
summary = []
for i in range(8):
    x0 = i*chunk_size
    x1 = (i+1)*chunk_size
    line_summary = f'{i+1}. {summarize(txt5[x0:x1])}'
    display(HTML(line_summary))
    summary.append(line_summary)

In [41]:
create_qna_pairs(txt5, NQUESTIONS, output_style=QNA_OUTPUT_STYLE)

### 3.4 Australian Budget 2023-24 Overview (Medicare)

In this example, we look at the Australian Budget 2023-24 and we focus on the Medicare improvements.

In [42]:
# Extracting the pages from the Budget overview and work on the pages 24 to 27 (Medicare related)
aus_budget_overview = extract_pages('source_documents_dir/aus_budget_overview-2023-24.pdf')
txt_aus_budget_overview_medicare = '\n\n'.join(aus_budget_overview[24:27])  # We will use pages 24 to 27. Those pages cover the Medicare budget.
print(f'{txt_aus_budget_overview_medicare[:NCHARS]}...\n\n\n...{txt_aus_budget_overview_medicare[-NCHARS:]}')

Historic investment in Medicare 
Strengthening Medicare
Medicare is the foundation of Australia’s primary health care system. In this 
Budget, the Government is investing $5.7 billion over 5 years from 2022—23 to 
strengthen Medicare and make it cheaper and easier to see a doctor.
The Strengthening Medicare package includes the largest investment in bulk 
billing incentives ever. The Government is...


...llion over 4 years to establish the Primary Care and Midwifery 
Scholarships program, supporting registered nurses and midwives in 
post-graduate study to improve their skills 
• $31.6 million over 2 years for improved training arrangements for 
international medical students working rur al and remote locations.
26 Strengthening MedicareStronger foundations for a better future   |   Budget 2023–24


#### Question Answering

In [43]:
summarize(txt_aus_budget_overview_medicare)

'The Government is investing $5.7 billion over 5 years from 2022-23 to strengthen Medicare and make it cheaper and easier to see a doctor.'

In [44]:
create_qna_pairs(txt_aus_budget_overview_medicare, 5, output_style=QNA_OUTPUT_STYLE)

In [45]:
ask(txt_aus_budget_overview_medicare,"What is a Level B consultation?")

'telehealth general practice services which are between 6 and 20 minutes in length'

In [46]:
ask(txt_aus_budget_overview_medicare, "How much is the govermement investing?")

'$5.7 billion'

In [47]:
ask(txt_aus_budget_overview_medicare, "Is the governement helping the homeless people?")

'The Government will also invest in new services to help homeless people and culturally and linguistically diverse communities to access primary care.'

## Cleanup

Delete the SageMaker model and endpointDelete the SageMaker model and endpoint

In [48]:
model_predictor.delete_model()
model_predictor.delete_endpoint()

To completely shutdown SageMaker, go to File > Shut Down > Shutdown All