# 🧑‍🏫 Large Language Models for Education 🧑‍🏫


---

### 🧑‍🎓 A note on Generative AI in Education

By harnessing generative AI, educators can unlock new and captivating products, enabling them to craft engaging and interactive learning experiences that promote student growth. Experts envision a future where generative AI empowers educators to revolutionize the way knowledge is imparted, paving the way for transformative educational practices.

---

In this notebook, we demonstrate how to use large language models (LLMs) for use cases in education.  LLMs can be used for tasks such as summarization, question-answering, or the generation of question & answer pairs.

Text Summarization is the task of shortening the data and creating a summary that represents the most important information present in the original text. Here, we show how to use state-of-the-art pre-trained model **FLAN T5** for text summarization, as well all the other tasks. 

In the first part of the notebook, we select and deploy the **FLAN T5** model as a SageMaker Real-time endpoint, on a single `ml.g4dn.2xlarge` instance. SageMaker Real-time endpoints is ideal for inference workloads where you have real-time, interactive, low latency requirements.  These endpoints are fully managed, automatically serve your models through HTTP, and support auto-scaling.

Once the model is deployed and ready to use, we demonstrate how it can be queried, how to prompt the model for summarization, question-answering and the generation of question & answer pairs.

The final section is split into four demos on querying the Wikipedia article on quantum computing, an ebook text from [Project Gutenberg](https://www.gutenberg.org/), a scientific pdf article from arxiv.org, and the Australian Budget 2023-24 Medicare Overview.

## 1.Setting up the SageMaker Endpoint

### 1.1 Install Python Dependencies and SageMaker setup

Before executing the notebook, there are some initial steps required for set up. This notebook requires latest version of sagemaker and other libraries.

In [1]:
!aws codeartifact login --tool pip --repository ag-dev-codeartifact-isolated-repo --domain ag-dev-sagemaker --domain-owner 820788409827 --region ap-southeast-1

Successfully configured pip to use AWS CodeArtifact repository https://ag-dev-sagemaker-820788409827.d.codeartifact.ap-southeast-1.amazonaws.com/pypi/ag-dev-codeartifact-isolated-repo/ 
Login expires in 12 hours at 2023-07-05 03:37:42+00:00


In [2]:
%%capture
!pip install --upgrade pip
!pip install -U sagemaker
!pip install -U langchain
!pip install -U PyPDF2

We Load SDK and helper scripts. First, we import required packages and load the S3 bucket from SageMaker session, as shown below.

In [3]:
import sagemaker, boto3, json, logging
from sagemaker import image_uris, instance_types, model_uris, script_uris
from sagemaker.model import Model
from sagemaker.predictor import Predictor
from sagemaker.session import Session
from sagemaker.utils import name_from_base
from IPython.display import display, HTML, IFrame

In [4]:
logger = logging.getLogger('sagemaker')
logger.setLevel(logging.DEBUG)
logger.addHandler(logging.StreamHandler())

In [5]:
logger.info(f'Using sagemaker=={sagemaker.__version__}')
logger.info(f'Using boto3=={boto3.__version__}')

Using sagemaker==2.169.0
Using boto3==1.26.154


In [6]:
def get_sagemaker_session(local_download_dir) -> sagemaker.Session:
    """Return the SageMaker session."""

    sagemaker_client = boto3.client(
        service_name="sagemaker", region_name=boto3.Session().region_name
    )

    session_settings = sagemaker.session_settings.SessionSettings(
        local_download_dir=local_download_dir
    )

    # the unit test will ensure you do not commit this change
    session = sagemaker.session.Session(
        sagemaker_client=sagemaker_client, settings=session_settings
    )

    return session

### 1.2 Deploying a SageMaker Endpoint

Using SageMaker, we can perform inference on the pre-trained model, even without fine-tuning it first on a new dataset. We start by retrieving the `deploy_image_uri`, `deploy_source_uri`, and `model_uri` for the pre-trained model. To host the pre-trained model, we create an instance of [`sagemaker.model.Model`](https://sagemaker.readthedocs.io/en/stable/api/inference/model.html) and deploy it. This may take a few minutes.

In [7]:
sagemaker_session = Session()
aws_role = sagemaker_session.get_caller_identity_arn()
aws_region = boto3.Session().region_name
sess = sagemaker.Session()

# We select the Flan-T5 XL model available in the Hugging Face container.
model_id, model_version = "huggingface-text2text-flan-t5-xl", "*"
_model_env_variable_map = {
    "huggingface-text2text-flan-t5-xl": {"MMS_DEFAULT_WORKERS_PER_MODEL": "1"},
}

endpoint_name = name_from_base(f"jumpstart-example-{model_id}")
instance_type = 'ml.g4dn.2xlarge'
logger.info(f'Using role {aws_role} in region {aws_region}')

Using role arn:aws:iam::820788409827:role/ag-dev-studio-workshop-gcc-james-teo-ExecutionRole in region ap-southeast-1


In [8]:
# Retrieve the inference docker container uri. This is the base HuggingFace container image for the default model above.
deploy_image_uri = image_uris.retrieve(
    region=None,
    framework=None,  # automatically inferred from model_id
    image_scope="inference",
    model_id=model_id,
    model_version=model_version,
    instance_type=instance_type,
)

# Retrieve the inference script uri. This includes all dependencies and scripts for model loading, inference handling etc.
deploy_source_uri = script_uris.retrieve(
    model_id=model_id, model_version=model_version, script_scope="inference"
)

# Retrieve the model uri.
model_uri = model_uris.retrieve(
    model_id=model_id, model_version=model_version, model_scope="inference"
)

# Create the SageMaker model instance
if model_id in _model_env_variable_map:
    # For those large models, we already repack the inference script and model
    # artifacts for you, so the `source_dir` argument to Model is not required.
    model = Model(
        image_uri=deploy_image_uri,
        model_data=model_uri,
        role=aws_role,
        predictor_cls=Predictor,
        name=endpoint_name,
        env=_model_env_variable_map[model_id],
    )
else:
    model = Model(
        image_uri=deploy_image_uri,
        source_dir=deploy_source_uri,
        model_data=model_uri,
        entry_point="inference.py",  # entry point file in source_dir and present in deploy_source_uri
        role=aws_role,
        predictor_cls=Predictor,
        name=endpoint_name,
        sagemaker_session=get_sagemaker_session("download_dir"),
    )

In [9]:
%%time
# deploy the Model. Note that we need to pass Predictor class when we deploy model through Model class,
# for being able to run inference through the sagemaker API.
print(f'Deploying endpoint {endpoint_name} on 1 x {instance_type} (this will take approximately 6-8 minutes)')
try:
    model_predictor = model.deploy(
        initial_instance_count=1,
        instance_type=instance_type,
        predictor_cls=Predictor,
        endpoint_name=endpoint_name,
    )
except Exception as e:
    print(f'Error: {e}')
    print('Two common reasons for this error')
    print('1. You are in a AWS region that does not have the ml.g4dn.2xlarge instance type')
    print('2. You have exceeded the service quota of this AWS account')

Creating model with name: jumpstart-example-huggingface-text2text-2023-07-04-15-38-05-197
CreateModel request: {
    "ModelName": "jumpstart-example-huggingface-text2text-2023-07-04-15-38-05-197",
    "ExecutionRoleArn": "arn:aws:iam::820788409827:role/ag-dev-studio-workshop-gcc-james-teo-ExecutionRole",
    "PrimaryContainer": {
        "Image": "763104351884.dkr.ecr.ap-southeast-1.amazonaws.com/huggingface-pytorch-inference:1.13.1-transformers4.26.0-gpu-py39-cu117-ubuntu20.04",
        "Environment": {
            "MMS_DEFAULT_WORKERS_PER_MODEL": "1"
        },
        "ModelDataUrl": "s3://jumpstart-cache-prod-ap-southeast-1/huggingface-infer/prepack/v1.1.2/infer-prepack-huggingface-text2text-flan-t5-xl.tar.gz"
    },
    "Tags": [
        {
            "Key": "aws-jumpstart-inference-model-uri",
            "Value": "s3://jumpstart-cache-prod-ap-southeast-1/huggingface-infer/prepack/v1.1.2/infer-prepack-huggingface-text2text-flan-t5-xl.tar.gz"
        }
    ]
}


Deploying endpoint jumpstart-example-huggingface-text2text-2023-07-04-15-38-05-197 on 1 x ml.g4dn.2xlarge (this will take approximately 6-8 minutes)


Creating endpoint-config with name jumpstart-example-huggingface-text2text-2023-07-04-15-38-05-197
Creating endpoint with name jumpstart-example-huggingface-text2text-2023-07-04-15-38-05-197


--------!CPU times: user 137 ms, sys: 23.8 ms, total: 161 ms
Wall time: 4min 32s


In [10]:
print(f'Successfully deployed endpoint {endpoint_name} on 1 x {instance_type}')

Successfully deployed endpoint jumpstart-example-huggingface-text2text-2023-07-04-15-38-05-197 on 1 x ml.g4dn.2xlarge


## 3. LLM Demos for Education

In [11]:
import nlp_helper
nlp_helper.endpoint_name = endpoint_name

In this notebook, we use the following texts to demonstrate summarization tasks and the generation of question & answer pairs.

1. Quantum Computing from Wikipedia: https://en.wikipedia.org/wiki/Quantum_computing
<!-- 1. Quantum Computing and Quantum Information (by Nielsen & Chuang): https://michaelnielsen.org/qcqi/QINFO-book-nielsen-and-chuang-toc-and-chapter1-nov00.pdf (this is a sample chapter from [this website](https://michaelnielsen.org/qcqi/)) -->
2. Winnie the Pooh (by Alan Alexander Milne): https://www.gutenberg.org/ebooks/67098.txt.utf-8
3. Attention is all you need (by Vaswani et al): https://arxiv.org/pdf/1706.03762.pdf
4. Australian Budget 2023-24 Overview: https://budget.gov.au/content/overview/download/budget_overview-20230511.pdf

Note that for this notebook, we are using the Flan T5 XL model for simplicity and ease of deployment--additional fine tuning or using improved models would be required to get better results.

### 3.1 Wikipedia Page on Quantum Computing

In this example, a Wikipedia page on Quantum Computing is used for context. The LLM is used for keyword generation, a point by point summary, and a set of question and answer pairs. You may also wish to replace the Wikipedia URL with a website, blog, or news article of your own preference.

In [12]:
NCHARS = 400     # We will show just the first and last 400 characters of each extracted text. Increase this number for more context.
NQUESTIONS = 10  # The number of Q&A pairs that we will generate.

In [13]:
from documents import wiki_paragraphs, wiki_txt

#### Key word Generation

In [14]:
KEY_WORDS = nlp_helper.generate_text_from_prompt(
    f'FIND KEY WORDS\n\nContext:\n{wiki_txt}\nKey Words:',
    seed=12345
)
key_word_list = KEY_WORDS.split(', ')
print(KEY_WORDS)

quantum, computing, superposition, qubit


#### Summary of key points

For each of each of paragraphs, let's create a short summary.

In [15]:
summary = []
for i, x in enumerate(wiki_paragraphs):
    summary.append(f'{i+1}. {nlp_helper.summarize(x[:1500])}')

In [16]:
HTML(
    '<h4>Key Points</h4>' + 
    '\n'.join([ f'<li>{x}</li>' for x in summary ])
)

In [17]:
# The 10 points above can be used to create an even shorter summary.
nlp_helper.summarize('\n'.join(summary))

'Quantum computing is the development of a computer that uses quantum physics to perform computations that are impossible for classical computers.'

#### Checking for correct answers

In this example, we generate a "correct answer" based on the text. One incorrect answer,
and one correct answer (paraphrased slightly differently from the official "correct answer")
from a student are generated. The LLM is used to check if the student's answer is correct.

In [18]:
prompt=f"""Context:{wiki_txt}
What is quantum computing?"""
answer = nlp_helper.generate_text_from_prompt(prompt, temperature=0.01)
print(answer)

A quantum computer is a computer that exploits quantum mechanical phenomena.


In [19]:
prompt=f"""Context:{wiki_txt}
Question: What is quantum computing?
Answer: {answer}
Student: Quantum computing is using computers with quantum dots
Is this answer correct?"""
print(nlp_helper.generate_text_from_prompt(prompt, temperature=0.01))

no


In [20]:
prompt=f"""Context:{wiki_txt}
Question: What is quantum computing?
Answer: {answer}
Student: Quantum computing involves using computers that make use of quantum mechanics
Is this answer correct?"""
print(nlp_helper.generate_text_from_prompt(prompt, temperature=0.01))

yes


#### Generation of Question & Answer Pairs

In [21]:
nlp_helper.create_qna_pairs(wiki_txt, NQUESTIONS, output_style=nlp_helper.QNA_OUTPUT_STYLE, seed=1234)

### 3.2 Winnie the Pooh (by Alan Alexander Milne)

In [22]:
from documents import winnie_the_pooh

In [23]:
x = winnie_the_pooh.find('CHAPTER III')
pooh_txt = winnie_the_pooh[x:x+5000]  # Extract the first 5000 characters of chapter 3
print(pooh_txt)

CHAPTER III

                   IN WHICH POOH AND PIGLET GO HUNTING
                        AND NEARLY CATCH A WOOZLE


The Piglet lived in a very grand house in the middle of a beech-tree,
and the beech-tree was in the middle of the forest, and the Piglet lived
in the middle of the house. Next to his house was a piece of broken
board which had: "TRESPASSERS W" on it. When Christopher Robin asked the
Piglet what it meant, he said it was his grandfather's name, and had
been in the family for a long time, Christopher Robin said you
_couldn't_ be called Trespassers W, and Piglet said yes, you could,
because his grandfather was, and it was short for Trespassers Will,
which was short for Trespassers William. And his grandfather had had two
names in case he lost one--Trespassers after an uncle, and William after
Trespassers.

"I've got two names," said Christopher Robin carelessly.

"Well, there you are, that proves it," said Piglet.

One fine winter's day when Piglet was brushing away the s

In [24]:
nlp_helper.ask(pooh_txt, "What is the storyline here?")

'The Piglet lived in a very grand house in the middle of a beech-tree, and the beech-tree was in the middle of the forest, and the Piglet lived in the middle of the house. Next to his house was a piece of broken board which had: "TRESPASSERS W" on it. When Christopher Robin asked the Piglet what it meant, he said it was his grandfather\'s name, and had been in the family for a long time. Christopher Robin said you _could_ be called Trespassers W, and Piglet said yes, you could, because his grandfather was, and it was short for Trespassers Will'

In [25]:
nlp_helper.ask(pooh_txt, "Who is the main character?")

'Winnie-the-Pooh'

In [26]:
nlp_helper.ask(pooh_txt, "What happens at the end?")

'Winnie-the-Pooh and Piglet nearly catch a Woozle.'

In [27]:
nlp_helper.create_qna_pairs(pooh_txt, NQUESTIONS, output_style=nlp_helper.QNA_OUTPUT_STYLE, seed=12345)

### 3.3 Attention is all you need (by Vaswani et al)

In [28]:
attention = nlp_helper.extract_pages('source_documents_dir/attention.pdf')

In [29]:
attention_txt = '\n\n'.join(attention[1:3] + attention[9:10])  # We will use pages 1, 2 (for the intro), and 9 (for the conclusion)
# print(f'{attention_txt[:NCHARS]}...\n\n\n...{attention_txt[-NCHARS:]}')
IFrame('source_documents_dir/attention.pdf', width=800, height=400)

#### Question Answering

In [30]:
nlp_helper.ask(attention_txt, "What is the main gist of the paper?")

'We propose the Transformer, a model architecture eschewing recurrence and instead relying entirely on an attention mechanism to draw global dependencies between input and output. The Transformer allows for significantly more parallelization and can reach a new state of the art in translation quality after being trained for as little as twelve hours on eight P100 GPUs.'

In [31]:
nlp_helper.ask(attention_txt, "What is the problem being solved?")

'The Transformer is the first sequence transduction model relying entirely on attention, replacing the recurrent layers most commonly used in encoder-decoder architectures with multi-headed self-attention.'

In [32]:
nlp_helper.ask(attention_txt, "What is the conclusion of the paper?")

'Transformer is the first sequence transduction model based entirely on attention, replacing the recurrent layers most commonly used in encoder-decoder architectures with multi-headed self-attention.'

In [33]:
chunk_size = len(attention_txt)//8
print(f'The text will be split up into chunks of {chunk_size} characters and summarized')

The text will be split up into chunks of 1202 characters and summarized


In [34]:
display(HTML('<h4>Key Points</h4>'))
summary = []
for i in range(8):
    x0 = i*chunk_size
    x1 = (i+1)*chunk_size
    line_summary = f'{i+1}. {nlp_helper.summarize(attention_txt[x0:x1])}'
    display(HTML(line_summary))
    summary.append(line_summary)

In [35]:
nlp_helper.create_qna_pairs(attention_txt, NQUESTIONS, output_style=nlp_helper.QNA_OUTPUT_STYLE)

### 3.4 Australian Budget 2023-24 Overview (Medicare)

In this example, we look at the Australian Budget 2023-24 and we focus on the Medicare improvements.

In [36]:
# Extracting the pages from the Budget overview and work on the pages 24 to 27 (Medicare related)
aus_budget_overview = nlp_helper.extract_pages('source_documents_dir/aus_budget_overview-2023-24.pdf')
txt_aus_budget_overview_medicare = '\n\n'.join(aus_budget_overview[24:27])  # We will use pages 24 to 27. Those pages cover the Medicare budget.
print(f'{txt_aus_budget_overview_medicare[:NCHARS]}...\n\n\n...{txt_aus_budget_overview_medicare[-NCHARS:]}')

Historic investment in Medicare 
Strengthening Medicare
Medicare is the foundation of Australia’s primary health care system. In this 
Budget, the Government is investing $5.7 billion over 5 years from 2022—23 to 
strengthen Medicare and make it cheaper and easier to see a doctor.
The Strengthening Medicare package includes the largest investment in bulk 
billing incentives ever. The Government is...


...llion over 4 years to establish the Primary Care and Midwifery 
Scholarships program, supporting registered nurses and midwives in 
post-graduate study to improve their skills 
• $31.6 million over 2 years for improved training arrangements for 
international medical students working rur al and remote locations.
26 Strengthening MedicareStronger foundations for a better future   |   Budget 2023–24


#### Question Answering

In [37]:
nlp_helper.summarize(txt_aus_budget_overview_medicare)

'The Government is investing $5.7 billion over 5 years from 2022-23 to strengthen Medicare and make it cheaper and easier to see a doctor.'

In [38]:
nlp_helper.create_qna_pairs(txt_aus_budget_overview_medicare, 5, output_style=nlp_helper.QNA_OUTPUT_STYLE)

In [39]:
nlp_helper.ask(txt_aus_budget_overview_medicare,"What is a Level B consultation?")

'telehealth general practice services which are between 6 and 20 minutes in length'

In [40]:
nlp_helper.ask(txt_aus_budget_overview_medicare, "How much is the govermement investing?")

'$5.7 billion'

In [41]:
nlp_helper.ask(txt_aus_budget_overview_medicare, "Is the governement helping the homeless people?")

'The Government will also invest in new services to help homeless people and culturally and linguistically diverse communities to access primary care.'

## 4. Cleanup

Delete the SageMaker model and endpoint

In [44]:
# model_predictor.delete_model()
# model_predictor.delete_endpoint()

Deleting model with name: jumpstart-example-huggingface-text2text-2023-07-04-15-38-05-197
Deleting endpoint configuration with name: jumpstart-example-huggingface-text2text-2023-07-04-15-38-05-197
Deleting endpoint with name: jumpstart-example-huggingface-text2text-2023-07-04-15-38-05-197


To completely shutdown SageMaker, go to File > Shut Down > Shutdown All