# LAB | Abstractive Question Answering

Abstractive question-answering focuses on the generation of multi-sentence answers to open-ended questions. It usually works by searching massive document stores for relevant information and then using this information to synthetically generate answers. This notebook demonstrates how Pinecone helps you build an abstractive question-answering system. We need three main components:

- A vector index to store and run semantic search
- A retriever model for embedding context passages
- A generator model to generate answers

# Install Dependencies

In [1]:
!pip -q install -U --force-reinstall numpy
!pip -q install -U datasets

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
langchain-pinecone 0.2.13 requires pinecone[asyncio]<8.0.0,>=6.0.0, but you have pinecone 8.1.0 which is incompatible.
google-colab 1.0.0 requires pandas==2.2.2, but you have pandas 3.0.1 which is incompatible.
cudf-cu12 25.10.0 requires pandas<2.4.0dev0,>=2.0, but you have pandas 3.0.1 which is incompatible.
db-dtypes 1.5.0 requires pandas<3.0.0,>=1.5.3, but you have pandas 3.0.1 which is incompatible.
bqplot 0.12.45 requires pandas<3.0.0,>=1.0.0, but you have pandas 3.0.1 which is incompatible.
gradio 5.50.0 requires pandas<3.0,>=1.0, but you have pandas 3.0.1 which is incompatible.
tensorflow 2.19.0 requires numpy<2.2.0,>=1.26.0, but you have numpy 2.4.2 which is incompatible.
tensorflow 2.19.0 requires protobuf!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<6.0.0dev,>=3.20.3, but you have protobuf 

In [2]:
import numpy as np
print(np.__version__)

2.4.2


In [3]:
!pip -q install --upgrade --force-reinstall pyarrow pandas xxhash
!pip -q install --upgrade --force-reinstall datasets

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/193.9 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m193.9/193.9 kB[0m [31m7.6 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/229.9 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m229.9/229.9 kB[0m [31m14.5 MB/s[0m eta [36m0:00:00[0m
[?25h[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
langchain-pinecone 0.2.13 requires pinecone[asyncio]<8.0.0,>=6.0.0, but you have pinecone 8.1.0 which is incompatible.
google-colab 1.0.0 requires pandas==2.2.2, but you have pandas 3.0.1 which is incompatible.
cudf-cu12 25.10.0 requires pandas<2.4.0dev0,>=2.0, but you have pandas 3.0.1 which is incompatible.
db-dtypes 1.5.0 requires panda

In [1]:
import numpy as np
print("numpy", np.__version__)

import pyarrow as pa
print("pyarrow", pa.__version__)

import pandas as pd
print("pandas", pd.__version__)

from datasets import load_dataset
print("datasets import OK")

numpy 2.4.2
pyarrow 23.0.1
pandas 3.0.1
datasets import OK


In [2]:
import numpy as np, numpy
print("numpy file:", numpy.__file__)

import pkgutil
mods = ["pyarrow", "pandas", "datasets"]
for m in mods:
    spec = pkgutil.find_loader(m)
    print(m, "->", spec.get_filename() if spec else None)

numpy file: /usr/local/lib/python3.12/dist-packages/numpy/__init__.py
pyarrow -> /usr/local/lib/python3.12/dist-packages/pyarrow/__init__.py
pandas -> /usr/local/lib/python3.12/dist-packages/pandas/__init__.py
datasets -> /usr/local/lib/python3.12/dist-packages/datasets/__init__.py


  spec = pkgutil.find_loader(m)




---



In [None]:
#!pip install -U langchain langchain-core langchain-classic langchain-pinecone langchain-huggingface datasets pinecone-client sentence-transformers torch

In [None]:
#!pip uninstall -y pyarrow datasets numpy pandas transformers sentence-transformers huggingface_hub -q
#!pip install numpy==1.26.4 "pyarrow==14.0.2" "datasets==2.18.0" transformers sentence-transformers huggingface_hub -U -q

In [None]:
# !pip -q install -U datasets transformers sentence-transformers huggingface_hub



---



In [3]:
import requests

d = "vblagoje/wikipedia_snippets_streamed"
print(requests.get(f"https://datasets-server.huggingface.co/parquet?dataset={d}").json())

{'error': "The dataset viewer doesn't support this dataset because it runs arbitrary python code. Please open a discussion in the discussion tab if you think this is an error and tag @lhoestq and @severo."}


In [4]:
!pip -q install "datasets==2.18.0"

# Load and Prepare Dataset

Our source data will be taken from the Wiki Snippets dataset, which contains over 17 million passages from Wikipedia. But, since indexing the entire dataset may take some time, we will only utilize 50,000 passages in this demo that include "History" in the "section title" column. If you want, you may utilize the complete dataset. Pinecone vector database can effortlessly manage millions of documents for you.

In [5]:
# load the dataset from huggingface in streaming mode and shuffle it
from datasets import load_dataset
wiki_data = load_dataset(
    'vblagoje/wikipedia_snippets_streamed',
    split='train',
    streaming=True
).shuffle(seed=960)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this dataset from the next major release of `datasets`.


We are loading the dataset in the streaming mode so that we don't have to wait for the whole dataset to download (which is over 9GB). Instead, we iteratively download records one at a time.

In [6]:
# show the contents of a single document in the dataset
next(iter(wiki_data))

{'wiki_id': 'Q7649565',
 'start_paragraph': 20,
 'start_character': 272,
 'end_paragraph': 24,
 'end_character': 380,
 'article_title': 'Sustainable Agriculture Research and Education',
 'section_title': "2000s & Evaluation of the program's effectiveness",
 'passage_text': "preserving the surrounding prairies. It ran until March 31, 2001.\nIn 2008, SARE celebrated its 20th anniversary. To that date, the program had funded 3,700 projects and was operating with an annual budget of approximately $19 million. Evaluation of the program's effectiveness As of 2008, 64% of farmers who had received SARE grants stated that they had been able to earn increased profits as a result of the funding they received and utilization of sustainable agriculture methods. Additionally, 79% of grantees said that they had experienced a significant improvement in soil quality though the environmentally friendly, sustainable methods that they were"}

In [7]:
# The 'wiki_snippets' dataset does not have 'section_title', so we will proceed without this specific filter
history = wiki_data

Let's iterate through the dataset and apply our filter to select the 50,000 historical passages. We will extract `article_title`, `section_title` and `passage_text` from each document.

In [8]:
from tqdm.auto import tqdm

total_doc_count = 50_000
docs = []

for d in tqdm(history):
    docs.append(
        {
            "wiki_id": d["wiki_id"],
            "article": d["article_title"],
            "section": d["section_title"],
            "passage": d["passage_text"],
        }
    )

    if len(docs) >= total_doc_count:
        break

0it [00:00, ?it/s]

In [9]:
import pandas as pd

# create a pandas dataframe with the documents we extracted
df = pd.DataFrame(docs)
df.head()

Unnamed: 0,wiki_id,article,section,passage
0,Q7649565,Sustainable Agriculture Research and Education,2000s & Evaluation of the program's effectiveness,preserving the surrounding prairies. It ran un...
1,Q1739333,State Street Bank & Trust Co. v. Signature Fin...,Bilski,"a claim is patent-eligible under § 101,"" and ""..."
2,Q7709900,Thames Punting Club,Punt racing,boat that your opponent can. The narrowest of ...
3,Q7721521,The Case of the Late Pig,Plot summary,The Case of the Late Pig Plot summary As Lugg ...
4,Q56121171,Stradivarius (horse),2018: four-year-old season,"place in the straight, caught Torcedor inside ..."


In [10]:
len(docs), df.shape

(50000, (50000, 4))

# Initialize Pinecone Index

The Pinecone index stores vector representations of our historical passages which we can retrieve later using another vector (query vector). To build our vector index, we must first establish a connection with Pinecone. For this, we need an API from Pinecone. You can get one for free from [here](https://app.pinecone.io/), and after that, we initialize the connection as follows:

In [11]:
!pip uninstall -y pinecone-client pinecone -q
!pip install "pinecone[grpc]" -U -q  # Latest v5 with gRPC

[0m[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
langchain-pinecone 0.2.13 requires pinecone[asyncio]<8.0.0,>=6.0.0, but you have pinecone 8.1.0 which is incompatible.[0m[31m
[0m

In [12]:
from google.colab import userdata
import os
from pinecone import Pinecone
from pinecone import ServerlessSpec

# initialize connection to pinecone (get API key at app.pinecone.io)
pinecone_api_key = os.environ.get('PINECONE_API_KEY') or 'PINECONE_API_KEY'


Now we setup our index specification, this allows us to define the cloud provider and region where we want to deploy our index. You can find a list of all [available providers and regions here](https://docs.pinecone.io/docs/projects).

In [13]:
spec = ServerlessSpec(
    cloud="aws", region="us-east-1"
)

# connect to pinecone environment
pc = Pinecone(api_key=pinecone_api_key, environment=spec.region)

Now we create a new index. We will name it "abstractive-question-answering" — you can name it anything we want. We specify the metric type as "cosine" and dimension as 768 because the retriever we use to generate context embeddings is optimized for cosine similarity and outputs 768-dimension vectors.

In [14]:
from pinecone import Pinecone, ServerlessSpec
from google.colab import userdata
import time

index_name = "abs-qa-ben-v1"
DIMENSION = 384
METRIC = "cosine"

PINECONE_API_KEY = userdata.get("PINECONE_API_KEY")
pc = Pinecone(api_key=PINECONE_API_KEY)

existing = {idx["name"] for idx in pc.list_indexes()}
if index_name not in existing:
    pc.create_index(
        name=index_name,
        dimension=DIMENSION,
        metric=METRIC,
        spec=ServerlessSpec(cloud="aws", region="us-east-1"),
    )
    while not pc.describe_index(index_name).status["ready"]:
        time.sleep(1)

index = pc.Index(index_name)

# verify empty (no delete needed)
stats = index.describe_index_stats()
print(stats)

assert stats.get("total_vector_count", 0) == 0, "Index is not empty (total_vector_count != 0)"

{'_response_info': {'raw_headers': {'connection': 'keep-alive',
                                    'content-length': '150',
                                    'content-type': 'application/json',
                                    'date': 'Wed, 25 Feb 2026 15:16:58 GMT',
                                    'grpc-status': '0',
                                    'server': 'envoy',
                                    'x-envoy-upstream-service-time': '44',
                                    'x-pinecone-request-latency-ms': '43',
                                    'x-pinecone-response-duration-ms': '45'}},
 'dimension': 384,
 'index_fullness': 0.0,
 'memoryFullness': 0.0,
 'metric': 'cosine',
 'namespaces': {},
 'storageFullness': 0.0,
 'total_vector_count': 0,
 'vector_type': 'dense'}


# Initialize Retriever

Next, we need to initialize our retriever. The retriever will mainly do two things:

- Generate embeddings for all historical passages (context vectors/embeddings)
- Generate embeddings for our questions (query vector/embedding)

The retriever will create embeddings such that the questions and passages that hold the answers to our queries are close to one another in the vector space. We will use a SentenceTransformer model based on Microsoft's MPNet as our retriever. This model performs quite well for comparing the similarity between queries and documents. We can use Cosine Similarity to compute the similarity between query and context vectors generated by this model (Pinecone automatically does this for us).

In [15]:
# --- Fill the None placeholders only (keep the rest of your lab code as-is) ---

from sentence_transformers import SentenceTransformer

# the lab usually has something like:
# embedder = None
# retriever = None

embedder = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")  # 384-dim, matches your Pinecone index

def retriever(query: str, top_k: int = 3):
    # This is the missing "retrieval" piece:
    # 1) embed the query
    q_vec = embedder.encode(query, normalize_embeddings=True).tolist()

    # 2) query Pinecone
    res = index.query(
        vector=q_vec,
        top_k=top_k,
        include_metadata=True,
    )

    # 3) return what the rest of the lab expects: the retrieved passages
    # If your later cells expect strings, return strings.
    return [m["metadata"]["passage"] for m in res.get("matches", []) if m.get("metadata")]

Loading weights:   0%|          | 0/103 [00:00<?, ?it/s]

[1mBertModel LOAD REPORT[0m from: sentence-transformers/all-MiniLM-L6-v2
Key                     | Status     |  | 
------------------------+------------+--+-
embeddings.position_ids | UNEXPECTED |  | 

[3mNotes:
- UNEXPECTED[3m	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.[0m


In [16]:
retriever("What is the capital of Australia?", top_k=1)[:1] # alignment check

[]

# Generate Embeddings and Upsert

Next, we need to generate embeddings for the context passages. We will do this in batches to help us more quickly generate embeddings and upload them to the Pinecone index. When passing the documents to Pinecone, we need an id (a unique value), context embedding, and metadata for each document representing context passages in the dataset. The metadata is a dictionary containing data relevant to our embeddings, such as the article title, section title, passage text, etc.

In [18]:
from tqdm.auto import tqdm

batch_size = 64  # None -> 64 (good default for Colab; raise/lower if needed)

for i in tqdm(range(0, len(docs), batch_size)):
    batch = docs[i : i + batch_size]

    # None -> the text we embed
    texts = [x["passage"] for x in batch]

    # None -> embeddings for the batch
    vectors = embedder.encode(
        texts,
        batch_size=batch_size,
        show_progress_bar=False,
        normalize_embeddings=True,
    ).tolist()

    # None -> format records for Pinecone upsert
    to_upsert = []
    for j, x in enumerate(batch):
        vec_id = str(x["wiki_id"])  # must be a string
        metadata = {
            "wiki_id": x["wiki_id"],
            "article": x["article"],
            "section": x["section"],
            "passage": x["passage"],
        }
        to_upsert.append((vec_id, vectors[j], metadata))

    # None -> upsert into the index
    index.upsert(vectors=to_upsert)

  0%|          | 0/782 [00:00<?, ?it/s]

# Initialize Generator

We will use ELI5 BART for the generator which is a Sequence-To-Sequence model trained using the ‘Explain Like I’m 5’ (ELI5) dataset. Sequence-To-Sequence models can take a text sequence as input and produce a different text sequence as output.

The input to the ELI5 BART model is a single string which is a concatenation of the query and the relevant documents providing the context for the answer. The documents are separated by a special token &lt;P>, so the input string will look as follows:

>question: What is a sonic boom? context: &lt;P> A sonic boom is a sound associated with shock waves created when an object travels through the air faster than the speed of sound. &lt;P> Sonic booms generate enormous amounts of sound energy, sounding similar to an explosion or a thunderclap to the human ear. &lt;P> Sonic booms due to large supersonic aircraft can be particularly loud and startling, tend to awaken people, and may cause minor damage to some structures. This led to prohibition of routine supersonic flight overland.

More detail on how the ELI5 dataset was built is available [here](https://arxiv.org/abs/1907.09190) and how ELI5 BART model was trained is available [here](https://yjernite.github.io/lfqa.html).

Let's initialize the BART model using transformers.

In [20]:
from transformers import BartTokenizer, BartForConditionalGeneration

import torch
device = "cuda" if torch.cuda.is_available() else "cpu"

# load bart tokenizer and model from huggingface
tokenizer = BartTokenizer.from_pretrained('vblagoje/bart_lfqa')
generator = BartForConditionalGeneration.from_pretrained('vblagoje/bart_lfqa').to(device)

Loading weights:   0%|          | 0/512 [00:00<?, ?it/s]

All the components of our abstract QA system are complete and ready to be queried. But first, let's write some helper functions to retrieve context passages from Pinecone index and to format the query in the way the generator expects the input.

In [27]:
def query_pinecone(query, top_k=3):
    # 1) embed the query (must match your index dimension)
    xq = embedder.encode(query, normalize_embeddings=True).tolist()

    result = index.query(vector=xq, top_k=top_k, include_metadata=True)
    return result

    # 2) query pinecone for the most similar vectors
    res = index.query(
        vector=xq,
        top_k=top_k,
        include_metadata=True
    )

    # 3) return a clean list the rest of the lab can use
    matches = []
    for m in res.get("matches", []):
        md = m.get("metadata", {}) or {}
        matches.append({
            "score": m.get("score"),
            "wiki_id": md.get("wiki_id"),
            "article": md.get("article"),
            "section": md.get("section"),
            "passage": md.get("passage"),
        })

    return matches

In [28]:
def format_query(query, context):
    # context: list of retrieved docs (dicts)
    content = " ".join([c["passage"] for c in context if c.get("passage")])
    return f"question: {query} context: {content}"

Let's test the helper functions. We will query the Pinecone index function we created earlier with the `query_pinecone` to get context passages and pass them to the `format_query` function.

In [29]:
query = "when was the first electric power system built?"
result = query_pinecone(query, top_k=1)
result

QueryResponse(matches=[{'id': 'Q2388981',
 'metadata': {'article': 'Electric power system',
              'passage': 'Electric power system History In 1881, two '
                         "electricians built the world's first power system at "
                         'Godalming in England. It was powered by two '
                         'waterwheels and produced an alternating current that '
                         'in turn supplied seven Siemens arc lamps at 250 '
                         'volts and 34 incandescent lamps at 40 volts. '
                         'However, supply to the lamps was intermittent and in '
                         '1882 Thomas Edison and his company, The Edison '
                         'Electric Light Company, developed the first '
                         'steam-powered electric power station on Pearl Street '
                         'in New York City. The Pearl Street Station initially '
                         'powered around 3,000 lamps for 59 cust

In [30]:
from pprint import pprint

In [31]:
# format the query in the form generator expects the input
query = format_query(query, result['matches'])
pprint(query)

'question: when was the first electric power system built? context: '


The output looks great. Now let's write a function to generate answers.

In [32]:
def generate_answer(query):
    # tokenize the query to get input_ids
    inputs = tokenizer([query], max_length=1024, return_tensors="pt").to(device)
    # use generator to predict output ids
    ids = generator.generate(inputs["input_ids"], num_beams=2, min_length=20, max_length=40)
    # use tokenizer to decode the output ids
    answer = tokenizer.batch_decode(ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
    return pprint(answer)

In [33]:
generate_answer(query)

('Electricity was first used in the 19th century. The first electric power '
 'system built was a steam engine powered by steam.')


As we can see, the generator used the provided context to answer our question. Let's run some more queries.

In [34]:
query = "How was the first wireless message sent?"
context = query_pinecone(query, top_k=5)
query = format_query(query, context["matches"])
generate_answer(query)

('The first wireless message was sent by the first radio transmitter. The '
 'first radio transmitter was the first radio receiver.')


To confirm that this answer is correct, we can check the contexts used to generate the answer.

In [36]:
for doc in context["matches"]:
    print(doc["metadata"]["passage"], end="\n---\n")

microbrowser and internet. It decodes the encoded WAP requests from the microbrowser and send the HTTP requests to the internet or to a local application server. It also encodes the WML and HDML data returning from the web for transmission to the microbrowser in the handset.
---
transmitter was moved from El Paso Peak to a new location adjacent to that of KTPI-FM. The format changed to adult hits as "Bob FM" shortly thereafter, and the call letters were changed to KGBB.
---
WJTI History The 1460 kHz frequency signed on the air in 1950 with the WRAC call sign. The owner of the station purchased another Racine station, WRJN-FM in 1969, changing it to WRAC-FM. A year later, the FM station flipped to a rock-leaning top 40 format as WRKR, and WRAC later adopted that call sign, simulcasting their FM sister station.
It was also for a brief time WWEG ("The Country Egg") before returning to WRKR and again simulcasting the FM signal. Later, there was a short lived Spanish format.
The station swi

In this case, the answer looks correct. If we ask a question and no relevant contexts are retrieved, the generator will typically return nonsensical or false answers, like with this question about COVID-19:

In [37]:
query = "where did COVID-19 originate?"
context = query_pinecone(query, top_k=5)
query = format_query(query, context["matches"])
generate_answer(query)

('COVID-19 is a name for COVID-19. COVID-19 is the acronym for COVID-19. '
 'COVID-19 is the acronym for COVID-19.')


In [39]:
for doc in context["matches"]:
    print(doc["metadata"]["passage"], end='\n---\n')

found in Taiwan, Japan, Malaysia, India, and China.
---
Mokola lyssavirus Classification Mokola virus (MOKV) is a member of the genus Lyssavirus, which belongs to the family Rhabdoviridae. MOKV is one of four lyssaviruses found in Africa. The other three viruses are the classic rabies virus, Duvenhage virus and Lagos bat virus. Emergence MOKV was first isolated in Nigeria, in 1968, from three shrews (Crocidura species) found in the Mokola forest, Ibadan, Oyo State, Nigeria. The virus was shown to be morphologically and serologically related to rabies virus. Since the initial isolation of MOKV, the virus has been mainly isolated in domestic cats and small mammals in sub-Saharan Africa.
There
---
which blood was not routinely screened for this virus. He became an active worker for the Canadian Hemophilia Society and campaigned for transfusion safety ever since getting infected, but developed AIDS, of which he died in 1993.
---
Sangassou orthohantavirus Genome The virus genome consists of

Let’s finish with a final few questions.

In [40]:
query = "what was the war of currents?"
context = query_pinecone(query, top_k=5)
query = format_query(query, context["matches"])
generate_answer(query)

"The war of currents is a bit of a stretch, but it's a good one to start with."


In [41]:
query = "who was the first person on the moon?"
context = query_pinecone(query, top_k=3)
query = format_query(query, context["matches"])
generate_answer(query)

('The first man to walk on the moon was Neil Armstrong. He was the first man '
 'to walk on the moon.')


In [42]:
query = "what was NASAs most expensive project?"
context = query_pinecone(query, top_k=3)
query = format_query(query, context["matches"])
generate_answer(query)

("I don't know if this counts, but I'm curious about what the most expensive "
 'project was.')


As we can see, the model can generate some decent answers.

#### Add a few more questions

In [44]:
query = "How does a microwave oven heat food differently than a normal oven?"
context = query_pinecone(query, top_k=2)
query = format_query(query, context["matches"])
generate_answer(query)

('Microwave ovens heat food differently than a normal oven. Microwave ovens '
 'heat food differently than a normal oven. Microwave ovens heat food '
 'differently than a normal oven.')


In [45]:
query = "Why do some animals use camouflage while others use bright warning colors?"
context = query_pinecone(query, top_k=2)
query = format_query(query, context["matches"])
generate_answer(query)

("Some animals use camouflage because it's easier to hide in the dark. Some "
 "animals use camouflage because it's easier to hide in the dark. Some animals "
 "use camouflage because it's easier to hide in")


In [46]:
query = "Who was it that invented the story of santa claws"
context = query_pinecone(query, top_k=2)
query = format_query(query, context["matches"])
generate_answer(query)

('The story of Santa claws is a fairly recent invention. It was first recorded '
 'in the 18th century.')


In [47]:
query = "How many breeds of dag are there in europe?"
context = query_pinecone(query, top_k=2)
query = format_query(query, context["matches"])
generate_answer(query)

('There are a few breeds of dag in Europe. Here is a list of the breeds in the '
 'UK.')


In [48]:
query = "Which nations in the world has the highest standards of education"
context = query_pinecone(query, top_k=2)
query = format_query(query, context["matches"])
generate_answer(query)

("I'm not sure about the US, but I'm pretty sure the UK has the highest "
 'standards of education in the world.')
