## Purpose

NOTE: Needs to be run on Google Colaboratory A100 GPU or similar spec (as DeepSeek 14B Distilled in semi precision takes significant memory)

The purpose of this notebook is to generate synthetic question-answer (QA) pairs from the existing scraped Citizen Information knowledge base to generate a dataset for fine-tuning the Mistral 7B Instruct model.

DeepSeek 14B Distilled model used for generation of QA pairs.

An interesting exploration of validation criteria for determining whether a valid QA pair has been generated from the input prompt.

Random chunks of Citizen Information documents were provided as context to the model with the task of generating a QA pair from the provided context.

A mix of prompts were used to add some diversity to the type of question-answer pairs generated:

- 60% basic QA pairs
- 30% intermediate QA pairs
- 10% advanced QA pairs

## Install Additional Packages

In [None]:
!pip install -q transformers accelerate langchain_huggingface
!pip install -q langchain_community
!pip install -q "unstructured[all-docs]"
!pip install -q nltk

## Imports

In [None]:
from tqdm import tqdm
from langchain.docstore.document import Document as LangchainDocument
from typing import Optional, List, Tuple
from google.colab import drive, runtime
from datetime import datetime
import json
import os
from langchain_community.document_loaders import (
    DirectoryLoader,
    UnstructuredMarkdownLoader,
)
from langchain.docstore.document import Document
from langchain import PromptTemplate, LLMChain
from langchain_huggingface.llms import HuggingFacePipeline

from langchain.text_splitter import RecursiveCharacterTextSplitter
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
from typing import Optional, List, Tuple, Any, Union, Pattern, Dict, Callable
from sentence_transformers import SentenceTransformer, util
from collections import Counter
import nltk
from nltk.corpus import stopwords
import torch
import random
import re

In [None]:
nltk.download("stopwords")

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.


True

In [None]:
# Clone down my git repo where Citizen Information scraped documents are stored
!git clone https://github.com/JohnDennehy101/webScraperCitizensInformation.git

Cloning into 'webScraperCitizensInformation'...
remote: Enumerating objects: 41164, done.[K
remote: Counting objects: 100% (53/53), done.[K
remote: Compressing objects: 100% (31/31), done.[K
remote: Total 41164 (delta 27), reused 37 (delta 21), pack-reused 41111 (from 2)[K
Receiving objects: 100% (41164/41164), 59.75 MiB | 18.52 MiB/s, done.
Resolving deltas: 100% (6051/6051), done.
Updating files: 100% (43422/43422), done.


In [None]:
MARKDOWN_SEPARATORS = [
    "\n#{1,6} ",
    "```\n",
    "\n\\*\\*\\*+\n",
    "\n---+\n",
    "\n___+\n",
    "\n\n",
    "\n",
    " ",
    "",
]

def split_documents(
    chunk_size: int,
    knowledge_base: List[LangchainDocument],
    tokenizer_name: Optional[str] = "Alibaba-NLP/gte-Qwen2-1.5B-instruct",
) -> List[LangchainDocument]:
    """
    Split documents into chunks of maximum size `chunk_size` tokens and return a list of documents.

    Args:
        chunk_size (int): determines what size each chunk should be
        knowledge_base (List[LangchainDocument]): list of langchain documents that are to be split into chunks
        tokenizer_name (Optional[str]): What tokenizer model should be used
    
    Returns:
        List[LangchainDocument]: Returns chunked documents
    """

    # Use the RecursiveCharacterTextSplitter to split the data into chunks based on parameter values
    # Note the overlap is set to 10% to minimise chance of context loss between chunks
    text_splitter = RecursiveCharacterTextSplitter.from_huggingface_tokenizer(
        AutoTokenizer.from_pretrained(tokenizer_name),
        chunk_size=chunk_size,
        chunk_overlap=int(chunk_size / 10),
        add_start_index=True,
        strip_whitespace=True,
        separators=MARKDOWN_SEPARATORS,
    )

    docs_processed = []
    for doc in knowledge_base:
        # Loop over documents and append chunk to docs_processed variable
        docs_processed += text_splitter.split_documents([doc])

    # Remove duplicates
    unique_texts = {}
    docs_processed_unique = []

    # Check to ensure that no duplicates are included
    for doc in docs_processed:
        if doc.page_content not in unique_texts:
            unique_texts[doc.page_content] = True
            docs_processed_unique.append(doc)

    return docs_processed_unique

In [None]:
def check_file_exists(file_path: str) -> bool:
  """
  This function checks if provided file path exists

  Args:
    file_path (str): file path which should be checked for file existence
  
  Returns:
    bool: whether the file exists or not
  """

  # Uses os package isfile method to determine if a file exists at the provided file path
  # If so, return True, else return False
  if os.path.isfile(file_path):
    return True

  return False

In [None]:
def write_json_file(output_file_path: str, content: Any) -> bool:
  """
  Writes content to json file at provided output path

  Args:
    output_file_path (str): File path at which file should be generated
    content (Any): The content to be written within the file
  
  Returns:
    bool: indicates if writing json was successful or not
  """

  # Get directory name from output file path parameter value
  output_directory = os.path.dirname(output_file_path)

  # If the directory does not exist, create it
  if output_directory and not os.path.exists(output_directory):
    os.makedirs(output_directory)

  # Try writing content to the file - return True if all successful, otherwise catch the error, log and return False
  try:
    with open(output_file_path, "w") as output_file:
      json.dump(content, output_file, indent=4)

    print("Successfully saved content to {}".format(output_file_path))

    return True
  except(OSError, IOError) as e:
    print("Error saving content to {}".format(output_file_path))
    return False

In [None]:
def read_json_file(input_file_path: str, default: Any = []) -> Any:
  """
  Reads content from json file at provided input file path

  Args:
    input_file_path (str): the file path where the target file resides
    default (Any): the default structure of the expected file (to be returned in case of error to avoid consuming errors)
  
  Returns:
    Any: the content read from the file
  """

  # Call utility function to check if a file actually exists at the provided file path parameter
  # If it does not exist, return the default data structure
  if not check_file_exists(input_file_path):
    return default

  # Try open the file and read the contents
  # If successful, return the contents. If not successful, return the default data structure
  try:
    with open(input_file_path, "r", encoding="utf-8") as input_file:
      content = json.load(input_file)

    print("Successfully loaded content from {} file".format(input_file_path))
    return content

  except (OSError, IOError, json.JSONDecodeError) as e:
    print("Error reading from file path: {}".format(input_file_path))
    return default

In [None]:
def extract_file_name_from_metadata(metadata) -> str:
  """
  Function that extracts file name from metadata
  """

  # Try extract source property from metadata, otherwise default to empty string
  source_path = metadata.get("source", "")

  # If markdown in the path, extract file name, and replace markdown with html ending
  if "/markdown/" in source_path:
    file_name = source_path.split("/markdown/")[-1]

    file_name = file_name.replace(".md", ".html")

    # Retrun modified file name
    return file_name

  # If no match, return empty string
  return ""

In [None]:
def find_metadata_by_file_name(metadata_list, file_name):
  """
  Search for additional metadata by file name
  """

  # Loop over metadata list if match found between value in "fileName" property and passed file_name parameter value, return metadata dict
  for entry in metadata_list:
    if entry.get("fileName") == file_name:
      return entry

  # If no match found, return None
  return None

In [None]:
# Initialise target paths within git repo
directory_path = "/content/webScraperCitizensInformation/src/data/markdown"
metadata_path = "/content/webScraperCitizensInformation/src/data/metadata/file_metadata.json"

# Read provided metadata file which was generated during web-scraping
metadata_file_contents = read_json_file(metadata_path)

# Load markdown files which store the scraped content
loader = DirectoryLoader(
    path=directory_path,
    glob="*.md",
    loader_cls=UnstructuredMarkdownLoader,
    recursive=True
)

# Load documents
documents = loader.load()

# Filter documents to only English language ones (as core LLMs don't currently support Irish)
filtered_documents = [
    doc for doc in documents if not os.path.basename(doc.metadata['source']).startswith('ga_')
]

if documents:
    print("{} documents found".format(len(filtered_documents)))
else:
    print("No documents found.")

Successfully loaded content from /content/webScraperCitizensInformation/src/data/metadata/file_metadata.json file
1924 documents found


In [None]:
RAW_KNOWLEDGE_BASE = []

# Loop over documents, append metadata for file if relevant
for doc in tqdm(filtered_documents):
  doc_dict = dict(doc)
  file_name = extract_file_name_from_metadata(doc_dict.get("metadata"))
  file_manual_metadata = find_metadata_by_file_name(metadata_file_contents, file_name)

  combined_metadata = {**doc_dict.get("metadata", {})}

  if file_manual_metadata:
    combined_metadata.update(file_manual_metadata)

  enriched_document = Document(page_content=doc_dict.get("page_content", ""), metadata=combined_metadata)

  RAW_KNOWLEDGE_BASE.append(enriched_document)

100%|██████████| 1924/1924 [00:00<00:00, 11037.31it/s]


In [None]:
# Split documents into chunks
docs_processed = split_documents(
    512,
    RAW_KNOWLEDGE_BASE
)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

added_tokens.json:   0%|          | 0.00/80.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/370 [00:00<?, ?B/s]

In [None]:
print(docs_processed[0].metadata)

print(len(docs_processed))

{'source': '/content/webScraperCitizensInformation/src/data/markdown/en_money_and_tax_tax_income_tax_taxation_of_benefits_from_employment.md', 'url': 'https://www.citizensinformation.ie/en/money-and-tax/tax/income-tax/taxation-of-benefits-from-employment/', 'fileName': 'en_money_and_tax_tax_income_tax_taxation_of_benefits_from_employment.html', 'path': 'data/html/en_money_and_tax_tax_income_tax_taxation_of_benefits_from_employment.html', 'created': '2024-09-15T17:57:16.931100', 'lastUpdated': '2024-09-28T14:11:53.332077', 'linkPageSources': ['en_employment_employment_rights_and_conditions_contracts_of_employment_contract_of_employment.html', 'en_money_and_tax_tax_income_tax_taxation_of_benefits_from_employment.html'], 'start_index': 0}
6302


In [None]:
# DeepSeek's distilled model was used for this synthetic dataset generation
model_id = "deepseek-ai/DeepSeek-R1-Distill-Qwen-14B"

In [None]:
# Initialise tokeniser
tokeniser = AutoTokenizer.from_pretrained(model_id)

tokenizer_config.json: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

In [None]:
# Initialise model
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto")

config.json:   0%|          | 0.00/664 [00:00<?, ?B/s]

model.safetensors.index.json: 0.00B [00:00, ?B/s]

Fetching 4 files:   0%|          | 0/4 [00:00<?, ?it/s]

model-00002-of-000004.safetensors:   0%|          | 0.00/8.67G [00:00<?, ?B/s]

model-00003-of-000004.safetensors:   0%|          | 0.00/8.67G [00:00<?, ?B/s]

model-00004-of-000004.safetensors:   0%|          | 0.00/3.49G [00:00<?, ?B/s]

model-00001-of-000004.safetensors:   0%|          | 0.00/8.71G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/181 [00:00<?, ?B/s]

In [None]:
# 3 prompts used for synthetic dataset generation based on scraped documents information
# Basic prompt: simple question answer pairs
# Intermediate prompt: More thought, critical reasoning in the question-answer pair
# Advanced prompt: Longer, analytical question answer pair

basic_prompt = PromptTemplate(input_variables=["chunk_text"],
                              template = """You are an expert educator. Carefully read the text and do the following:
                                1. Write **one simple, clear question** about the text that helps understanding.
                                2. Provide a **clear and straightforward** answer to that question.

                                Make sure your response includes both the question and the answer, clearly labelled.

                                Text:
                                \"\"\"\
                                {chunk_text}
                                \"\"\"\

                                Respond in this exact format:

                                **Question**:
                                <your question here>

                                **Answer**:
                                <your answer here>""")

intermediate_prompt = PromptTemplate(input_variables=["chunk_text"],
                              template = """You are an expert educator. Carefully read the text and do the following:
                                1. Write **one thoughtful, critical thinking question** about the text that helps understanding.
                                2. Provide a **detailed, well-reasoned** answer to that question.

                                Make sure your response includes both the question and the answer, clearly labelled.

                                Text:
                                \"\"\"\
                                {chunk_text}
                                \"\"\"\

                                Respond in this exact format:

                                **Question**:
                                <your question here>

                                **Answer**:
                                <your answer here>""")

advanced_prompt = PromptTemplate(input_variables=["chunk_text"],
                              template = """You are an expert educator. Carefully read the text and do the following:
                                1. Write **one deep, analytical, challenging question** about the text that helps understanding.
                                2. Provide a **comprehensive, nuanced** answer to that question.

                                Make sure your response includes both the question and the answer, clearly labelled.

                                Text:
                                \"\"\"\
                                {chunk_text}
                                \"\"\"\

                                Respond in this exact format:

                                **Question**:
                                <your question here>

                                **Answer**:
                                <your answer here>""")

In [None]:
# Set prompts to list so that weighted prompting can be used
prompts = [basic_prompt, intermediate_prompt, advanced_prompt]

In [None]:
# Initialise corresponding weights
# Note: 60% on average should be basic prompts, 30% intermediate, 10% advanced
weights = [0.6, 0.3, 0.1]

In [None]:
def get_weighted_random_prompt():
  """
  Simple function to return a random prompt (using the weights to guide)
  """
  
  return random.choices(prompts, weights=weights, k=1)[0]

In [None]:
# Initialise hugging face pipeline with DeepSeek model and tokeniser
# Note hyperparameter configuration: low temp, high top P for highly deterministic, factual responses
pipe = pipeline("text-generation", model=model, tokenizer=tokeniser, max_new_tokens=1024, temperature=0.1, top_p=0.9)

Device set to use cuda:0


In [None]:
# Wrap hugging face pipeline with LangChain wrapper
llm = HuggingFacePipeline(pipeline=pipe)

In [None]:
def generate_question_answer_from_chunk(chunk_text, metadata):
  """
  Based on provided chunk text, use DeepSeek model to generate question-answer pair
  """

  # Extract prompt
  prompt = get_weighted_random_prompt()

  # Invoke model inference, pass model and prompt
  qa_chain = LLMChain(llm=llm, prompt=prompt)

  # Extract response from inference
  response = qa_chain.run({"chunk_text", chunk_text})

  # Search for target answer and question patterns using regex - note only defined here
  pattern_answer = r"\*\*Answer\*\*:\s*(.+?)(?=(?:\*\*Question\*\*|$))"
  pattern_question = r"\*\*Question\*\*:\s*(.+?)(?=(?:\*\*Answer\*\*|$))"

  # If question and answer words present in response, assumption is that output is as expected
  if "Question" in response and "Answer" in response:

    # Regex searches using earlier defined regex patterns for target question and answer properties
    question_matches = re.findall(pattern_question, response, flags=re.DOTALL)
    answer_matches = re.findall(pattern_answer, response, flags=re.DOTALL)

    # If match for question regex, extract last instance
    if question_matches:
      last_question = question_matches[-1].strip()

    # If match for answer regex, extract last instance
    if answer_matches:
      last_answer = answer_matches[-1].strip()

    # If both question and answer extracted, return valid response
    if last_question and last_answer:
      return {"chunk_text": chunk_text, "raw_answer": response, "question": last_question, "answer": last_answer, "prompt": prompt.template, "metadata": metadata}

    # Otherwise, return None to indicate unsuccessful extraction of values from generated output
    return None

In [None]:
# Initialise sentence transformer model
sentence_transformer_model = SentenceTransformer("all-MiniLM-L6-v2")

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [None]:
# Download stop words list which are used in validation
stop_words = set(stopwords.words("english"))

In [None]:
def token_overlap_ratio(answer, chunk):
  """
  Measure the overlap between the generated answer and the origina chunk text
  """

  # Use set to avoid benefitting repetitive responses
  answer_tokens = set(answer.lower().split())
  chunk_tokens = set(chunk.lower().split())

  # Calculate overlap value
  overlap = answer_tokens.intersection(chunk_tokens)

  # Overlap formula calculation as percentage of overall output
  return len(overlap) / len(answer_tokens) if answer_tokens else 0

In [None]:
def semantic_similarity(answer, chunk):
  """
  Measure semantic similarity between the generated answer and the original chunk text
  """

  # Use sentence transformers model to generate embeddings for both generated answer and original chunk text
  embedding_answer = sentence_transformer_model.encode(answer, convert_to_tensor=True)
  embedding_chunk = sentence_transformer_model.encode(chunk, convert_to_tensor=True)

  # Return calculation of cosine similarity of embedddings
  return util.pytorch_cos_sim(embedding_answer, embedding_chunk).item()

In [None]:
def keyword_presence(answer, chunk, top_k = 10):
  """
  Measure presence of most frequent words in chunk in generated output
  Top k is top number of words with which to conduct search
  """

  # Filter chunk tokens - removal of stop words and special characters
  chunk_tokens = [token for token in chunk.lower().split() if token.isalpha() and token not in stop_words]

  # Initialise counter to enable easy extraction of top frequency words
  frequency = Counter(chunk_tokens)

  # Extract key words from counter using frequency
  key_words = [token for token, count in frequency.most_common(top_k)]
  
  # Filter answer tokens - exclude special characters
  answer_tokens = set(token for token in answer.lower().split() if token.isalpha())

  # Check if most frequent words in chunk present in generated answer text
  present = [word in answer_tokens for word in key_words]

  # Return percentage of key words present
  return sum(present) / len(key_words) if key_words else 0

In [None]:
def repetition_in_answer(answer, n_gram_size = 3, threshold = 2):
  """
  Determine how repetitive generated answer is
  """

  # Lowercase all answer output tokens
  tokens = answer.lower().split()

  if len(tokens) < n_gram_size:
    return 0

  # Generate n grams for generated output
  n_grams = [" ".join(tokens[i:i+n_gram_size]) for i in range(len(tokens) - n_gram_size + 1)]

  # Utilise counter to enable easy frequency analysis
  counts = Counter(n_grams)

  # Get most common n-gram
  max_count = max(counts.values())

  # Calculate penalty score based on how many times this most frequent item appears, divide by length of n-grams to normalise
  penalty_score = (max_count - 1) / max(len(n_grams), 1)

  # Return penalty score
  return penalty_score

In [None]:
def is_answer_valid(answer, chunk, overlap_threshold=0.2, semantic_threshold = 0.4, keyword_threshold = 0.1, repetition_threshold = 0.3, combine_method="weighted") -> bool:
  """
  Function to determine if DeepSeek generated answer output for given Citizen Information data chunk is valid
  By using a mix of overlap threshold, semantic similarity, key word presence, repetition check
  """

  # Calculate scores for overlap, semantic similarity, keyword presence, amount of repetition
  overlap = token_overlap_ratio(answer, chunk)
  semantic = semantic_similarity(answer, chunk)
  keywords = keyword_presence(answer, chunk)
  repetitive = repetition_in_answer(answer)

  # If any of the scores do not exceed the stated thresholds, deemed a failed validation, return False
  if overlap < overlap_threshold or semantic < semantic_threshold or keywords < keyword_threshold or repetitive > repetition_threshold:
    print("Overlap {}".format(overlap))
    print("Semantic {}".format(semantic))
    print("Key words {}".format(keywords))
    print("Repetition {}".format(repetitive))
    return False

  # If score method is weighted, use weighted score to determine final composite score
  if combine_method == "weighted":
    score = (0.5 * semantic + 0.3 * overlap + 0.2 * keywords) - repetitive * 0.2
  
  # If score method is min, return min score from all metric results
  elif combine_method == "min":
    score = min(semantic, overlap, keywords)
  
  # Otherwise, use average of metrics
  else:
    score = (semantic + overlap + keywords) / 3

  # Return true if score is greater or equal to 0.5, False otherwise
  return score >= 0.5

In [None]:
def is_question_valid(question):
  """
  Function to determine if generated answer from DeepSeek model based on provided chunk text for Citizen Information document is valid
  """

  # Sanitise provided question
  question = question.strip()

  # Simple check, if generated output does not finish with ?, assume not a question
  if not question.endswith("?"):
    return False

  # Search for invalid characters, if any present also fail validaiton
  forbidden_characters = r"[<>{}\[\]\\|^~`$%#@]"

  if re.search(forbidden_characters, question):
    return False

  # If both simple checks pass, return True as assumption is that it is a valid question
  return True

In [None]:
# Google Drive was mounted as this notebook was ran overnight due to long inference times
drive.mount("/content/drive")

Mounted at /content/drive


In [None]:
# Output directory path in Google Drive defined
SAVE_PATH = "/content/drive/MyDrive/qa_generation_outputs"

In [None]:
# Make the directories in Google drive if they do not already exist
os.makedirs(SAVE_PATH, exist_ok=True)

In [None]:
# 1600 QA pairs to be generated with checkpoint save intervals at 100 QA pairs to ensure minimal chance of data loss
NUMBER_OF_QA_PAIRS_TO_GENERATE = 1600
CHECKPOINT_INTERVAL = 100

In [None]:
results = []
generated_answers = 0

# Keep looping until 1600 valid QA pairs generated
while generated_answers < NUMBER_OF_QA_PAIRS_TO_GENERATE:
  print("*" * 100)
  print("Generating answer {} / {}".format(generated_answers + 1, NUMBER_OF_QA_PAIRS_TO_GENERATE))

  # Choose random Citizen Information data chunk from scraped documents knowledge base
  random_chunk_document = random.choice(docs_processed)

  # Generate QA pair output using the DeepSeek model
  output = generate_question_answer_from_chunk(random_chunk_document.page_content, random_chunk_document.metadata)

  # If output returned, validate it
  if output:
    # Do validation checks on both question and answers generated by DeepSeek model
    question_valid = is_question_valid(output["question"])
    answer_valid = is_answer_valid(output["answer"], random_chunk_document.page_content)

    # If both valid, add to results, increment counter
    if question_valid and answer_valid:
      print("*" * 100)
      print("Passed")
      results.append(output)
      generated_answers += 1

      # If checkpoint interval hit, write current results to json file and save to Google Drive
      # Then reset variables
      if generated_answers % CHECKPOINT_INTERVAL == 0:
        timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
        write_json_file("{}/{}_qa.json".format(SAVE_PATH, timestamp), results)
        results = []

    # If failed validation checks, do not add to synthetic dataset, log and continue to next iteration
    elif not answer_valid:
      print("*" * 100)
      print("Failed based on answer validation")
    else:
      print("*" * 100)
      print("Failed based on question validation")
  
  # Otherwise, log that extraction failed and move to next iteration
  else:
    print("Question answer generation failed")

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
****************************************************************************************************
Generating answer 935 / 1600
Overlap 0.35135135135135137
Semantic 0.30633828043937683
Key words 0.0
Repetition 0.05747126436781609
****************************************************************************************************
Failed based on answer validation
****************************************************************************************************
Generating answer 935 / 1600
****************************************************************************************************
Failed based on answer validation
****************************************************************************************************
Generating answer 935 / 1600
Overlap 0.25
Semantic 0.10329043865203857
Key words 0.1
Repetition 0.13056379821958458
**************************************************************************************

In [None]:
# Note this was added to ensure that Google Colab credit use would be minimised (i.e. it would disconnect on final execution)
print("QA generation completed successfully")
runtime.unassign()

QA generation completed successfully


NameError: name 'runtime' is not defined