# Introduction
This notebook is used to experiment with different prompts for the question generation model that uses the chatGPT API from open.ai.
First a helper function is created to call the API with the provided prompt. For this prompt different techniques are tried out and evaluated to find the best performing prompt template.

In [1]:
import os
from dotenv import load_dotenv
import openai
from src.datageneration.extractor import extract_text_without_image
from pypdfium2 import PdfDocument
import pandas as pd
from sklearn.model_selection import train_test_split
from src.evaluation.eval_main import Metrics
import nltk
import time

nltk.download('wordnet')

load_dotenv()
openai.api_key = os.getenv("OPENAI-API-KEY")

def chat_gpt(prompt, temperature=0):
    completion = openai.ChatCompletion.create(
        model="gpt-3.5-turbo",
        messages=[
            {"role": "user", "content": prompt}
        ],
        temperature=temperature
    )
    return completion.choices[0].message.content

[nltk_data] Downloading package wordnet to /Users/I516258/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


First we prepare the data for the evaluation.

In [None]:
# # initially retrieve extracted text for each slide - only execute once
slide_path = "../../../datasets/IT-Security_all_slides_no_duplicates.pdf"
pdf = PdfDocument(slide_path)
text = extract_text_without_image(pdf.raw)
extracted_content = pd.DataFrame(columns=['Pagenumber', 'Page-Text', 'OCR-text'])
for i in text:
    extracted_content = extracted_content.append({'Pagenumber': i[0], 'Page-Text': i[1], 'OCR-text': i[2]}, ignore_index=True)

# Define the file path and name
file_path = "../../../datasets/extracted_text_content.csv"

# Save the DataFrame to the specified folder
extracted_content.to_csv(file_path, index=False)

In [2]:
# reload extracted content from file
file_path = "../../../datasets/extracted_text_content.csv"
extracted_content = pd.read_csv(file_path)

print(extracted_content)

     Pagenumber                                          Page-Text  \
0             0  Selected Topics in IT-Security\r\nProf. Dr. Fr...   
1             1  Simple Model\r\n1. User(s)\r\n• Access the sys...   
2             2  IT-Security\r\n• Security (german: Sicherheit)...   
3             3  Attacker - Examples\r\n• National agency\r\n• ...   
4             4  Attacker Model\r\n• Usually specifies what the...   
..          ...                                                ...   
591         591  Differential Privacy\r\nIntuition\r\n• Assume ...   
592         592  Differential Privacy\r\nDefinition (Simplified...   
593         593  On the Parameter \r\nPr  ଵ =  ≤ ఢ ⋅ Pr  ଶ = \r...   
594         594  Privacy Budget\r\n• Defines an upper bound on ...   
595         595  Making Algorithms Differentially Private\r\n• ...   

                                              OCR-text  
0    Fealitet\nSelected Topics in IT-Security BB OF...  
1    Simple Model\n\n1. User(s)\n\n* Access t

In [3]:
file_path = '../../../datasets/Goldstandard.csv'

goldstandard = pd.read_csv(file_path, delimiter=";")

# Remove unnecessary columns
goldstandard.drop(['PDF-Name', 'Comment','Page Number'], axis=1, inplace=True)

# Join two DataFrames based on index
goldstandard = extracted_content.join(goldstandard, lsuffix='_left', rsuffix='_right')

# Delete records with value "No" and "no" in the "Marked for processing" column
goldstandard = goldstandard[(goldstandard['Marked for processing'] != 'No')]

# Remove unnecessary columns
goldstandard.drop(['Marked for processing', 'Includes Image Data'], axis=1, inplace=True)

# Split the DataFrame into train, validation, and test sets
goldstandard_train_val, goldstandard_test = train_test_split(goldstandard, test_size=0.2, random_state=42)

print("Lenght of test set: ", len(goldstandard_test))
print(goldstandard_test)

Lenght of test set:  91
     Pagenumber                                          Page-Text  \
452         452  Cookies\r\nAdvantages and Disadvantages\r\nAdv...   
46           46  Access Control\r\n• Controls which authenticat...   
475         475  XSS\r\n• XSS = Cross Site Scripting\r\n• One o...   
471         471  Javascript\r\nAbilities\r\n• Runs on the clien...   
200         200  Technique\r\n• Recall that e-mail communicatio...   
..          ...                                                ...   
591         591  Differential Privacy\r\nIntuition\r\n• Assume ...   
177         177  Reference Models for Computer \r\nNetworks\r\n...   
108         108  Passphrases\r\n• Good method: choose a (silly)...   
66           66  Some Comments\r\n• RBAC can be based on access...   
199         199  E-Mail Spoofing\r\n• Creation of email message...   

                                              OCR-text  \
452  Cookies\nAdvantages and Disadvantages\n\nAdvan...   
46   te\nAccess Con

In [4]:
# Reset the index of the DataFrame
goldstandard_test = goldstandard_test.reset_index(drop=True)
goldstandard_train_val = goldstandard_train_val.reset_index(drop=True)

# this stores now the possible input for the chatGPT model
content = goldstandard_test[["Page-Text", "OCR-text"]]

# this stores the reference
references = goldstandard_test[["Question"]]

references.to_csv("./refs.csv")

In [5]:
len(references)

91

In [6]:
content

Unnamed: 0,Page-Text,OCR-text
0,Cookies\r\nAdvantages and Disadvantages\r\nAdv...,Cookies\nAdvantages and Disadvantages\n\nAdvan...
1,Access Control\r\n• Controls which authenticat...,te\nAccess Control Be OP MANNHEIM\n\n—— School...
2,XSS\r\n• XSS = Cross Site Scripting\r\n• One o...,XSS\n\n¢ XSS = Cross Site Scripting\n* One of ...
3,Javascript\r\nAbilities\r\n• Runs on the clien...,ol\nH UNIVERSITY\nJavascript BOR MANNHEIM\n———...
4,Technique\r\n• Recall that e-mail communicatio...,te\nae UNIVERSITY\n2) OF MANNHEIM\n\n— School ...
...,...,...
86,Differential Privacy\r\nIntuition\r\n• Assume ...,te\nGee 5 UNIVERSITY\n\nDifferential Privacy 8...
87,Reference Models for Computer \r\nNetworks\r\n...,alt\nReference Models for Computer BB Oh MANNG...
88,Passphrases\r\n• Good method: choose a (silly)...,ol\nPassphrases Be) OF MANNHEIM\n\n—— School o...
89,Some Comments\r\n• RBAC can be based on access...,te\nSome Comments Be OP MANNHEIM\n\n—— School ...


# Prompt Engineering
Having prepared everything it is possible to start with prompt engineering. It is started with simple prompts and continued with more complex prompts.

| **#** | **Prompt**                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | **Techniques** |
|-------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------|
| 1     | Generate a question in a flashcard style for the content delimited by triple backticks. ```{row['Page-Text']}```                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |                |
| 2     | Generate a question in a flashcard style for the content delimited by triple backticks. Take into account how exam questions are normally formulated and formulate the question accordingly. ```{row['Page-Text']}```                                                                                                                                                                                                                                                                                                                                                                                                  |                |
| 3     | Generate a question in a flashcard style for the content delimited by triple backticks. When there are examples do not focus on their specifics but try to cover the overarching concept or idea. ```{row['Page-Text']}```                                                                                                                                                                                                                                                                                                                                                                                             |                |
| 4     | Generate a question in a flashcard style for the content delimited by triple backticks. Focus on concepts, definitions and key-words. Take into account how exam questions are normally formulated and formulate the question accordingly. When there are examples do not focus on their specifics but try to cover the overarching concept or idea. ```{row['Page-Text']}```                                                                                                                                                                                                                                          |                |
| 5     | You are a bot to support in the generation of flashcards from lecture slides. You are provided with two inputs. The first input delimited by triple backticks is the text that is copied from the slides. The second input delimited by triple quotation marks is retrieved with an OCR tool to extract all text from a slide. Follow the below process: 1. Step: Compare the first input with the second input to retrieve the relevant information 2. Step: Generate a question for this information in a flashcard style Only return the generated question. ```{row['Page-Text']}``` \"\"\"{row['OCR-text']}\"\"\" |                |
| 6     | Generate a question in a flashcard style for the content delimited by triple backticks. ```{row['Page-Text']}``` Follow a similar style for generating the question as in this two examples: 1) Input: {goldstandard_train_val.loc[0, 'Page-Text']}, question: {goldstandard_train_val.loc[0, 'Question']} 2) Input: {goldstandard_train_val.loc[1, 'Page-Text']}, question: {goldstandard_train_val.loc[1, 'Question']}                                                                                                                                                                                               |                |
| 7     | Generate a question in a flashcard style for the content delimited by triple backticks. Take into account how exam questions are normally formulated and formulate the question accordingly. ```{row['Page-Text']}``` Follow a similar style for generating the question as in this two examples: 1) Input: {goldstandard_train_val.loc[0, 'Page-Text']}, question: {goldstandard_train_val.loc[0, 'Question']} 2) Input: {goldstandard_train_val.loc[1, 'Page-Text']}, question: {goldstandard_train_val.loc[1, 'Question']}                                                                                          |                |

## Zero-Shot Prompting

In [7]:
refs = []
for i, q in references.iterrows():
    refs.append((i, [q.item()]))
refs

[(0, ['What are Advantages and Disadvantages of Cookies?']),
 (1, ['How is Access Control defined?']),
 (2, ['What is XSS?']),
 (3, ['What are the Abilities of JavaScript?']),
 (4, ['What Spoofing Technique do exist?']),
 (5, ['How do attackers evade signature-based scanners?']),
 (6, ['What steps does the TLS protocol comprise?']),
 (7, ['How do rainbow tables work?']),
 (8, ['How is the Procedure of Proof of Work in Bitcoin?']),
 (9, ['How does the RSA Signature Scheme work?']),
 (10, ['How is the Write Rule formally defined (Chinese Wall Model)?']),
 (11, ['What is the block reward for successful mining?']),
 (12, ['How does the TCP/IP model look graphically?']),
 (13, ['What is the workflow of cookie creation/exchange?']),
 (14,
  ['What is the shortcoming of k-anonymity and how does ℓ-Diversity and 𝒕-Closeness address it?']),
 (15, ['How can Role-Based-Access-Control be graphically represented?']),
 (16, ['What is the basic principle of Cache (storage)?']),
 (17, ['What are the re

In [13]:
#model_results = []
# the chatGPT API is called and results are stored
for index, row in content.iterrows():
    if index >=78:
        prompt = f"""
        Generate a question in a flashcard style for the content delimited by triple backticks.
        ```{row['Page-Text']}```
        """
        question = chat_gpt(prompt)
        model_results.append((index, [question]))
        print("Generated question for index ", index, ": ", question)
        time.sleep(1)


print(model_results)

Generated question for index  78 :  What are some possible approaches to achieving anonymity in IT-Security?
Generated question for index  79 :  What are the stages of a social engineering attack?
Generated question for index  80 :  What are examples of critical data that should be protected against attacks?
Generated question for index  81 :  What is the basic principle of the Heartbleed Bug?
Generated question for index  82 :  What is the encryption method used for sending messages in IT-Security?
Generated question for index  83 :  What is the term for the process of converting a message into an unreadable form using a specific algorithm and a secret key?
Generated question for index  84 :  What is the formal definition of an access matrix in IT-Security?
Generated question for index  85 :  What are the three steps involved in generating synthetic data?
Generated question for index  86 :  What is the intuition behind differential privacy and how does it prevent re-identification of 

In [16]:
# Save model_results to disk
df_model_results = pd.DataFrame(model_results, columns=["Index", "Question"])
df_model_results.to_csv("./model_results/prompt1.csv", index=False)

# restore model_results
df_model_results = pd.read_csv("./model_results/prompt1.csv")
model_results = [(row['Index'], [row['Question']]) for _, row in df_model_results.iterrows()]


In [18]:
# Performance is evaluated
metrics = Metrics(save_to_file=True)
result = pd.DataFrame(
    metrics.evaluate(model_output=model_results, references=refs),
    index=["ChatGPT-Prompt1"]
)
print(result)

[nltk_data] Downloading package stopwords to
[nltk_data]     /Users/I516258/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


                 PUNCTUATION  MISC  GRAMMAR  rouge1_fmeasure  \
ChatGPT-Prompt1           15     1        1         0.445543   

                 rouge1_precision  rouge1_recall  rouge2_fmeasure  \
ChatGPT-Prompt1           0.38389        0.60203         0.245597   

                 rouge2_precision  rouge2_recall  rougeL_fmeasure  ...  min_r  \
ChatGPT-Prompt1          0.208953       0.348498         0.421388  ...    0.0   

                  avg_f1  max_f1  min_f1  avg_cos_sim  max_cos_sim  \
ChatGPT-Prompt1  0.45029     1.0     0.0     0.681447     0.999843   

                 min_cos_sim  avg_sem_meteor  max_sem_meteor  min_sem_meteor  
ChatGPT-Prompt1     0.200163        0.501238        0.952193        0.073529  

[1 rows x 30 columns]


In [30]:
#model_results = []
# the chatGPT API is called and results are stored
for index, row in content.iterrows():
    if index >=89:
        prompt = f"""
        Generate a question in a flashcard style for the content delimited by triple backticks.
        Take into account how exam questions are normally formulated and formulate the question accordingly.
        ```{row['Page-Text']}```
        """
        question = chat_gpt(prompt)
        model_results.append((index, [question]))
        print("Generated question for index ", index, ": ", question)
        time.sleep(1)

print(model_results)

Generated question for index  89 :  What are some advantages of using Role-Based Access Control (RBAC) based on an access matrix?
Generated question for index  90 :  What is the motivation behind e-mail spoofing and what are some applications of this technique in IT security?
[(0, ['What are the advantages and disadvantages of using cookies in web applications?']), (1, ['What is access control and on what levels can it exist?']), (2, ['What is XSS and how does it compromise the interaction between a user and a vulnerable web application?']), (3, ['What are some common programming features provided by client-side JavaScript?']), (4, ['What protocol is commonly used for e-mail communication?']), (5, ['What are some techniques that viruses use to evade signature-based virus scanners?']), (6, ['What is the purpose of the TLS protocol?']), (7, ['How can an attacker check if a given hash value is part of a chain in rainbow tables?']), (8, ['What is the procedure for Proof of Work in mining a

In [31]:
# Save model_results to disk
df_model_results = pd.DataFrame(model_results, columns=["Index", "Question"])
df_model_results.to_csv("./model_results/prompt2.csv", index=False)

# # restore model_results
# df_model_results = pd.read_csv("./model_results/prompt2.csv")
# model_results = [(row["Index"], [row["Question"]]) for _, row in df_model_results.iterrows()]

In [32]:
model_results

[(0,
  ['What are the advantages and disadvantages of using cookies in web applications?']),
 (1, ['What is access control and on what levels can it exist?']),
 (2,
  ['What is XSS and how does it compromise the interaction between a user and a vulnerable web application?']),
 (3,
  ['What are some common programming features provided by client-side JavaScript?']),
 (4, ['What protocol is commonly used for e-mail communication?']),
 (5,
  ['What are some techniques that viruses use to evade signature-based virus scanners?']),
 (6, ['What is the purpose of the TLS protocol?']),
 (7,
  ['How can an attacker check if a given hash value is part of a chain in rainbow tables?']),
 (8, ['What is the procedure for Proof of Work in mining a block?']),
 (9, ['What are the steps involved in the RSA Signature Scheme?']),
 (10,
  ['What are the conditions that must be met for a subject to write to an object at a specific point in time?']),
 (11,
  ['What is the initial block reward for successful m

In [33]:
# Performance is evaluated
metrics = Metrics(save_to_file=True)
result = pd.DataFrame(
    metrics.evaluate(model_output=model_results, references=refs),
    index=["ChatGPT-Prompt2"]
)
print(result)

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


[nltk_data] Downloading package stopwords to
[nltk_data]     /Users/I516258/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


                 PUNCTUATION  MISC  GRAMMAR  rouge1_fmeasure  \
ChatGPT-Prompt2           14     1        1         0.424193   

                 rouge1_precision  rouge1_recall  rouge2_fmeasure  \
ChatGPT-Prompt2          0.357888       0.618192         0.237488   

                 rouge2_precision  rouge2_recall  rougeL_fmeasure  ...  min_r  \
ChatGPT-Prompt2          0.198803       0.360979         0.399191  ...    0.0   

                  avg_f1  max_f1  min_f1  avg_cos_sim  max_cos_sim  \
ChatGPT-Prompt2  0.41543     1.0     0.0     0.667354     0.999843   

                 min_cos_sim  avg_sem_meteor  max_sem_meteor  min_sem_meteor  
ChatGPT-Prompt2     0.121115        0.498779          0.9995        0.073529  

[1 rows x 30 columns]


In [None]:
model_results = []
# the chatGPT API is called and results are stored
for index, row in content.iterrows():
    if (index == 3):
        break
    prompt = f"""
    Generate a question in a flashcard style for the content delimited by triple backticks.
    When there are examples do not focus on their specifics but try to cover the overarching concept or idea.
    ```{row['Page-Text']}```
    """
    model_results.append((index, [chat_gpt(prompt)]))

print(model_results)
# Performance is evaluated
metrics = Metrics(save_to_file=True)
result = pd.DataFrame(
    metrics.evaluate(model_output=model_results, references=refs[:3]),
    index=["ChatGPT-Prompt3"]
)
result

In [None]:
model_results = []
# the chatGPT API is called and results are stored
for index, row in content.iterrows():
    if (index == 3):
        break
    prompt = f"""
    Generate a question in a flashcard style for the content delimited by triple backticks.
    Focus on concepts, definitions and key-words.
    Take into account how exam questions are normally formulated and formulate the question accordingly.
    When there are examples do not focus on their specifics but try to cover the overarching concept or idea.
    ```{row['Page-Text']}```
    """
    model_results.append((index, [chat_gpt(prompt)]))

print(model_results)
# Performance is evaluated
metrics = Metrics(save_to_file=True)
result = pd.DataFrame(
    metrics.evaluate(model_output=model_results, references=refs[:3]),
    index=["ChatGPT-Prompt4"]
)
result

In [None]:
model_results = []
# the chatGPT API is called and results are stored
for index, row in content.iterrows():
    if (index == 3):
        break
    prompt = f"""
    You are a bot to support in the generation of flashcards from lecture slides.
    You are provided with two inputs. The first input delimited by triple backticks is the text that is copied from the slides.
    The second input delimited by triple quotation marks is retrieved with an OCR tool to extract all text from a slide.
    Follow the below process:
    1. Step: Compare the first input with the second input to retrieve the relevant information
    2. Step: Generate a question for this information in a flashcard style
    Only return the generated question.
    ```{row['Page-Text']}```
    \"\"\"{row['OCR-text']}\"\"\"
    """
    model_results.append((index, [chat_gpt(prompt)]))


print(model_results)
# Performance is evaluated
metrics = Metrics(save_to_file=True)
result = pd.DataFrame(
    metrics.evaluate(model_output=model_results, references=refs[:3]),
    index=["ChatGPT-Prompt5"]
)
result

## Few-Shot

In [34]:
model_results = []
# the chatGPT API is called and results are stored
for index, row in content.iterrows():
    if index >=0:
        prompt = f"""
        Generate a question in a flashcard style for the content delimited by triple backticks.
        ```{row['Page-Text']}```
        Follow a similar style for generating the question as in this two examples:
        1) Input: {goldstandard_train_val.loc[0, 'Page-Text']}, question: {goldstandard_train_val.loc[0, 'Question']}
        2) Input: {goldstandard_train_val.loc[1, 'Page-Text']}, question: {goldstandard_train_val.loc[1, 'Question']}
        """
        question = chat_gpt(prompt)
        model_results.append((index, [question]))
        print("Generated question for index ", index, ": ", question)
        time.sleep(1)

print(model_results)

Generated question for index  0 :  What are the advantages and disadvantages of cookies?
Generated question for index  1 :  What is access control and on what levels can it exist?
Generated question for index  2 :  What is XSS and what does it allow an attacker to do?
Generated question for index  3 :  What are the abilities provided by the API in client-side programming?
Generated question for index  4 :  What is the technique that allows one to spoof the sender's address in e-mail communication?
Generated question for index  5 :  What are some techniques used by viruses to evade signature-based virus scanners?
Generated question for index  6 :  TLS Protocol, question: What are the steps involved in establishing encrypted communication using the TLS protocol?
Generated question for index  7 :  What is the idea behind Rainbow Tables?
Generated question for index  8 :  What is the procedure for the Proof of Work algorithm?
Generated question for index  9 :  What is the workflow of an RS

In [35]:
# Save model_results to disk
df_model_results = pd.DataFrame(model_results, columns=["Index", "Question"])
df_model_results.to_csv("./model_results/prompt6.csv", index=False)

# # restore model_results
# df_model_results = pd.read_csv("./model_results/prompt6.csv")
# model_results = [(row["Index"], [row["Question"]]) for _, row in df_model_results.iterrows()]

In [36]:
# Performance is evaluated
metrics = Metrics(save_to_file=True)
result = pd.DataFrame(
    metrics.evaluate(model_output=model_results, references=refs),
    index=["ChatGPT-Prompt6"]
)
print(result)

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Av

[nltk_data] Downloading package stopwords to
[nltk_data]     /Users/I516258/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


                 PUNCTUATION  CASING  MISC  TYPOGRAPHY  rouge1_fmeasure  \
ChatGPT-Prompt6           10       4     1           1         0.440566   

                 rouge1_precision  rouge1_recall  rouge2_fmeasure  \
ChatGPT-Prompt6          0.386222       0.598951         0.249332   

                 rouge2_precision  rouge2_recall  ...  min_r   avg_f1  max_f1  \
ChatGPT-Prompt6          0.214606       0.356362  ...    0.0  0.45615     1.0   

                 min_f1  avg_cos_sim  max_cos_sim  min_cos_sim  \
ChatGPT-Prompt6     0.0     0.698805     0.996389     0.096924   

                 avg_sem_meteor  max_sem_meteor  min_sem_meteor  
ChatGPT-Prompt6        0.503604        0.979938        0.073529  

[1 rows x 31 columns]


In [43]:
model_results = []
# the chatGPT API is called and results are stored
for index, row in content.iterrows():
    if index >=0:
        prompt = f"""
        Generate a question in a flashcard style for the content delimited by triple backticks.
        Take into account how exam questions are normally formulated and formulate the question accordingly.
        ```{row['Page-Text']}```
        Follow a similar style for generating the question as in this two examples:
        1) Input: {goldstandard_train_val.loc[0, 'Page-Text']}, question: {goldstandard_train_val.loc[0, 'Question']}
        2) Input: {goldstandard_train_val.loc[1, 'Page-Text']}, question: {goldstandard_train_val.loc[1, 'Question']}
        """
        question = chat_gpt(prompt)
        model_results.append((index, [question]))
        print("Generated question for index ", index, ": ", question)
        time.sleep(1)

print(model_results)

Generated question for index  0 :  What are the advantages and disadvantages of using cookies?
Generated question for index  1 :  What is access control and on which levels can it exist?
Generated question for index  2 :  What is XSS and how does it compromise the interaction between a user and a vulnerable web application?
Generated question for index  3 :  Question: What abilities does JavaScript provide when running on the client's side?
Generated question for index  4 :  What is the technique that allows one to spoof the sender's address in e-mail communication?
Generated question for index  5 :  What are some techniques used by viruses to evade signature-based scanners?
Generated question for index  6 :  What is the content of the flashcard?
Generated question for index  7 :  What is the idea behind Rainbow Tables?
Generated question for index  8 :  What is the procedure for the Proof of Work algorithm in mining a block?
Generated question for index  9 :  What is the RSA Signature

In [44]:
# Save model_results to disk
df_model_results = pd.DataFrame(model_results, columns=["Index", "Question"])
df_model_results.to_csv("./model_results/prompt7.csv", index=False)

# # restore model_results
# df_model_results = pd.read_csv("./model_results/prompt7.csv")
# model_results = [(row["Index"], [row["Question"]]) for _, row in df_model_results.iterrows()]

In [48]:
len(model_results)

91

In [50]:
# Performance is evaluated
metrics = Metrics(save_to_file=True)
result = pd.DataFrame(
    metrics.evaluate(model_output=model_results, references=refs),
    index=["ChatGPT-Prompt7"]
)
print(result)

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Av

[nltk_data] Downloading package stopwords to
[nltk_data]     /Users/I516258/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


                 PUNCTUATION  GRAMMAR  CASING  MISC  rouge1_fmeasure  \
ChatGPT-Prompt7           17        1       1     1         0.435454   

                 rouge1_precision  rouge1_recall  rouge2_fmeasure  \
ChatGPT-Prompt7          0.374773       0.610946          0.23359   

                 rouge2_precision  rouge2_recall  ...  min_r    avg_f1  \
ChatGPT-Prompt7          0.198278       0.350348  ...    0.0  0.436784   

                 max_f1  min_f1  avg_cos_sim  max_cos_sim  min_cos_sim  \
ChatGPT-Prompt7     1.0     0.0     0.689269     0.991066     0.060811   

                 avg_sem_meteor  max_sem_meteor  min_sem_meteor  
ChatGPT-Prompt7        0.496503         0.96699        0.073529  

[1 rows x 31 columns]


In [None]:
#Playing with temperature
model_results = []
# the chatGPT API is called and results are stored
for index, row in content.iterrows():
    if index == 3:
        break
    prompt = f"""
    Generate a question in a flashcard style for the content delimited by triple backticks.
    ```{row['Page-Text']}```
    """
    model_results.append((index, [chat_gpt(prompt, temperature=0.2)]))

print(model_results)
# Performance is evaluated
metrics = Metrics(save_to_file=True)
result = pd.DataFrame(
    metrics.evaluate(model_output=model_results, references=refs[:3]),
    index=["ChatGPT_Temp_Adjusted"]
)
print(result)