In [1]:
import pickle
from dataset import Lecture

lectures: dict[int, Lecture] = pickle.load(open("dataset.pkl", "rb"))

for lecture_index in lectures:
    print(lectures[lecture_index].topic)
    with open(f"questions/{lectures[lecture_index].topic}.txt", "r") as question_file:
        print(question_file.readlines())

  from .autonotebook import tqdm as notebook_tqdm


Introduction_into_nlp
['[Question Start]What are the common challenges in NLP, and how do researchers attempt to overcome these issues?[Question End]\n', '\n', '[Question Start]Can you give some real-world examples of misapplications or unintended consequences of question-answering systems or chatbots in the wild, similar to the Air Canada incident? [Question End]\n', '\n', '[Question Start]How have neural network-based approaches revolutionized NLP tasks such as text generation and machine translation in recent years?[Question End]\n', '\n', '[Question Start]Based on the lecture content about NLP and its many applications, which of the following is NOT mentioned as a primary task of the field?\n', '\n', 'A) Text classification for spam filtering\n', 'B) Generating coherent text based on a given context\n', 'C) Turning sound into written text (Speech-to-text)\n', 'D) The automatic creation of visual art from textual prompts\n', '(Note: While modern AI can generate impressive images, th

In [2]:
import regex as re

def is_question(text: str):
    pattern = r'\[Question Start\](.*?)\[Question End\]'
    regex = re.compile(pattern, re.DOTALL)

    matches = regex.findall(text.strip())

    return matches

def is_valid(text: str):
    pattern = r'^[a-zA-Z0-9].*'

    # Compile the regex pattern
    regex = re.compile(pattern, re.MULTILINE)

    # Find all matching sentences
    matches = regex.findall(text)

    return matches

In [3]:
class LectureQuestions():
    def __init__(self, 
                 topic: str, 
                 questions: list = [],
                 evaluations: list = [],
                 overall_evaluation: list = [],
                 ) -> None:
        
        self.topic: str = topic
        self.questions: list = questions
        self.evaluations: list = evaluations
        self.overall_evaluation: list = overall_evaluation

In [4]:
from collections import defaultdict

questions = defaultdict()

for lecture_index in lectures:
    print(f"{lectures[lecture_index].topic}")

    lecture_questions = LectureQuestions(lectures[lecture_index].topic)

    with open(f"questions/{lectures[lecture_index].topic}.txt", "r") as question_file:
        text = "\n".join(question_file.readlines())
        question_text = is_question(text)

        print(question_text)

    lecture_questions.questions = question_text
    questions[lecture_index] = lecture_questions

    print(f" ")

print(questions)

Introduction_into_nlp
['What are the common challenges in NLP, and how do researchers attempt to overcome these issues?', 'Can you give some real-world examples of misapplications or unintended consequences of question-answering systems or chatbots in the wild, similar to the Air Canada incident? ', 'How have neural network-based approaches revolutionized NLP tasks such as text generation and machine translation in recent years?', 'Based on the lecture content about NLP and its many applications, which of the following is NOT mentioned as a primary task of the field?\n\n\n\nA) Text classification for spam filtering\n\nB) Generating coherent text based on a given context\n\nC) Turning sound into written text (Speech-to-text)\n\nD) The automatic creation of visual art from textual prompts\n\n(Note: While modern AI can generate impressive images, the idea that it could yet accurately translate text to visual content is overstated and remains in the realm of science fiction for now.)', "Gi

In [5]:
for lecture_index in questions:
    print(f"lecture_content: {lectures[lecture_index].content}")
    print(f"lecture_questions:")
    for question in questions[lecture_index].questions:
        print(question)

lecture_content: What is NLP?


What is NLP?
Natural language processing (NLP) is an interdisciplinary subfield of 
computer science and information retrieval. It is primarily concerned with 
giving computers the ability to support and manipulate human language. It 
involves processing natural language datasets, such as text corpora or 
speech corpora, using either rule-based or probabilistic (i.e. statistical and, 
most recently, neural network-based) machine learning approaches. The 
goal is a computer capable of "understanding" the contents of documents, 
including the contextual nuances of the language within them. To this end, 
natural language processing often borrows ideas from theoretical 
linguistics. The technology can then accurately extract information and 
insights contained in the documents as well as categorize and organize the 
documents themselves.
“
”
Source: https://en.wikipedia.org/w/index.php?title=Natural_language_processing&oldid=1215529997
focus in this lecture


In [6]:
from openai import OpenAI

# Point to the local server
client = OpenAI(base_url="http://localhost:1234/v1", api_key="lm-studio")

completion = client.chat.completions.create(
    model="bartowski/Mistral-7B-Instruct-v0.3-GGUF",
    messages=[
        {"role": "system", "content": "Always answer in rhymes."},
        {"role": "user", "content": "Introduce yourself."}
    ],
    temperature=0.7,
)

print(completion.choices[0].message)

ChatCompletionMessage(content=" In a world where thoughts take flight, under the moon's soft, gentle light,\n\nA wanderer I am, seeking knowledge and insight.\n\nIn the realm of words, I find my delight,\n\nPenning tales and verses with all my might.\n\nGreetings, I bring you from this poet's sight,\n\nEager to share stories and take flight!", role='assistant', function_call=None, tool_calls=None)


In [None]:
### Does not work
import json

MODEL = "bartowski/Mistral-7B-Instruct-v0.3-GGUF"

evaluations = defaultdict(list)

for lecture_index in questions:
    # print(f"lecture_content: {lectures[lecture_index].content}")
    print(f"lecture_questions:")
    for question in questions[lecture_index].questions:
        print(question)

        messages = [
            {'role': 'system', 'content': 'You are given the task of evaluating examination questions given the lecture content and question within JSON as {"lecture_content": <lecture content>, "question": <question to evaluate>}. Provide a response ONLY in the following JSON format, adhering to correct syntax and using delimiters and JSON separators and commas appropriately. Here is the JSON format:{"reason": <explain your evaluation in detail here, including the section of the lecture that the question covers and your reasoning for the evaluation in a single line, using only plaintext>, "difficulty": <0-10, where 10 is very very hard 5, 5 is average, and 0 is a silly question>, "relevance": <0-10, where 0 means irrelevant, 5 is still slightly bad, and 10 means a very important and relevant question, in context of the given lecture content>, "answer": <answer the given question in length. If the question has choices, instead reply with only the correct choice, otherwise, reply in length in textual form, explaining your reasoning for the answer in a single line, using only plaintext. Write only in plaintext in a single line.>}'},
            {'role': 'user', 'content': '{"lecture_content": '+lectures[lecture_index].content+', "question": '+question+'}'}
        ]

        completion = client.chat.completions.create(
            model=MODEL,
            messages=messages,
            response_format={"type": "json_object"},
            temperature=0.3,
        )

        print(completion.choices[0].message)
        
        evaluation = json.loads(completion.choices[0].message.content)
        print(evaluation)
        
        questions[lecture_index].evaluations.append(evaluation)

In [None]:
### Does not work
import instructor
from pydantic import BaseModel, Field
from typing import List, Dict, Any
from openai import OpenAI

# Define the desired output structure
class Evaluation(BaseModel):
    # reasoning: List[str] = Field(description="reasoning in <reasoning> </reasoning> tags")
    difficulty: int = Field(description="difficulty of the question between 0 and 10")
    relevance: int = Field(description="relevance of the question between 0 and 10")
    # answer: List[str] = Field(description="answer in <answer> </answer> tags")

# Patching the OpenAI client
client = instructor.from_openai(
    OpenAI(base_url="http://localhost:1234/v1", api_key="lm-studio"),
    mode=instructor.Mode.TOOLS)

for lecture_index in questions:
    # print(f"lecture_content: {lectures[lecture_index].content}")
    print(f"lecture_questions:")
    for question in questions[lecture_index].questions:
        print(question)

        messages = [
            {'role': 'system', 'content': 'You are given the task of evaluating examination questions given the lecture content and question.'},
            {'role': 'user', 'content': f'Lecture content: {lectures[lecture_index].content}\nQuestion: {question}'}
        ]

        evaluation = client.chat.completions.create(
            model=MODEL,
            response_model=Evaluation,
            messages=messages,
            temperature=0.3,
        )

        print(evaluation)
    

In [7]:
def extract_tag_content(text):
    # Define the regex pattern
    pattern = r'<(?P<tag>\w+)>(?P<content>.*?)</(?P=tag)>'
    
    # Find all matches in the text
    matches = re.finditer(pattern, text, re.DOTALL)
    
    # Extract and print the content
    result = {}
    for match in matches:
        tag = match.group('tag')
        content = match.group('content')
        if tag in result:
            result[tag].append(content)
        else:
            result[tag] = [content]
    
    return result

In [10]:
import tqdm 

MODEL = "bartowski/Mistral-7B-Instruct-v0.3-GGUF"
client = OpenAI(base_url="http://localhost:1234/v1", api_key="lm-studio")


for lecture_index in tqdm.tqdm(questions):
    # print(f"lecture_content: {lectures[lecture_index].content}")
    print(f"lecture_questions:")

    questions[lecture_index].evaluations = []

    for question in tqdm.tqdm(questions[lecture_index].questions):
        print(question)

        messages = [
            {'role': 'system', 'content': 'You are given the task of evaluating an examination question given the lecture content within <lecture> </lecture> and question within <question> </question> tags. Provide a response in the following format: <reasoning>explain your evaluation in detail, including the section of the lecture that the question covers and your reasoning for the evaluation in markdown text</reasoning>\n    <relevance>an integer from 0 to 10,  where 0 means irrelevant, 5 is still slightly bad, and 10 means a very important and relevant question; only respond with a single number</relevance>\n    <difficulty>an integer from 0 to 10, where 10 is very very hard 5, 5 is average, and 0 is a silly question, in context of the student having taken the lecture already; only respond with a single number</difficulty>\n    <answer>answer the given question in detail. If the question has choices, instead reply with only the correct choice, otherwise, reply in length in textual form, explaining your reasoning for the answer in markdown text</answer>'},
            {'role': 'user', 'content': f'<lecture>{lectures[lecture_index].content}</lecture>; <question>{question}</question>'}
        ]

        completion = client.chat.completions.create(
            model=MODEL,
            messages=messages,
            temperature=0.3,
        )

        # print(completion.choices[0].message.content)

        evaluation = extract_tag_content(completion.choices[0].message.content)
        print(f"evaluation: {evaluation}\naxes: {len(evaluation)}\n-----")

        questions[lecture_index].evaluations.append(evaluation)

    # Querying for overall coverage
    all_questions = "\n\n".join(questions[lecture_index].questions)
    messages = [
            {'role': 'system', 'content': 'You are given the task of evaluating a series of examination questions given the lecture content within <lecture> </lecture> and questions within <questions> </questions> tags. Provide a response in the following format: <reasoning>explain your evaluation in detail, including the sections of the lecture that the question covers and your reasoning for the evaluation in markdown text</reasoning>\n    <relevance>an integer from 0 to 10,  where 0 means irrelevant, 5 is still slightly bad, and 10 means very important and relevant questions; only respond with a single number encompassing the overall relevance</relevance>\n    <difficulty>an integer from 0 to 10, where 10 is very very hard 5, 5 is average, and 0 is for silly questions, in context of the student having taken the lecture already; only respond with a single number encompassing the overall difficulty</difficulty>\n    <coverage>an integer from 0 to 10, which describes the coverage of the set of questions of the given lecture, where 0 means the lecture is not covered at all, and 10 means the lecture is fully covered</coverage>'},
            {'role': 'user', 'content': f'<lecture>{lectures[lecture_index].content}</lecture>; <questions>{all_questions}</questions>'}
        ]
    completion = client.chat.completions.create(
            model=MODEL,
            messages=messages,
            temperature=0.3,
        )
    overall_evaluation = extract_tag_content(completion.choices[0].message.content)
    print(f"overall_evaluation: {overall_evaluation}\naxes: {len(overall_evaluation)}\n-----")
    questions[lecture_index].overall_evaluation = overall_evaluation

  0%|          | 0/13 [00:00<?, ?it/s]

lecture_questions:




What are the common challenges in NLP, and how do researchers attempt to overcome these issues?




evaluation: {'relevance': ['9'], 'difficulty': ['6'], 'answer': ['Common challenges in NLP include ambiguities/homonyms, computation, speech-to-text losses, vectorization/representation, typos, dataset sizes, languages, character sets, writing styles, accents, hallucinations, explainability, biases in datasets, and the need for diverse datasets. Researchers attempt to overcome these issues by employing various techniques such as rule-based approaches, probabilistic (statistical) methods, and neural network-based methods. They also focus on improving dataset sizes, addressing biases, and creating diverse datasets to enhance the performance of NLP systems.']}
axes: 3
-----
Can you give some real-world examples of misapplications or unintended consequences of question-answering systems or chatbots in the wild, similar to the Air Canada incident? 




evaluation: {'relevance': ['9'], 'difficulty': ['5'], 'answer': ['1. Misinformation Spread: In 2018, a Delta Air Lines customer service chatbot provided incorrect information about flight statuses, leading to confusion and frustration among passengers. (Source: https://www.cnbc.com/2018/03/26/delta-air-lines-chatbot-misinformation-spreads-among-passengers.html)\n\n2. Inappropriate Responses: A customer service chatbot for a major bank in the UK was found to be providing sexually explicit responses when asked about account balances or other financial matters. (Source: https://www.bbc.com/news/technology-45698730)\n\n3. Lack of Empathy: Chatbots are often criticized for their inability to understand the emotional context of a conversation, leading to insensitive or inappropriate responses. For example, a chatbot used by a major retailer responded with "I\'m sorry for your loss" when a customer asked about a sale on flowers, not realizing that the customer had just lost a loved one. (Sour



evaluation: {}
axes: 0
-----
Based on the lecture content about NLP and its many applications, which of the following is NOT mentioned as a primary task of the field?



A) Text classification for spam filtering

B) Generating coherent text based on a given context

C) Turning sound into written text (Speech-to-text)

D) The automatic creation of visual art from textual prompts

(Note: While modern AI can generate impressive images, the idea that it could yet accurately translate text to visual content is overstated and remains in the realm of science fiction for now.)




evaluation: {'reasoning': ['The primary tasks of NLP as covered in the lecture include text classification (for spam filtering), sentiment analysis, text summarization, machine translation, keyword extraction, text generation, image captioning, question answering, and chat bots. However, the automatic creation of visual art from textual prompts is not mentioned as a primary task of NLP. This task falls more under the intersection of NLP and computer vision, as it involves generating images based on text inputs.\n\nAlthough modern AI can generate impressive images, the idea that it could accurately translate text to visual content remains in the realm of science fiction for now. The question asks about the automatic creation of visual art from textual prompts, which is not a primary task of NLP as presented in the lecture.\n'], 'relevance': ['8'], 'difficulty': ['5'], 'answer': ['D) The automatic creation of visual art from textual prompts']}
axes: 4
-----
Given the following code snipp

100%|██████████| 5/5 [05:28<00:00, 65.64s/it]

evaluation: {'reasoning': ['The given code snippet is from a Python class called StopWordKeywordExtractor. The is_proper_word function checks if a given token (a string) is a proper word according to its definition. In this case, the regular expression `r\'\\b(\\w{2,})\\b\'` is used to match words that consist of at least two alphabetic characters and have word boundaries on both sides. This ensures that only whole words are matched and not parts of words or punctuation. The purpose of this regular expression is to filter out stop words (common words like "the", "and", "a", etc.) from the text by excluding those that do not meet the criteria of being a proper word according to the definition in the function.\n'], 'relevance': ['10'], 'difficulty': ['5'], 'answer': ['The purpose of the regular expression `r\'\\b(\\w{2,})\\b\'` in the is_proper_word function is to match words that consist of at least two alphabetic characters and have word boundaries on both sides. This ensures that only


  8%|▊         | 1/13 [08:23<1:40:46, 503.89s/it]

overall_evaluation: {'reasoning': ["\nThe lecture provides an overview of Natural Language Processing (NLP), focusing on common tasks and challenges. The questions are related to the content covered in the lecture.\n\n1. Common challenges in NLP, and how do researchers attempt to overcome these issues?\nIn the lecture, several challenges in NLP are discussed, including ambiguities/homonyms, computation, speech-to-text losses, vectorization/representation, typos, dataset sizes, languages, character sets, writing styles, accents, hallucinations, explainability, biases in datasets, and the need for diverse datasets. Researchers attempt to overcome these issues through various methods such as using machine learning algorithms, improving dataset quality, developing new techniques for handling ambiguities, and addressing biases in data collection and analysis.\n\n2. Can you give some real-world examples of misapplications or unintended consequences of question-answering systems or chatbots i



In the context of BPE tokenization, what is the significance of merging less frequent pairs before more frequent ones during the training phase?




evaluation: {'reasoning': ["The question asks about the significance of merging less frequent pairs before more frequent ones during the training phase in Byte Pair Encoding (BPE) tokenization. In BPE, the algorithm first counts all pairs of adjacent symbols and chooses the most frequent pair to merge. This process is repeated until a certain number of merges have been performed. Merging less frequent pairs before more frequent ones ensures that the algorithm focuses on learning common patterns in the data early on, which can help improve the model's performance by reducing the complexity of the vocabulary and improving generalization. By starting with less frequent pairs, BPE is able to capture more complex and less common patterns later in the training process, as it has already learned the most common building blocks of the language. This approach helps create a more efficient and effective representation of the text data for various NLP tasks like translation, summarization, and ot



evaluation: {'reasoning': ['The given question covers the topic of subword tokenization methods, specifically Byte Pair Encoding (BPE), within the context of Natural Language Processing (NLP). The question asks about the role of morphemes in these methods and why they are often represented as part of a token.\n\nIn BPE, the algorithm learns a vocabulary by iteratively merging frequent pairs of adjacent symbols from the training data until a certain number of merges have been performed. This process results in tokens that often include frequent words and subwords, which are usually morphemes. A morpheme is the smallest meaning-bearing unit of a language, and these units can be found within words. For example, the word "unlikeliest" has three morphemes: un-, likely, and -est.\n\nThe reason morphemes are often represented as part of a token in text processed by BPE is that these methods aim to create better word representations for NLP tasks like language translation and text summarizatio



evaluation: {'reasoning': ["The question pertains to the interaction between sentence segmentation and Byte Pair Encoding (BPE) token learning in Natural Language Processing (NLP). The BPE algorithm learns a vocabulary from a corpus by iteratively merging adjacent symbols that appear frequently together. Sentence segmentation, on the other hand, involves dividing text into individual sentences.\n\nIf sentence segmentation is not accurate, it could lead to incorrect tokenization and learning of tokens in BPE. For instance, if a period (.) is misinterpreted as a sentence boundary, words following that period might be treated as separate tokens, which can negatively impact the learning process. To avoid this issue, precautions such as using an abbreviation dictionary or rules based on tokenization can help in determining whether a period is part of a word or a sentence boundary.\n\nIn addition, adding a special end-of-word symbol before spaces in the training corpus and then separating it

In [None]:
for lecture_index in questions:
    for index, question in enumerate(questions[lecture_index].questions):
        print(questions[lecture_index].questions[index])
        print(questions[lecture_index].evaluations[index])
        print("-----")

What are the common challenges in NLP, and how do researchers attempt to overcome these issues?
{'reasoning': ['The question asks for an explanation of the common challenges in NLP and the ways researchers tackle these problems. The lecture content covers various aspects of NLP, including its definition, tasks, and applications. However, it also discusses some of the challenges that arise when working with natural language data.\n\nIn the section titled "What are common challenges in NLP?", the lecture mentions several issues such as ambiguities/homonyms, computation, speech-to-text losses, vectorization/representation, typos, dataset sizes, languages, character sets, writing styles, accents, hallucinations, explainability, biases in datasets, and the need for diverse datasets.\n\nTo address these challenges, researchers employ various strategies. For instance, they use machine learning approaches to improve NLP models\' ability to understand context and handle ambiguities. They also w