# KateiKyoushi


## TODOs in Future Version:

1. Use the study plan and course outline to generate questions.
2. The questions will be multiple choice questions that check the understanding of a concept/topic.
3. If answered correctly, continue to next question (next topic); if answered incorrectly, explain why the choice selected was wrong.
4. Error analysis on questions and devise a review plan.


# Imports


In [1]:
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer

import nltk
import json
from collections import Counter
from pathlib import Path
from PyPDF2 import PdfReader

In [2]:
from openai import OpenAI

client = OpenAI()

In [3]:
# If first time, uncomment below lines

# nltk.download("punkt")
# nltk.download("wordnet")
# nltk.download("stopwords")

# Utils


In [4]:
def create_file(file_name, text):
    """Create a local file"""
    with open(file_name, "w+") as f:
        f.write(text)

In [5]:
class PDFParser:
    """Class to handle single PDF file"""

    def __init__(self, path) -> None:
        self.path = path
        self.text = self.parse_text()

    def parse_text(self):
        """Return text in pdf file"""
        pdf_reader = PdfReader(self.path)
        text = ""
        for page in pdf_reader.pages:
            text += page.extract_text()
        return text

    def get_keywords(self, num):
        """Get top-K frequent words as keywords"""
        keywords = []
        words = word_tokenize(self.text)

        # Remove punctuation
        words = [word for word in words if word.isalnum()]

        # Remove stopwords
        stop_words = set(stopwords.words("english"))
        words = [word for word in words if not word.lower() in stop_words]

        # Lemmatization
        lemmatizer = WordNetLemmatizer()
        words = [lemmatizer.lemmatize(word) for word in words]

        # Get top-K most common words
        word_freq = Counter(words)
        keywords = word_freq.most_common(num)

        str_keywords = ", ".join([w[0] for w in keywords])

        return f"Top-{num} keywords: {str_keywords}"

# Prompt Engineering


In [7]:
def get_response(prompt, model="gpt-3.5-turbo"):
    completion = client.chat.completions.create(
        model=model,
        messages=[
            {
                "role": "system",
                "content": f"""You are a highly skilled instructor, skilled at
                            creating course outlines and study plans from
                            summarized materials.""",
            },
            {
                "role": "user",
                "content": prompt,
            },
        ],
    )
    return completion.choices[0].message.content

In [8]:
def prompt_course_outline(content, keywords):
    context = f"""You are tasked with creating a structured course outline from
               given materials based on provided keywords."""

    context += f"""Keywords: {keywords}"""
    context += f"""Materials: {content}"""

    context += f"""
    Output Requirement: Generate a course outline formatted in Markdown, strictly adhering to the example format provided below.
    Each lesson should include a lesson name (lesson_name) and a concise description (lesson_abstract).
    Use different levels of headings, bold and italic format to highlight important topics.

    Example Format:
    # lesson_name
    lesson_abstract
    
    ## important topic 1
    - important topic 1 breakdown
    explanation of the topic
    
    ### key concept explanation
    explanation of the concept
    
    ## important topic 2
    explanation of the topic
    
    ### important topic 2 breakdown
    - important topic 2 breakdown
    explanation of the topic
    ...

    Begin designing the course now."""

    return context

In [9]:
def generate_questions(content):
    context = f"""
    Use provided Course Outline, refine and elaborate on each topic. 
    Output your response combining the given Course Outline and your generated content.
    """

    context += f"""Materials: {content}"""

    return context

In [10]:
def prompt_study_plan(outline, time):
    context = f"""
    Using the provided course outline, create a study plan within the specified time frame. 
    Detail the key topics and suggested study sessions per hours based on the material's structure and complexity. 
    Ensure the plan is tailored to your specific learning preferences and available study hours per week. 
    """

    context += f"""Course Outline: {outline}"""
    context += f"""Time Frame: {time}"""

    return context

In [11]:
def prompt_elaborate(schedule):
    context = f"""
    Using the provided study plan, elaborate on each concept mentioned. 
    Include detailed explanations, relevant examples, and practical applications to ensure a comprehensive understanding of each topic. 
    This expanded content should assist in preparing for in-depth discussions, exams, or practical implementations related to the course material.
    """

    context += f"""Study Plan: {schedule}"""

    return context

In [12]:
def prompt_feedback(schedule, feedback):
    context = f"""
    Based on the feedback provided, revise the existing study plan provided.
    Consider incorporating suggestions for improved time management, resource allocation, and methodologies that better align with student needs.
    Ensure the revised plan includes clear objectives, scheduled review times, and incorporates diverse learning materials to address various learning styles.
    Include checkpoints for assessing progress and adjust the schedule to accommodate areas needing more focus.
    """

    context += f"""Study Plan: {schedule}"""
    context += f"""Feedback: {feedback}"""

    return context

# Test


1. Read the file(s)
2. Get K keywords to help summarize (default = 15)

In [13]:
test = PDFParser("./slides/06-01.pdf")
keywords = test.get_keywords(15)

## Functions

Define parameters:

    ``lm`` - the LM model to perform the task, default: "gpt-3.5-turbo"

    ``time_frame`` - remaining time to carry out the study plan, default: "1 week"

In [18]:
lm = ["gpt-3.5-turbo", "gpt-4", "gpt-4-turbo"]
time_frame = "1 week"

Begin by summarizing the course outline first.


In [None]:
response = get_response(prompt_course_outline(test.text, keywords))
create_file("1-outline.md", response)

Devise a study plan within a given time frame.


In [None]:
schedule = get_response(prompt_study_plan(response, "3 hours"))
create_file("2-schedule.md", schedule)

Option 1: Elaborate the current study plan to make it more comprehensive


In [None]:
response = get_response(prompt_elaborate(schedule))
create_file("3-detailed.md", response)

Option 2: You can always give your feedback on the study plan to revise it


In [1]:
# Give any feedback on the generated study plan
# e.g. "I've already learned topic A, B, skip those and elaborate on topic C."
# More detailed prompt helps improve the quality of the revised study plan
feedback = f"""
[Your feedback]
"""

In [None]:
response = get_response(prompt_feedback(schedule, feedback))
create_file("4-revised.md", response)