# Generating the JetBrains Academy Course

This notebook serves as a showcase to the functionality of the second part of the pipeline to generate a JetBrains Academy Course from a closed and solved OSS GitHub issue. This part of the pipeline will create the lessons and tasks sequentially for the sake of simplicity and completeness. However, as the tasks do not depend on eachother during the file creation, several steps could be parallelized to improve the performance.

## Processing the Input Data

The first step of this part of the pipeline is to properly read the input given and pass it along to the appropriate functions. The given input is of the following form:

```python
class GuidedExercise(BaseModel):
    id : int
    title : str
    gitInfo : GitInfo
    exercise : str
    steps : list[Step]
    tags : list[str]
```

All the necessary types are defined in the coding blocks below

In [67]:
# Dependencies of the pipeline
import os
import yaml
from openai import OpenAI
from github import Github
from github import Auth
from dotenv import load_dotenv
from pydantic import BaseModel

github_token = os.getenv('GITHUB_TOKEN')
github_obj = Github(github_token)
client = OpenAI()

# Important types of the pipeline
class GitInfo(BaseModel):
    repo : str
    issue : str
    pr : str

class CodeRange(BaseModel):
    start: int
    end: int

class CodeStep(BaseModel):
    id : int
    summary : str
    code : str
    path : str
    range : CodeRange

class Step(BaseModel):
    id : str
    summary : str
    codeSteps : list[CodeStep]
    
class GuidedExercise(BaseModel):
    id : int
    title : str
    gitInfo : GitInfo
    exercise : str
    steps : list[Step]
    tags : list[str]

class TaskContent(BaseModel):
    

In [76]:
# Testing Random Shit

content = {
    "task.js": "console.log(\"Hello World!\")",
    "test": "",
    "config": None
}
generateTaskFolder("Testing Task", "# Testing Is Wonderful", "theory", content, "test_data")
test = {"title": ["Hello World!"], "body": "Test 1"}
lesson = Lesson(**test)
type(lesson)

__main__.Lesson

### Task Content

To create the task folder, I expect the input to be passed as the following format:

```python
class Task(BaseModel):
    title: str          # Title of the task
    description: str    # Text which will be desplayed in the task description, already formatted
    category: str       # Category of the task that will be specified in the config file
    content: dict       # Dictionary containing the name of the file as the key and the content of the file as content
    lesson_path: str    # Path of the lesson directory

```

In [73]:
def generateTaskFolder(title, description, category, content, lesson_path):
    """
    This function generates a task folder which is the content of a lesson folder. 
    The inputs of the function are its title, description and type as well as its content. 
    By content we mean what the master solution of the task is.
    """

    task_path = os.path.join(lesson_path, title)
    os.makedirs(task_path, exist_ok=True)
    file_paths = {}

    # TODO: Files to create: -task.md -task.js -task-info.yaml
    taskMD_path = os.path.join(task_path, "task.md")
    with open(taskMD_path, "w", encoding="utf-8") as f:
        f.write(description)
    
    for file in content:
        if content[file] == None or content[file] == "":
            continue
        file_path = os.path.join(task_path, file)
        if file == "test":
            test_folder = os.path.join(task_path, "test")
            os.makedirs(test_folder, exist_ok=True)
            file_path = os.path.join(test_folder, file)
        with open(file_path, "w", encoding="utf-8") as f:
            f.write(content[file])
        file_paths[file] = file_path

    taskYML_path = os.path.join(task_path, "task-info.yaml")
    with open(taskYML_path, "w", encoding="utf-8") as f:
        file_content = {
            "type": category,
            "custom_name": title,
            "files": []
        }
        for file in file_paths:
            file_content["files"].append({"name": file, "visible": True})
            yaml.dump(file_content, f, default_flow_style=False)

    return task_path


# Lesson Content

Each lesson will be generated using GenAI to create its content. Depending on the type of the lesson, a different prompt will be used:
- `introduction` lesson -> `generateIntroductionContent()` will be called, where the repository as well as the issue will be briefly introduced
- `theory` lesson -> `generateTheoryContent()` will be called. The content will be short summaries about the necessary theoretical background to solve the issue
- `content` lesson -> No function will be called, as all the content will have been already generated by a previous part of the pipeline
- `debrief` lesson -> Currently still unclear if I need a prompt to generate content.

In [69]:
def generateIntroductionContent(issue_url, pr_url):
    """
    This function takes as input the issue and pr url to generate the content of the 
    Introduction lesson, such that the introduction to the course is tailored to the 
    issue.
    """
    generated_content = client.beta.chat.completions.parse(
        model="gpt-4o-mini",
        response_format=Exercise,
    )

In [70]:
def generateTheoryContent(issue_url, pr_url):
    """
    This function takes as input the issue and pr of the exercise. Its output is the content used
    to create the Theory Lesson for the course, that presents the necessary theoretical knowledge to 
    the student taking the course. 
    """

    '''
    TODO: The output of this function needs to be of the form
    {
        "title": "",
        "tasks": {
            "title": "",
            "description": ""
        }
    }
    where the description is already in MarkDown formatting.
    '''
    generated_content = client.beta.chat.completions.parse(
        model="gpt-4o-mini",
        response_format=Exercise,
    )



To create a lesson folder, I expect that the function receives the following input

```python
class Lesson(BaseModel):
    title: str          # Title of the lesson
    description: str    # General Description of the lesson
    git_info: GitInfo   # All the necessary github informations to generate the config files.
    category: str       # The type of the lesson, either introduction, theory, coding, or debrief
    steps: list[Step]   # A list of the steps, i.e. tasks to be created inside the lesson
    course_path: str    # Absolute path of the lesson folder

class Step(BaseModel):
    id : str                    # ID of the step
    summary : str               # Short summary describing what needs to be done, i.e. the task description
    codeSteps : list[CodeStep]  # The list of all the hints for that specific step/task
```

For theory tasks, the `codeSteps` field of the `Step()` class will be left empty, as we don't need any hints. The task description will be saved in the `summary` field. The input `steps` will also be empty for all lesson types except for the *coding* lessons, as the steps won't be generated by this part but will have been given by the previous section of the pipeline.

In [72]:
def generateLessonFolder(title, description, git_info, category, steps, course_path):
    """
    This function generates a lesson folder necessary for a JetBrains Academy course. Its input
    are the content of each task it has as well as its title and description
    The types of the lesson's are: "introduction", "theory", "coding", and "debrief"
    """
    
    lesson_path = os.path.join(course_path, category)
    os.makedirs(lesson_path, exist_ok=True)
    task_paths = {}

    if (category == "introduction"):
        # TODO: Don't forget to generate the config.js file to properly set-up the course for the plugin
        # TODO: Generate the task descriptions properly according to the github issue
        stop = 1

    elif (category == "theory"):
        tasks = generateTheoryContent(git_info.issue, git_info.pr)
        for task in tasks:
            

    elif (category == "coding"):
        stop = 1

    elif (category == "debrief"):
        # TODO: Either generate content using AI or use hardcoded text
        stop = 1
        
    else:
        raise Exception("Wrong Lesson Type given to the function!")


In [68]:
def generateCourseFolder(input: GuidedExercise):
    """
    This function takes as input a specific json format generated by the previous part of the pipeline.
    The format is equivalent to the GuidedExercise class described in the beginning.
    It outputs an entire folder which possesses the structure needed to be considered a course by 
    the JetBrains Academy plugin
    """
    # TODO: Check for the proper format of the input
    try:
        theory_content = generateTheoryContent(input.gitInfo.issue, input.gitInfo.pr)
    except:
        print("Wrong input type has been given.")

    return None

## Running the Pipeline

Now we are ready to run the entire pipeline. Feel free to run the last cell to see how a course is generated!

In [None]:
placeholder_input = {
    "id": 0
}
generateCourseFolder(placeholder_input)