# Steps Needed:

## Develop the needed functions
    - Search Function: Get recent results from Google via SERP
    - Scrape Function: Read online data from selected Google Searches
    - Summarize Function: Extracts only the relevant information from a given online source
    - File Retrieval (RAG): Allows users to add other course outlines as context for formatting and contents
    - Document Functions: Functions for reading different types of documents (.txt, .docx, .pdf, etc.) as context as well as creating documents in a desired format

## Build Pipeline
The process needs trial and error in terms of the right prompts and processes, but the overall simplified outline is as follows:
1) Educator inputs course details and (optionally) a sample course outline for reference
2) GPT Generates Queries to to look for references needed for the given course details
3) For each query, run an online search for resources and references
    - ideally, these references should be from academic resources
4) Add the original request + the gathered resources as context for creating the final course outline/lesson plan
    - Should have the following:
        - Course Learning Outcomes (CLOs)
        - Weekly Activities
            - Each activity should have a week number, topic, activity description, expected output or assessment, and assessment tools
        - References
5) Format the output with Document Functions to fit a pre-defined format

In [4]:
from search_functions import *
from gpt_functions import *

In [94]:
# Input sample course details
course_title = "IT031 LIVING IN THE IT ERA (Digital Technology)"
course_description = "This course covers the science, culture and ethics in information technology, its influence on modern living and human relationships, and uses for personal, professional, and social advancement. The course is designed to enable students to appreciate, in broad terms, the societal impact of development in information technology at the global and national level. This includes a review of the history of Information Technology globally, - from pre-historic era all the way to today’s advances on the field of IT – and similarly in the Philippines. The historical survey, which is grounded on basic IT concepts, will examine how these developments have affected the course in human society: politically, economically, and socially (including culturally). The second part of the course focuses on current issues arising from the application of information technology, how such applications relate to ethical and political decisions in both public and private sectors, and their effects (positive and negative) on society and life in general. "
target_students = "Computer Science/IT college freshman students"
total_hours = 54
weekly_hours = 3
instructor_name = "Jun Albert Pardillo"

# convert course details into singular string
course_details = f"""
Course Name: {course_title}
Course Description: {course_description}
Target Students: {target_students}
Total Hours: {total_hours}
Class Hours per Week: {weekly_hours}
Instructor: {instructor_name}
"""

citation_style = "APA"
number_of_topics = 5

In [71]:
# Generate topics and search queries for each topic
def generate_queries(course_details, number_of_topics=5):
    prompt = f""" You are a capable and experienced researcher and professional educator. Provided are details regarding a course outline for which we need to look for references. 
Figure out the topics that are relevant to this course and generate queries to search for academic resources on Google Scholar.
    - For the arrangement of the topics, make sure the flow is appropriate for the course i.e. beginning with a general topic or overview and then moving on to more specific topics.
For each topic, you'll need to generate a separate query to look for potential academic resources to be used as reference materials for this course.
Generate a total of {number_of_topics} queries for the same number of topics to search in Google Scholar.
Output only the queries in the following json format:
{{
    "queries": [
        {{
            "topic": "topic 1",
            "query": "query 1"
        }},
        {{
            "topic": "topic 2",
            "query": "query 2"
        }}...
    ]
}}

Course Details:
{course_details}
"""
    
    queries = gpt_response(prompt,'gpt-4-1106-preview',response_format='json_object')
    return queries


In [72]:
queries = generate_queries(course_details,5)
print(queries)
queries_json = json.loads(queries)

{
    "queries": [
        {
            "topic": "Overview of Information Technology",
            "query": "historical development of information technology"
        },
        {
            "topic": "Information Technology and Society",
            "query": "impact of information technology on society"
        },
        {
            "topic": "Information Technology in the Philippines",
            "query": "development of information technology in the Philippines"
        },
        {
            "topic": "Ethical Issues in Information Technology",
            "query": "ethical considerations in information technology"
        },
        {
            "topic": "Information Technology and Human Relationships",
            "query": "effects of information technology on human relationships"
        }
    ]
}


In [73]:
# Input search queries to search in Google Scholar
def search_google_scholar(query, num_results=3,language = 'en',as_ylo = 2020):
    """
    Params:
    query (str): The search query
    num_results (int): Number of results to return
    language (str): Two-letter code for desired language (i.e. "en" for English, "tl" for Tagalog/Filipino)
    as_ylo (int): The year of the last publication to be returned
    """
    
    # Your SerpAPI key
    api_key = "54dace100f45fd1e2deb96294960500abe400723302dc6959857888d013fd25a"

    # Set up the search parameters
    params = {
        "engine": "google_scholar",
        "q": query,  # Your search query
        "api_key": api_key,
        "as_ylo": as_ylo,
        "hl":language,
        'num': num_results
    }

    # Make the API request
    response = requests.get("https://serpapi.com/search", params=params)

    # Check if the request was successful
    if response.status_code == 200:
        # Parse the JSON response
        results = response.json()
        return results
    else:
        print("Failed to retrieve data:", response.status_code)
        return None


In [74]:
# For each query, get the top 1 result from Google Scholar and combine into a single string
total_search_results = ""
for query in queries_json['queries']:
    results = search_google_scholar(query['query'],3)
    temp_result =f"Topic: {query['topic']}\nResults:\n"
    for search_results in results['organic_results']:
        temp_result += f"""
        Title: {search_results['title']}
        Link: {search_results['link']}
        Snippet: {search_results['snippet']}
        Publication Summary: {search_results['publication_info']['summary']}

"""
    total_search_results += temp_result
print(total_search_results)

Topic: Overview of Information Technology
Results:

        Title: Computer: A history of the information machine
        Link: https://books.google.com/books?hl=en&lr=&id=G868EAAAQBAJ&oi=fnd&pg=PT7&dq=historical+development+of+information+technology&ots=h7UG5FwuIr&sig=raNtd8bxnE7g032zwXPYV9tb1kk
        Snippet: … history and globalization of information technology have been published in Technology and … take a broader view of the history of the computer as the history of the information machine. …
        Publication Summary: M Campbell-Kelly, WF Aspray, JR Yost, H Tinn… - 2023 - books.google.com


        Title: Application of modular teaching technology in technology
        Link: https://cyberleninka.ru/article/n/application-of-modular-teaching-technology-in-technology
        Snippet: … history of human history, choosing the necessary information … information and teaching the necessary information and to … technology, so the need to use modal education technology in …
        Pu

In [89]:
# Generate Lesson Plan from aggregated search results

# First Step will be to Generate the lesson plan CLOs and ILOs
learning_outcomes_prompt = f'''You are a highly-capable researcher and professional educator. Provided below are course details for a course outline you will need to generate.

Course Details:
    {course_details}

You are tasked to generate the following for a course outline for this course:
    1 Course Learning Outcomes (CLOs)
    2 Topics/ Modules and Intended Learning Outcomes (ILOs)
        - each topic requires at least 2 ILOs
        - each topic should be marked by starting with "Topic #", and each ILO should be marked by starting with "ILO #"
        - add the source/s for each topic
    3 References (at least 1 reference per topic and in {citation_style} format)
        
Provided below are the search results for reference material from Google Scholar for suggested topics to cover in this course. 
You may decide which topics and search results to include in the final result. 
Do not include all search results in the final course outline, and do not add sources not included below.

{total_search_results}
'''

learning_outcomes = gpt_response(learning_outcomes_prompt,'gpt-4-1106-preview')
print(learning_outcomes)


### Course Learning Outcomes (CLOs)

1. Understand the historical development of information technology and its global and national implications.
2. Analyze the impact of information technology on society, culture, and human relationships.
3. Evaluate the ethical considerations and challenges presented by the advancements in information technology.
4. Assess the influence of information technology on professional, personal, and social advancement.
5. Discuss the current trends and issues in information technology, particularly in the context of the Philippines.

### Topics/Modules and Intended Learning Outcomes (ILOs)

**Topic 1: Overview of Information Technology**
- **ILO 1**: Describe the evolution and historical milestones of information technology from prehistoric times to the present.
- **ILO 2**: Explain the role of information technology in shaping global history and culture.
- *Source*: Campbell-Kelly, M., Aspray, W. F., Yost, J. R., & Tinn, H. (2023). Computer: A history of t

In [95]:
# Check is JSON works too to generate activities for the lesson plan
activities_prompt_json = f'''You are a highly-capable researcher and professional educator. Provided below are course details for a course outline you will need to generate.

Course Details:
{course_details}
{learning_outcomes}

-------------------

From the provided learning outcomes, create weekly activities for the course. 
This course divided into {number_of_topics} topics with a total of {total_hours} hours for the whole semester divided into {weekly_hours} hours per week ({total_hours//weekly_hours} weeks in total).
You may stretch one topic over the course of multiple weeks or add topics not included in the learning outcomes
Each activity should have a week number, topic, activity description, expected output or assessment, and assessment tools.
Follow the json formatting below:

{{
    "course_title": "Course Title",
    "course_description": "Course Description",
    "instructor_name": "Instructor Name",
    "credit_units": "Credit Units",
    "total_hours": "Total Hours",
    "weekly_hours": "Weekly Hours",
    "clos": [
        "CLO 1",
        "CLO 2",
        "CLO 3"
    ],
    "topics": [
        {{
            "topic": "Topic 1",
            "ilos": [
                "ILO 1",
                "ILO 2"
            ]
        }},
        {{
            "topic": "Topic 2",
            "ilos": [
                "ILO 1",
                "ILO 2"
            ]
        }}...
    ],
    "references":[
        {{
            reference: "Reference 1"
            link: "Link 1"
        }}
    ]
    "activities": [
        {{
            "week": "Week 1",
            "topic": "Topic 1",
            "activity_description": "activity description 1",
            "expected_output": "expected output 1",
            "assessment_tools": "assessment tools 1"
        }},
        {{
            "week": "Week 2",
            "topic": "Topic 2",
            "activity_description": "activity description 2",
            "expected_output": "expected output 2",
            "assessment_tools": "assessment tools 2"
        }}...
    ]
}}
'''
activities_json = gpt_response(activities_prompt_json,'gpt-4-1106-preview', response_format='json_object')
print(activities_json)


{
    "course_title": "IT031 LIVING IN THE IT ERA (Digital Technology)",
    "course_description": "This course covers the science, culture and ethics in information technology, its influence on modern living and human relationships, and uses for personal, professional, and social advancement.",
    "instructor_name": "Jun Albert Pardillo",
    "credit_units": "3",
    "total_hours": "54",
    "weekly_hours": "3",
    "clos": [
        "Understand the historical development of information technology and its global and national implications.",
        "Analyze the impact of information technology on society, culture, and human relationships.",
        "Evaluate the ethical considerations and challenges presented by the advancements in information technology.",
        "Assess the influence of information technology on professional, personal, and social advancement.",
        "Discuss the current trends and issues in information technology, particularly in the context of the Philippines.

In [122]:
from docx import Document
from docx.shared import Inches
import json
from docx.shared import Pt
from docx.enum.text import WD_ALIGN_PARAGRAPH
from docx.enum.table import WD_ALIGN_VERTICAL
from docx.oxml import OxmlElement
from docx.oxml.ns import qn

def add_hyperlink(paragraph, url, text, color, underline):
    # Create a new Run object and add a hyperlink
    part = paragraph.part
    r_id = part.relate_to(url, 'http://schemas.openxmlformats.org/officeDocument/2006/relationships/hyperlink', is_external=True)

    hyperlink = OxmlElement('w:hyperlink')
    hyperlink.set(qn('r:id'), r_id)

    new_run = OxmlElement('w:r')
    rPr = OxmlElement('w:rPr')

    # Add underline and color to the hyperlink
    if underline:
        u = OxmlElement('w:u')
        u.set('val', 'single')
        rPr.append(u)

    color_elem = OxmlElement('w:color')
    color_elem.set('val', color)
    rPr.append(color_elem)

    new_run.append(rPr)
    new_run.text = text
    hyperlink.append(new_run)

    paragraph._p.append(hyperlink)

    return hyperlink

def set_cell_border(cell, **kwargs):
    """
    Set cell's border
    Usage: set_cell_border(
        cell,
        top={"sz": 0, "val": "single", "color": "#FF0000", "space": "0"},
        bottom={"sz": 0, "color": "#00FF00", "val": "single"},
        start={"sz": 24, "val": "dashed", "shadow": "true"},
        end={"sz": 0, "val": "dashed"},
    )
    """
    tc = cell._tc
    tcPr = tc.get_or_add_tcPr()

    # check for tag existnace, if none found, then create one
    if not hasattr(tcPr, 'tcBorders'):
        tcPr.tcBorders = OxmlElement('w:tcBorders')

    # list over all available tags
    for edge in ('start', 'top', 'end', 'bottom', 'insideH', 'insideV'):
        edge_data = kwargs.get(edge)
        if edge_data:
            tag = 'w:{}'.format(edge)

            # check for tag existnace, if none found, then create one
            if not hasattr(tcPr.tcBorders, tag):
                el = OxmlElement(tag)
                tcPr.tcBorders.append(el)
            else:
                el = tcPr.tcBorders.__getattr__(tag)

            # looks like order of attributes is important
            for key in ["sz", "val", "color", "space", "shadow"]:
                if key in edge_data:
                    el.set(key, str(edge_data[key]))

def create_word_document_from_json(json_data, title="Course_Outline.docx"):

    doc = Document()
    doc.add_heading('Course Outline', level=1)

    # Create a table for course details
    details_table = doc.add_table(rows=2, cols=2)

    # Set table style to enable borders
    details_table.style = 'Table Grid'

    # Add course details in two columns
    detail_keys = ["course_title", "instructor_name", "credit_units", "total_hours"]
    for i, key in enumerate(detail_keys):
        cell = details_table.cell(i // 2, i % 2)
        cell.vertical_alignment = WD_ALIGN_VERTICAL.TOP
        cell.text = f"{key.replace('_', ' ').capitalize()}: {json_data[key]}"
        # Make the detail name bold
        for paragraph in cell.paragraphs:
            paragraph.runs[0].font.bold = True

    # Add Course Description
    doc.add_heading("Course Description:", level=2)
    doc.add_paragraph(json_data["course_description"], style='BodyText')

    # Add Course Learning Outcomes (CLOs)
    doc.add_heading('Course Learning Outcomes (CLOs)', level=2)
    for i, clo in enumerate(json_data['clos'], 1):
        doc.add_paragraph(f"CLO {i}: {clo}", style='ListBullet')

    # Add Topics / Modules and Intended Learning Outcomes (ILOs)
    doc.add_heading('Topics / Modules and Intended Learning Outcomes', level=2)
    for topic in json_data['topics']:
        doc.add_paragraph(topic['topic'], style='ListNumber')
        for ilo in topic['ilos']:
            doc.add_paragraph(ilo, style='ListBullet2')

    
    # Add a table to the document
    doc.add_heading('Weekly Activities', level=2)
    table = doc.add_table(rows=1, cols=5)

    # Set table style to enable borders
    table.style = 'Table Grid'

    # Add headers to the table
    hdr_cells = table.rows[0].cells
    headers = ["Week No.", "Topic", "Activity Description", "Expected Output", "Assessment Tools"]
    for i, header in enumerate(headers):
        hdr_cells[i].text = header
        paragraph = hdr_cells[i].paragraphs[0]
        run = paragraph.runs
        run[0].font.bold = True
        run[0].font.size = Pt(12)
        paragraph.alignment = WD_ALIGN_PARAGRAPH.CENTER

        # Set column widths
        if header == "Week No.":
            hdr_cells[i].width = Inches(0.75)
        elif header == "Activity Description":
            hdr_cells[i].width = Inches(3)
        else:
            hdr_cells[i].width = Inches(1.5)

    # Add rows to the table from the JSON data
    for activity in json_data['activities']:
        row_cells = table.add_row().cells
        row_cells[0].text = activity['week']
        row_cells[1].text = activity['topic']
        row_cells[2].text = activity['activity_description']
        row_cells[3].text = activity['expected_output']
        row_cells[4].text = activity['assessment_tools']

        for cell in row_cells:
            set_cell_border(cell, top={"sz": 12, "val": "single"},
                            bottom={"sz": 12, "val": "single"},
                            start={"sz": 12, "val": "single"},
                            end={"sz": 12, "val": "single"})

            # Set font size for cells
            for paragraph in cell.paragraphs:
                for run in paragraph.runs:
                    run.font.size = Pt(10)
            
            # Set topic bold
            if cell == row_cells[1]:
                for paragraph in cell.paragraphs:
                    for run in paragraph.runs:
                        run.font.bold = True

    # Add References section
    if "references" in json_data:
        doc.add_heading('References', level=2)
        for reference in json_data['references']:
            # Add each reference with its link as plain text
            ref_paragraph = doc.add_paragraph(style='BodyText')
            ref_paragraph.add_run(reference['reference']).italic = True
            if 'link' in reference:
                ref_paragraph.add_run("\nLink: ")
                ref_paragraph.add_run(reference['link'])

    # Allow for titles containing .docx to still be saved as a word document ending in .docx instead of .docx.docx
    title = title.replace(".docx", "")
    doc.save(title+'.docx')



In [123]:
create_word_document_from_json(json.loads(activities_json))

