# Summarization

This is a very high-level task that summarizes the major themes of a set of feedback comments.

## Imports and setup

In [10]:
import pandas as pd
from IPython.display import Markdown, display
from pathlib import Path
from dotenv import load_dotenv, find_dotenv
from survey_analysis.summarization import summarize_comments

In [2]:
# this makes it more robust to run async tasks inside an already async environment (jupyter notebooks)
import nest_asyncio
nest_asyncio.apply()

Make sure to either set `OPENAI_API_KEY` as an environment variable or put it in a .env file and use the following cell to load the env var. The format in the .env file is:
```
OPENAI_API_KEY=yourKeyGoesHere
```

In [3]:
load_dotenv(find_dotenv())

True

In [4]:
%load_ext autoreload
%autoreload 2

This is a convenience function to make seeing Pandas dataframe values easier, especially when there are long strings like the student comments we will be using.

In [5]:
def full_show(df):
    with pd.option_context('display.max_columns', None, 'display.max_rows', None, 'display.max_colwidth', None):
        display(df)

This is a convenience function for pretty-printing long student comments.

In [6]:
def print_wrap(text: str, width: int = 72) -> str:
    print(textwrap.fill(text, width=width))

## Load the example data

In [7]:
data_path = Path('../data/example_data')

Let's load up some fake data. 

All of these comments are synthetic to avoid sharing any sensitive or PII information, but they should work great for illustration purposes. There are 100 rows, with just a few null/nan values here and there for realism. In most surveys I've seen, there are quite a number of null/None/blank etc values, and the functions are written to handle those.

In [8]:
example_survey = pd.read_csv(data_path / 'example_survey_data_synthetic.csv')
full_show(example_survey.head())

Unnamed: 0,best_parts,enhanced_learning,improve_course
0,I valued the practical clinical aspects related to immune-related disorders and their management.,The illustrative visuals and straightforward explanatory clips.,Consider reducing the duration of certain videos. A few appeared to be slightly prolonged.
1,The flexibility to learn at a self-determined speed,The opportunity to review the lecture content,"The pace of some lectures could be slowed down. At times, it's challenging to follow the lecturer's speech or decipher their handwriting."
2,The educational content was extremely enriching and stimulating! The section on oncology was the highlight.,the self-assessment activities.,Nothing specific comes to mind.
3,Professional growth within the medical sector,"The practical integration workshops were highly beneficial, they significantly contributed to a deeper comprehension of the theories and their implementation in a healthcare environment.",Incorporating a few advanced projects as optional tasks could benefit learners who wish to delve deeper into the subject matter. These projects wouldn't need to influence exam scores.
4,The highlights of the class included the practical demonstration clips that made the complex biological principles more understandable by connecting them to daily well-being and actions. This connection was incredibly beneficial as I navigated the course content.,"The aspect of the course that most facilitated my learning was the regular assessments provided at each segment, which helped confirm my grasp of the material presented. These checkpoints effectively guided me in the correct learning direction. It's evident that considerable effort was invested in designing these educational modules to enable students to gain a deep comprehension rather than just a superficial understanding of the subject matter.","Extend the duration of the concept videos for the more challenging topics, as they require a deeper dive to fully grasp the intricacies involved. Additionally, consider introducing an additional educator to the mix. The dynamic of having multiple voices in another subject area is quite engaging, and it would be beneficial to replicate that experience in this subject to prevent monotony from setting in with just one instructor."


## Summarization

In [11]:
improve_course_question = "What could be improved about the course?"
comments = example_survey['improve_course'].tolist()
summarization_result = await summarize_comments(comments=comments, question=improve_course_question)

display(Markdown((summarization_result.summary)))

The feedback highlights several areas for improvement across different aspects of the course:

1. **Pacing and Content Delivery**: There's a consensus on adjusting the pace of lectures and videos. Some students find certain videos too long or too fast-paced, making it hard to follow along. Suggestions include slowing down the pace of some lectures, shortening overly lengthy videos, and providing more comprehensive notes or outlines.

2. **Depth and Variety of Content**: Students expressed a desire for more in-depth exploration of topics, including advanced projects, more case studies, and expanding on subjects like cancer treatments, immunotherapy, and practical clinical demonstrations. Incorporating more advanced and varied content could enrich the learning experience.

3. **Engagement and Interactivity**: Enhancing the course with more interactive elements, such as simulations, practical exercises, and engaging activities, was frequently mentioned. Suggestions also include incorporating more visual aids and multimedia presentations to better illustrate concepts.

4. **Assessment and Feedback Mechanisms**: There's feedback on aligning assessments more closely with course content, increasing the number of tries for final assessments, and improving the clarity and relevance of quiz questions. Incorporating a mid-term assessment and providing more mock tests were also suggested.

5. **Accessibility and Clarity**: Improving the clarity of video lectures, ensuring consistency in language across presentations, and making subtitles selectable for copying were mentioned. Adding subtitles in other languages could also help non-English speakers.

6. **Resource Availability**: Students would appreciate more comprehensive learning materials, including downloadable PDFs, additional study resources, and ongoing access to course content for future reference.

7. **Curriculum Expansion**: Expanding the curriculum to cover more topics and incorporating additional content on current and innovative treatments were suggested to broaden the learning experience.

Overall, while many students found the course satisfactory or even outstanding, these areas of improvement could enhance the educational quality and student engagement.

When I ran this for the example, the model happened to format the output in Markdown, so that's the way I'm displaying it above. This summary gives a pretty reasonable sense at a high level of what students said for this survey question. But if you're looking for more structured output (something you could use as categories in a bar graph, for example), then check out the example notebook on theme derivation, or the (much longer) notebook with the end-to-end example.