# Multilabel Classification

Here the goal is to use a set of tags (also called labels or classes) to classify student feedback comments. This lets us get a quantitative sense of the volume of feedback in each major category and also focus in on any particular category of interest by filtering down to just those comments and seeing what students said. 

We've come up with a set of predefined labels that can be used for any type of course, but you can also pass your own labels if you have specific categories in mind. _Multilabel classification_ means that each comment can have any number of labels assigned, which is important given that students often touch on multiple topics in a single comment.

## Imports and setup

In [30]:
import pandas as pd
import json
from functools import partial
from pprint import pprint
from pathlib import Path
from dotenv import load_dotenv, find_dotenv
from survey_analysis.multilabel_classification import MultiLabelClassification, multilabel_classify, default_tags_list
from survey_analysis.models_common import CommentModel, CommentBatch
from survey_analysis.single_input_task import apply_task
from survey_analysis.batch_runner import process_tasks

In [6]:
# this makes it more robust to run async tasks inside an already async environment (jupyter notebooks)
import nest_asyncio
nest_asyncio.apply()

Make sure to either set `OPENAI_API_KEY` as an environment variable or put it in a .env file and use the following cell to load the env var. The format in the .env file is:
```
OPENAI_API_KEY=yourKeyGoesHere
```

In [7]:
load_dotenv(find_dotenv())

True

In [8]:
%load_ext autoreload
%autoreload 2

This is a convenience function to make seeing Pandas dataframe values easier, especially when there are long strings like the student comments we will be using.

In [9]:
def full_show(df):
    with pd.option_context('display.max_columns', None, 'display.max_rows', None, 'display.max_colwidth', None):
        display(df)

## Load the example data

In [10]:
data_path = Path('../data/example_data')

Let's load up some fake data. 

All of these comments are synthetic to avoid sharing any sensitive or PII information, but they should work great for illustration purposes. There are 100 rows, with just a few null/nan values here and there for realism. In most surveys I've seen, there are quite a number of null/None/blank etc values, and the functions are written to handle those.

In [11]:
example_survey = pd.read_csv(data_path / 'example_survey_data_synthetic.csv')
full_show(example_survey.head())

Unnamed: 0,best_parts,enhanced_learning,improve_course
0,I valued the practical clinical aspects related to immune-related disorders and their management.,The illustrative visuals and straightforward explanatory clips.,Consider reducing the duration of certain videos. A few appeared to be slightly prolonged.
1,The flexibility to learn at a self-determined speed,The opportunity to review the lecture content,"The pace of some lectures could be slowed down. At times, it's challenging to follow the lecturer's speech or decipher their handwriting."
2,The educational content was extremely enriching and stimulating! The section on oncology was the highlight.,the self-assessment activities.,Nothing specific comes to mind.
3,Professional growth within the medical sector,"The practical integration workshops were highly beneficial, they significantly contributed to a deeper comprehension of the theories and their implementation in a healthcare environment.",Incorporating a few advanced projects as optional tasks could benefit learners who wish to delve deeper into the subject matter. These projects wouldn't need to influence exam scores.
4,The highlights of the class included the practical demonstration clips that made the complex biological principles more understandable by connecting them to daily well-being and actions. This connection was incredibly beneficial as I navigated the course content.,"The aspect of the course that most facilitated my learning was the regular assessments provided at each segment, which helped confirm my grasp of the material presented. These checkpoints effectively guided me in the correct learning direction. It's evident that considerable effort was invested in designing these educational modules to enable students to gain a deep comprehension rather than just a superficial understanding of the subject matter.","Extend the duration of the concept videos for the more challenging topics, as they require a deeper dive to fully grasp the intricacies involved. Additionally, consider introducing an additional educator to the mix. The dynamic of having multiple voices in another subject area is quite engaging, and it would be beneficial to replicate that experience in this subject to prevent monotony from setting in with just one instructor."


We'll also load up some Coursera comments (source is from [this Kaggle notebook](https://www.kaggle.com/datasets/imuhammad/course-reviews-on-coursera), just using the first 100 as an example. The included example dataset is just the first 200 rows of the full 1.45 million rows. I didn't include the full set so as not to blimp up the size of this repo.

In [13]:
coursera_survey = pd.read_csv(data_path / 'coursera_survey_200rows.csv', nrows=100)
full_show(coursera_survey.head())

Unnamed: 0,reviews,reviewers,date_reviews,rating,course_id
0,"Pretty dry, but I was able to pass with just two complete watches so I'm happy about that. As usual there were some questions on the final exam that were NO WHERE in the course, which is annoying but far better than many microsoft tests I have taken. Never found the suplimental material that the course references... but who cares... i passed!",By Robert S,"Feb 12, 2020",4,google-cbrs-cpi-training
1,would be a better experience if the video and screen shots would sho on the side of the text that the instructor is going thru so that user does not have to go all the way to beginning of text to be able to view any slides instructor is showing.,By Gabriel E R,"Sep 28, 2020",4,google-cbrs-cpi-training
2,Information was perfect! The program itself was a little annoying. I had to wait 30 to 45 minutes after watching the videos to to take the quiz. Other than that the information was perfect and passed the test with no issues!,By Jacob D,"Apr 08, 2020",4,google-cbrs-cpi-training
3,A few grammatical mistakes on test made me do a double take but all in all not bad.,By Dale B,"Feb 24, 2020",4,google-cbrs-cpi-training
4,Excellent course and the training provided was very detailed and easy to follow.,By Sean G,"Jun 18, 2020",4,google-cbrs-cpi-training


## Multilabel classification using predefined tags

Let's first take a look at the predefined tags and their descriptions. You can change these in any way you want or use a completely different set. The 'topic' is the tag itself (the program will replace spaces with underscores if needed, so no need to worry about spaces). The 'description' gives you a bit of information about the types of things that would fall under this label. We've included a catch-all 'other' tag, but you could also let the model just not choose any label if none apply if you're using custom tags.

In [19]:
# this was imported from the multilabel_classification module
pprint(default_tags_list)

[{'description': 'course delivery (policy, support), cost, difficulty, time '
                 'commitment, grading, credit, schedule, user fit, access, '
                 'background (e.g. prereqs and appropriateness of course '
                 'level).',
  'topic': 'course logistics and fit'},
 {'description': 'course content, curriculum, specific topics, course '
                 'structure.  This focuses on the content and the pedagogical '
                 'structure of the content, including flow and organization.  '
                 'This also includes applied material such as clinical cases '
                 'and case studies. Includes references to pre-recorded '
                 'discussions between experts or between a doctor and a '
                 'patient. Includes specific suggestions for additional '
                 'courses or content.',
  'topic': 'curriculum'},
 {'description': 'video, visual, interactive, animation, step-by-step, deep '
                 'dive, b

### Test the multilabel classification on a single comment

Let's dip our toe in the water by seeing what happens for a single comment.

In [28]:
survey_task = MultiLabelClassification(tags_list=default_tags_list)
task_input = CommentModel(comment=example_survey.iloc[0]['best_parts'])
sample_classification = await apply_task(task_input=task_input,
                                         get_prompt=survey_task.prompt_messages,
                                         result_class=survey_task.result_class)

pprint(task_input.model_dump())
# dumping by alias here allows showing output with original tag names, even if they had spaces in them
pprint(json.loads(sample_classification.model_dump_json(by_alias=True)))

{'comment': 'I valued the practical clinical aspects related to immune-related '
            'disorders and their management.'}
{'categories': {'assessment': 0,
                'course logistics and fit': 0,
                'curriculum': 1,
                'other': 0,
                'peer and teacher interaction': 0,
                'resources': 0,
                'teaching': 0,
                'teaching modality': 0},
 'reasoning': 'The comment specifically mentions appreciation for the '
              'practical clinical aspects of the course, which relates '
              'directly to the curriculum content. There is no mention of '
              'teaching methods, course logistics, assessment, resources, or '
              'peer interaction, so those categories do not apply.'}


Looks good. We get back the classification (in this case, there is only one tag, 'curriculum', but other comments might have multiple tags). It makes sense given the comment, and we can inspect the reasoning if we want. In this case, it looks pretty logical.

### Test multilabel classification on the first 10 comments.

The models for task results (result_class in the code below) have defaults to account for comment with no content. The individual task processing routine uses the default if a comment has no content so as not to incur any model costs and save latency.

In this case, we are creating a partially applied function (`apply_task` with some arguments filled in) that we then pass to the `process_tasks` code that will run the comment classifications asynchonrously. If we didn't run them this way, it would take **much** longer.

In [32]:
# we could re-use the survey_task object from cells above, but we'll create a new one for clarity
survey_task = MultiLabelClassification(tags_list=default_tags_list)
comments_to_test = [CommentModel(comment=comment) for comment in example_survey['best_parts'].tolist()[:10]]
mlc_task = partial(apply_task, 
                   get_prompt=survey_task.prompt_messages, 
                   result_class=survey_task.result_class)
classifications = await process_tasks(comments_to_test, mlc_task)

for comment, classification in zip(comments_to_test, classifications):
    pprint(comment.model_dump())
    pprint(json.loads(classification.model_dump_json()))
    print('\n')

processing 10 inputs in batches of 100
sleeping for 30 seconds between batches
starting 0 to 100
completed 0 to 100
elapsed time: 6.149885892868042
{'comment': 'I valued the practical clinical aspects related to immune-related '
            'disorders and their management.'}
{'categories': {'assessment': 0,
                'course_logistics_and_fit': 0,
                'curriculum': 1,
                'other': 0,
                'peer_and_teacher_interaction': 0,
                'resources': 0,
                'teaching': 0,
                'teaching_modality': 0},
 'reasoning': 'The comment specifically mentions appreciation for the '
              'practical clinical aspects of the course, which relates '
              'directly to the curriculum content. There is no mention of '
              'logistics, teaching methods, assessment methods, resources, or '
              'interactions, so those categories do not apply.'}


{'comment': 'The flexibility to learn at a self-determined s

This is a simpler way of doing the same thing, using a provided alternate convenience method for multilabel classification of comments. In this case, you don't really need to know anything about the models that wrap comments or the survey task...you just give it a list of comments as strings and off it goes.

In [36]:
comments = example_survey['improve_course'].tolist()[:10]
# this uses the default tags list by default but you can supply your own (see the tags_8.yaml file for an example)
results = await multilabel_classify(comments=comments) 

for comment, classification in zip(comments, results):
    pprint(comment)
    pprint(json.loads(classification.model_dump_json()))
    print('\n')

processing 10 inputs in batches of 100
sleeping for 30 seconds between batches
starting 0 to 100
completed 0 to 100
elapsed time: 6.633772134780884
('Consider reducing the duration of certain videos. A few appeared to be '
 'slightly prolonged.')
{'categories': {'assessment': 0,
                'course_logistics_and_fit': 0,
                'curriculum': 0,
                'other': 0,
                'peer_and_teacher_interaction': 0,
                'resources': 1,
                'teaching': 0,
                'teaching_modality': 0},
 'reasoning': 'The comment suggests a modification in the teaching materials, '
              'specifically regarding the length of video content. This falls '
              "under the category of 'resources' as it pertains to the "
              'materials provided for the course. There is no mention of '
              'course logistics, curriculum structure, teaching modality, '
              'teaching methods, assessment methods, or peer and teacher 

We can also turn this into a dataframe for downloading or easier viewing.

In [37]:
# make a dataframe with comments in one column and the pivoted tag categories as the other columns
results_df = pd.DataFrame({'comment': comments})
values_df = pd.json_normalize([classification.categories.model_dump() for classification in results]).applymap(lambda x: x.value)
reasoning_df = pd.DataFrame({"reasoning": [classification.reasoning for classification in results]})
results_df = pd.concat([results_df, reasoning_df, values_df], axis=1)

full_show(results_df)

Unnamed: 0,comment,reasoning,course_logistics_and_fit,curriculum,teaching_modality,teaching,assessment,resources,peer_and_teacher_interaction,other
0,Consider reducing the duration of certain videos. A few appeared to be slightly prolonged.,"The comment suggests a modification in the teaching materials, specifically regarding the length of video content. This falls under the category of 'resources' as it pertains to the materials provided for the course. There is no mention of course logistics, curriculum structure, teaching modality, teaching methods, assessment methods, or peer and teacher interaction, so those categories do not apply.",0,0,0,0,0,1,0,0
1,"The pace of some lectures could be slowed down. At times, it's challenging to follow the lecturer's speech or decipher their handwriting.","The comment specifically addresses the pace of lectures and the difficulty in following the lecturer's speech or handwriting, which falls under the category of 'teaching'. There is no mention of course logistics, curriculum content, teaching modality, assessment methods, resources, or peer and teacher interaction. Therefore, the comment is categorized solely under 'teaching'.",0,0,0,1,0,0,0,0
2,Nothing specific comes to mind.,"The comment explicitly states that the student does not have any specific feedback to provide, indicating that none of the categories are applicable.",0,0,0,0,0,0,0,0
3,Incorporating a few advanced projects as optional tasks could benefit learners who wish to delve deeper into the subject matter. These projects wouldn't need to influence exam scores.,"The comment suggests adding advanced projects to the curriculum for learners who want to explore the subject matter more deeply, indicating it relates to the curriculum. It also mentions that these projects should not affect exam scores, which pertains to assessment.",0,1,0,0,1,0,0,0
4,"Extend the duration of the concept videos for the more challenging topics, as they require a deeper dive to fully grasp the intricacies involved. Additionally, consider introducing an additional educator to the mix. The dynamic of having multiple voices in another subject area is quite engaging, and it would be beneficial to replicate that experience in this subject to prevent monotony from setting in with just one instructor.","The comment suggests extending the duration of concept videos for challenging topics, indicating a need for more in-depth coverage in the curriculum. It also proposes introducing an additional educator to enhance the teaching dynamic, suggesting improvements in both the curriculum and teaching aspects. The mention of preventing monotony with just one instructor touches on teaching methods and the engagement aspect of the course, which falls under the teaching category.",0,1,0,1,0,0,0,0
5,"Educationally, I found the course to be of exceptional quality; the resources provided were excellent and the course was well organized. It would be beneficial to include more comprehensive discussions on a wider variety of treatments for cancer. Topics like circulating tumor DNA and the progression of tumors were addressed somewhat superficially. Expanding on the practical consequences and real-life instances of therapeutic approaches and patient scenarios would be a valuable enhancement.","The comment praises the course's educational quality, organization, and resources, which falls under the 'curriculum' and 'resources' categories. The suggestion to include more comprehensive discussions on a wider variety of treatments for cancer and to expand on practical consequences and real-life instances of therapeutic approaches and patient scenarios suggests improvements to the 'curriculum'.",0,1,0,0,0,1,0,0
6,Everything is ideal as it stands.,"The comment is a general positive feedback without specific details about any of the categories. It does not mention course logistics, curriculum, teaching modality, teaching quality, assessment, resources, peer and teacher interaction, or any other specific aspect. Therefore, it does not fit into any of the predefined categories.",0,0,0,0,0,0,0,0
7,OUTSTANDING,"The comment 'OUTSTANDING' does not provide specific information about any of the categories such as course logistics and fit, curriculum, teaching modality, teaching, assessment, resources, peer and teacher interaction, or other aspects of the course. It appears to be a general positive feedback without details.",0,0,0,0,0,0,0,1
8,Extend the duration! The course felt too brief with just a three-week timeframe; I was eager to delve deeper into the subject matter.,"The comment specifically addresses the duration of the course, suggesting it was too short for the student's liking and expressing a desire for more in-depth exploration of the subject matter. This feedback directly relates to the course's logistics and fit, as it comments on the course's structure and scheduling. There is no mention of teaching methods, curriculum content, assessment methods, resources, or interactions, so those categories do not apply.",1,0,0,0,0,0,0,0
9,It's excellent.,"The comment is positive but does not provide specific details about any of the categories. It's a general positive feedback without mentioning aspects like curriculum, teaching methods, resources, etc.",0,0,0,0,0,0,0,1


## Multilabel classification using your own custom tags

Here we'll use some tags that were generated by theme derivation (another task with a separate example notebook). These might not be what you would classically think of as tags/labels, but it's nice in that it shows the diversity of what can be used. Also note that these themes were derived from comments in the 'what were the best parts of the course?' column but can still be applied to other columns like we do here with the 'what could we improve about the course?'.

In [34]:
custom_tags = [
    {
        'topic': 'Practical Learning',
        'description': 'This theme encompasses the appreciation for the practical application of theoretical knowledge through clinical scenarios, laboratory exercises, and real-world case studies, highlighting how these applications deepen understanding of complex concepts and demonstrate the relevance of the material.'
    },
    {
        'topic': 'Visual Learning',
        'description': 'This theme combines the effectiveness of visual aids such as videos, animations, and detailed illustrations in simplifying complex topics, making the course content more accessible and engaging, and enhancing the integration of theoretical concepts with practical applications.'
    },
    {
        'topic': 'Comprehensive Course Content',
        'description': "This theme highlights the course's wide range of topics, including immunological principles, oncological pathways, and the latest treatments in immune-based cancer therapies, providing a comprehensive educational experience and a solid foundation in the subject matter."
    },
    {
        'topic': 'Flexible Learning Experience',
        'description': "This theme focuses on the course's flexibility and digital format, accommodating different schedules and allowing learning at one's own pace, particularly beneficial for those balancing the course with full-time employment or other commitments."
    },
    {
        'topic': 'Interactive and Engaging',
        'description': "This theme emphasizes the course's engaging nature, with interactive content, online activities, and participatory elements that make learning more dynamic and enjoyable."
    },
    {
        'topic': 'Assessment and Quiz Questions',
        'description': 'Practical exercises, including laboratory work and case studies, were recognized for their effectiveness in conveying fundamental principles and providing a detailed understanding of scientific concepts.'
    },
    {
        'topic': 'Clinical Relevance',
        'description': "The course's focus on clinical relevance, including discussions about clinical settings, real-world clinical scenarios, and clinical procedures, was highly valued. This aspect helped students understand the practical implications of their learning."
    },
    {
        'topic': 'Instructor Clarity',
        'description': 'The clarity and effectiveness of the instructors in presenting the material were frequently mentioned. Students appreciated the concise and lucid explanations, which facilitated understanding and retention of complex concepts.'
    }
]


In [35]:
comments = example_survey['improve_course'].tolist()[:10]

# the only difference here is that we're supplying custom tags
results = await multilabel_classify(comments=comments, tags_list=custom_tags) 

for comment, classification in zip(comments, results):
    pprint(comment)
    pprint(json.loads(classification.model_dump_json()))
    print('\n')

processing 10 inputs in batches of 100
sleeping for 30 seconds between batches
starting 0 to 100
completed 0 to 100
elapsed time: 13.570764064788818
('Consider reducing the duration of certain videos. A few appeared to be '
 'slightly prolonged.')
{'categories': {'Assessment_and_Quiz_Questions': 0,
                'Clinical_Relevance': 0,
                'Comprehensive_Course_Content': 0,
                'Flexible_Learning_Experience': 0,
                'Instructor_Clarity': 0,
                'Interactive_and_Engaging': 0,
                'Practical_Learning': 0,
                'Visual_Learning': 0},
 'reasoning': 'The comment suggests a modification in the course content, '
              'specifically in the duration of certain videos, indicating that '
              'they are longer than necessary. This feedback does not directly '
              'relate to the practicality, visual aspects, comprehensiveness, '
              'flexibility, interactivity, assessment quality, clinical

## Try it yourself

Your turn...as an exercise for the reader, use the `coursera_survey` comments that we loaded and run `multilabel_classify` on those.