# Narratives Classification using a Hierarchical Approach

### Pipeline:

1. Modify current dataset to only include category of the hassle
2. Train RoBertA
3. Make predictions
4. Compare performance vs old model
5. If there are improvements, modify dataset again to include the specific hassle for each category (need to train 1 model for each category)
6. Make 2nd predictions
7. Compare performance vs old model


# Section 1: Importing of libraries

In [58]:
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import torch

from transformers import AutoTokenizer
from sklearn.metrics import classification_report, accuracy_score

from NarrativesDataset import NarrativesDataset
from DataModule import NarrativesDataModule
from Model import NarrativesClassifier

# Section 2: Cleaning the dataset

In [59]:
train_path = 'data/train-1.csv'
val_path = 'data/val-1.csv'
num_workers = 4

temp_df = pd.read_csv(train_path)
temp_df.rename(columns={"Troubling thoughts about one’s future": "Troubling thoughts about ones future"}, inplace = True)
temp_val_df = pd.read_csv(val_path)
temp_val_df.rename(columns={"Troubling thoughts about one’s future": "Troubling thoughts about ones future"}, inplace = True)



In [60]:
categories = {
    'General hassles': ['Misplacing or losing things', 'Silly practical mistakes', 'Trouble with pets', 'Difficulties with friends'],
    'Inner concerns': ['Regrets over past decision/s', 'Concerned about the meaning of life', 'Being lonely', 'Inability to express oneself', 'Fear of rejection', 'Trouble making decisions', 'Physical appearance', 'Not seeing people', "Troubling thoughts about ones future", 'Not enough personal energy', 'Concerns about getting ahead', 'Fear of confrontation', 'Wasting time'],
    'Financial concerns': ['Not enough money for basic necessities (food, clothing, transportation, housing, healthcare etc.)', 'Not enough money for wants (entertainment and recreation)', 'Concerns about owing money', 'Concerns about money for emergencies', 'Financial security'],
    'Time Pressures': ['Not enough time to do things one needs to', 'Too many responsibilities', 'Not getting enough rest', 'Too many interruptions', 'Not enough time for entertainment and recreation', 'Too many meetings', 'Social obligations', 'Concerns about meeting high standards', 'Noise'],
    'Environmental Hassles': ['Pollution', 'Crime', 'Traffic', 'Concerns about news events', 'Rising prices of common goods', 'Concerns about accidents'],
    'Family Hassles': ['Yardwork or outside home maintenance', 'Overloaded with family responsibilities', 'Home maintenance (inside)'],
    'Health Hassles': ['Concerns about medical treatment', 'Physical illness', 'Side effects of medication', 'Concerns about health in general', 'Concerns about bodily functions'],
    'Academic Hassles': ['Dissatisfaction with academic performance', 'Challenges with instructors', 'Discontent with current academic responsibilities', 'Concerns regarding academic transitions', 'Difficulties with peers or classmates', 'Challenges in managing group projects', 'Getting late to class']
}

In [61]:
train_df = pd.DataFrame()
val_df = pd.DataFrame()
train_df['Narrative'] = temp_df['Narrative']
val_df['Narrative'] = temp_val_df['Narrative']

In [62]:
for category, hassles in categories.items():
    train_df[category] = temp_df[hassles].max(axis =1 )
    val_df[category] = temp_val_df[hassles].max(axis = 1)

In [63]:
directory = 'hierarchical_data/'
train_filename = 'hierarchical_train-1.csv'
val_filename = 'hierarchical_val-1.csv'

train_filepath = directory + train_filename
val_filepath = directory + val_filename

train_df.to_csv(train_filepath, index = False)
val_df.to_csv(val_filepath, index = False)

# Section 3: Model

# Section 4: Model Training

# Section 5: Making Predictions