# Single Annotator (Me)

In [1]:
import pandas as pd
import numpy as np

df = pd.read_excel('Data/SingleAnnotation.xlsx')

df

Unnamed: 0,Text,Annotator 1
0,I believe there is,Other
1,HI! I texted the number on the site and am wai...,Buyer
2,"Hello, We have been searching homes and finall...",Buyer
3,I will be in Florida in mid October and we wil...,Buyer
4,Hello. Saw Ken's YouTube on Airbnb in reunion ...,Buyer
...,...,...
122,"Not right now. Wendy Feel free to browse, I'm ...",Other
123,"Hello, Alicia",Other
124,who are you? Faith I'm a customer service repr...,Other
125,"Hello, my name is Jeady. I am looking for a va...",Buyer


In [2]:
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report

texts = df["Text"]
labels = df["Annotator 1"]

vectorizer = CountVectorizer()

X = vectorizer.fit_transform(texts)

X_train, X_test, y_train, y_test = train_test_split(X, labels, test_size=0.2)

classifier = RandomForestClassifier(n_estimators=100)
classifier.fit(X_train, y_train)

y_pred = classifier.predict(X_test)

In [3]:
from sklearn.metrics import precision_score, recall_score, f1_score

precision = precision_score(y_test, y_pred, average='weighted') #Use weighted as the data is fairly weighted towards other and buyer
recall = recall_score(y_test, y_pred, average='weighted')
f1 = f1_score(y_test, y_pred, average='weighted')

print("Precision:", precision)
print("Recall:", recall)
print("F1-Score:", f1)

Precision: 0.5998168498168499
Recall: 0.7307692307692307
F1-Score: 0.6563545150501672


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


## Classifier Decision

The Machine Learning Classifier I chose was the Random Forest Classifier. This classifier combines multiple decision trees allowing it to make a more robust model. It is also less likely to overfit than a Decision Tree Classifier which I felt was necessary as we are working with a smaller dataset that seemed to be annotated heavily with Other and Buyer.

## Just my Annotations Metrics

Overall, this classifier is performing alright but not perfectly. There are more than likely a few reasons for this. I think one of the biggest reasons is the uneven amount of different items in the annotations. Overall there are way more Buyers and Others than any other classification. This most likely is causing the model to predict these items more than other items.

# IAA

In [13]:
from statsmodels.stats.inter_rater import fleiss_kappa

df = pd.read_excel('Data/Annotations.xlsx')
df.drop(["Text"], axis=1, inplace=True)
df["Annotator 1"] = df["Annotator 1"].str.lower()
annotations = df

categories = annotations.apply(pd.Series.value_counts, axis=1).fillna(0)
categories = categories.astype(int)  

kappa_score = fleiss_kappa(categories.values)

print("Fleiss' Kappa:", kappa_score)

Fleiss' Kappa: 0.6868586236912926


## IAA Metric

For this assignment I utilized Fleiss' Kappa. I did this for a few reasons. The first of which is because I had multiple annotators, 5 to be specific (me included). I couldnt utilize Cohen's Kappa or Scott's Pi as they were both more for 2 annotators than 5 so it made sense that Fleiss would be a choice. The other major reason is for the change agreement adjustments. As this was given to 4 of my friends to annotate I worried some may just go through it quick without even looking at some of the stuff. Due to this I wanted to ensure that agreement by chance was accounted for. This seemed to have worked as I am getting ~0.69 agreement.

## Ways to Get Better Annotations
The first way that I could think of was to clarify/define what each category meant. While looking through the data I did notice that some of the discorse on why one category was chosen over the other was most likely because they didnt actualy understand the category. Due to this some of my friends chose categories that I personally wouldn't have, however that is slightly expected with different annotators. Thus it would make sense that before handing them this I should have clarified each category and even possibly given an example of each category to ensure everyone thought of the categories in a similar way.

The second way I can think to do this is something we did in assignment 8, have the annotators annotate a few times. With this I could then create a new column of their most selected category for each visitor text. This would help annotators in a few ways. When I had to annotate something twice I noticed I was much more confident throughout the entire annotation as I already grasped what I would most likely be seeing and could compare what I was choosing. This helped me make not only more annotations but I feel better annotations as well. This could have helped the agreement go up even further. 

I think the main idea I am getting at is that each annotator should have a good grasp of what they are annotating, whether thats preemptively telling them what each category means or making them do it multiple times so they have a better understanding. While I think you could do something like setting a time limit so they have to choose whatever comes first or other methods, I think it is a better use of time to ensure that the annotators grasp what they are annotating.

In [5]:
df = pd.read_excel('Data/Annotations.xlsx')

df

Unnamed: 0,Text,Annotator 1,Annotator 2,Annotator 3,Annotator 4,Annotator 5
0,I believe there is,Other,other,other,other,other
1,HI! I texted the number on the site and am wai...,Buyer,buyer,buyer,buyer,buyer
2,"Hello, We have been searching homes and finall...",Buyer,buyer,buyer,buyer,buyer
3,I will be in Florida in mid October and we wil...,Buyer,buyer,buyer,buyer,buyer
4,Hello. Saw Ken's YouTube on Airbnb in reunion ...,Buyer,other,other,other,buyer
...,...,...,...,...,...,...
122,"Not right now. Wendy Feel free to browse, I'm ...",Other,other,other,other,other
123,"Hello, Alicia",Other,other,other,other,other
124,who are you? Faith I'm a customer service repr...,Other,realtor network,realtor network,realtor network,realtor network
125,"Hello, my name is Jeady. I am looking for a va...",Buyer,other,other,other,buyer


In [6]:
annotator_columns = ["Annotator 1",	"Annotator 2", "Annotator 3", "Annotator 4", "Annotator 5"]
df["Annotator 1"] = df["Annotator 1"].str.lower()
df

Unnamed: 0,Text,Annotator 1,Annotator 2,Annotator 3,Annotator 4,Annotator 5
0,I believe there is,other,other,other,other,other
1,HI! I texted the number on the site and am wai...,buyer,buyer,buyer,buyer,buyer
2,"Hello, We have been searching homes and finall...",buyer,buyer,buyer,buyer,buyer
3,I will be in Florida in mid October and we wil...,buyer,buyer,buyer,buyer,buyer
4,Hello. Saw Ken's YouTube on Airbnb in reunion ...,buyer,other,other,other,buyer
...,...,...,...,...,...,...
122,"Not right now. Wendy Feel free to browse, I'm ...",other,other,other,other,other
123,"Hello, Alicia",other,other,other,other,other
124,who are you? Faith I'm a customer service repr...,other,realtor network,realtor network,realtor network,realtor network
125,"Hello, my name is Jeady. I am looking for a va...",buyer,other,other,other,buyer


In [7]:
texts = df["Text"]
annotations = df[["Annotator 1", "Annotator 2", "Annotator 3", "Annotator 4", "Annotator 5"]] 

# Smooth the data
smoothed_labels = []
for _, row in annotations.iterrows():
    label_counts = row.value_counts(normalize=True) 
    majority_class = label_counts.idxmax()  
    
    majority_weight = 0.9
    num_other_classes = len(label_counts) - 1

    smoothed_label = {}
    if num_other_classes > 0:
        other_weight = (1 - majority_weight) / num_other_classes
        smoothed_label = {label: other_weight for label in label_counts.index}
        smoothed_label[majority_class] = majority_weight
    else:
        smoothed_label[majority_class] = 1.0
    
    smoothed_labels.append(smoothed_label)

synthetic_texts, synthetic_labels = [], []
for i, text in enumerate(texts):
    for label, weight in smoothed_labels[i].items():
        samples = int(weight * 100)  
        synthetic_texts.extend([text] * samples)
        synthetic_labels.extend([label] * samples)

vectorizer = CountVectorizer()
X = vectorizer.fit_transform(synthetic_texts)
y = np.array(synthetic_labels)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

classifier = RandomForestClassifier(n_estimators=100)
classifier.fit(X_train, y_train)

y_pred = classifier.predict(X_test)

In [8]:
precision = precision_score(y_test, y_pred, average='weighted')
recall = recall_score(y_test, y_pred, average='weighted')
f1 = f1_score(y_test, y_pred, average='weighted')

print("Precision:", precision)
print("Recall:", recall)
print("F1-Score:", f1)

Precision: 0.9627017815598478
Recall: 0.9659944642151048
F1-Score: 0.963971326407356


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


## Improvements

Based on the scores that were calculated we see a major improvement in this model over the first model. There is one major reason as to why the model performed so well, smoothing. Smoothing allowed me to ingest each of the annotators annotations and allow each one to have a weight to account for the variability in how different annotators may interpret the same data. This helped the classifier learn which datapoints are which and perform exceedingly better. I worked with this method after I realized that doing majority on each annotated datapoint didnt improve results (atleast initial testings it didnt, seems to now). I have provided that code below as well incase doing smoothing was too much code changing for the model. Either way it seemed to have improved via having more annotators, again most likely due to having better classified data.

In [9]:
df = pd.read_excel('Data/Annotations.xlsx')

df

Unnamed: 0,Text,Annotator 1,Annotator 2,Annotator 3,Annotator 4,Annotator 5
0,I believe there is,Other,other,other,other,other
1,HI! I texted the number on the site and am wai...,Buyer,buyer,buyer,buyer,buyer
2,"Hello, We have been searching homes and finall...",Buyer,buyer,buyer,buyer,buyer
3,I will be in Florida in mid October and we wil...,Buyer,buyer,buyer,buyer,buyer
4,Hello. Saw Ken's YouTube on Airbnb in reunion ...,Buyer,other,other,other,buyer
...,...,...,...,...,...,...
122,"Not right now. Wendy Feel free to browse, I'm ...",Other,other,other,other,other
123,"Hello, Alicia",Other,other,other,other,other
124,who are you? Faith I'm a customer service repr...,Other,realtor network,realtor network,realtor network,realtor network
125,"Hello, my name is Jeady. I am looking for a va...",Buyer,other,other,other,buyer


In [10]:
annotator_columns = ["Annotator 1",	"Annotator 2", "Annotator 3", "Annotator 4", "Annotator 5"]
df['majority_label'] = df[annotator_columns].mode(axis=1)[0]
df

Unnamed: 0,Text,Annotator 1,Annotator 2,Annotator 3,Annotator 4,Annotator 5,majority_label
0,I believe there is,Other,other,other,other,other,other
1,HI! I texted the number on the site and am wai...,Buyer,buyer,buyer,buyer,buyer,buyer
2,"Hello, We have been searching homes and finall...",Buyer,buyer,buyer,buyer,buyer,buyer
3,I will be in Florida in mid October and we wil...,Buyer,buyer,buyer,buyer,buyer,buyer
4,Hello. Saw Ken's YouTube on Airbnb in reunion ...,Buyer,other,other,other,buyer,other
...,...,...,...,...,...,...,...
122,"Not right now. Wendy Feel free to browse, I'm ...",Other,other,other,other,other,other
123,"Hello, Alicia",Other,other,other,other,other,other
124,who are you? Faith I'm a customer service repr...,Other,realtor network,realtor network,realtor network,realtor network,realtor network
125,"Hello, my name is Jeady. I am looking for a va...",Buyer,other,other,other,buyer,other


In [None]:
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.linear_model import LogisticRegression

texts = df['Text']
labels = df['majority_label']

vectorizer = CountVectorizer()

X = vectorizer.fit_transform(texts)

X_train, X_test, y_train, y_test = train_test_split(X, labels, test_size=0.2, random_state=42)

classifier = LogisticRegression(max_iter=1000)
classifier.fit(X_train, y_train)

y_pred = classifier.predict(X_test)

In [12]:
precision = precision_score(y_test, y_pred, average='weighted')
recall = recall_score(y_test, y_pred, average='weighted')
f1 = f1_score(y_test, y_pred, average='weighted')

print("Precision:", precision)
print("Recall:", recall)
print("F1-Score:", f1)

Precision: 0.7457983193277311
Recall: 0.7692307692307693
F1-Score: 0.752913752913753


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
