# Manual Annotation

To ensure that the manual annotator tool matches the categories used by the automatic annotator (e.g., DeepFace), we'll use the same categories that DeepFace uses for gender, race, and age annotations.

Automatic Annotation Categories:

**Gender:** DeepFace typically returns "Man" or "Woman" for gender.

**Race/Ethnicity:** DeepFace returns a dominant race from the following categories:

asian
indian
black
white
middle eastern
latino hispanic

**Age:** DeepFace provides age as an integer, but we convert it into age groups by decade.

# Annotation Guide

You will be required to annotate 12 images by providing answers for three categories: age, gender, and ethnicity. 

### Instructions
1. **Enter a Nickname**: Please start by entering a nickname that cannot be linked to your identity. This ensures anonymity in the annotation process.
  
2. **Annotate Each Image**: You will then see the first image. For each image, select one option for each category: **age**, **gender**, and **ethnicity**. Your choices should be based on your best judgment using the definitions provided below:
   - **Age Groups**: Choose from the following:
      - **Baby** (0–1 year)
      - **Child** (2–9 years)
      - **Adolescent** (10–19 years)
      - **Young Adult** (20–29 years)
      - **Middle-aged Adult** (30–49 years)
      - **Older Adult** (50–65 years)
      - **Elderly** (65+ years)
      
   - **Gender**: Choose from **Man** or **Woman**. Note that this category refers to perceived gender based on appearance.
   
   - **Ethnicity**: Choose from the following categories:
      - **Asian**
      - **Indian**
      - **Black**
      - **White**
      - **Middle Eastern**
      - **Latino/Hispanic**
      
   Please base your judgments on visible traits. Keep in mind that these labels are intended for research into automated systems and may not capture the complexity of individual identities.

3. **Save Your Selections**: After selecting an option for all three categories, click on "Save."

4. **Proceed to the Next Image**: Click "Next" to move to the next image.

5. **Repeat the Process**: Continue this process for all images.

6. **Review Completed Annotations**: After completing all annotations, review your entries to ensure that no fields are left blank.

### Important Notes
- **Accuracy and Judgment**: Please annotate each image as accurately as possible, based on your best judgment. If you are uncertain, make your best guess. This helps us assess which images are challenging for annotators, an important part of the analysis.
  
- **Ground Truth and Consistency**: Since this dataset consists of unknown individuals, we do not have exact demographic information. We aim to use your annotations to establish a "ground truth" based on consensus among annotators, which will help us evaluate automated system performance.

- **Ethical Considerations**: Recognize that categories like age, gender, and ethnicity involve subjective assessment and can be sensitive. The purpose of this annotation is to evaluate biases in automated systems and improve their accuracy, not to define individuals. Please approach the task respectfully, keeping in mind the limitations of these categories.

- **Why Your Input Matters**: Your responses are crucial for understanding annotation consistency, identifying challenging cases, and evaluating automated system performance. Your work helps build more accurate and fair models in the future.

In [None]:
import tkinter as tk
from tkinter import ttk, messagebox, simpledialog
from PIL import Image, ImageTk
import pandas as pd
import os
from functools import reduce
import matplotlib.pyplot as plt

In [None]:
class AnnotationTool:
    def __init__(self, root, image_dir, annotations_file):
        self.root = root
        self.image_dir = image_dir
        self.annotations_file = annotations_file

        # Load images from directory
        self.image_files = [f for f in os.listdir(image_dir) if f.endswith(('.jpg', '.png'))]
        self.current_index = 0

        # Load or create annotation DataFrame
        self.annotations_df = self.load_or_create_annotations()

        # Store the current image reference to avoid garbage collection
        self.image_ref = None

        # Build GUI components
        self.build_gui()

        # Defer the nickname prompt until the main window is ready
        self.root.after(100, self.prompt_nickname)

        # Display the first image
        self.display_image()

    def prompt_nickname(self):
        """Prompt the user to enter their nickname for annotations."""
        nickname = simpledialog.askstring("Annotator Name", "Please enter your nickname:")
        if not nickname:
            nickname = "Anonymous"  # Default to Anonymous if no name is provided
        self.annotator = nickname  # Store the nickname for later use

    def load_or_create_annotations(self):
        """Load existing annotations or create a new DataFrame."""
        if os.path.exists(self.annotations_file):
            return pd.read_csv(self.annotations_file)
        else:
            return pd.DataFrame(columns=["Image", "Age", "Gender", "Ethnicity", "Annotation_by"])

    def build_gui(self):
        """Setup the layout of the GUI components."""
        # Image display label
        self.image_label = tk.Label(self.root)
        self.image_label.grid(row=0, column=0, columnspan=2)

        # Age dropdown (Combobox)
        tk.Label(self.root, text="Age:").grid(row=1, column=0)
        self.age_combobox = ttk.Combobox(
            self.root, 
            values=[
                "Baby (0-1)", "Child (2-9)", "Adolescent (10-19)", 
                "Young Adult (20-29)", "Middle-aged Adult (30-49)", 
                "Older Adult (50-65)", "Elderly (65+)"
            ]
        )
        self.age_combobox.grid(row=1, column=1)

        # Gender dropdown (Combobox)
        tk.Label(self.root, text="Gender:").grid(row=2, column=0)
        self.gender_combobox = ttk.Combobox(self.root, values=["Man", "Woman"])
        self.gender_combobox.grid(row=2, column=1)

        # Ethnicity dropdown (Combobox)
        tk.Label(self.root, text="Ethnicity:").grid(row=3, column=0)
        self.ethnicity_combobox = ttk.Combobox(
            self.root, 
            values=["Asian", "Indian", "Black", "White", "Middle Eastern", "Latino/Hispanic"]
        )
        self.ethnicity_combobox.grid(row=3, column=1)

        # Save button
        self.save_button = tk.Button(self.root, text="Save", command=self.save_annotation)
        self.save_button.grid(row=4, column=0)

        # Next image button
        self.next_button = tk.Button(self.root, text="Next", command=self.next_image)
        self.next_button.grid(row=4, column=1)

        # Reminder label
        self.reminder_label = tk.Label(self.root, text="Please click on Save after annotating the image.", fg="blue")
        self.reminder_label.grid(row=5, column=0, columnspan=2)

    def display_image(self):
        """Load and display the current image, and populate any existing annotations."""
        image_path = os.path.join(self.image_dir, self.image_files[self.current_index])
        img = Image.open(image_path)
        img.thumbnail((400, 400))  # Resize image for display
        
        # Store the image in a class attribute to prevent garbage collection
        self.image_ref = ImageTk.PhotoImage(img)

        # Display image in the label
        self.image_label.config(image=self.image_ref)

        # Load existing annotations if any
        current_image = self.image_files[self.current_index]
        self.load_existing_annotations(current_image)

    def load_existing_annotations(self, current_image):
        """Load existing annotations for the current image if available."""
        if current_image in self.annotations_df['Image'].values:
            annotation = self.annotations_df[self.annotations_df['Image'] == current_image].iloc[0]
            self.age_combobox.set(annotation['Age'])
            self.gender_combobox.set(annotation['Gender'])
            self.ethnicity_combobox.set(annotation['Ethnicity'])
        else:
            # Clear entries if no annotation is found
            self.age_combobox.set('')
            self.gender_combobox.set('')
            self.ethnicity_combobox.set('')

    def save_annotation(self):
        """Save the current annotation to the DataFrame."""
        current_image = self.image_files[self.current_index]
        age = self.age_combobox.get()
        gender = self.gender_combobox.get()
        ethnicity = self.ethnicity_combobox.get()

        # Check if all fields are filled in
        if not age or not gender or not ethnicity:
            messagebox.showwarning("Incomplete Data", "Please fill in all fields before saving.")
            return

        # Update or add annotation
        if current_image in self.annotations_df['Image'].values:
            self.annotations_df.loc[self.annotations_df['Image'] == current_image, ['Age', 'Gender', 'Ethnicity', 'Annotation_by']] = [age, gender, ethnicity, self.annotator]
        else:
            self.annotations_df = self.annotations_df.append({"Image": current_image, "Age": age, "Gender": gender, "Ethnicity": ethnicity, "Annotation_by": self.annotator}, ignore_index=True)

        # Save annotations to file
        self.annotations_df.to_csv(self.annotations_file, index=False)

        # Show a confirmation popup
        messagebox.showinfo("Saved", f"Annotation for {current_image} saved successfully!")

    def next_image(self):
        """Navigate to the next image."""
        if self.current_index < len(self.image_files) - 1:
            self.current_index += 1
        else:
            self.current_index = 0  # Loop back to the first image
        self.display_image()

# Main section to run the GUI
if __name__ == '__main__':
    # Create the root window
    root = tk.Tk()
    root.title("Image Annotation Tool")

    # Specify the directory where images are located and the CSV file to save annotations
    image_directory = '../../datasets/manual'  # Replace with your directory
    annotations_file = 'annotations_K.csv'  # CSV file for saving annotations

    # Initialize the annotation tool
    app = AnnotationTool(root, image_directory, annotations_file)

    # Start the GUI loop
    root.mainloop()


In [None]:
annot1_data = pd.read_csv('annotations_K.csv')
#annot2_data = pd.read_csv('annotations_G.csv')
#annot3_data = pd.read_csv('annotations_M.csv')
#annot4_data = pd.read_csv('annotations_I.csv')

In [None]:
annot1_data

In [None]:
Test_data = pd.read_csv('face_analysis_results.csv')
Test_data

In [None]:
# List of DataFrames to merge
dfs = [annot1_data, annot2_data, Test_data]

# Merge all DataFrames on 'Image' using reduce and pd.merge
merged_data = reduce(lambda left, right: pd.merge(left, right, on='Image'), dfs)

# Optionally, save the merged data to a new CSV file
merged_data.to_csv('merged_annotations_and_analysis.csv', index=False)

# Rename columns for clarity
merged_data = merged_data.rename(columns={
    'Age_x': 'Age_manual_1',
    'Gender_x': 'Gender_manual_1',
    'Race_x': 'Race_manual_1',
    'Annotation_by_x': 'Annotation_by_manual_1',
    'Age_y': 'Age_manual_2',
    'Gender_y': 'Gender_manual_2',
    'Race_y': 'Race_manual_2',
    'Annotation_by_y': 'Annotation_by_manual_2',
    'Age': 'Age_automatic',
    'Gender': 'Gender_automatic',
    'Race': 'Race_automatic'
})

# Define a function to create age groups
def create_age_group(age):
    if age < 20:
        return '0-19'
    elif 20 <= age < 30:
        return '20-29'
    elif 30 <= age < 40:
        return '30-39'
    elif 40 <= age < 50:
        return '40-49'
    elif 50 <= age < 60:
        return '50-59'
    else:
        return '60+'

# Apply the age group function to all age columns (manual and automatic)
merged_data['Age_group_manual_1'] = merged_data['Age_manual_1'].apply(create_age_group)
merged_data['Age_group_manual_2'] = merged_data['Age_manual_2'].apply(create_age_group)
merged_data['Age_group_automatic'] = merged_data['Age_automatic'].apply(create_age_group)

merged_data

# 1. (Dis)agreement Metrics:
We’ll calculate the agreement between the two manual annotations and the automatic annotations for Gender, Race, and Age.

Gender Agreement:
- Manual 1 vs Automatic
- Manual 2 vs Automatic

Race Agreement:
- Manual 1 vs Automatic
- Manual 2 vs Automatic

In [None]:
# Gender agreement between Manual 1 and Automatic
gender_agreement_manual1_automatic = (merged_data['Gender_manual_1'] == merged_data['Gender_automatic']).mean() * 100

# Gender agreement between Manual 2 and Automatic
gender_agreement_manual2_automatic = (merged_data['Gender_manual_2'] == merged_data['Gender_automatic']).mean() * 100

print(f"Gender Agreement between Manual 1 and Automatic: {gender_agreement_manual1_automatic}%")
print(f"Gender Agreement between Manual 2 and Automatic: {gender_agreement_manual2_automatic}%")

# Race agreement between Manual 1 and Automatic
race_agreement_manual1_automatic = (merged_data['Race_manual_1'] == merged_data['Race_automatic']).mean() * 100

# Race agreement between Manual 2 and Automatic
race_agreement_manual2_automatic = (merged_data['Race_manual_2'] == merged_data['Race_automatic']).mean() * 100

print(f"Race Agreement between Manual 1 and Automatic: {race_agreement_manual1_automatic}%")
print(f"Race Agreement between Manual 2 and Automatic: {race_agreement_manual2_automatic}%")

# Calculate agreement between Manual 1 and Automatic for Age Groups
age_group_agreement_manual1_automatic = (merged_data['Age_group_manual_1'] == merged_data['Age_group_automatic']).mean() * 100

# Calculate agreement between Manual 2 and Automatic for Age Groups
age_group_agreement_manual2_automatic = (merged_data['Age_group_manual_2'] == merged_data['Age_group_automatic']).mean() * 100

# Calculate agreement between Manual 1 and Manual 2 for Age Groups
age_group_agreement_manual1_manual2 = (merged_data['Age_group_manual_1'] == merged_data['Age_group_manual_2']).mean() * 100

# Display the results
print(f"Age Group Agreement between Manual 1 and Automatic: {age_group_agreement_manual1_automatic}%")
print(f"Age Group Agreement between Manual 2 and Automatic: {age_group_agreement_manual2_automatic}%")
print(f"Age Group Agreement between Manual 1 and Manual 2: {age_group_agreement_manual1_manual2}%")


# 2. Bias Detection:
To investigate bias, we can check if certain races are consistently over/underpredicted compared to manual annotations. 

In [None]:
# Bias in Race predictions (Manual 1 vs Automatic)
race_bias_manual1_automatic = merged_data.groupby(['Race_manual_1', 'Race_automatic']).size().unstack(fill_value=0)
print("Race Bias (Manual 1 vs Automatic):")
print(race_bias_manual1_automatic)

# Bias in Race predictions (Manual 2 vs Automatic)
race_bias_manual2_automatic = merged_data.groupby(['Race_manual_2', 'Race_automatic']).size().unstack(fill_value=0)
print("Race Bias (Manual 2 vs Automatic):")
print(race_bias_manual2_automatic)

# 3. Agreement Metrics by Category:
You can calculate agreement by category (e.g., by Gender or Race) to see how well the automatic system agrees with manual annotations in specific subgroups.

Agreement by Gender:

In [None]:
# Agreement by Gender (Manual 1 vs Automatic)
gender_group_agreement_manual1 = merged_data.groupby('Gender_manual_1').apply(
    lambda x: (x['Gender_manual_1'] == x['Gender_automatic']).mean() * 100)

# Agreement by Gender (Manual 2 vs Automatic)
gender_group_agreement_manual2 = merged_data.groupby('Gender_manual_2').apply(
    lambda x: (x['Gender_manual_2'] == x['Gender_automatic']).mean() * 100)

print("Gender Agreement by Gender (Manual 1 vs Automatic):")
print(gender_group_agreement_manual1)

print("Gender Agreement by Gender (Manual 2 vs Automatic):")
print(gender_group_agreement_manual2)


In [None]:
# Agreement by Race (Manual 1 vs Automatic)
race_group_agreement_manual1 = merged_data.groupby('Race_manual_1').apply(
    lambda x: (x['Race_manual_1'] == x['Race_automatic']).mean() * 100)

# Agreement by Race (Manual 2 vs Automatic)
race_group_agreement_manual2 = merged_data.groupby('Race_manual_2').apply(
    lambda x: (x['Race_manual_2'] == x['Race_automatic']).mean() * 100)

print("Race Agreement by Race (Manual 1 vs Automatic):")
print(race_group_agreement_manual1)

print("Race Agreement by Race (Manual 2 vs Automatic):")
print(race_group_agreement_manual2)


# 4. Overall Metrics:
To compute the overall accuracy between manual annotations and automatic predictions:

In [None]:
# Overall accuracy for Gender and Race (Manual 1 vs Automatic)
overall_accuracy_manual1 = ((merged_data['Gender_manual_1'] == merged_data['Gender_automatic']) & 
                            (merged_data['Race_manual_1'] == merged_data['Race_automatic'])).mean() * 100

# Overall accuracy for Gender and Race (Manual 2 vs Automatic)
overall_accuracy_manual2 = ((merged_data['Gender_manual_2'] == merged_data['Gender_automatic']) & 
                            (merged_data['Race_manual_2'] == merged_data['Race_automatic'])).mean() * 100

print(f"Overall Accuracy (Manual 1 vs Automatic): {overall_accuracy_manual1}%")
print(f"Overall Accuracy (Manual 2 vs Automatic): {overall_accuracy_manual2}%")


#  5. Inter-Annotator Agreement (IAA):
**Inter-annotator agreement (IAA)** is a general term referring to how much multiple annotators agree on their annotations. It measures the consistency between human annotators. The **Cohen’s Kappa score** is one way to compute inter-annotator agreement, but other methods like **Fleiss' Kappa** or **Krippendorff’s alpha** are also used for multi-annotator setups.




## 5. Cohen's Kappa:


The **Kappa score (Cohen's Kappa)** is a statistical measure used to evaluate the level of agreement between two annotators or classifiers, taking into account the possibility of agreement occurring by chance. It’s often used to assess the reliability of categorical data annotations.

### Cohen's Kappa Formula:
$$
\kappa = \frac{P_o - P_e}{1 - P_e}
$$

- $P_o$: The observed agreement (the percentage of times the annotators agreed).
- $P_e$: The expected agreement by chance (the percentage of agreement you would expect by random chance).

### Cohen's Kappa Score Interpretation:
- **1.0**: Perfect agreement.
- **0.8 to 1.0**: Almost perfect agreement.
- **0.6 to 0.8**: Substantial agreement.
- **0.4 to 0.6**: Moderate agreement.
- **0.2 to 0.4**: Fair agreement.
- **0.0 to 0.2**: Slight agreement.
- **<0.0**: Less than chance agreement (negative kappa).


In [None]:
from sklearn.metrics import cohen_kappa_score

# Calculate Cohen's Kappa for Age Group between Manual 1 and Automatic
kappa_manual1_automatic = cohen_kappa_score(merged_data['Age_group_manual_1'], merged_data['Age_group_automatic'])

# Calculate Cohen's Kappa for Age Group between Manual 2 and Automatic
kappa_manual2_automatic = cohen_kappa_score(merged_data['Age_group_manual_2'], merged_data['Age_group_automatic'])

# Calculate Cohen's Kappa for Age Group between Manual 1 and Manual 2
kappa_manual1_manual2 = cohen_kappa_score(merged_data['Age_group_manual_1'], merged_data['Age_group_manual_2'])

# Display the results
print(f"Cohen's Kappa (Manual 1 vs Automatic): {kappa_manual1_automatic}")
print(f"Cohen's Kappa (Manual 2 vs Automatic): {kappa_manual2_automatic}")
print(f"Cohen's Kappa (Manual 1 vs Manual 2): {kappa_manual1_manual2}")