## Motivation and Context

The performance analysis of the baseline classification model, particularly through the confusion matrix, revealed a systematic difficulty in distinguishing between specific categories of toxic relationships. The most significant confusions occurred between:

- Narcisista e Succube (Narcissist and Succubus)

- Psicopatico e Adulatrice (Psychopath and Flatterer)

This overlap is not surprising, as both categories describe relational dynamics characterized by antagonistic personalities (lack of empathy, manipulation, grandiosity) and partners who adopt compliant or submissive adaptation strategies. The subtle distinction between these labels, based solely on the text of the dialogues, represents a significant challenge for any classification model.

Hypothesis: By merging these two classes into a single "super-category" named Antagonistic and Adaptive Dynamic, it is hypothesized that we will:

- Create a more robust class with clearer boundaries compared to the others.

- Simplify the classifier's task by reducing the total number of classes from 10 to 9.

In [1]:
import json
import os
import pandas as pd
from collections import Counter


# Original input file paths
INPUT_DIR = '../data/processed'
TRAIN_FILE = os.path.join(INPUT_DIR, 'train.json')
TEST_FILE = os.path.join(INPUT_DIR, 'test.json')
VAL_FILE = os.path.join(INPUT_DIR, 'val.json')

# Paths for the new files with merged classes
OUTPUT_DIR = '.'
TRAIN_MERGED_FILE = os.path.join(OUTPUT_DIR, 'train_merged.json')
TEST_MERGED_FILE = os.path.join(OUTPUT_DIR, 'test_merged.json')
VAL_MERGED_FILE = os.path.join(OUTPUT_DIR, 'val_merged.json')

# Mapping dictionary for merging classes.
CLASS_MAPPING = {
    # Gruppo 1: Dinamiche basate su controllo, dominio e manipolazione
    "Controllore e Isolata": "Dinamica di Controllo e Sottomissione",
    "Manipolatore e Dipendente emotiva": "Dinamica di Controllo e Sottomissione",
    "Dominante e Schiavo emotivo": "Dinamica di Controllo e Sottomissione",
    "Persona violenta e Succube": "Dinamica di Controllo e Sottomissione",

    # Gruppo 2: Dinamiche basate su personalità antagoniste e crudeltà
    "Narcisista e Succube": "Dinamica Antagonista e Adattiva",
    "Psicopatico e Adulatrice": "Dinamica Antagonista e Adattiva",
    "Sadico-Crudele e Masochista": "Dinamica Antagonista e Adattiva"
}


In [2]:
def merge_classes_in_file(input_path: str, output_path: str, mapping: dict):
    """
    Loads a dataset JSON file, merges the classes specified in the mapping,
    and saves the result to a new JSON file.

    Args:
        input_path (str): Path to the input JSON file.
        output_path (str): Path to the output JSON file.
        mapping (dict): Dictionary defining how to merge the classes.
    """
    try:
        # Load the original dataset
        with open(input_path, 'r', encoding='utf-8') as f:
            data = json.load(f)

        new_data = []
        # Iterate over each conversation to update the label
        for conversation in data:
            original_label = conversation['person_couple']
            
            # Apply the mapping: if the label is a key in the dictionary,
            # it is replaced with the corresponding value; otherwise, it remains unchanged.
            new_label = mapping.get(original_label, original_label)
            
            conversation['person_couple'] = new_label
            new_data.append(conversation)

        # Save the new dataset with merged classes
        with open(output_path, 'w', encoding='utf-8') as f:
            json.dump(new_data, f, ensure_ascii=False, indent=4)
        
        print(f"Processing complete for {input_path}. File saved to {output_path}")
        
    except FileNotFoundError:
        print(f"ERROR: Input file {input_path} not found.")
    except Exception as e:
        print(f"An error occurred while processing {input_path}: {e}")

In [3]:
# List of files to process
files_to_process = [
    (TRAIN_FILE, TRAIN_MERGED_FILE),
    (TEST_FILE, TEST_MERGED_FILE),
    (VAL_FILE, VAL_MERGED_FILE)
]

# Run the function for each file
for input_f, output_f in files_to_process:
    merge_classes_in_file(input_f, output_f, CLASS_MAPPING)

Processing complete for ../data/processed\train.json. File saved to .\train_merged.json
Processing complete for ../data/processed\test.json. File saved to .\test_merged.json
Processing complete for ../data/processed\val.json. File saved to .\val_merged.json


In [4]:
try:
    # Load data into pandas DataFrames for easy analysis
    df_original = pd.read_json(TRAIN_FILE)
    df_merged = pd.read_json(TRAIN_MERGED_FILE)

    print("--- Class Distribution in Original Dataset ---")
    print(df_original['person_couple'].value_counts())
    print("\n" + "="*50 + "\n")
    print("--- Class Distribution in Merged Dataset ---")
    print(df_merged['person_couple'].value_counts())

except FileNotFoundError:
    print(f"Verification not possible. Ensure that files '{TRAIN_FILE}' and '{TRAIN_MERGED_FILE}' exist.")
except Exception as e:
    print(f"An error occurred during verification: {e}")


--- Class Distribution in Original Dataset ---
person_couple
Dominante e Schiavo emotivo                 79
Geloso-Ossessivo e Sottomessa               77
Sadico-Crudele e Masochista                 71
Manipolatore e Dipendente emotiva           71
Narcisista e Succube                        71
Psicopatico e Adulatrice                    70
Vittimista e Croccerossina                  69
Controllore e Isolata                       68
Perfezionista Critico e Insicura Cronica    63
Persona violenta e Succube                  61
Name: count, dtype: int64


--- Class Distribution in Merged Dataset ---
person_couple
Dinamica di Controllo e Sottomissione       279
Dinamica Antagonista e Adattiva             212
Geloso-Ossessivo e Sottomessa                77
Vittimista e Croccerossina                   69
Perfezionista Critico e Insicura Cronica     63
Name: count, dtype: int64
