## Evaluating Model Performance by Category

In this cell, we evaluate the model's performance by splitting the generated and target details into categories and calculating various metrics:

### Data Preparation

- **`generated_dict`** and **`target_dict`**: Dictionaries to store generated and target details for each category (0 through 5). The `generated_details` and `target_details` lists are split into these dictionaries based on category indices.

- **`generated_details`** and **`target_details`**: Predictions generated by the model and the actual target values

### Metrics Calculation

- **`categories`**: List of categories for which metrics will be computed: `details_Brand`, `L0_category`, `L1_category`, `L2_category`, `L3_category`, and `L4_category`.

- **`metrics`**: List of metrics to be calculated: `accuracy`, `precision`, `recall`, and `f1`.

For each category:
1. **Compute Metrics**: Accuracy, precision, recall, and F1 score are calculated using `accuracy_score`, `precision_score`, `recall_score`, and `f1_score` from `sklearn.metrics`. Metrics are computed with macro averaging to handle multi-class classification.

2. **Print Results**: The results for each category are printed, showing the calculated metrics with four decimal places.

The printed results provide insight into the performance of the model across different categories and metrics.


In [None]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

generated_dict = {i: [] for i in range(6)}
target_dict = {i: [] for i in range(6)}

for gen, tar in zip(generated_details, target_details):
    for i in range(6):
        generated_dict[i].append(gen[i])
        target_dict[i].append(tar[i])

print('Splitted into category.............\n')

# Clean repeated patterns in L4_category
generated_dict[5] = [text for text in generated_dict[5]]

categories = ['details_Brand', 'L0_category', 'L1_category', 'L2_category', 'L3_category', 'L4_category']
metrics = ['accuracy', 'precision', 'recall', 'f1']

results = {category: {metric: 0 for metric in metrics} for category in categories}

for i, category in enumerate(categories):
    print('Current Category: ', category)
    y_true = target_dict[i]
    y_pred = generated_dict[i]

    results[category]['accuracy'] = accuracy_score(y_true, y_pred)
    results[category]['precision'] = precision_score(y_true, y_pred, average='macro', zero_division=0)
    results[category]['recall'] = recall_score(y_true, y_pred, average='macro', zero_division=0)
    results[category]['f1'] = f1_score(y_true, y_pred, average='macro', zero_division=0)

print()

for category, metrics in results.items():
    print(f"{category}:")
    for metric, value in metrics.items():
        print(f"  {metric}: {value:.4f}")
    print()

## Computing Item-Level Accuracy

In this cell, we define a function to compute item-level accuracy, which measures how often all predicted categories match the target categories for each item:

### Function: `compute_item_accuracy`

- **Inputs**:
  - `generated_details`: List of predicted details for each item.
  - `target_details`: List of true details for each item.

- **Process**:
  - **Count Correct Items**: Iterates through pairs of generated and target details. If all elements in a generated detail match the corresponding elements in the target detail, it counts as a correct item.
  - **Compute Accuracy**: Divides the count of correct items by the total number of items to get the accuracy. Returns `0` if there are no items.

### Execution

- **`item_accuracy`**: Calls `compute_item_accuracy` with the `generated_details` and `target_details` to calculate the accuracy.
- **Print Accuracy**: Prints the item-level accuracy with four decimal places.

Item-level accuracy provides a metric of how well the model performs in predicting all categories correctly for each product.


In [None]:
def compute_item_accuracy(generated_details, target_details):
    correct_items = 0
    total_items = len(generated_details)

    for gen, tar in zip(generated_details, target_details):
        if all(g == t for g, t in zip(gen, tar)):
            correct_items += 1

    return correct_items / total_items if total_items > 0 else 0

item_accuracy = compute_item_accuracy(generated_details, target_details)
print(f"Item-level accuracy: {item_accuracy:.4f}")
