# Heatmap

This code allows the calculation of the heatmap of consensus on sense of coreelation between policies and outcomes.

It uses the outputs of the other phases that will be used to compute the heatmap matrix of sum of correlations pondered by the similarity score. 

In this Jupyter Notebook we will: 
1. Import the data with similarity score ; 
2. Import the relevant packages ;
3. Prepare data for computing ;
4. Compute the heatmap ; 
5. Export data heatmap data.

To complete those tasks you will need:
- The dataset of papers with the policy extraction of the 4_similarity_score code. 

At the end of this script you will extract: 
- The heatmap_df dataset of sum of correlations pondered by the similarity score. 

## 1. Import the data with similarity score

In [None]:
## 1 input
## Output dataset of the 4_similarity score (policy_and_factors_clustered_similarity_normalized)
input_similarity = ""

# 1 output
## Heatmap dataset dataset with clusters
output_path = ""

## 2. Import the relevant packages

In [None]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import matplotlib

## 3. Prepare data for computing

In [None]:
df = pd.read_csv(input_similarity)

df['correlation_cluster_normalized_global'] = df['Corr Sign']*df['policy_similarity_normalized_global']*df['CORRELATION_num']

In [None]:
# Define a custom function for percentage of positive or negative values
def percentage_of_values(series):
    if series.mean() > 0:
        return (series > 0).mean()*100
    elif series.mean() < 0:
        return (series < 0).mean()*100
    else:
        return 0

In [None]:
# Modify the aggregation to include the custom percentage
aggregated_result = df.groupby(['matched_cluster', 'Agg Cluster_factor', 'Agg ClusterFalse']).agg(
    correlation_mean_cluster_factor=('correlation_cluster_normalized_global', 'mean'),
    row_count=('correlation_cluster_normalized_global', 'count'),
    percentage_of_positive_negative=('correlation_cluster_normalized_global', percentage_of_values)
).reset_index()

## 4. Compute the heatmap

In [None]:
matplotlib.rcParams['font.family'] = 'Times New Roman'

# Define the threshold
row_count_threshold = 2

# Create pivot tables for mean, standard deviation, and counts
heatmap_data = aggregated_result.pivot(
    index='matched_cluster', 
    columns='Agg Cluster_factor', 
    values='correlation_mean_cluster_factor'
)
percentage_data = aggregated_result.pivot(
    index='matched_cluster', 
    columns='Agg Cluster_factor', 
    values='percentage_of_positive_negative'
)
row_counts_data = aggregated_result.pivot(
    index='matched_cluster', 
    columns='Agg Cluster_factor', 
    values='row_count'
)

# Mask values below the threshold
mask = row_counts_data < row_count_threshold
masked_heatmap_data = heatmap_data.mask(mask)


# Prepare annotations with mean and percentage only for valid rows
annotations = masked_heatmap_data.copy()

# Iterate through the DataFrame in a vectorized manner
valid_indices = ~mask  # Identify valid cells based on the mask

# Apply formatted annotations for valid cells
annotations[valid_indices] = heatmap_data[valid_indices].round(2).astype(str) + "\n(" + \
                             percentage_data[valid_indices].round(1).astype(str) + "%)"

# Fill invalid cells with an empty string
annotations[~valid_indices] = ""

# Generate the heatmap
plt.figure(figsize=(14, 20))

heatmap =sns.heatmap(
    masked_heatmap_data,
    annot=annotations,
    fmt="",
    cmap="coolwarm",
    cbar_kws={'label': 'Correlation Mean'},
    annot_kws={"fontsize": 7, "color": "black"},
    linewidths=0,
)
plt.grid(False)  # Ensure no additional gridlines are added

# Adjust x-axis and y-axis labels
plt.xlabel("Factors Impacted", fontsize=12, labelpad=10)
plt.ylabel("Policies", fontsize=12, labelpad=10)

# Rotate x-axis labels
plt.xticks(rotation=45, ha='right', fontsize=10)  # Tilt column names
plt.yticks(fontsize=10)  # Set y-axis label font size

# Rotate the color bar legend
colorbar = heatmap.collections[0].colorbar
colorbar.ax.set_ylabel("Correlation Mean", fontsize=10, rotation=-90, labelpad=10)

# Adjust layout
plt.tight_layout()
plt.show()

## 5. Export heatmap data 

In [None]:
# Update with your desired output path
heatmap_data.reset_index().to_csv(output_path, index=False)