## Contextualization
Raw transactional data is loaded and columns of interest are identified for contextualization (layering).

## Clustering
Each layer is clustered independently over all time windows

## Temporal Community Segmentation
Clusters are split up based on their timestamp into multiple time windows


## Feature Engineering
_Features are extracted for each cluster:_
- cluster size
- cluster standard deviation
- cluster scarcity
- cluster popularity (importance I)
- cluster diversity (importance II)

new:
- ??? (range/needed space)
- center


_Features are extracted for each layer:_
- relative cluster sizes
- layer entropy
- distance from global centers

new:
- number of nodes
- number of clusters
- center of clusters

## Cluster Features

In [None]:
from typing import List
import json
import os
from entities import TimeWindow, Cluster

def calculate_metrics_for_clusters(layer_name: str, feature_names: List[str]):
    '''
    :param layer_name: Name of the layer for which multiple time windows exist
    :param feature_names: Features of the layer
    '''
    print(f"Working on {layer_name}")

    path_in = f'input/timeslices/{layer_name}'
    path_out = f'input/metrics/{layer_name}.json'

    complete_clusters: List[Cluster] = []

    for root, _, files in os.walk(path_in):
        for f in files:
            with open(os.path.join(root, f), 'r') as file:
                # for each time window json
                json_slice = json.loads(file.read())
                time_window = TimeWindow.create_from_serializable_dict(json_slice)

                # create all clusters + metrics for one time window
                clusters = Cluster.create_multiple_from_time_window(time_window, feature_names)
                complete_clusters.extend(clusters)
        
    # store the cluster metrics
    with open(path_out, 'w') as file:
        file.write(json.dumps([cl.__dict__ for cl in complete_clusters]))