# Deep Horizon Plots

@author: jhuthmacher

This notebook shows how to use the high level plotting API.

It is structured as follows:

1. Import necessary modules
2. Load raw data (features + proton intensities)
3. Load model data (predictions + observations)
4. Plots

The functions that are used here are a high level api functions for the deep horizon plotting mechanism and, therefore, doesn't have any customization possibilities. In general, all functions just require the **data** (in the correct format!) and you could also handover a **path**, where the plot should be saved. The **path** is not mandatory and if it is not provided it will use some default path (e.g. `figures/proton_intensities_relations.pdf`). Important, the path needs the **file name at the end**!

## Plotting API structure

Python file `src.visualization.plot_results.py` (located in `src.visualization`) - Plotting functions :
* This file contains all underlying functions for creating the plots
* When you have to adapt things this would be a place to look at. Be carfeul to not destroy the general usage.

## About the plotting mechanism
The  plotting API is devided in different parts you can use depending on your need for customization. 

1. **Generator Functions (high level API)**

For each plot that is available yet you have a so called *generator function* that just creates the plot with default settings and without the possibility to adapt things. This notebooks only contains generator functions! They are located in `visualization.plot_results`

A list of all generator functions:
* `generate_feature_imp_plot`: Creates feature importance plot
* `generate_correlation_matrix`: Creates correlation matrix
* `generate_pos_distr`: Creates the positional distribution plots for X, Y, Z
* `generate_pos_intensity_heatmaps`: Creates the intensity distribution for X, Y, Z (mean as aggregation)
* `generate_pred_heatmaps`: Creates the prediction vs. observation heatmaps
* `generate_proton_relation_plot`: Creates proton/feature relation plot (Figure 3)
* `generate_feature_distr`: Creates distribution plots for features

## Data Structure for Plotting
To us the plot function one have to handover the correct data in the expected format. Below you find a brief description about how the data files that are not automatically downloaded should be organized on your machine. This mostly relates to the model data such as predictions or feature importances.

```
model_path = ./model/
test_path = ./model/obs_vs_pred_csv/test/
train_path = ./model/obs_vs_pred_csv/train/
fi_path = ./model/feature_imp_csv/test/

+-- model
    +-- feature_imp_csv
    |   +-- test
    |       +-- fi_ch1_test.csv
    |       +-- fi_ch2_test.csv
    |       +-- fi_ch3_test.csv
    |       +-- fi_ch4_test.csv
    |       +-- fi_ch5_test.csv
    +-- obs_vs_pred_csv
        +-- test
        |   +-- p1_obs_vs_predict.csv
        |   +-- p2_obs_vs_predict.csv
        |   +-- p3_obs_vs_predict.csv
        |   +-- p4_obs_vs_predict.csv
        |   +-- p5_obs_vs_predict.csv
        +-- train
            +-- p1_obs_vs_predict.csv
            +-- p2_obs_vs_predict.csv
            +-- p3_obs_vs_predict.csv
            +-- p4_obs_vs_predict.csv
            +-- p5_obs_vs_predict.csv
```

In [None]:
###########
# Imports #
###########

# Load notebook magic for reloading packages automatically
%load_ext autoreload
%autoreload 2

import sys
sys.path.insert(0, "../")

import pandas as pd 
import numpy as np 
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib as mpl
from datetime import datetime

############################
# Stlying for presentation #
############################
plt.style.use('seaborn')
sns.set_context("paper")

In [None]:
################
# Data Loading #
################
from utils.data_utils import load_data

dataLog10, data, _, _ = load_data()

In [None]:
####################
# Load Model Data  #
####################
from utils.data_utils import load_model_data

# For details have a look at the initial description
test_path = "../data/model/obs_vs_pred_csv/test/"  # SET_PATH
train_path = "../data/model/obs_vs_pred_csv/train/"  # SET_PATH

dfs1 = load_model_data(test_path, channels=[1, 2, 3, 4, 5])
dfs1_train = load_model_data(train_path, channels=[1, 2, 3, 4, 5])

In [None]:
############################
# Load Feature Importances #
############################
from utils.data_utils import load_feature_importances

# For details have a look at the initial description
fi_path = "../data/model/feature_imp_csv/test/"  # SET_PATH

df_feature_imp2 = load_feature_importances(fi_path, mode="test", channels=[1,2,3,4,5])

In [None]:
######################
# Feature Importance #
######################
from visualization.plot_results import generate_feature_imp_plot

# Merge combined features
df_feature_imp2["feature"] = df_feature_imp2["feature"].str.replace("_combined", "")

generate_feature_imp_plot(df_feature_imp2, pivot="feature", val="perm_imp", fmt="1.3f")

In [None]:
######################
# Correlation Matrix #
######################
from visualization.plot_results import generate_correlation_matrix
generate_correlation_matrix(dataLog10[dataLog10.columns[:-7]].dropna())

In [None]:
################################
# Positional Distribution Plot #
################################
from visualization.plot_results import generate_pos_distr
generate_pos_distr(data)

In [None]:
#################################
# Positional Intensity Heatmaps #
#################################
from visualization.plot_results import generate_pos_intensity_heatmaps
generate_pos_intensity_heatmaps(dataLog10)

In [None]:
#######################
# Prediction Heatmaps #
#######################
from visualization.plot_results import generate_pred_heatmap
import matplotlib
params = {'legend.fontsize': 'xx-large',
          'axes.labelsize': 'xx-large',
          'axes.titlesize':'xx-large',
          'xtick.labelsize':'x-large',
          'ytick.labelsize':'x-large'}
matplotlib.rcParams.update(params)

channels = [0, 1, 3, 3, 4]

for ch in channels:
    train = dfs1_train[ch]
    test = dfs1[ch]
    generate_pred_heatmap(train, test, annotated_text=f"Channel {ch + 1}",
                          path=f".figures/prediction_heatmap_ch{ch+1}.pdf")

In [None]:
#############
# Time Plot #
#############
from visualization.plot_results import plot_pred_obs_time

sns.set_context("talk")

start_date = datetime(2015, 9, 19, 1, 15)
end_date = datetime(2015, 9, 19, 20, 30)

fig = plot_pred_obs_time(dfs1, save_plot=True, idx_range=(start_date, end_date))

In [None]:
###########################
# Proton Feature Relation #
###########################
from visualization.plot_results import generate_proton_relation_plot

generate_proton_relation_plot(dataLog10, channel="p1")

In [None]:
######################
# Distribution Plots #
######################
from visualization.plot_results import generate_feature_distr
generate_feature_distr(dataLog10)