<h1>Visualisation of 1-d correlation output script</h1>

This notebook makes plots of the output yaml files created using:

<code>%>photoz-wg/systematic_tests/one_d_correlate.py SomeFile.Fits</code>

In [41]:
import yaml
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
import glob
import seaborn as sns
sns.set(style="white")

path_to_yaml_output_files = '/Users/hoyleb/Documents/python/modules/photoz-wg/systematic_tests/'

<h2>Plotting stuff</h2>

In [39]:
almost_black = '#262626'
plt.rcParams['figure.figsize'] = (10, 6)
plt.rcParams.update({'font.size': 32, 
                     'axes.linewidth': 5,
                    'text.color': almost_black,
                    'xtick.major.size': 4,
                    'ytick.major.size': 4,
                    'legend.fancybox': True,
                    'figure.dpi': 300,
                    'legend.fontsize': 16,
                    'legend.framealpha': 0.8,
                    'legend.shadow': True,
                    'xtick.labelsize': 22,
                    'ytick.labelsize': 22})

In [40]:
files = glob.glob(path_to_yaml_output_files + 'CorrelationResults_1d*.yaml')
assert len(files) > 0,'Check the file path'

AssertionError: Check the file path

<h3>Loading in the data, and visualising it</h3>

The output of the correlation script is complex, so let's extract the data we need. We'll store it in some arrays. Then we'll pass the NxN arrays to seaborn for nice plots.

To save computation time, in the 1d_correlation_script, we only calcualte features correlations if i>=j.

In [33]:
#load in the yaml files, and extract some items of interest
for k, f1 in enumerate(files):
    res = yaml.load(open(f1, 'r'))
    #{'filename': dataFile, 'correlation_results': test_results, 'columns': cols, 'number_rows': Nrows}
    
    cols = res['columns']
    
    MIarr = np.zeros((len(cols), len(cols)))
    KSarr = np.zeros((len(cols), len(cols)))
    Parr = np.zeros((len(cols), len(cols)))
    mask = np.zeros_like(MIarr, dtype=np.bool)
    
    for i, c1 in enumerate([c for c in cols if 'random' not in c]):
        for j, c2 in enumerate(cols):
            if j >= i:
                MIarr[i,j] = res['correlation_results'][c1][c2]['MI']
                KSarr[i,j] = res['correlation_results'][c1][c2]['KS']
                Parr[i,j] = res['correlation_results'][c1][c2]['CC']
                mask[i,j] = True
                
    #now plot the data
    
    #start with the Mutual Information
    f, ax = plt.subplots(figsize=(11, 9))
    cmap = sns.diverging_palette(220, 10, as_cmap=True)

    sns.heatmap(MIarr, mask=mask, cmap=cmap, vmax=1.0,
                square=True,
                linewidths=.5, cbar_kws={"shrink": .5}, ax=ax, yticklabels=cols, xticklabels=cols)

    # This sets the yticks "upright" with 0, as opposed to sideways with 90.
    plt.yticks(rotation=0) 
    plt.title('MIC: ' + f1.split('/')[-1])
    
    #now the KS- statistic. 0 = perfect agreement, >0.1 not consistent with being drawn from the same parent population
    f, ax = plt.subplots(figsize=(11, 9))
    cmap = sns.diverging_palette(220, 10, as_cmap=True)

    sns.heatmap(KSarr,mask=mask, cmap=cmap, vmax=np.amax(KSarr),
                square=True,
                linewidths=.5, cbar_kws={"shrink": .5}, ax=ax, yticklabels=cols, xticklabels=cols)

    # This sets the yticks "upright" with 0, as opposed to sideways with 90.
    plt.yticks(rotation=0) 
    plt.title('KS test: ' + f1.split('/')[-1])

    
    #Show the Pearson correlation coefficient.
    f, ax = plt.subplots(figsize=(11, 9))
    cmap = sns.diverging_palette(220, 10, as_cmap=True)

    sns.heatmap(Parr, mask=mask, cmap=cmap, vmax=np.amax(Parr),
                square=True,
                linewidths=.5, cbar_kws={"shrink": .5}, ax=ax, yticklabels=cols, xticklabels=cols)

    # This sets the yticks "upright" with 0, as opposed to sideways with 90.
    plt.yticks(rotation=0) 
    plt.title('CC: ' + f1.split('/')[-1])
