<a href="https://colab.research.google.com/github/ArthurCBx/Applied_Data_Representation_Coursera/blob/main/Module3/Assignmtent3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Easiest option:** Implement the bar coloring as described above - a color scale with at least three colors, (e.g. blue, white, and red). Assume the user provides the y axis value of interest as a parameter or variable.

**Harder option:** Implement the bar coloring as described in the paper, where the color of the bar is actually based on the amount of data covered (e.g. a gradient ranging from dark blue for the distribution being certainly below this y-axis, to white if the value is certainly contained, to dark red if the value is certainly not contained as the distribution is above the axis).

**Even Harder option:** Add interactivity to the above, which allows the user to click on the y axis to set the value of interest. The bar colors should change with respect to what value the user has selected.

**Hardest option:** Allow the user to interactively set a range of y values they are interested in, and recolor based on this (e.g. a y-axis band, see the paper for more details).


In [None]:
from ipywidgets import interact
import ipywidgets as widgets
import matplotlib.pyplot as plt
import matplotlib.colors as mcolors
import pandas as pd
import numpy as np
from scipy import stats

@interact(y_axis_value=widgets.IntText(
    value=40000,
    description='y_axis value of interest',
    step=500,
    disabled=False
))

def plot_bar(y_axis_value):
  np.random.seed(12345)

  df = pd.DataFrame([np.random.normal(32000,200000,3650),
                   np.random.normal(43000,100000,3650),
                   np.random.normal(43500,140000,3650),
                   np.random.normal(48000,70000,3650)],
                  index=[1992,1993,1994,1995])

  # Calculate the mean and standart deviation for each year
  means = df.mean(axis=1)
  stds = df.std(axis=1)

  # Calculate the 95% interval of confidence for each year's statistic
  t_crit = stats.t.ppf(0.975,df=df.shape[1]-1)
  sems = stds / np.sqrt(df.shape[1])
  ci95 = t_crit * sems

  # Computing the probability that mean > threashold
  probs = []
  for mean, sem in zip(means,sems):
    z = (y_axis_value - mean) / sem
    p = 1 - stats.norm.cdf(z) # P(mean > threashold)
    probs.append(p)

  # Getting the color map through matplotlib
  cmap = mcolors.LinearSegmentedColormap.from_list('blue_white_red',
                                                  ['darkblue','white','darkred'])

  # Drawing the bars with conficence intervals and color set
  fig, ax = plt.subplots(figsize=(12,10))
  ax.bar(range(len(df)),means,
        yerr=ci95, capsize=12,
        color = [cmap(v) for v in probs],
        edgecolor='black')

  # Renaming xticks to the years
  ax.set_xticks(range(len(df)),df.index)

  # Drawing a grey line on y_axis set
  ax.axhline(y=y_axis_value, color='grey')

  # Setting gradient color legend on the bottom of the figure
  sm = plt.cm.ScalarMappable(cmap=cmap, norm=plt.Normalize(vmin=0, vmax=1))
  sm.set_array([])
  cbar = plt.colorbar(sm,ax=ax, ticks=np.arange(0,1,0.1),location='bottom')
  cbar.set_label('confidence on data being above y_axis set',fontsize=13,alpha=1)

  # Printing vertical lines to separate values
  for x in np.arange(0,1.1,0.1):
    cbar.ax.axvline(x,ymin=0,ymax=1,color='black',alpha=0.7)
  fig.savefig('bar_plot.png')

interactive(children=(IntText(value=40000, description='y_axis value of interest', step=500), Output()), _dom_…