# Distortion summary - Demo of the NDimensional interactive visualization and ML regression
  (Bokeh+ipywidgets based)
  

* 0) Import libraries
* 1) Load csv file with distortion and split them per sector to enable correlation
* 2) Draw distortion as function of other distortions
* 3) Draw Distortion as function of the TRD flux
* 4)Load fitter and fit distortion in regions as function of other distortions
  * fit mean and median properties + model "error" estimates  -from bootstrapping
  * quantiles tp probe PDF - not yet in the demo
  *  inspect fit results  - corealtion and time series - local  data,data- fit, pulls (data-fit)/error
  
## For time series distortion,fit - see the end of the notebook
 * distortion vs time
 * distortion-fit vs time
 * (distortion-fit)/"predicted" error vs time

# 0) Import libraries
* MLpipeline.NDFunctionInterface
  * wrapper on top of Keras and sklearn algorithms for the moment (RandomForest, KNN)
  * main purpose  - provide error estimates+ combined estimators (e.g weighted mean)
  * simulatianous regression and comaprison
* TTreeHnInteractive.bokehTools 
  * interactive visualization of multidimensional data
  * graphics using pandas+Bokeh
  * interface like in our old TTree based interace  - but interactive

In [None]:
from RootInteractive.tutorial.distortionCase.distortionStudy import *
from RootInteractive.TTreeHnInteractive.TTreeHnBrowser import *
from RootInteractive.InteractiveDrawing.bokeh.bokehTools import *
import matplotlib.pyplot as plt
from RootInteractive.MLpipeline.NDFunctionInterface import  DataContainer, Fitter
#from RootInteractive.InteractiveDrawing.bokeh.bokehDrawPanda import *
from RootInteractive.InteractiveDrawing.bokeh.bokehDraw import *
output_notebook()
p3 = figure(plot_width=400, plot_height=250, title="template")

# 1) Load csv file with distortion and split them per sector to enable correlation
* csv file extracted before for combiend root trees
  * one liner ''''AliTreePlayer::selectWhatWhereOrderBy(tree,vars,selection,"",0,10000,"csvroot","data.csv");''''
* additional derived varaible defined similar as formula "aliases" in  tree

In [None]:
#AliTreePlayer::selectWhatWhereOrderBy(tree,vars,selection,"",0,10000,"csvroot","distortionAll.csv");
input=os.path.expandvars("$NOTESData/JIRA/ATO-336/DistortionsTimeSeries/distortionAll.csv")
df=readDataFrame(input)
dfsplit=splitDistortionFrame(df)
print("load csv file", input, df.shape, dfsplit.shape)
dfsplit=dfsplit.query("iz2x>2 & meanTRDCurrent<0.3  & abs(invTRDCurrentNorm)<0.05 & gascompH2O>0")
dfsplit=SetAlias(dfsplit,"H2O","gascompH2O/100.")
dfsplit['date']=pd.to_datetime(dfsplit['time'], unit='s')
tooltips=[('Gas  composition (Ar,CO2, H2O)','(@gascompAr, @gascompCO2, @gascompH2O)'), ('current','@meanTRDCurrent'), ('Delta current','@deltaTRDCurrentNorm %'), ('date','@date')]
dfsplit.head(3)
#list(dfsplit)

## 2) Inspect distortion  as function of mean distortion
* interactive wrapper using ipywidgets and Bokeh (similar as  in old root + but with  interacativity)
* distortion in sector 2,4,6,9,16,20,30 as function of mean distortion (40 minutes sampling)
* secondary parameters could be controlled by user defined sliders

In [None]:
bokehDrawPanda?

In [None]:
vars="drphiSector2:drphiSector4:drphiSector6:drphiSector7:drphiSector9:drphiSector16:drphiSector20:drphiSector30"
sliders="slider.meanTRDCurrent(0,0.5,0.05,0,1):slider.H2O(0,5,0.2,0,5):slider.deltaTRDCurrentNorm(0.0,10.,0.1,0,10)"
plot2=bokehDraw(dfsplit.sample(200),"time>0","drphiMean",vars,"H2O",sliders,p3,ncols=3, commonX=1, commonY=1,tooltip=tooltips,size=5)

# 3) Inspect distortion as function of the TRD flux (2015-2016 data)
* demonstation of interactive graphics using Bokeh+ipywidgets 
* distortion increasing with flux (TRD current estimators)
  * backround characterization - estimated using the radial profile of TRD currents - not exaplaing data
  * differnt bands vissible - hypothesies  -gas mixture?

In [None]:
vars="drphiSector2:drphiSector4:drphiSector6:drphiSector7:drphiSector9:drphiSector16:drphiSector20:drphiSector30"
sliders="slider.meanTRDCurrent(0,0.5,0.05,0,1):slider.H2O(0,5,0.2,0,5):slider.deltaTRDCurrentNorm(0.0,10.,0.1,0,10)"
plot2=bokehDraw(dfsplit.sample(200),"time>0","meanTRDCurrent",vars,"H2O",sliders,p3,ncols=3, commonX=1,tooltip=tooltips,size=5)

# 4)  Load fitter and fit distortion as function of other distortions
* prepare data
* register regression method (to be done by expert as in our TMVA interface)
* evaluate and register regression 

In [None]:
varFit='drphiSector2'
variableX= ['drphiMean',"H2O", "iz2x",'bz',"deltaTRDCurrentNorm"]
dataContainer = DataContainer(dfsplit, variableX, varFit, [500,500])
fitter = Fitter(dataContainer)

#fitter.Register_Method('KM200','KerasModel', 'Regressor', layout = [200, 10, 10],  epochs=100, dropout=0.1, l1=0.1)
fitter.Register_Method('KNN','KNeighbors', 'Regressor')
fitter.Register_Method('RF','RandomForest', 'Regressor', n_estimators=100, max_depth=10)
fitter.Register_Method('RF200','RandomForest', 'Regressor', n_estimators=200, max_depth=10)
fitter.Register_Method('KM','KerasModel', 'Regressor', layout = [50, 50, 50], epochs=300, dropout=0.4)
fitter.Fit()
test=dataContainer.Test_sample
#fitter.Compress('KM')
for method in ['RF', 'KNN', 'RF200', 'KM']: 
    test = fitter.AppendOtherPandas(method,test)

In [None]:
fitter.printImportance()

## 4.1)  Bokeh visualization of regression results

In [None]:
p3 = figure(plot_width=400, plot_height=250, title="drphiSector")
plot,data,dummy=drawColzArray(test," abs (invTRDCurrentNorm)<0.05&year<2017", varFit,"RF:RF200:KNN:KM","H2O",p3,ncols=2,tooltip=tooltips,size=5)
show(plot)
plot,data,dummy=drawColzArray(test," abs (invTRDCurrentNorm)<0.05&year<2017", "RF","RF:RF200:KNN:KM","H2O",p3,ncols=2,tooltip=tooltips,size=5)
show(plot)

In [None]:
fitter.AppendStatPandas("RF",test)
fitter.AppendStatPandas("RF200",test)
test=SetAlias(test,"pullRF_2","(drphiSector2-RFMedian)/RFRMS")
test=SetAlias(test,"pullRF200_2","(drphiSector2-RF200Median)/RF200RMS")
plot,data,dummy=drawColzArray(test,"abs(invTRDCurrentNorm)<0.05&year<2017", "RF","RFMedian","H2O",p3,ncols=3)
show(plot)
test['pullRF_2'].plot.hist()

# TRD curent model
* make regression of the dostortion
  * flux (TRD current estimator)
  * background estimators
  * gas compostion
* export  fit errors  

In [None]:
#variableX= ['meanTRDCurrent','deltaTRDCurrent','bz','bckg0Mean', 'bckg1Mean', 'bckg2Mean',"gascompH2O"]
variableX= ['meanTRDCurrent','deltaTRDCurrentNorm','bz',"H2O","iz2x", "bsign","gascompCO2"]
#variableX= ['meanTRDCurrent','deltaTRDCurrentNorm','bz',"iz2x", "bsign"]
x = DataContainer(dfsplit, variableX, ['drphiSector4'], [500,500])
fitter = Fitter(x)
fitter.Register_Method('KNN','KNeighbors', 'Regressor')
fitter.Register_Method('RF','RandomForest', 'Regressor', n_estimators=100, max_depth=10)
fitter.Register_Method('RF200','RandomForest', 'Regressor', n_estimators=200, max_depth=10)
#list(variableX)
fitter.Fit()
for method in ['RF', 'KNN', 'RF200']: 
    dfsplit = fitter.AppendOtherPandas(method,dfsplit)
fitter.printImportance()    

In [None]:
fitter.AppendStatPandas("RF",dfsplit)
fitter.AppendStatPandas("RF200",dfsplit)
dfsplit=SetAlias(dfsplit,"pullRF","(drphiSector2-RFMedian)/RFRMS")
dfsplit=SetAlias(dfsplit,"deltaRF","(drphiSector2-RFMedian)")
p = figure(plot_width=500, plot_height=300, title="drphiSector2")
plot,data,dummy=drawColzArray(dfsplit.sample(300)," trdMeanMedianL0<1 & abs (invTRDCurrentNorm)<0.05&year<2017", "meanTRDCurrent","drphiSector4:RF200","H2O",p,commonX=1,commonY=1,size=5,
                  tooltip=tooltips)
show(plot)

## Time series distortion,fit
 * distortion vs time
 * distortion-fit vs time
 * (distortion-fit)/"predicted" error vs time

In [None]:
ptime = figure(plot_width=1000, plot_height=150, title="template")
sliders="slider.meanTRDCurrent(0,0.3,0.03,0,1):slider.H2O(0,5,0.2,0,5):slider.deltaTRDCurrentNorm(-0.0,0.1,0.01,-0.2,0.2)"
plot0=bokehDraw(dfsplit,"trdMeanMedianL0<1 & abs (invTRDCurrentNorm)<0.2&year<2017& iz2x==4","date","meanTRDCurrent:drphiSector4:deltaRF:pullRF:H2O:gascompCO2","H2O",sliders,ptime,ncols=1, commonX=1,size=5,
                  tooltip=tooltips,x_axis_type='datetime')


In [None]:
dfsplit['date']