# CR ML Road Map


- Step 1:
    - Transfrom Mock Data from Text File to Numpy Dataframe 
- Step 2:
    - ReCalculate Normalized Factor for Mock Data
- Step 3:
    - Machine Learning
        - using whole data to train
        - retrain the model with the data in 6 $\sigma$ CL region 

## Import Packages

In [2]:
from __future__ import absolute_import, division, print_function, unicode_literals
# basic python package
import importlib
import numpy as np
import time
import logging
importlib.reload(logging)
logging.basicConfig(level = logging.INFO)

# python ploting packages
import matplotlib.pyplot as plt
from matplotlib.ticker import MaxNLocator
from matplotlib.colors import ListedColormap, LinearSegmentedColormap, BoundaryNorm
from matplotlib.collections import LineCollection
from matplotlib import cm


# self-define classes
from script import CR_ML_Class as CR
from script import load_mock_data as LD


# tensorflow
import tensorflow as tf
from tensorflow.keras.models import Sequential, load_model
print("Tensorflow Version is {}".format(tf.__version__))
print("Keras Version is {}".format(tf.keras.__version__))
!nvudia-smi

Tensorflow Version is 2.4.1
Keras Version is 2.4.0
/bin/bash: nvudia-smi: command not found


## Transfrom Mock Data from Text File to Numpy Dataframe 

- Data Description:


    - parameter : propagation and source parameters 
        numpy shape: (# of mock data, 14)
            #parameter: original parameter 
            #new_parameter: recalculate the normal factor and Ap 

            raw1=D_0, 
            raw2=\delta, 
            raw3=z_h, 
            raw4=v_A, 
            raw5=\eta, 
            raw6=A_p, 
            raw7=\nu_1, 
            raw8=\nu_2, 
            raw9=log10(R_{br,1}), 
            raw10=\nu_3, 
            raw11=log10(R_{br,2}), 
            raw12=N_{Li}, 
            raw13=N_{Be}, 
            raw14=N_{O}


    - data: Mock data
        numpy shape: (# of mock data, 84, 6)
            84 means there are 84 energy bins from 1.000e-03 to 1.105e+05
            6 means there are the spectrum for E, Li, Be, B, C, O
            #data_0: original mocak data
            #modify_data_0: reshape data accroding to recalculated parameters


     - chi: chi-square 
        numpy shape: (# of mock data)
            #new_chi: chi-square from "modify_data"

In [130]:
%%time
text_Data_path = "../Data/Text_Mock_data/"

# mockdata_1 = CR.Mock_Data_to_NumpyArray(text_Data_path + "res_3x_1.txt")
mockdata_1 = CR.Mock_Data_to_NumpyArray(text_Data_path + "return_4.txt")
origin_parameter, data, chisq = mockdata_1.parameter, mockdata_1.spectrum, mockdata_1.chisq

logging.info("Data Shape for 'parameter': {}".format(origin_parameter.shape))
logging.info("Data Shape for 'data': {}".format(data.shape))
logging.info("Data Shape for 'chisq': {}".format(chisq.shape))

INFO:root:Fri Aug 06 01:57:17 2021
INFO:root:Now loading...
INFO:root:Total data: 30000
INFO:root:[3;33mTime consumption : 0.4208 min[0;m
INFO:root:Data Shape for 'parameter': (30000, 15)
INFO:root:Data Shape for 'data': (2520000, 11)
INFO:root:Data Shape for 'chisq': (30000,)


CPU times: user 25.1 s, sys: 536 ms, total: 25.7 s
Wall time: 25.7 s


## ReCalculate Normalized Factor for Mock Data

- 1: Using   
    `CR.Mock_Data_Rescale(parameter, parameter, data)`   
    to split 'data' into Li, Be, B, C and O `spectra`.  
    Note that we put the same 'parameter' here because we have not get new normalized factor yet
    
    - `spectra`: Mock data
        numpy shape: (# of mock data, 84, 6)
            84 means there are 84 energy bins from 1.000e-03 to 1.105e+05
            6 means there are the spectrum for E, Li, Be, B, C, O
            #data_0: original mocak data
            #modify_data_0: reshape data accroding to recalculated paramete

In [131]:
%%time
spectra_data = CR.Mock_Data_Rescale(origin_parameter=origin_parameter, new_parameter=origin_parameter, spectrum=data, usedata=False)
logging.info("Data Shape for 'spectra_data': {}".format(spectra_data.data.shape))
logging.info("There are {} mock data.".format(spectra_data.data.shape[0]))
logging.info("For each mock data, there are {} energy bins.".format(spectra_data.data.shape[1]))
logging.info("{} corresponding to E, Li, Be, B, C and O.".format(spectra_data.data.shape[2]))

INFO:root:Data Shape for 'spectra_data': (30000, 84, 6)
INFO:root:There are 30000 mock data.
INFO:root:For each mock data, there are 84 energy bins.
INFO:root:6 corresponding to E, Li, Be, B, C and O.


CPU times: user 172 ms, sys: 24 ms, total: 196 ms
Wall time: 195 ms


- 2: Using  
`CR.ReCalculateAp(spectra_data.data)`   
  to recalculate Ap

In [132]:
%%time
importlib.reload(CR)
new_Ap = CR.ReCalculateAp(spectra_data.data[:100]) 
new_Ap.GetBestAp()

INFO:root:Fri Aug 06 01:57:43 2021
INFO:root:Finding best Ap
INFO:root:=====START=====
100%|██████████| 100/100 [00:00<00:00, 111.24it/s]
INFO:root:[3;33m Time Cost for this Step : 0.0150 min[0;m
INFO:root:=====Finish=====
INFO:root:[3;33mTime Cost : 0.0152 min[0;m


CPU times: user 908 ms, sys: 7.98 ms, total: 916 ms
Wall time: 919 ms


- 3: Using   
`CR.ReCalculateN((spectra_data.new_parameter,spectra_data.data,ap=new_Ap.ap_5))`  
to recalculate normalized factor ($N_{Li}$, $N_{Be}$ and $N_{O}$)

In [138]:
%%time
importlib.reload(CR)
new_normalized_factor = CR.ReCalculateN(spectra_data.new_parameter[:100],spectra_data.data[:100],ap=new_Ap.ap_5)
new_normalized_factor.GetBestN()

INFO:root:Fri Aug 06 01:59:41 2021
INFO:root:Finding New Normalized Factor
INFO:root:=====START=====
100%|██████████| 100/100 [00:01<00:00, 80.12it/s]
INFO:root:[3;33m Time Cost for this Step : 0.0208 min[0;m
INFO:root:=====Finish=====
INFO:root:[3;33mTime Cost : 0.0211 min[0;m


CPU times: user 1.27 s, sys: 5.93 ms, total: 1.27 s
Wall time: 1.27 s


- 4: Using   
    `CR.New_Parameter(spectra_data.new_parameter,new_normalized_factor.new_factor,ap_5=new_Ap.ap_5).new_parameter`
    to get new parameter array

In [143]:
%%time
new_parameter = CR.New_Parameter(spectra_data.new_parameter[:100],new_normalized_factor.new_factor,ap_5=new_Ap.ap_5).new_parameter

INFO:root:Fri Aug 06 02:00:36 2021
INFO:root:[3;33mTime Cost : 0.0000 min[0;m


CPU times: user 2.67 ms, sys: 6 µs, total: 2.68 ms
Wall time: 1.98 ms


- 5: Put `origin_parameter` and `new_parameter` back into  
    `CR.Mock_Data_Rescale(parameter, new_parameter, spectra_data.data,usedata = True)`  
    to get `new parameter` array, `new data` array and new $\chi^2$`. 

In [144]:
%%time
new_spectra_data = CR.Mock_Data_Rescale(origin_parameter[:100], new_parameter, spectra_data.data[:100], usedata = True)

parameter = new_spectra_data.new_parameter
data = new_spectra_data.data

chisq = CR.Calculate_Chi_Square(data=data,usedata=True) 
chi = chisq.chi_square()

logging.info("Data Shape for 'parameter': {}".format(parameter.shape))
logging.info("Data Shape for 'data': {}".format(data.shape))
logging.info("Data Shape for 'chi': {}".format(chi.shape))

INFO:root:Fri Aug 06 02:00:37 2021
INFO:root:Fit the Spectrum.
INFO:root:=====START=====
INFO:root:[3;33m Time Cost for this Step : 0.0001 min[0;m
INFO:root:=====Finish=====
INFO:root:Calculate Chi-Square.
INFO:root:=====START=====
INFO:root:[3;33m Time Cost for this Step : 0.0001 min[0;m
INFO:root:=====Finish=====
INFO:root:

INFO:root:[3;33m Total Time Consumption : 0.0004 min[0;m
INFO:root:Data Shape for 'parameter': (100, 14)
INFO:root:Data Shape for 'data': (100, 84, 6)
INFO:root:Data Shape for 'chi': (100,)


CPU times: user 28.1 ms, sys: 3.04 ms, total: 31.1 ms
Wall time: 27.4 ms


- 6: Using  
    `CR.Select_Sample(chi_para, chi_data, chi_sele,1).Sample()`  
    to seperate data into different CL region.

In [147]:
%%time
importlib.reload(CR)
chi_para, chi_data, chi_sele = parameter, data, chi

para_1_sigma, data_1_sigma, _ =  CR.Select_Sample(chi_para, chi_data, chi_sele,1).Sample()
para_2_sigma, data_2_sigma, _ =  CR.Select_Sample(chi_para, chi_data, chi_sele,2).Sample()
para_3_sigma, data_3_sigma, _ =  CR.Select_Sample(chi_para, chi_data, chi_sele,3).Sample()
para_4_sigma, data_4_sigma, _ =  CR.Select_Sample(chi_para, chi_data, chi_sele,4).Sample()
para_5_sigma, data_5_sigma, _ =  CR.Select_Sample(chi_para, chi_data, chi_sele,5).Sample()
para_6_sigma, data_6_sigma, _ =  CR.Select_Sample(chi_para, chi_data, chi_sele,6).Sample()

INFO:root:There are 5 data in the 1 σ region.
INFO:root:[3;33mTime consumption : 0.0000 min[0;m
INFO:root:There are 9 data in the 2 σ region.
INFO:root:[3;33mTime consumption : 0.0000 min[0;m
INFO:root:There are 20 data in the 3 σ region.
INFO:root:[3;33mTime consumption : 0.0000 min[0;m
INFO:root:There are 35 data in the 4 σ region.
INFO:root:[3;33mTime consumption : 0.0000 min[0;m
INFO:root:There are 46 data in the 5 σ region.
INFO:root:[3;33mTime consumption : 0.0000 min[0;m
INFO:root:There are 65 data in the 6 σ region.
INFO:root:[3;33mTime consumption : 0.0000 min[0;m


CPU times: user 17.8 ms, sys: 3.97 ms, total: 21.8 ms
Wall time: 18.1 ms


# Machine Learning

- 1: Using  
     `CR.Mock_Data_Processing(parameter=parameter, data=data, usedata = True)`
     and
     `.Train_Test_split(splitrate = 0.1, split = True)`   
     to whitening data and split into training and test data set with the ratio 9:1.

In [149]:
%%time
importlib.reload(CR)
data_processing = CR.Mock_Data_Processing(parameter=parameter, data=data, usedata = True)
data_processing.Train_Test_split(splitrate = 0.1, split = True)

input_train, input_test = data_processing.input_train, data_processing.input_test
source_train, source_test = data_processing.source_train, data_processing.source_test



INFO:root:Fri Aug 06 02:05:53 2021
INFO:root:[3;33mPrepare Ratio[0;m
INFO:root:Fri Aug 06 02:05:53 2021
INFO:root:Whitening
INFO:root:=====START=====
INFO:root:[3;33m Time Cost for this Step : 0.0000 min[0;m
INFO:root:=====Finish=====
INFO:root:[3;33mTime Cost : 0.0001 min[0;m
INFO:root:random split traning sample and test sample, 10% for test
INFO:root:=====START=====
INFO:root:[3;33m Time Cost for this Step : 0.0000 min[0;m
INFO:root:=====Finish=====
INFO:root:Shape for training Input: (90, 8, 84)
INFO:root:Shape for  testing Input: (10, 8, 84)
INFO:root:Shape for training Target: (90, 10)
INFO:root:Shape for  testing Target: (10, 10)
INFO:root:[3;33mTime Cost : 0.0005 min[0;m


CPU times: user 31.4 ms, sys: 6.94 ms, total: 38.4 ms
Wall time: 35 ms


- 2:Using   
    `ML.ML_Training(input_train,input_test,source_train,source_test,EPOCH=10, save_path="./")`
    to train a model with whole CL region.

In [161]:
%%time
from script import ML_Training as ML
importlib.reload(ML)

ML.ML_Training(input_train,input_test,source_train,source_test,EPOCH=10, save_path="./")

INFO:root:
-----------------------------------------------------------
""python3""
[3;32m ML_Training.py [0;m
[3;31m Usage: ML_Training(input_train,input_test,source_train,source_test,EPOCH=100,save_path="save_path") [0;m
[3;31m        Trained Model will be stroed in "Model" directory [0;m
[3;31m Usage: Load_ML_ML_Training(input_train,input_test,source_train,source_test,model_path,EPOCH=250,BATCH=256,save_path="save_path")[0;m
[3;31m        Load Model for snd training [0;m
[3;31m        Trained Model will be stroed in "Model" directory [0;m


-----------------------------------------------------------

INFO:root:Fri Aug 06 02:23:59 2021


Model: "Sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
Conv1D_input (Conv1D)        (None, 8, 512)            43520     
_________________________________________________________________
Conv1D_1 (Conv1D)            (None, 8, 512)            262656    
_________________________________________________________________
Conv1D_2 (Conv1D)            (None, 8, 256)            131328    
_________________________________________________________________
Conv1D_3 (Conv1D)            (None, 8, 256)            65792     
_________________________________________________________________
Conv1D_4 (Conv1D)            (None, 8, 128)            32896     
_________________________________________________________________
max_pooling1d_3 (MaxPooling1 (None, 4, 128)            0         
_________________________________________________________________
flatten_3 (Flatten)          (None, 512)               0

INFO:root:accuracy: 0.90000
INFO:root:mse: 0.00616
INFO:root:mae: 0.06280
INFO:root:mape: 27.44915
INFO:root:[3;33mTime consumption : 0.0622 min[0;m


CPU times: user 3.54 s, sys: 310 ms, total: 3.85 s
Wall time: 3.75 s


- 3:Using   
    `ML.Load_ML_Training(input_train,input_test,source_train,source_test,model_path="./Model/CR_ML.h5", EPOCH = 250, BATCH = 256, save_path="./")`  
    to retrain a model in 6 $\sigma$ CL region.

In [163]:

data_processing = CR.Mock_Data_Processing(parameter=para_6_sigma, data=data_6_sigma, usedata = True)
data_processing.Train_Test_split(splitrate = 0.1, split = True)

input_train, input_test = data_processing.input_train, data_processing.input_test
source_train, source_test = data_processing.source_train, data_processing.source_test


ML.Load_ML_Training(input_train,input_test,source_train,source_test,model_path="./Model/CR_ML.h5", EPOCH = 10, BATCH = 256, save_path="./")

INFO:root:Fri Aug 06 02:25:08 2021
INFO:root:[3;33mPrepare Ratio[0;m
INFO:root:Fri Aug 06 02:25:08 2021
INFO:root:Whitening
INFO:root:=====START=====
INFO:root:[3;33m Time Cost for this Step : 0.0000 min[0;m
INFO:root:=====Finish=====
INFO:root:[3;33mTime Cost : 0.0001 min[0;m
INFO:root:random split traning sample and test sample, 10% for test
INFO:root:=====START=====
INFO:root:[3;33m Time Cost for this Step : 0.0000 min[0;m
INFO:root:=====Finish=====
INFO:root:Shape for training Input: (58, 8, 84)
INFO:root:Shape for  testing Input: (7, 8, 84)
INFO:root:Shape for training Target: (58, 10)
INFO:root:Shape for  testing Target: (7, 10)
INFO:root:[3;33mTime Cost : 0.0004 min[0;m
INFO:root:Fri Aug 06 02:25:08 2021


Model: "Sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
Conv1D_input (Conv1D)        (None, 8, 512)            43520     
_________________________________________________________________
Conv1D_1 (Conv1D)            (None, 8, 512)            262656    
_________________________________________________________________
Conv1D_2 (Conv1D)            (None, 8, 256)            131328    
_________________________________________________________________
Conv1D_3 (Conv1D)            (None, 8, 256)            65792     
_________________________________________________________________
Conv1D_4 (Conv1D)            (None, 8, 128)            32896     
_________________________________________________________________
max_pooling1d_3 (MaxPooling1 (None, 4, 128)            0         
_________________________________________________________________
flatten_3 (Flatten)          (None, 512)               0

INFO:root:accuracy: 0.85714
INFO:root:mse: 0.00610
INFO:root:mae: 0.05680
INFO:root:mape: 19.20176
INFO:root:[3;33mTime consumption : 0.0543 min[0;m


In [148]:
importlib.reload(CR)

help(CR.Select_Sample)

Help on class Select_Sample in module script.CR_ML_Class:

class Select_Sample(builtins.object)
 |  Methods defined here:
 |  
 |  Sample(self)
 |      Usage: 
 |          Select_Sample(parameter, data, total_chisq_list, sigma).Sample()
 |      Return:
 |          para_sigma, data_sigma, chi_sigma
 |      Item:
 |          Null
 |  
 |  __init__(self, parameter=[], data=[], total_chisq_list=[], sigma=[])
 |      Usage: 
 |          Select_Sample(parameter, data, total_chisq_list, sigma)
 |      Return:
 |          Null
 |      Item:
 |          Null
 |  
 |  ----------------------------------------------------------------------
 |  Data descriptors defined here:
 |  
 |  __dict__
 |      dictionary for instance variables (if defined)
 |  
 |  __weakref__
 |      list of weak references to the object (if defined)

