# Project HARDy: Quickstart Guide
Updated 2020-06-09

### INTRO:
Hail and Well-Met! 
 * If you've just downloaded Project HARDy, this will be your guide to the initial setup you need in order to run our package and test it out on your data! 
 * Or, if you're ready to go, Simply click "Kernel" > "Restart & Run All" in the toolbar and watch it go!

 * ** **Note: As of Recent Testing, the default configuration may take >4 Hours to run on a normal Personal laptop. We will discuss speed and future work at the end. For your first start, We suggest starting with a subset of your data and more simple Configuration Files!**

To run this Quick-Start notebook, there are only **~4 Things** we need to talk about and you'll be set to go!
 1. Your Raw Data, in CSV form
 2. A Transformation Configuation File (That works with your data!)
 3. The Tuner and/or Model Configuration Files
 4. Understanding your Computer's Speed limits

For $1 \Rightarrow 3$, Let's start out by giving a Path for each one, and checking that it's pointing to something!

In [8]:
import os.path, numpy as np, pandas as pd, pickle, random
# ^ Multiple imports is a python No-No, but it's fine to save space here...

raw_data_path = './hardy/local_data/200504_csv_EIS_simulaiton/'        # 1) Raw Data File Path
tform_config_file = './hardy/arbitrage/tform_config.yaml'  # 2) Transformation Config File. (File, not Path!!)
classifier_config_path = './hardy/recognition/'            # 3) Classifier COnfiguration Path (Path, not File!!)

if os.path.exists(raw_data_path):
    print("1) Raw Data Path is Ok. Found {} Files".format(len(os.listdir(raw_data_path))))
if os.path.exists(tform_config_file):
    print("2) Transform File Path is Ok.")
if os.path.exists(classifier_config_path):
    print("3) Classifier Config Path is Ok")
print("Note: We aren't checking that the files are good, just that the paths exist!")
    

1) Raw Data Path is Ok. Found 9003 Files
2) Transform File Path is Ok.
3) Classifier Config Path is Ok
Note: We aren't checking that the files are good, just that the paths exist!


## Section 0: Demo On/Off Switch!

In [26]:
# IF YOU ARE WALKING THROUGH THIS NOTEBOOK, TURN THIS ON
DEMO = True
# IF you just want to run the Classifier, go ahead and turn all the blocks below OFF by making this False.

## Section 1. Raw Data *(and File-Structure)*
<img src="./doc/Images/quickstart_DataImport.PNG" width=600 p align="right" />

First things first, let's talk about your data to make sure we know what we're working with. We are going to read and open up your data with a creative take on the Pandas "read_csv" importer, which mostly has the following features:
 * It allows your data to have a header (by default up to 100 lines) before the actual data. *(It will loop over the rows until the importer finds a "skiprows" that imports a good numerical DataFrame)* 
 * We will only keep the columns that contain only Numerical Data, so any word-columns will be simply ignored, as shown.
 * The Column names will be kept for the final report, but we will refer to them by their Column Number, as shown.

### 1b) File Structure and File Names:
There are two perfectly good ways we consider to pre-sort your data for the computer to understand it:
<img src="./doc/Images/quickstart_FileNames.PNG" width=350 p align="right" />

 * By using a "label" at the end of the file name
 * By sorting the files into labeled directories beforehand. 
 
For now, we will focus on the first method, so please have "tags" at the end of your file names: **Defined as any string that comes between the final Underscore and the file extension**. We will hopefully have the other method working soon, but it hasn't been fully tested yet. 

### 1c) A few other options:
In fact, as you'll see later (and we still have some de-bugging to do...), you can either pass a pre-determined list of categories to expect, or use the file name parser to "figure out" the categories to use!. 

By default, we look at the data and expect to find **2 Categories**. If we find more, then the most popular category is category 0, and all of the others are grouped into another category called "Not_" + the first category. 

 * In our example dataset, the labels are actually given as "**noise**", "**one**", and "**spread**"
 * Instead, we report the labels as "**noise**" and "**not_noise**"

In [30]:
if DEMO:
    from hardy.handling import handling as handling
    files = os.listdir(raw_data_path)
    fload = os.path.join(raw_data_path,files[0])
    fdata, rows = handling._smart_read_csv(fload, try_skiprows=5)
    print("Simple File Import:")
else:
    fdata = pd.DataFrame()
fdata.head()

Simple File Import:


Unnamed: 0.1,Unnamed: 0,freq [Hz],angular_freq [1/s],complex_Z [ohm],Re_Z [ohm],Im_Z [ohm],|Z| [ohm],phase_angle [rad],Re_Z_noise [ohm],Im_Z_noise [ohm]
0,0,1000000.0,6283185.0,(4.052683096634199e-05-0.006365939721856767j),4.1e-05,-0.006366,0.006366,-1.56443,4.1e-05,-0.006366
1,1,792016.405019,4976386.0,(6.460465820414498e-05-0.008037442655613992j),6.5e-05,-0.008037,0.008038,-1.562759,6.5e-05,-0.008037
2,2,627289.98582,3941379.0,(0.00010298614656918597-0.010147686456665913j),0.000103,-0.010148,0.010148,-1.560648,0.000103,-0.010148
3,3,496823.959473,3121637.0,(0.0001641662509285629-0.012811686086172225j),0.000164,-0.012812,0.012813,-1.557983,0.000164,-0.012812
4,4,393492.72631,2472388.0,(0.0002616815879154793-0.01617445858945595j),0.000262,-0.016174,0.016177,-1.554619,0.000262,-0.016174


In [32]:
if DEMO:
    # That last file is wrapped into this one to load ALL the files, 
    # BUT the wrapping function also 
    from hardy.handling import to_catalogue as catalogue
    data_tuples_list = catalogue._data_tuples_from_fnames(raw_data_path)
    print("\nActual Files Imported. Let's look at the first 'tuple' in the List:")
    print("Format Will Be :\t(FILE NAME, DataFrame, LABEL)\n")
    print(data_tuples_list[0])

From 9003 Files:
Found 9000 CSVs...
Found 3 Labels, Only Expected 2...
	3500 Files of Label : one
	4500 Files of Label : noise
	1000 Files of Label : spread
Loaded	9000 of 9003	Files	 at rate of 304 Files per Second
	 Success!	 About 0.49 Minutes...

Actual Files Imported. Let's look at the first 'tuple' in the List:
Format Will Be :	(FILE NAME, DataFrame, LABEL)

('200504-0001_sim_one',          freq [Hz]  angular_freq [1/s]  Re_Z [ohm]  Im_Z [ohm]  |Z| [ohm]  \
0   1000000.000000        6.283185e+06    0.000041   -0.006366   0.006366   
1    792016.405019        4.976386e+06    0.000065   -0.008037   0.008038   
2    627289.985820        3.941379e+06    0.000103   -0.010148   0.010148   
3    496823.959473        3.121637e+06    0.000164   -0.012812   0.012813   
4    393492.726310        2.472388e+06    0.000262   -0.016174   0.016177   
..             ...                 ...         ...         ...        ...   
75        0.025413        1.596773e-01    1.000000   -0.000004   1.000

## Section 2. Transformation Configuration Setup

The underlying hypothesis of our project is that in addition to using data-rich plots to train a CNN model, that data will almost certainly be even more informative and Data-Rich (thus leading to better model performance) with some number of transformantions such as Log, Reciprocal, and plenty of quadratic transformations as options as well! 

This is a potentially infinitely expandable part of the project, including that we eventually want to train an algorythm to create 'best guesses' of possible data-rich transformations to try! **For now, though, Users will create their own Configuration File, with the list of transformations to try on their own data.** We have made this quite easy to set up, as shown in this image below:

Fundamentally, what you are creating is a simple **Dictionary of Commands**, and then passing (at the top) an **Ordered List** of those commands *(because otherwise python will simply do them in alphabetical order!)* The **Instructions** in the config.yaml file can help you with a bit more explaination than we will do here. Feel free to reach out if further explaination would help.

<img src="./doc/Images/quickstart_transformconfig.PNG" width=500 p align="right" />

As mentioned, the **Ordered List, called "tform_command_list"** will simply be any order of the commands below that you want to run. Also, if you only want to run a few, just leave them off of this list! You don't need to remove them from the dictionary!

Finally, the structure of the **"tform_command_dict"** is explained a bit more in the image: 
   * Be sure that the [0] entry is in the range of $0\Rightarrow5$, since that is like saying "RGBrgb" for the X and then Y axis!
   * Be sure that the [1] entry is on the **list of Transformations**! That list works a lot like this dictionary in the package, where each keyword will call a function to perform that transformation. As we grow the package, this list will get ever more creative and may include 2D Transformations as well!
   * Finally, be sure the [2] entry corresponds to the **Column Numbers** of your raw data (as it will be imported. See section 1). 

In [37]:
if DEMO: 
    from hardy.arbitrage import arbitrage as arbitrage
    command_list, command_dict = arbitrage.import_tform_config(tform_config_file, raw_df=data_tuples_list[0][1])
    print("Successfully Found and Tested the Transform Config!")
    print(command_list)
    print("\nFirst Transform to run is:\n\n{}".format(command_list[0]))
    print(command_dict[command_list[0]])
    

Successfully Loaded 11 Transforms to Try!
Successfully Found and Tested the Transform Config!
['Nyquist_like_Z_Z', 'Bode_like_LogF_ZPhase', 'NyBode_like_LogF_ZZ', 'Raw_FZ_Z', 'LogF_RawZZ', 'Recip_FZZ', 'Raw_FZPhase', 'LogF_RawZPhase', 'Recip_FZPhase', 'Raw_FZZ_ZPhase', 'LogF_ZZ_ZPhase']

First Transform to run is:

Nyquist_like_Z_Z
[[0, '1d_raw', 6], [5, '1d_raw', 7]]


## Section 3. Configuring the Model and Tuner



For the classifier section of the package, there are two separate configuration files:

* cnn_config.yaml


_A configuration file which contains the hyperparameters to use in the single convolutional neural network.
The configuration file is easy to fill out and interact with._ 


__Note__: Make sure that the hyperparameters found in the config. file are also used in the cnn model


<img src="./doc/Images/quickstart_cnn_config.PNG" width=500 p align="center" />


* tuner_config.yaml
    A configuration file containing the hyperparamter search space for the tuning step. This should substitute the single cnn model. 

In [5]:
import hardy.run_hardy as run
run.hardy_multi_transform(raw_data_path, tform_config_path, classifier_config_path,
                          iterator_mode='arrays', classes=['noise', ''], project_name='multi_transform')

Successfully Loaded 11 Transforms to Try!
Processing Data...	From 9002 Files:
Found 9000 CSVs...
Found 3 Labels, Only Expected 2...
	3500 Files of Label : one
	4500 Files of Label : noise
	1000 Files of Label : spread
Loaded	9000 of 9002	Files	 at rate of 201 Files per Second
	 Success!	 About 0.75 Minutes...
Making rgb Images from Data...	

  normalized_image[:, :, i] = img / (np.amax(img, axis=0))


Success in 24.72seconds!
That Took 1.32 Min !
  ...
    to  
  ['...']
  ...
    to  
  ['...']
Train for 237 steps, validate for 27 steps
Epoch 1/3


NotFoundError: Failed to create a directory: .\../eisy/examples/simulation_data/multi_transform/Nyquist_like_Z_Z/trial_2df434fb6034827fd605ee32096204f1\checkpoints\epoch_0; No such file or directory