# Project HARDy: Quickstart Guide
Updated 2020-06-09

### INTRO:
Hail and Well-Met! 
 * If you've just downloaded Project HARDy, this will be your guide to the initial setup you need in order to run our package and test it out on your data! 
 * Or, if you're ready to go, Simply click "Kernel" > "Restart & Run All" in the toolbar and watch it go!

 * ** **Note: As of Recent Testing, the default configuration may take >4 Hours to run on a normal Personal laptop. We will discuss speed and future work at the end. For your first start, We suggest starting with a subset of your data and more simple Configuration Files!**

To run this Quick-Start notebook, there are only **~4 Things** we need to talk about and you'll be set to go!
 1. Your Raw Data, in CSV form
 2. A Transformation Configuation File (That works with your data!)
 3. The Tuner and/or Model Configuration Files
 4. Understanding your Computer's Speed limits

For $1 \Rightarrow 3$, Let's start out by giving a Path for each one, and checking that it's pointing to something!

In [5]:
import os.path, numpy as np, pandas as pd, pickle, random
# ^ Multiple imports is a python No-No, but it's fine to save space here...

raw_data_path = '../eisy/examples/simulation_data/'        # 1) Raw Data File Path
tform_config_file = './hardy/arbitrage/tform_config.yaml'  # 2) Transformation Config File. (File, not Path!!)
classifier_config_path = './hardy/recognition/'            # 3) Classifier COnfiguration Path (Path, not File!!)

if os.path.exists(raw_data_path):
    print("1) Raw Data Path is Ok. Found {} Files".format(len(os.listdir(raw_data_path))))
if os.path.exists(tform_config_file):
    print("2) Transform File Path is Ok.")
if os.path.exists(classifier_config_path):
    print("3) Classifier Config Path is Ok")
print("Note: We aren't checking that the files are good, just that the paths exist!")
    

2) Transform File Path is Ok.
3) Classifier Config Path is Ok
Note: We aren't checking that the files are good, just that the paths exist!


## Section 1. Raw Data *(and File-Structure)*
<img src="./doc/Images/quickstart_DataImport.PNG" width=600 p align="right" />

First things first, let's talk about your data to make sure we know what we're working with. We are going to read and open up your data with a creative take on the Pandas "read_csv" importer, which mostly has the following features:
 * It allows your data to have a header (by default up to 100 lines) before the actual data. *(It will loop over the rows until the importer finds a "skiprows" that imports a good numerical DataFrame)* 
 * We will only keep the columns that contain only Numerical Data, so any word-columns will be simply ignored, as shown.
 * The Column names will be kept for the final report, but we will refer to them by their Column Number, as shown.

### 1b) File Structure and File Names:
There are two perfectly good ways we consider to pre-sort your data for the computer to understand it:
<img src="./doc/Images/quickstart_FileNames.PNG" width=350 p align="right" />

 * By using a "label" at the end of the file name
 * By sorting the files into labeled directories beforehand. 
 
For now, we will focus on the first method, so please have "tags" at the end of your file names: **Defined as any string that comes between the final Underscore and the file extension**. We will hopefully have the other method working soon, but it hasn't been fully tested yet. 


In [None]:
import hardy.handling.handling
files = os.listdir(raw_data_path)
fdata = handling._smart_read_csv(files[0], try_skiprows=5)

## Section 2. Transformation Configuration Setup

The underlying hypothesis of our project is that in addition to using data-rich plots to train a CNN model, that data will almost certainly be even more informative and Data-Rich (thus leading to better model performance) with some number of transformantions such as Log, Reciprocal, and plenty of quadratic transformations as options as well! 

This is a potentially infinitely expandable part of the project, including that we eventually want to train an algorythm to create 'best guesses' of possible data-rich transformations to try! **For now, though, Users will create their own Configuration File, with the list of transformations to try on their own data.** We have made this quite easy to set up, as shown in this image below:

Fundamentally, what you are creating is a simple **Dictionary of Commands**, and then passing (at the top) an **Ordered List** of those commands *(because otherwise python will simply do them in alphabetical order!)* The **Instructions** in the config.yaml file can help you with a bit more explaination than we will do here. Feel free to reach out if further explaination would help.

<img src="./doc/Images/quickstart_transformconfig.PNG" width=500 p align="right" />

As mentioned, the **Ordered List, called "tform_command_list"** will simply be any order of the commands below that you want to run. Also, if you only want to run a few, just leave them off of this list! You don't need to remove them from the dictionary!

Finally, the structure of the **"tform_command_dict"** is explained a bit more in the image: 
   * Be sure that the [0] entry is in the range of $0\Rightarrow5$, since that is like saying "RGBrgb" for the X and then Y axis!
   * Be sure that the [1] entry is on the **list of Transformations**! That list works a lot like this dictionary in the package, where each keyword will call a function to perform that transformation. As we grow the package, this list will get ever more creative and may include 2D Transformations as well!
   * Finally, be sure the [2] entry corresponds to the **Column Numbers** of your raw data (as it will be imported. See section 1). 

In [5]:
import hardy.run_hardy as run
run.hardy_multi_transform(raw_data_path, tform_config_path, classifier_config_path,
                          iterator_mode='arrays', classes=['noise', ''], project_name='multi_transform')

Successfully Loaded 11 Transforms to Try!
Processing Data...	From 9002 Files:
Found 9000 CSVs...
Found 3 Labels, Only Expected 2...
	3500 Files of Label : one
	4500 Files of Label : noise
	1000 Files of Label : spread
Loaded	9000 of 9002	Files	 at rate of 201 Files per Second
	 Success!	 About 0.75 Minutes...
Making rgb Images from Data...	

  normalized_image[:, :, i] = img / (np.amax(img, axis=0))


Success in 24.72seconds!
That Took 1.32 Min !
  ...
    to  
  ['...']
  ...
    to  
  ['...']
Train for 237 steps, validate for 27 steps
Epoch 1/3


NotFoundError: Failed to create a directory: .\../eisy/examples/simulation_data/multi_transform/Nyquist_like_Z_Z/trial_2df434fb6034827fd605ee32096204f1\checkpoints\epoch_0; No such file or directory

In [12]:
import numpy as np
image_labels = [ 'noisy', 'not_noisy', 'noisy', 'not_noisy' ,'noisy', 'extra']
# image_data = [( 'file_1', [1,2,3,4], 'not_noisy'), ( 'file_2', [1,2,3,4], 'noisy'), ( 'file_3', [1,2,3,4], 'not_noisy'), ( 'file_4', [1,2,3,4], 'noisy')]

In [13]:
np.unique(image_labels)

array(['extra', 'noisy', 'not_noisy'], dtype='<U9')

In [14]:
for i, label in enumerate(np.unique(image_labels)):
    for j in range(len(image_labels)):
        if image_labels[j]== label:
            image_labels[j] = i
        
print(image_labels)

[1, 2, 1, 2, 1, 0]


In [1]:
raw_data_path = '../eisy/examples/simulation_data/'