# DeepSkies Toolbox 

DeepUtils is a suite of tools to aid along in cutting out the fluff of machine learning. 
It automates initial data analysis, processing, loading, and testing/evaluation metrics. 
For the sake of ease of use it is distributed via [pypi](), so it can be installed via pip. (Currently, installable via github, at this point)
Once you have it installed, restart your kernel and load it in with an import statement.

This tutorial will walk you through the main use and provide a little challenge problem to show some further application.

In [None]:
"""
"Beta" additions: 
  - Add command line help function for detailing information about the configuration options.
  - Make the task types more fine-grained. Ex, 'object-detection', 'mask', 'anomaly-detection'
  - Module for more easily interacting with matplotlib.
  - Separating diagnostics out as their own module.
"""

In [None]:
"""
TODO:
  - Add the pythonic definitions as well, time permitting.
"""

In [14]:
!pip install git+https://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@github.com/AeRabelais/DeepUtilities-demo.git
 # REPLACE WITH ORGANIZATION TOKEN AND GITHUB DOWNLOAD

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting git+https://****@github.com/AeRabelais/DeepUtilities-demo.git
  Cloning https://****@github.com/AeRabelais/DeepUtilities-demo.git to /tmp/pip-req-build-il4w49t1
  Running command git clone --filter=blob:none --quiet 'https://****@github.com/AeRabelais/DeepUtilities-demo.git' /tmp/pip-req-build-il4w49t1
  Resolved https://****@github.com/AeRabelais/DeepUtilities-demo.git to commit bd1d84ea7d722b309041622460367e84809c7ac8
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting pytest-cov<4.0.0,>=3.0.0
  Downloading pytest_cov-3.0.0-py3-none-any.whl (20 kB)
Collecting coverage[toml]>=5.2.1
  Downloading coverage-7.2.2-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (227 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━

In [None]:
# Uncomment below upon installation!
# import deeputils 

## DeepBench for Data Generation 

### Generating your configuration YAML

In [None]:
# Generate a default configuration yaml using the following command:

#### Understanding your Configuration File's Content

...

### Generating Random Data Automatically

In [None]:
# Run the following command to generate data on the fly, using the default config:


### Updating your Configuration YAML

In [None]:
# Use the following command for updating the yaml.


## Understanding the YAML Configuration File

Little explainer about the yaml configuration file.


### What's in the Configuration File?

An example yaml configuration file would look as follows:

```
DATA:
  CONFIGURATION: variable that gives info on how your dataset is configured.
  DATA_PATH: path of the training data
  LABEL_PATH: path of the labels for the training data
MODEL_DATA:
  MODEL: model type to use for training, if the user already has a preference.
  PRE_MODEL: saved model files from another task, if applicable.
  TASK_TYPE: The type of ML task being undertaken.
MODEL_PARAMS:
  BATCH_SIZE: The preferred initial batch size.
  LEARNING_RATE: The preferred initial learning rate.
  OPTIMIZER: The preferred initial optimizer.
OUTPUT:
  OUTPUT_PATH: The path where all processing, training, testing, and diagnostic data will be output.
```

Some of the YAML variables come with a few preset options. 



```
CONFIGURATION: {
    "LABELLED_IMAGE_CSV": images and accompanying csv of label information
    "LABELLED_IMAGE_FOLDERS": images broken into folders containing their labels
    "LABELLED_NUMPY": images held in training data and label numpy arrays
    "UNLABELLED": no labels, just images
    "DEEPBENCH": want to generate some random data from DeepBench
    "DEEPGOTDATA": want to download data from DeepGotData
    "MNIST": Load MNIST data
    "CIFAR": Load CIFAR data
    "FMNIST": Load FMNIST data
}

MODEL: Any of the primary models listed via PyTorch's Model Zoo: https://pytorch.org/vision/stable/models.html

TASK_TYPE: ["CLASSIFICATION", "REGRESSION", "FULL_NEURAL_NETWORK"]


    
```



### Create or update config.yaml using the CLI

In [None]:
# Create a default config yaml using the CLI
!deeputils newyaml --config-file-path XX

# new_yaml(config_file_path)

In [None]:
# Update the config yaml using the CLI
!deeputils yaml --path XX --config_dict XX
! deeputils yaml --path XX --config_dict 

## Full Run Command


Assuming that you've correctly configured your files and data, you would be able to implement a full run of DeepUtils using the following command(s):

In [None]:
# For this demo, a full run can be done as listed below.
# Generally, a full run would be run with the command:
# !deeputils full-run config-file-path XX

!deeputils demo-run config-file-path XX 

## Start: Ingesting Data

Data for DeepUtils can be ingested using a few methods (this list will likely expand with more astrophysics-centered download options in the future), allowing someone to use example data from common benchmark datasets, randomly generated astrophysical data, or your own custom dataset.

In [None]:
# Let's use a demo'd version of the ingestion commands for the "custom" dataset input types.
# This version will allow us to download some small example custom datasets.
# Automatically updates your yaml file to input the data paths and data configuration types. ONLY FOR DEMO.

!deeputils demo-ingest config-file-path "./config/"  

# Usually, the command would be:
# !deeputils ingest config-file-path "./config/"  

In [None]:
# If we wanted to use generated DeepBench data instead, you'd do the following:
# !deeputils create-bench-config file-path "./config/" --config_dict {dictionary/values/of/bench/data}

!deeputils create-bench-config config-file-path "./config/" --config_dict "{num_objects: 50}"

In [None]:
# If we wanted to use generated DeepGotData data, we'd use the command:
# data_path here is an optional, meaning it is not necessary. Data path will default to the cwd.

#!deeputils download-got-data --project_id XX --data_path XX

In [None]:
# We'll begin the pipeline by running the command for a demo of DeepUtils.

!deeputils demo-ingest config-file-path "./config/"

## Preprocessing Data

(Must run the last step if you don't have any data added!) Data preprocessing via DeepUtils automates the process of basic feature analysis using pandas-profiling, erasing missing data, replacing missing data (when applicable), running data interpolation (when applicable), removing outliers, or calculating feature importance.

---



In [None]:
# Running basic preprocessing is as simple as follows:
!deeputils demo-preprocess config-file-path "./config/" --option/featuring/desired/preprocessing/method --erase_missing

## Model Selection, Training, and Diagnosis

The process of selecting an initial model and training configuration are done primarily in the config path, which means running the command line script as easy as running a short, one-line command.

Diagnostics are not a separate command, and operate automatically when training/testing conclude.

In [None]:
# Use architecture-search if you want to implement an architecture search for your model

!deeputils demo-modelling config-file-path "./config/" --architecture-search {TRUE/FALSE} --saved-model {/path/to/config/model}

## Assignement: Do it again



Now it's time to put this in action: with one critical difference. 
This time, use an image dataset. (Like mnist or a truncated cfar-10). 

* Load in your image dataset 
* Run an EDA 
* Preprocess, if you feel it necessary 
* Write a small convolusional net, and train it 
* Test how good your results are! 