# Step 1. Installing Python Depedencies
## Install the following packages
* [filelock](https://github.com/tox-dev/filelock)
* [flask](https://flask.palletsprojects.com/en)
* [gunicorn](https://flask.palletsprojects.com/en/3.0.x/deploying/gunicorn/)
* [h5py](https://docs.h5py.org/en/stable/build.html)
* [hdf5plugin](https://github.com/silx-kit/hdf5plugin)
* [msgpack](https://pypi.org/project/msgpack/)
* [numpy](https://numpy.org/install/)
* [pandas](https://pandas.pydata.org/docs/getting_started/index.html#getting-started)
* [papermill](https://papermill.readthedocs.io/en/latest/index.html)
* [pyyaml](https://pyyaml.org/wiki/PyYAMLDocumentation)
* [rapidfuzz](https://github.com/maxbachmann/RapidFuzz)
* [ratelimit](https://pypi.org/project/ratelimit/)
* [scipy](https://scipy.org/install/)
* [tables](https://pypi.org/project/tables/)
* [torch](https://pytorch.org/get-started/locally/)
* [tqdm](https://github.com/tqdm/tqdm)
* [waitress](https://pypi.org/project/waitress/)
* [zstandard](https://pypi.org/project/zstandard/)

## Install the following frameworks
* [cuda](https://developer.nvidia.com/cuda-downloads)

# Step 2. Installing Julia and Dependencies
* Install Julia (version >= 1.9) from https://julialang.org/downloads/

### Install the following packages
* [CSV](https://github.com/JuliaData/CSV.jl)
* [CodecZstd](https://github.com/JuliaIO/CodecZstd.jl)
* [DataFrames](https://dataframes.juliadata.org/stable/man/getting_started/)
* [H5Zblosc](https://github.com/JuliaIO/HDF5Plugins.jl)
* [Glob](https://github.com/vtjnash/Glob.jl)
* [HDF5](https://github.com/JuliaIO/HDF5.jl)
* [HTTP](https://github.com/JuliaWeb/HTTP.jl)
* [IJulia](https://github.com/JuliaLang/IJulia.jl)
* [JLD2](https://github.com/JuliaIO/JLD2.jl)
* [JSON](https://github.com/JuliaIO/JSON.jl)
* [JuMP](https://jump.dev)
* [JupyterFormatter](https://juliahub.com/ui/Packages/JupyterFormatter/Qolop/0.1.0)
* [LoggingExtras](https://github.com/JuliaLogging/LoggingExtras.jl)
* [Memoize](https://github.com/JuliaCollections/Memoize.jl)
* [MLUtils](https://github.com/JuliaML/MLUtils.jl)
* [MsgPack](https://github.com/JuliaIO/MsgPack.jl)
* [NBInclude](https://github.com/stevengj/NBInclude.jl)
* [NNlib](https://github.com/FluxML/NNlib.jl)
* [Optim](https://julianlsolvers.github.io/Optim.jl/stable/#)
* [Oxygen](https://github.com/OxygenFramework/Oxygen.jl)
* [ProgressMeter](https://github.com/timholy/ProgressMeter.jl)
* [SCIP](https://github.com/scipopt/SCIP.jl)
* [Setfield](https://github.com/jw3126/Setfield.jl)
* [StatsBase](https://github.com/JuliaStats/StatsBase.jl)
* [YAML](https://github.com/JuliaData/YAML.jl)

# Step 3. Curating Datasets
* Run `notebooks/API/API/GenerateMalToken.ipynb`
* Run `notebooks/API/API/GenerateKitsuToken.ipynb`
* Run `notebooks/API/WebEndpoints/WebScripts.ipynb`
* Run `notebooks/API/ApiEndPoints/Executors/ContinuousScripts.ipynb`

# Step 4. Setup Environment
* The codebase can be run in the following modes:
  * Research: reserves the last N days for out-of-sample testing
  * Production: trains new models using all available data
  * Streaming: takes the pretrained models from prod and finetunes on new data
* The mode can be configured by populating artifacts in the `environment/` folder. By default, the codebase is run in research mode.

# Step 5. Preprocessing Data
* Run `notebooks/ImportDatasets/RunAllScripts.ipynb`
* Run `notebooks/ProcessData/RunAllScripts.ipynb`

# Step 6. Training Models
* Run `notebooks/TrainingAlphas/Transformer/RunAllScripts.ipynb`
* Run `notebooks/TrainingAlphas/Baseline/RunAllScripts.ipynb`
* Run `notebooks/TrainingAlphas/BagOfWords/RunAllScripts.ipynb`
* Run `notebooks/TrainingAlphas/Nondirectional/RunAllScripts.ipynb`
* Run `notebooks/TrainingAlphas/Ensemble/RunAllScripts.ipynb`

# Step 7. Starting Web Server 
* Run `notebooks/Microservices/Package.ipynb`
* Run `notebooks/Microservices/Deploy.ipynb`
* Open `localhost:3000` in a browser to connect to the web app