# Intructions for running the solution
## Initial setup
The solution depends on the following python packages:
  - numpy
  - pandas
  - scikit-learn
  - pydot
  - openpyxl
  - xlsxwriter
  - joblib
  - seaborn

You can either install them manually, or create a new Anaconda environment with all of them.
### Anaconda setup
1. Open Anaconda prompt in the directory of solution.
2. Execute the following command, replace 'envName' with the environment name of your choice
```shell
conda env update -n envName -f deps.yaml
```
3. Activate new environment
```shell
conda activate envName
```
4. Run the scripts
```shell
python script.py
```

## Submission files
- `raw_data.xlsx` - Unparsed training data
- `instructions.ipynb` - instructions how to use the solution as a jupyter notebook
- `instructions.html` - instructions how to use the solution as a HTML file
- `documentation.ipynb` - documentation of the work as a jupyter notebook
- `documentation.html` - documentation of the work as a HTML file
- `constants.py` - some general constants used by the code
- `utils.py` general purpose utility functions used by the code
- `train.py` - a script that trains the AI model using logistic regression and saves it into file `model.sav`
- `prepare.py` - a script that parses `raw_data.xlsx` and creates a file `training_data.xlsx` with derived features. Not needed if only `train.py` is used
- `train-logistic-regression.py` - trains a logistic regression model. Requires `prepare.py` to have been ran in advance. Saves the model into `model.sav`
- `train-naive-bayes.py` - trains a naive bayes model. Requires `prepare.py` to have been ran in advance. Saves the model into `model.sav`
- `train-random-forest.py` - trains a random forest model. Requires `prepare.py` to have been ran in advance. Saves the model into `model.sav`
- `deploy.py` - a script that uses an AI model to classify addresses and saves them into `classified.xlsx`

## Running the solution
Using a default model:
1. Run `train.py` to train a logistic regression model
2. Run `deploy.py` to use the trained model to classify new addresses.

Using a custom model:
1. Run `prepare.py` to generate derived training data
2. Run any of `train-naive-bayes.py`, `train-random-forest.py` or `train-logistic-regression.py` to train a respective model
3. In script `deploy.py` comment/uncomment line 
```python
### Uncomment the next line if using logistic regression ### 
unclassified_dataset.drop(....
```
if needed depending on the model
4. Run `deploy.py`