# Hackathon
<img src='../images/xebia-logo.png' width='300px' align='right' style="padding: 15px">

This notebook provides some scaffolding for the hackathon that marks the end of the *Production-Ready Machine Learning* training.

If you've followed all the steps of the training you should have a working ML aplication that can load data, train ML models on the data and generate predictions for nobel data.

In this notebook 

### Expose the model as an API and/or CLI.
- There should be a `train` endpoint/subcommand that accepts data, calls some internal functions, and saves a new copy of the model.
- There should be a `predict` endpoint/subcommand that accepts data, a model name and returns some predictions.
    - Think of the format of the response. There are multiple options, some examples in order of complexity include: returning the data as JSON, streaming `.csv` files, streaming Avro chunks.
- You should include loggers and tests for your endpoints/subcommands.
- You can also expose some metrics calculations and reporting. You might need to change the modeling strategy to include a cross-validation step.
- If you are familiar with using containers, you can try containerizing the application and exposing the API in a container port.

The following code can act as scaffolding. Ideally, the code that serves your application (i.e. the API or the CLI) should live in a different package that import `animal_shelter`. You can create a `app` or `cli` directory for this at the root of your project with a different script. Ideally even a fully-fledged package.

In [None]:
# Skeleton of an API definition
import logging

import joblib
import pandas as pd
from fastapi import FastAPI

from animal_shelter.data import load_data

app = FastAPI()

model = joblib.load(...)

@app.post("/train/")
def train(data):
    # Load/process data
    # Pass data to functions from animal_shelter
    # Check that the model is saved correctly
    # Generate a response with useful information
    return response

@app.post("/predict/")
def predict(data):
    # Load/process data
    # Generate predictions by calling functions from animal_shelter
    # Format predictions
    # Generate a response with the  predictions
    return response

In [None]:
# Skeleton for a CLI

import logging
from pathlib import Path

import typer

from animal_shelter.data import load_data

app = typer.Typer()

# Always gets called before all subcommands
@app.callback()
def main() -> None:
    logging.basicConfig(
        level=logging.INFO,
        format="[%(asctime)-15s] %(name)s - %(levelname)s - %(message)s",
    )


@app.command()
def train(input_path: Path, model_path: Path) -> None:
    """Trains a model on the given dataset."""
    typer.echo(f"Loading {input_path}")
    logger = logging.getLogger(__name__)
    logger.info("Loading input dataset from %s", input_path)
    
    # Load/process data
    # Pass data to functions from animal_shelter
    # Check that the model is saved correctly
    # Output some useful feedback for the user
    
@app.command()
def predict(input_path: Path, model_path: Path, output_path: Path) -> None:
    """Applies a model to the given dataset."""
    typer.echo(f"Loading {input_path}")

    logger = logging.getLogger(__name__)
    # Load/process data
    # Generate predictions by calling functions from animal_shelter
    # Format predictions
    # Save/return predictions and some useful feedback for the user


### More tests!
- Implement some test that call a function multiple times, but each time using a randomly generated input.
- Some end-to-end tests that test the functionality of your full pipeline including loading data, training a model, generating predictions and serving them.
  - With mocked data.
  - With a subset of the training data.
- Some tests that check that the performance of a model doesn't fall from a pre-specified threshold.
  - Check some model performance metrics (e.g. accuracy, recall, precission), but also some computational metrics (e.g. time to run).
- Write some tests using Pydantic to check the output of the different steps of your modeling pipelines.

**Some general tips:**
- Don't forget to set random seeds when your functions run non-deterministic code.
- Don't be afraid to break up your functions into smaller ones if it makes writing tests easier.