<a href="https://colab.research.google.com/github/Waltberry/Machine_Learning_Training_Repo/blob/main/Custom_Python_Model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Custom Model For Concrete Strength

This is a sample of the kind of notebook we might produce when we are attempting to build a custom model for a specific dataset. We have chosen to use a dataset that describes the properties of concrete for this example. We base it on the most popular Python data science and modeling libraries.

#### Extra libraries required in Colab

In [None]:
!pip install s3fs

#### Standard Libraries

In [None]:
import pandas as pd

### Loading and inspecting the data

We start by loading the dataset we are going to work with.

In [None]:
concrete_df = pd.read_csv('s3://abacusai.exampledatasets/predicting/concrete_measurements.csv')
concrete_df.describe()

Unnamed: 0,cement,slag,flyash,water,superplasticizer,coarseaggregate,fineaggregate,age,csMPa
count,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0
mean,281.167864,73.895825,54.18835,181.567282,6.20466,972.918932,773.580485,45.662136,35.817961
std,104.506364,86.279342,63.997004,21.354219,5.973841,77.753954,80.17598,63.169912,16.705742
min,102.0,0.0,0.0,121.8,0.0,801.0,594.0,1.0,2.33
25%,192.375,0.0,0.0,164.9,0.0,932.0,730.95,7.0,23.71
50%,272.9,22.0,0.0,185.0,6.4,968.0,779.5,28.0,34.445
75%,350.0,142.95,118.3,192.0,10.2,1029.4,824.0,56.0,46.135
max,540.0,359.4,200.1,247.0,32.2,1145.0,992.6,365.0,82.6


#### Custom Data Transform

Most of the time building an effective model requires some form of data transformation. It could be because:
- only a subset of the data is relevant to business problem
- the prediction request is organized differently from the training data
- feature engineering to improve the model

The transformation may apply to each individual row or could involve an aggregation over some portion of the dataset.

In [None]:
def transform_concrete(concrete_dataset):
  # Define your own custom transformation of the concrete dataset.
  raise NotImplementedError

transformed_concrete_df = transform_concrete(concrete_df)
transformed_concrete_df.describe()

### Custom Model

Now we are going to build a model to predict an attribute of the concrete given the other attributes. The possibilities are of course endless. The model could be a regression that predicts a numerical attribute in terms of the other attribtues or a classification model that predicts a discrete attribute. It could even be a `k-means` model that involves finding clusters in the data and selecting the closest to each row as the prediction. Generally, we would build it on popular modeling packages available in Python.

In [None]:
!pip install catboost

In [None]:
def train(training_dataset):
  # Define your custom model here. Be creative, for example it could ensemble a couple of models.
  raise NotImplementedError

local_model = train(transformed_concrete_df)

### Prediction Function

It is not uncommon for data science projects to stop at this point. You have a training algorithm. It can be trained on a subset of data and evaluated on a different subset of data to get metrics on how well the model is performing and at that point you would know whether your training algorithm is effective.

However, real world application requires that the model be evaluated on new data. In the prediction context model evaluation would require that the appropriate feature transforms are applied and then the underlying model evaluated.

In [None]:
def predict(model, query):
  raise NotImplementedError

for _, r in transformed_concrete_df[transformed_concrete_df.age < 10][:5].iterrows():
  print(predict(local_model, r.to_dict()), r['csMPa'])

for _, r in transformed_concrete_df[transformed_concrete_df.age > 10][:5].iterrows():
  print(predict(local_model, r.to_dict()), r['csMPa'])

## Need Help? 

Here's a cheat sheet with the functions implemented https://colab.research.google.com/drive/1fvJeBOLKe3wXBWrcOTTGckjCHdexl5FW
___

## Lots More To Do

Many data science projects stop at this point. Actually, most don't even clearly define the prediction operation that will be used in production applications. To actually leverage this model in production generally requires quite a bit more work:
- Storing the model so that it is available in various production workflows
- Hosting the model in a scalable manner so that it can be used for online predictions
- Support for evaluating the model against large batches of new data
- Monitoring the model to ensure its inputs and predictions have not shifted significantly

Beyond this specific features to support model usage there is the significant task of reliably keeping the model up-to-data as new data arrives. This involves a workflow of operations starting with the refresh of the input datasets through pushing the models to serving infrastructure.

Real world machine learning applications require performing all these operations reliably.

### Use Abacus.AI for all this and more

- [Sign up](https://abacus.ai/app/signup?signupToken=python_models) for an Abacus.AI Account
- Once your account is created, navigate to the [API Keys Dashboard](https://abacus.ai/app/profile/apikey) and generate an API key to authenticate your ApiClient

# Abacus.AI Integration Notebook

https://colab.research.google.com/drive/1AVvPE5Ue89l5n8Ed9eqdjAV5NQHMEMyl
