In this notebook, you should implement a first version of a working machine learning model to predict the age of an Abalone.

A few guidelines:
- The model does not have to be complex. A simple linear regression model is enough.
- You should use MLflow to track your experiments. You can use the MLflow UI to compare your experiments.
- Do not push any MLflow data to the repository. Only the code to run the experiments is interesting and should be pushed.

In [4]:
%load_ext autoreload
%autoreload 2
.
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import confusion_matrix,classification_report, accuracy_score
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import StratifiedKFold, KFold
import mlflow
pd.set_option('display.max_columns', 500)

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [13]:
mlflow.set_experiment('Abalone')

with mlflow.start_run() as run:
    run_id = run.info.run_id

    df = pd.read_csv("abalone.csv")
    X=pd.get_dummies(df.drop('Rings', axis=1))
    y=df['Rings']

    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=101)
    model = RandomForestRegressor(n_estimators=50)
    model.fit(X_train,y_train)

    preds = model.predict(X_test)
    rmse = mean_squared_error(y_test,preds, squared=False)
    mlflow.log_params(model.get_params())
    mlflow.log_metric("rmse", rmse)
    mlflow.sklearn.log_model(model, "model")
    mlflow.register_model(f"runs:/{run_id}/model", "Abalone Model")

Registered model 'Abalone Model' already exists. Creating a new version of this model...
Created version '2' of model 'Abalone Model'.
