# Crab Age Regression

**Context:** For a commercial crab farmer knowing the right age of the crab helps them decide if and when to harvest the crabs. Beyond a certain age, there is negligible growth in crab's physical characteristics and hence, it is important to time the harvesting to reduce cost and increase profit.

<br>

**Goal:** The dataset is used to estimate the age of the crab based on the physical attributes. 

<br>

**Feature Description:**
- Sex - Gender of the Crab (Male, Female and Indeterminate)
- Length - Length of the Crab (Feet)
- Diameter - Diameter of the Crab (Feet)
- Height - Height of the Crab (Feet)
- Weight - Weight of the Crab (Ounces)
- Shucked Weight - Weight without the shell (Ounces)
- Viscera Weight - It is weight that wraps around your abdominal organs deep inside body (Ounces)
- Shell Weight - Weight of the Shell (Ounces)
- Age - Age of the Crab (Months)

<br>

**Resources:**
- [Kaggle Challenge](https://www.kaggle.com/competitions/playground-series-s3e16/data?select=train.csv)
- [Original Dataset](https://www.kaggle.com/datasets/sidhus/crab-age-prediction)

In [1]:
# Import Standard Libraries
import os

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

import mlflow

from pathlib import Path
from colorama import Style, Fore

from sklearn.impute import SimpleImputer
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OneHotEncoder, FunctionTransformer, StandardScaler, MinMaxScaler
from sklearn.linear_model import LinearRegression
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.model_selection import train_test_split, LearningCurveDisplay, \
                                    learning_curve, ShuffleSplit, GridSearchCV, \
                                    KFold

from sklearn.metrics import mean_squared_error, mean_absolute_error, \
                            mean_absolute_percentage_error, r2_score
from sklearn.ensemble import RandomForestRegressor

from scipy import stats

from xgboost import XGBRegressor

In [2]:
# Define Seaborn theme parameters
theme_parameters =  {
    'axes.spines.right': False,
    'axes.spines.top': False,
    'grid.alpha':0.3,
    'figure.figsize': (16, 6),
    'font.family': 'Andale Mono',
    'axes.titlesize': 24,
    'figure.facecolor': '#E5E8E8',
    'axes.facecolor': '#E5E8E8'
}

# Set the theme
sns.set_theme(style='whitegrid',
              palette=sns.color_palette('deep'), 
              rc=theme_parameters)

In [3]:
# Define Colors
black = Style.BRIGHT + Fore.BLACK
magenta = Style.BRIGHT + Fore.MAGENTA
red = Style.BRIGHT + Fore.RED
blue = Style.BRIGHT + Fore.BLUE
reset_colors = Style.RESET_ALL

# Read Data