## Exploratory ML Notebook

We have a cross-sectional dataset of the number of the physical characteristics of . This dataset is taken from the University of California Irvine's [online machine learning data repository](https://archive.ics.uci.edu/). The online documentation for this dataset can be found [here](https://archive.ics.uci.edu/dataset/1/abalone). The machine learning task is to create a regression model that predicts the number of internal shell rings of an abalone mollusc based on its external characteristics.

To complete this task we will be using the [TensorFlow](https://www.tensorflow.org/) machine learning library. Tutorials can be accessed [here](https://www.tensorflow.org/tutorials?_gl=1*k1sdc8*_up*MQ..*_ga*MTAyMTc4MTE1LjE3MTk3OTIwMTY.*_ga_W0YLR4190T*MTcxOTc5MjAxNS4xLjAuMTcxOTc5MjAxNS4wLjAuMA..).

### Introduction

Package imports

In [1]:
from ucimlrepo import fetch_ucirepo
import tensorflow as tf
import pandas as pd
import numpy as np

Read in and preview the dataset. We can see we have a cross-sectional dataset of characteristics of abalone molluscs.

In [2]:
abalones = fetch_ucirepo(id=1)

abalones.data.original.head(7)

Unnamed: 0,Sex,Length,Diameter,Height,Whole_weight,Shucked_weight,Viscera_weight,Shell_weight,Rings
0,M,0.455,0.365,0.095,0.514,0.2245,0.101,0.15,15
1,M,0.35,0.265,0.09,0.2255,0.0995,0.0485,0.07,7
2,F,0.53,0.42,0.135,0.677,0.2565,0.1415,0.21,9
3,M,0.44,0.365,0.125,0.516,0.2155,0.114,0.155,10
4,I,0.33,0.255,0.08,0.205,0.0895,0.0395,0.055,7
5,I,0.425,0.3,0.095,0.3515,0.141,0.0775,0.12,8
6,F,0.53,0.415,0.15,0.7775,0.237,0.1415,0.33,20


Let's examine the target variable for this dataset, `Rings`.

In [3]:
abalones.data.targets.describe()

Unnamed: 0,Rings
count,4177.0
mean,9.933684
std,3.224169
min,1.0
25%,8.0
50%,9.0
75%,11.0
max,29.0


Let's now examine the features in this dataset. The majority of the features are numeric data, such as the height and weight of a mollusc, but we also have a categorical feature which is the sex of the mollusc.

In [4]:
abalones.data.features.describe(include='all')

Unnamed: 0,Sex,Length,Diameter,Height,Whole_weight,Shucked_weight,Viscera_weight,Shell_weight
count,4177,4177.0,4177.0,4177.0,4177.0,4177.0,4177.0,4177.0
unique,3,,,,,,,
top,M,,,,,,,
freq,1528,,,,,,,
mean,,0.523992,0.407881,0.139516,0.828742,0.359367,0.180594,0.238831
std,,0.120093,0.09924,0.041827,0.490389,0.221963,0.109614,0.139203
min,,0.075,0.055,0.0,0.002,0.001,0.0005,0.0015
25%,,0.45,0.35,0.115,0.4415,0.186,0.0935,0.13
50%,,0.545,0.425,0.14,0.7995,0.336,0.171,0.234
75%,,0.615,0.48,0.165,1.153,0.502,0.253,0.329


We can use a linear regression model to predict the number of internal rings of an abalone mollusc based on a combination of the features we have been provided with.

### ML analysis

#### Tasks for today:
- Preprocess the data
- Build and train a linear regression model using the abalone DataFrame