# Fish measurements

The dataset `../data/Fish.csv` contains measurements for several fish, specifically:

- **Species**: species name of fish
- **Weight**: weight of fish in Gram g
- **Length1**: vertical length in cm
- **Length2**: diagonal length in cm
- **Length3**: cross length in cm
- **Height**: height in cm
- **Width**: diagonal width in cm

# Imports

In [1]:
import pandas as pd
from sklearn.linear_model import LinearRegression

# Exercise 1.1 - read the data

In [2]:
df = pd.read_csv('../data/fish.csv')

# Exercise 1.2 - display the first 5 rows of the dataset

In [3]:
df.head(5)

Unnamed: 0,Species,Weight,Length1,Length2,Length3,Height,Width
0,Bream,242.0,23.2,25.4,30.0,11.52,4.02
1,Bream,290.0,24.0,26.3,31.2,12.48,4.3056
2,Bream,340.0,23.9,26.5,31.1,12.3778,4.6961
3,Bream,363.0,26.3,29.0,33.5,12.73,4.4555
4,Bream,430.0,26.5,29.0,34.0,12.444,5.134


# Exercise 1.3 - display the last 3 rows of the dataset

In [5]:
df.tail(3)

Unnamed: 0,Species,Weight,Length1,Length2,Length3,Height,Width
156,Smelt,12.2,12.1,13.0,13.8,2.277,1.2558
157,Smelt,19.7,13.2,14.3,15.2,2.8728,2.0672
158,Smelt,19.9,13.8,15.0,16.2,2.9322,1.8792


# Exercise 1.4 - How many rows and columns does the dataset have?

In [6]:
df.shape

(159, 7)

# Exercise 1.5 - define the features and target

we want as features: **Length1**, **Length2**, **Length3**, **Height**, **Width**

and as target: **Weight**

In [7]:
X = df[['Length1', 'Length2' , 'Length3', 'Height', 'Width']]
y = df[['Weight']]

# Exercise 1.6 - Train a Linear Regression on this data

In [9]:
# initialize the model
model = LinearRegression()

# fit/train the model
model.fit(X, y)

# Exercise 1.7 - Optimizing the business

You realize that 20% of the time spent processing the fish in the market is just on weighting them!

You want to present your boss a novel idea for saving workers' time by automating the weight measurement of fishes using other measurements as a proxy. 

So you have prepared some examples to showcase him:

In [12]:
# here's the data of the new batch of fish
new_batch_of_fish = pd.DataFrame(
                        [
                            [22.1, 24.9, 29.0, 10.42, 3.98],
                            [23.5, 23.5, 31.0, 10.32, 4.01],
                            [24.1, 22.8, 30.5, 10.58, 3.60]
                        ]
                        ,
                        columns=X.columns)
new_batch_of_fish

Unnamed: 0,Length1,Length2,Length3,Height,Width
0,22.1,24.9,29.0,10.42,3.98
1,23.5,23.5,31.0,10.32,4.01
2,24.1,22.8,30.5,10.58,3.6


Now use your model to automatically assign a weight to each fish!

In [13]:
preds = model.predict(new_batch_of_fish)

In [14]:
preds

array([[258.48896803],
       [294.71574857],
       [349.3539689 ]])