# Title- Fish Weight Prediction

# Objective
'''The objective of this project is to develop a linear regression model to accurately predict the weight of fish based on their physical characteristics (Height, Width, and Length measurements) and category. This involves:

Preprocessing the dataset to ensure there are no missing values and selecting the relevant features for modeling.
Splitting the dataset into training and testing sets.
Training a linear regression model on the training set.
Evaluating the model's performance on the test set using metrics such as mean absolute error and R² score.
Demonstrating the effectiveness of linear regression for predicting fish weight based on the model's performance metrics.'''

In [1]:
#step1 : import lirary
import pandas as pd

In [4]:
#step 2 : import library
fish = pd.read_csv('https://github.com/ybifoundation/Dataset/raw/main/Fish.csv')

In [5]:
fish.head()

Unnamed: 0,Category,Species,Weight,Height,Width,Length1,Length2,Length3
0,1,Bream,242.0,11.52,4.02,23.2,25.4,30.0
1,1,Bream,290.0,12.48,4.3056,24.0,26.3,31.2
2,1,Bream,340.0,12.3778,4.6961,23.9,26.5,31.1
3,1,Bream,363.0,12.73,4.4555,26.3,29.0,33.5
4,1,Bream,430.0,12.444,5.134,26.5,29.0,34.0


In [6]:
fish.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 159 entries, 0 to 158
Data columns (total 8 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   Category  159 non-null    int64  
 1   Species   159 non-null    object 
 2   Weight    159 non-null    float64
 3   Height    159 non-null    float64
 4   Width     159 non-null    float64
 5   Length1   159 non-null    float64
 6   Length2   159 non-null    float64
 7   Length3   159 non-null    float64
dtypes: float64(6), int64(1), object(1)
memory usage: 10.1+ KB


In [7]:
fish.describe()

Unnamed: 0,Category,Weight,Height,Width,Length1,Length2,Length3
count,159.0,159.0,159.0,159.0,159.0,159.0,159.0
mean,3.264151,398.326415,8.970994,4.417486,26.24717,28.415723,31.227044
std,1.704249,357.978317,4.286208,1.685804,9.996441,10.716328,11.610246
min,1.0,0.0,1.7284,1.0476,7.5,8.4,8.8
25%,2.0,120.0,5.9448,3.38565,19.05,21.0,23.15
50%,3.0,273.0,7.786,4.2485,25.2,27.3,29.4
75%,4.5,650.0,12.3659,5.5845,32.7,35.5,39.65
max,7.0,1650.0,18.957,8.142,59.0,63.4,68.0


In [8]:
#step 3 : define target (y) and features (x)
fish.columns

Index(['Category', 'Species', 'Weight', 'Height', 'Width', 'Length1',
       'Length2', 'Length3'],
      dtype='object')

In [10]:
y = fish['Weight']

In [11]:
x = fish[['Category','Height', 'Width', 'Length1','Length2', 'Length3']]

In [13]:
#step 4 : train test split
from sklearn.model_selection import train_test_split
x_train, x_test,y_train,y_test = train_test_split(x,y, train_size = 0.7, random_state=2529)

In [14]:
#check shape of train and test sample
x_train.shape, x_test.shape, y_train.shape, y_test.shape

((111, 6), (48, 6), (111,), (48,))

In [18]:
#step 5: select model
from sklearn.linear_model import LinearRegression
model = LinearRegression()

In [19]:
#step 6: train or fit model
model.fit(x_train,y_train)

In [20]:
model.intercept_

-684.4235918478525

In [21]:
model.coef_

array([ 35.19634977,  52.19372157, -37.13869125,  11.2218449 ,
        78.11233002, -59.11783139])

In [22]:
#step 7 : predict model
y_pred =model.predict(x_test)

In [23]:
y_pred

array([ 475.93351307,  525.81910195,   77.63275849,  881.10235121,
        160.9685664 ,  255.94371856,  361.87029932,  358.87068094,
        499.83411068, -150.07834151, -115.91810869,  428.65470115,
        114.67533404,  812.51385122,  586.5071178 ,  273.38510858,
        579.63900729,  225.18126845,  639.26068037,   85.00820599,
        136.92159041,  -87.7778087 ,  629.97231046,  732.63097812,
        859.8720695 , -166.76928607,  342.04209934,  722.92198147,
        321.44827179,  787.98248357,  486.93194673,  541.89982795,
        376.74813045,  624.81211202, -170.11945033,  917.76513801,
        792.26439518,  -21.15655005,  300.24921659,  914.07325473,
        621.05636286,  934.17373986,  676.85479574,  653.92304403,
        615.51226767,  336.61090622,  505.75519147,  -33.53283763])

In [24]:
#step 8 :model accuracy
from sklearn.metrics import mean_absolute_error,r2_score

In [25]:
mean_absolute_error(y_test,y_pred)

99.58910366731824

In [26]:
r2_score(y_test,y_pred)

0.8398246159944501

# Conclusion
Data Preparation: The dataset had 159 entries with no missing values, containing features like Height, Width, Length measurements, and Category.

Model Training: We used a linear regression model, splitting the data into 70% training and 30% testing sets.

Model Performance:

Mean Absolute Error (MAE): 99.59 grams
R² Score: 0.84
Model Accuracy: 84%
Prediction: The model effectively predicted fish weights, with an 84% accuracy in explaining the variance in the weight based on the selected features.

In summary, the linear regression model provided accurate and reliable predictions for fish weight, demonstrating an 84% model accuracy.