### I) load dataset

In [1]:
import pandas as pd
df = pd.read_csv("../data/abalone.csv")
df.head()

Unnamed: 0,Sex,Length,Diameter,Height,Whole weight,Shucked weight,Viscera weight,Shell weight,Rings
0,M,0.455,0.365,0.095,0.514,0.2245,0.101,0.15,15
1,M,0.35,0.265,0.09,0.2255,0.0995,0.0485,0.07,7
2,F,0.53,0.42,0.135,0.677,0.2565,0.1415,0.21,9
3,M,0.44,0.365,0.125,0.516,0.2155,0.114,0.155,10
4,I,0.33,0.255,0.08,0.205,0.0895,0.0395,0.055,7


### II) Look for  missing values

In [2]:

pd.concat([df.isna().any(),df.dtypes],axis = 1,keys = ['NULL','TYPE'])


Unnamed: 0,NULL,TYPE
Sex,False,object
Length,False,float64
Diameter,False,float64
Height,False,float64
Whole weight,False,float64
Shucked weight,False,float64
Viscera weight,False,float64
Shell weight,False,float64
Rings,False,int64


### II) 	Numerical values Normalization ( Z-score Normalization)

1. StandardScaler (Z-score normalization):

+ When to use: This is often a good default choice, especially for algorithms that are sensitive to feature scaling (e.g., linear models, SVMs, neural networks, k-NN). It's particularly useful when your data is approximately normally distributed or you don't have a strong reason to believe otherwise.
+ How it works: Transforms features to have a mean of 0 and a standard deviation of 1.
  + Pros: Widely applicable, often improves algorithm performance.
  + Cons: Sensitive to outliers. Outliers can significantly affect the mean and standard deviation, distorting the scaled values of other data points.

In [6]:
from sklearn.preprocessing import StandardScaler
num_feature=df.select_dtypes(include="number")
std_scale = StandardScaler().fit_transform(num_feature)
scaled_frame = pd.DataFrame(std_scale,columns=num_feature.columns)
scaled_frame.head()

Unnamed: 0,Length,Diameter,Height,Whole weight,Shucked weight,Viscera weight,Shell weight,Rings
0,-0.574558,-0.432149,-1.064424,-0.641898,-0.607685,-0.726212,-0.638217,1.571544
1,-1.448986,-1.439929,-1.183978,-1.230277,-1.17091,-1.205221,-1.212987,-0.910013
2,0.050033,0.12213,-0.107991,-0.309469,-0.4635,-0.35669,-0.207139,-0.289624
3,-0.699476,-0.432149,-0.347099,-0.637819,-0.648238,-0.6076,-0.602294,0.020571
4,-1.615544,-1.540707,-1.423087,-1.272086,-1.215968,-1.287337,-1.320757,-0.910013
