# Feature Scaling and Data Splitting in Scikit-learn

### Feature Scaling Code

![image.png](attachment:image.png)
![image-2.png](attachment:image-2.png)
![image-3.png](attachment:image-3.png)

In [6]:
import pandas as pd
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.model_selection import train_test_split

data = {
    'studyhours': [2, 3, 1, 4, 5, 2, 3, 1, 4, 5],
    'testscore': [70, 80, 60, 90, 100, 70, 80, 60, 90, 100]
}

df = pd.DataFrame(data)

standard_scaler = StandardScaler()
standard_scaled = standard_scaler.fit_transform(df)

print("Standard Scalar Output:")
print(pd.DataFrame(standard_scaled, columns=['studyhours', 'testscore']))

minmax_scaler = MinMaxScaler()
minmax_scaled = minmax_scaler.fit_transform(df)

print("\nMinMax Scalar Output:")
print(pd.DataFrame(minmax_scaled, columns=['studyhours', 'testscore']))

Standard Scalar Output:
   studyhours  testscore
0   -0.707107  -0.707107
1    0.000000   0.000000
2   -1.414214  -1.414214
3    0.707107   0.707107
4    1.414214   1.414214
5   -0.707107  -0.707107
6    0.000000   0.000000
7   -1.414214  -1.414214
8    0.707107   0.707107
9    1.414214   1.414214

MinMax Scalar Output:
   studyhours  testscore
0        0.25       0.25
1        0.50       0.50
2        0.00       0.00
3        0.75       0.75
4        1.00       1.00
5        0.25       0.25
6        0.50       0.50
7        0.00       0.00
8        0.75       0.75
9        1.00       1.00


### Splitting data for training and testing
-   80% of the data is used for training
-   20% of the data is used for testing

In [7]:
X = df[['studyhours']]
y = df[['testscore']]  # Assuming 'testscore' is the target variable

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
print("Training Data:")
print("X_train:")
print(X_train)
print("y_train:")
print(y_train)

print("Testing Data:")
print("X_test:")
print(X_test)
print("y_test:")
print(y_test)

Training Data:
X_train:
   studyhours
5           2
0           2
7           1
2           1
9           5
4           5
3           4
6           3
y_train:
   testscore
5         70
0         70
7         60
2         60
9        100
4        100
3         90
6         80
Testing Data:
X_test:
   studyhours
8           4
1           3
y_test:
   testscore
8         90
1         80
