In [1]:
# Import train_test_split from model_selection for splitting your dataset
from sklearn.model_selection import train_test_split
import pandas as pd

### 📊 Sample Dataset
We’ll use a small dataset with `Age`, `Salary`, and `Purchased` columns to demonstrate how to split data for training and testing ML models.

In [2]:
# Creating a simple DataFrame
data = {
    'Age': [22, 25, 47, 52, 46],
    'Salary': [18000, 24000, 52000, 58000, 60000],
    'Purchased': [0, 0, 1, 1, 1]
}
df = pd.DataFrame(data)
print("Original Data:")
print(df)

Original Data:
   Age  Salary  Purchased
0   22   18000          0
1   25   24000          0
2   47   52000          1
3   52   58000          1
4   46   60000          1


### 🎯 Define Features and Target
We separate input features (`Age`, `Salary`) from the target variable (`Purchased`).

In [3]:
X = df[['Age', 'Salary']]  # Features
y = df['Purchased']  # Target

### ✂️ Split the Dataset
We’ll use `train_test_split` to randomly split the dataset into **training** and **testing** sets.
80% of the data goes into training and 20% into testing.
## 🧪 train_test_split Parameters
- `test_size=0.2`: Allocates 20% of the data to the test set.
- `random_state=42`: Sets a seed to ensure reproducible splits every time you run the code.

In [4]:
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

### 🔍 View the Results
Let’s print the split data to verify how it was divided.

In [5]:
print("X_train:")
print(X_train)

print("\nX_test:")
print(X_test)

print("\ny_train:")
print(y_train)

print("\ny_test:")
print(y_test)

X_train:
   Age  Salary
4   46   60000
2   47   52000
0   22   18000
3   52   58000

X_test:
   Age  Salary
1   25   24000

y_train:
4    1
2    1
0    0
3    1
Name: Purchased, dtype: int64

y_test:
1    0
Name: Purchased, dtype: int64
