<img src="images/img07.png" width="800">

#### ***0. Arrange the Data***
***Make sure your data is arranged into a format acceptable for train test split.***
***In scikit-learn, this consists of separating your full data set into “Features” and “Target.”***

#### ***1. Split the Data***           
***Split the data set into two pieces — a training set and a testing set.***

***You train the model using the training set.***                
***You test the model using the testing set.***               
            
***Train the model means create the model.***            
***Test the model means test the accuracy of the model.***          
 
***Usually 70-80% for training, and 20-30% for testing.***      

#### ***2. Train the Model***             
***Train the model on the training set. This is “X_train” and “y_train” in the image.***

#### ***3. Test the Model***       
***Test the model on the testing set (“X_test” and “y_test” in the image) and evaluate the performance.***

**ধরা যাক, 6 জন ছাত্রদের তথ্য আছে:**             
**Feature (X): Height (উচ্চতা), Weight (ওজন), Age (বয়স) → ৩টা feature**               
**Target (y): Exam Score (পরীক্ষার নম্বর)**        

**X_train → ৪ জন ছাত্রের Height, Weight, Age → shape(4, 3)**         
**X_test → ২ জন ছাত্রের Height, Weight, Age → shape(2, 3)**              
**y_train → ৪ জন ছাত্রের Score → shape(4,)**             
**y_test → ২ জন ছাত্রের Score → shape(2,)**      

**X_... → সবসময় 2D (samples × features)**       
**y_... → সবসময় 1D (samples)**            
**Train+Test এর sample সংখ্যা = মোট sample সংখ্যা**             

## ***Implementation***

In [1]:
import pandas as pd
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split

#### ***Create a synthetic datasets***              

***X, y = make_classification(...)***              
***Generates a synthetic classification dataset.***             
***Returns two variables:***                
***X: The feature matrix (input variables)***               
***y: The target vector (class labels)***            

***n_samples - Create a total of 1000 samples (data points).***             
***n_features - Each sample has 10 features (input variables).***             
***n_classes***            
***random_state - Sets the random seed for reproducibility — ensures you get the same dataset every time.***             

In [2]:
X, y =make_classification(n_samples=1000, n_features=10, n_classes=2, random_state=42)

In [6]:
X

array([[ 0.96479937, -0.06644898,  0.98676805, ..., -1.2101605 ,
        -0.62807677,  1.22727382],
       [-0.91651053, -0.56639459, -1.00861409, ..., -0.98453405,
         0.36389642,  0.20947008],
       [-0.10948373, -0.43277388, -0.4576493 , ..., -0.2463834 ,
        -1.05814521, -0.29737608],
       ...,
       [ 1.67463306,  1.75493307,  1.58615382, ...,  0.69272276,
        -1.50384972,  0.22526412],
       [-0.77860873, -0.83568901, -0.19484228, ..., -0.49735437,
         2.47213818,  0.86718741],
       [ 0.24845351, -1.0034389 ,  0.36046013, ...,  0.77323999,
         0.1857344 ,  1.41641179]], shape=(1000, 10))

In [None]:
y

array([0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0,
       1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1,
       0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0,
       0, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1,
       0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0,
       0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0,
       0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1,
       0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0,
       0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0,
       0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1,
       0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1,
       0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0,
       0, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1,
       1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1,

#### ***Split the datasets into training and testing sets(70-30 split)***

In [8]:
#train, test = train_test_split(df, test_size=0.30, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=42)

In [9]:
X_train.shape

(700, 10)

In [10]:
X_test.shape

(300, 10)

In [11]:
y_train.shape

(700,)

In [13]:
y_test.shape

(300,)