# 1. Introduction
## 1.1 Definition
In machine learning, the **train-test split** is a technique used to evaluate the performance of a model. It involves dividing the available dataset into two or more subsets:
* a training set and
* a test set (and sometimes a validation set).

#### The train-test split process typically follows these steps:
1. **Training Set:** The training set is used to *train the model.* The model learns the underlying patterns and relationships in the data during the training process.
2. **Test Set:** The test set is used to *evaluate the performance of the trained model.* The model is applied to the test set, and its performance metrics (such as **accuracy, precision, recall, F1-score,** etc.) are measured. This gives an estimate of how the model will perform on unseen data.

In [1]:
import pandas as pd
import matplotlib.pyplot as plt

In [2]:
url = "https://raw.githubusercontent.com/akdubey2k/ML/main/6_train_test_split/carprices.csv"
df = pd.read_csv(url)
df.head()

HTTPError: HTTP Error 404: Not Found

## Pandas dataframe shape in # (columns, rows)

In [None]:
df.shape

## Pandas dataframe full information like Get the number of rows, columns, all elements (size) of DataFrame

In [None]:
df.info()

## Plot grap between age and sell price

In [None]:
%matplotlib inline
plt.scatter(df['Age(yrs)'], df['Sell Price($)'], color='brown', marker='+')
plt.xlabel('Age(yrs)', fontsize=14)
plt.ylabel('Sell Price($)', fontsize=14)

## Plot grap between mileage and sell price

In [None]:
plt.scatter(df['Mileage'], df['Sell Price($)'], color='green', marker='*')
plt.xlabel('Mileage', fontsize=14)
plt.ylabel('Sell Price($)', fontsize=14)

## Splitting the datasets into 'training' and 'testing' set

In [None]:
from sklearn.model_selection import train_test_split

In [None]:
X = df[['Age(yrs)', 'Mileage']]
y = df['Sell Price($)']

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)
X_train.info()

In [None]:
X_train

In [None]:
X_test

In [None]:
y_train

In [None]:
y_test

## Train the training datasets as per 'linear regression' model

In [None]:
from sklearn.linear_model import LinearRegression

In [None]:
model = LinearRegression()
model.fit(X_train, y_train)
model.score(X_test, y_test)

In [None]:
model.predict(X_test)

## Let's try with random_state argument

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=10)
X_train.info()

In [None]:
X_train

In [None]:
X_test

In [None]:
model.score(X_test, y_test)