<a href="https://colab.research.google.com/github/DeepsMaxi305/Data_Science/blob/main/bias_variance.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Bias-Variance Trade-Off
You should build a machine learning pipeline to examine the effect of regularization on the bias-variance trade-off. In particular, you should do the following:
- Load the `mnist` dataset using [Pandas](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html). You can find this dataset in the datasets folder.
- Split the dataset into training and test sets using [Scikit-Learn](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html). 
- Choose a model that is vulnerable to overfitting, such as [decision trees](https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html). 
- Choose a regularization hyperparameter of that model, such as the [max_depth](https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html) of decision trees.
- Define a range for the regularization hyperparameter and repeat the following experiment:
    - Change the value of the regularization hyperparameter within its range.
    - Train your model with the current value of the regularization hyperparameter on your training set.
    - Test your trained model on the test set.
    - Save the train and test errors in different columns of a [data frame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html).
- Draw a line chart to show how train and test errors of your model will change when the regularization hyperparameter increases. You can use [Plotly](https://plotly.com/python/line-charts/) for visualization.
- Analyze the chart and explain the role of the regularization hyperparameter on the bias-variance trade-off.
- Check the documentation of different machine learning models to identify their most important regularization hyperparameters.

#Importing the Library

In [34]:
import pandas as pd
import sklearn.model_selection
import sklearn.metrics
import sklearn.tree
import plotly.express as px


#Loading the Dataset

In [35]:
df = pd.read_csv("/content/mnist.csv")
df = df.set_index("id")
df.head()    

Unnamed: 0_level_0,class,pixel1,pixel2,pixel3,pixel4,pixel5,pixel6,pixel7,pixel8,pixel9,...,pixel775,pixel776,pixel777,pixel778,pixel779,pixel780,pixel781,pixel782,pixel783,pixel784
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
31953,5,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
34452,8,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
60897,5,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
36953,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1981,3,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


#Splitting dataset into train and test sets

In [36]:
x = df.drop(["class"], axis=1)
y = df["class"]
x_train, x_test, y_train, y_test = sklearn.model_selection.train_test_split (x,y)

#Training and testing the different Models

In [38]:
result_df= pd.DataFrame(columns=["Max Depth", "Train Error", "Test Error"])

for alpha in range (1,30):
    model = sklearn.tree.DecisionTreeClassifier(max_depth = alpha)
    model.fit(x_train, y_train)

    y_predicted = model.predict(x_train)
    train_accuracy = sklearn.metrics.accuracy_score(y_train,y_predicted)
    train_error = 1 - train_accuracy

    y_predicted = model.predict(x_test)
    test_accuracy = sklearn.metrics.accuracy_score(y_test,y_predicted)
    test_error = 1 - test_accuracy

    result_df = result_df.append({"Max Depth": alpha, "Train Error": train_error, "Test Error": test_error}, ignore_index=True)
  
pd.options.plotting.backend = "plotly"
result_df.plot(x="Max Depth", y =["Train Error", "Test Error"], labels = {"x": "Max Depth", "y": "Error"})

   

In [40]:
!jupyter nbconvert --to html bias_variance.ipynb

[NbConvertApp] Converting notebook bias_variance.ipynb to html
[NbConvertApp] Writing 307998 bytes to bias_variance.html
