In [4]:
from sklearn.neural_network import MLPClassifier 
from sklearn.datasets import make_classification 
from sklearn.model_selection import train_test_split  
import pandas as pd
 
X, y = make_classification(n_samples=100) 
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y) 
 
clf = MLPClassifier(max_iter=300).fit(X_train, y_train) 
clf.predict_proba(X_test[:1]) 
 
clf.predict(X_test[:5, :]) 
 
clf.score(X_test, y_test) 



0.92

# Q1
**Repeat the example above, with max_iter = 10. Run the same script again for five times, does the score changes? Explain why**

In [12]:
for i in range(5): 
    X, y = make_classification(n_samples=100) 
    X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y) 
 
    clf = MLPClassifier(max_iter=10).fit(X_train, y_train) 
    clf.predict_proba(X_test[:1]) 
 
    clf.predict(X_test[:5, :]) 
 
    print(clf.score(X_test, y_test))

0.52
0.72
0.52
0.64
0.56




In [8]:
for i in range(5): 
    X, y = make_classification(n_samples=100) 
    X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y) 
 
    clf = MLPClassifier(max_iter=10).fit(X_train, y_train) 
    clf.predict_proba(X_test[:1]) 
 
    clf.predict(X_test[:5, :]) 
 
    print(clf.score(X_test, y_test))

0.64
0.68
0.72
0.48
0.8




**<font color='#4682B4'>Yeah, the score varies between runs. Because the MLPClassifier uses stochastic optimization approaches, the model's weights are randomly changed depending on a part of the training data. As a result, depending on the randomly chosen data subsets and initial weights, the optimization process may converge to different solutions. Nonetheless, if the number of iterations is big enough, the optimization process should eventually arrive at a stable solution that is less susceptible to the random initialization and data selection.</font>**

# Q2
**Set the random state for the following functions: make_classification, train_test_split, MLPClassifier. Check the documentation for each of these functions to know how to set the random state. Next, repeat the example above, with max_iter = 10. Run the same script again for five times, does the score changes? Explain why.**

In [35]:
for i in range(5): 
    X, y = make_classification(n_samples=100,random_state=42)
    X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y,random_state=42)
    clf = MLPClassifier(max_iter=10,random_state=42).fit(X_train, y_train) 
    clf.predict_proba(X_test[:1]) 
    clf.predict(X_test[:5, :])
    print(clf.score(X_test, y_test))

0.56
0.56
0.56
0.56
0.56




In [40]:
for i in range(5): 
    X, y = make_classification(n_samples=100,random_state=82)
    X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y,random_state=82)
    clf = MLPClassifier(max_iter=10,random_state=82).fit(X_train, y_train) 
    clf.predict_proba(X_test[:1]) 
    clf.predict(X_test[:5, :])
    print(clf.score(X_test, y_test))

0.48
0.48
0.48
0.48
0.48




**<font color='#4682B4'>No, the score is constant, because the random_state's parameter value can be any integer, and it will ensure that the function produces the same results every time it is called with the same input and the same value of random_state.</font>**

# Q3
**Repeat the example above, with max_iter = 50, and report the score in the table below. Run the same script again for max_iter = 100, 200, and 300. does the score changes? Explain why.**

> max_iter = 50

In [18]:
X, y = make_classification(n_samples=100) 
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y) 
 
clf = MLPClassifier(max_iter=50).fit(X_train, y_train) 
clf.predict_proba(X_test[:1]) 
 
clf.predict(X_test[:5, :]) 
 
clf.score(X_test, y_test) 



0.88

> max_iter = 100

In [13]:
X, y = make_classification(n_samples=100) 
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y) 
 
clf = MLPClassifier(max_iter=100).fit(X_train, y_train) 
clf.predict_proba(X_test[:1]) 
 
clf.predict(X_test[:5, :]) 
 
clf.score(X_test, y_test) 



0.84

> max_iter = 200

In [14]:
X, y = make_classification(n_samples=100) 
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y) 
 
clf = MLPClassifier(max_iter=200).fit(X_train, y_train) 
clf.predict_proba(X_test[:1]) 
 
clf.predict(X_test[:5, :]) 
 
clf.score(X_test, y_test) 



0.68

> max_iter = 300

In [19]:
X, y = make_classification(n_samples=100) 
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y) 
 
clf = MLPClassifier(max_iter=300).fit(X_train, y_train) 
clf.predict_proba(X_test[:1]) 
 
clf.predict(X_test[:5, :]) 
 
clf.score(X_test, y_test) 

0.84

In [64]:
max_iter_list = [50,100,200,300]

for i in max_iter_list: 
    X, y = make_classification(n_samples=100) 
    X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y) 
 
    clf = MLPClassifier(max_iter=i).fit(X_train, y_train) 
    clf.predict_proba(X_test[:1]) 
 
    clf.predict(X_test[:5, :]) 
 
    print(clf.score(X_test, y_test))



0.88
0.76
0.88
0.84




**<font color='#4682B4'>The score changes slightly when the same script is run multiple times. because the optimization algorithm of the MLPClassifier is stochastic. However, the overall trend of the scores should remain the same, with higher values of max_iter generally leading to better performance, up to a point where the performance saturates or starts to degrade due to overfitting.</font>**

# Q4
**You are given the dataset “dataset_spine” in csv format. Read the dataset using read_csv and answer the questions below.   (link:https://www.kaggle.com/datasets/sammy123/lower-back-pain-symptoms-dataset)**

In [5]:
dataset=pd.read_csv("/kaggle/input/lower-back-pain-symptoms-dataset/Dataset_spine.csv")
dataset

Unnamed: 0,Col1,Col2,Col3,Col4,Col5,Col6,Col7,Col8,Col9,Col10,Col11,Col12,Class_att,Unnamed: 13
0,63.027817,22.552586,39.609117,40.475232,98.672917,-0.254400,0.744503,12.5661,14.5386,15.30468,-28.658501,43.5123,Abnormal,
1,39.056951,10.060991,25.015378,28.995960,114.405425,4.564259,0.415186,12.8874,17.5323,16.78486,-25.530607,16.1102,Abnormal,
2,68.832021,22.218482,50.092194,46.613539,105.985135,-3.530317,0.474889,26.8343,17.4861,16.65897,-29.031888,19.2221,Abnormal,Prediction is done by using binary classificat...
3,69.297008,24.652878,44.311238,44.644130,101.868495,11.211523,0.369345,23.5603,12.7074,11.42447,-30.470246,18.8329,Abnormal,
4,49.712859,9.652075,28.317406,40.060784,108.168725,7.918501,0.543360,35.4940,15.9546,8.87237,-16.378376,24.9171,Abnormal,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
305,47.903565,13.616688,36.000000,34.286877,117.449062,-4.245395,0.129744,7.8433,14.7484,8.51707,-15.728927,11.5472,Normal,
306,53.936748,20.721496,29.220534,33.215251,114.365845,-0.421010,0.047913,19.1986,18.1972,7.08745,6.013843,43.8693,Normal,
307,61.446597,22.694968,46.170347,38.751628,125.670725,-2.707880,0.081070,16.2059,13.5565,8.89572,3.564463,18.4151,Normal,
308,45.252792,8.693157,41.583126,36.559635,118.545842,0.214750,0.159251,14.7334,16.0928,9.75922,5.767308,33.7192,Normal,


**a. How many columns (features / attributes) does the dataset has? List the column names**

**<font color='#4682B4'>There are 14 columns; their names are as follows: Col1, Col2, Col3, Col4, Col5, Col6, Col7, Col8, Col9, Col10, Col11, Col12, Class_att, and unnamed. I prefer to drop the unnamed column because it is useless.</font>**


**b. How many rows (samples / records) does the dataset has?**

**<font color='#4682B4'>There are 310 rows (samples / records).</font>**


**c. What is the data type for each feature? Numerical, categorical, …**

**<font color='#4682B4'>Most of the features in the dataset are numerical, while the last column class_att is categorical.</font>**

**<font color='#4682B4'>Col1→ numerical,   Col2→ numerical,   Col3→ numerical,   Col4→ numerical,</font>**

**<font color='#4682B4'>Col5→ numerical,   Col6→ numerical,   Col7→ numerical,   Col8→ numerical,</font>**

**<font color='#4682B4'>Col9→ numerical,   Col10→ numerical,   Col11→ numerical,   Col12→ numerical,</font>**

**<font color='#4682B4'>Class_att→ categorical</font>**


**d. “class_att” is the target value. Why is the datatype of the target value?** 

**<font color='#4682B4'>Yes, it is. The datatype of class_att (the target value) is categorical.</font>**


**e. Is this a regression or a classification problem?**

**<font color='#4682B4'>It's a classification problem, specifically a binary classification problem, because the task is to predict whether a patient has a normal or abnormal spinal condition based on the given features.</font>**
        

## **Train an MLPClassifier**

In [24]:
X=dataset.iloc[:,:-2]
y=dataset['Class_att']

#1. Split the dataset into 70% training and 30% testing 
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y,train_size=0.7,test_size=0.3) 

#2. Initialize an MLPClassifier with 1 hidden layer and 7 neurons
clf=MLPClassifier(hidden_layer_sizes=7)

#3. Initialize the learning rate to 0.1 and the max_iter to 1000
clf=MLPClassifier(hidden_layer_sizes=7,learning_rate_init=0.1,max_iter=1000)

#4. Fit the model given the training set
clf.fit(X_train,y_train)

#5. Predict the output of the testing set 
clf.predict(X_test)

#6. Compute and report the score 
clf.score(X_test,y_test)

0.8172043010752689

**What is the effect of the number of neurons on the result? To check its effect, repeat the steps from 2 to 6, but change the number of neurons in the hidden layer. How does this affect the score?**

In [32]:
neurons = [1, 3, 5, 7, 10, 15, 20] 
for n in neurons: 
    clf = MLPClassifier(hidden_layer_sizes=(n,), max_iter=1000, learning_rate_init=0.1) 
    clf.fit(X_train, y_train) 
    clf.predict(X_test) 
    score = clf.score(X_test ,y_test) 
    print("Neurons:", n, "Accuracy score:", score) 

Neurons: 1 Accuracy score: 0.6774193548387096
Neurons: 3 Accuracy score: 0.7956989247311828
Neurons: 5 Accuracy score: 0.6774193548387096
Neurons: 7 Accuracy score: 0.6774193548387096
Neurons: 10 Accuracy score: 0.6666666666666666
Neurons: 15 Accuracy score: 0.6666666666666666
Neurons: 20 Accuracy score: 0.8602150537634409


**<font color='#4682B4'>The output shows the effect of the number of neurons on the accuracy score. Generally, increasing the number of neurons in the hidden layer can improve the accuracy score up to a certain point. However, adding too many neurons may cause overfitting.</font>**

**What is the effect of the number of hidden layers on the result? To check its effect, repeat the steps from 2 to 6, but change the number of hidden layers. How does this affect the score?**

In [48]:
layers = [(7,), (7, 7), (7, 7, 7)] 
for l in layers: 
    clf = MLPClassifier(hidden_layer_sizes=l, max_iter=1000, learning_rate_init=0.1) 
    clf.fit(X_train, y_train) 
    clf.predict(X_test) 
    score = clf.score(X_test, y_pred)
    print("Layers:", len(l),l, ", \tAccuracy score:", score) 

Layers: 1 (7,) , 	Accuracy score: 0.6559139784946236
Layers: 2 (7, 7) , 	Accuracy score: 0.4838709677419355
Layers: 3 (7, 7, 7) , 	Accuracy score: 0.4838709677419355


**<font color='#4682B4'>The output shows the effect of the number of hidden layers on the accuracy score. Generally, increasing the number of hidden layers can improve the accuracy score up to a certain point. However, adding too many hidden layers may cause overfitting.</font>**


**Apply gridsearch (GridSearchCV) to find the best hyperparamters for this problem. Use these Hyperparameters: number of hidden layers, number of neurons in each layer, learning rate, max iterations.**

In [55]:
from sklearn.model_selection import GridSearchCV
mlp_c = MLPClassifier(random_state=1,max_iter=50)

parameter={'hidden_layer_sizes':[(10,15,30),10,(10,15,5),15,30,20],'learning_rate_init': [0.5,0.1,0.3,0.9],'max_iter':[10,50,70,100,20,55]}

clf=GridSearchCV(mlp_c,parameter,n_jobs=-1,cv=3)
answer=clf.fit(X_train,y_train)
clf.predict(X_test)

print('Best Hyperparameters : %s \t with a score of: ' % answer.best_params_, answer.best_score_)



Best Hyperparameters : {'hidden_layer_sizes': 30, 'learning_rate_init': 0.1, 'max_iter': 100} 	 with a score of:  0.8252156265854896


