In [1]:
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score

In [2]:
# Load the Iris dataset
iris = load_iris()
X, y = iris.data, iris.target

In [3]:
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=0)

In [4]:
# Feature scaling (standardization)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

In [5]:
# Initialize and train the MLP classifier
mlp = MLPClassifier(hidden_layer_sizes=(10,), max_iter=1000, random_state=42)  #single hidden layer with 10 neurons. the data passes through the model 1000 times.
mlp.fit(X_train, y_train)



**hidden_layer_sizes=(10,)**: This representation uses a tuple with one element, which is 10. It indicates that there is a single hidden layer with 10 nodes (neurons).


So, (10,) means a **single hidden layer with 10 neurons** in the MLP model.


If there were more hidden layers, you would specify the number of neurons for each layer as separate elements in the tuple.







In [6]:
# Make predictions on the test set
y_pred = mlp.predict(X_test)

In [7]:
# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

Accuracy: 0.9333333333333333


In [8]:
# Get information about the model
print("Number of hidden layers:", len(mlp.coefs_) - 1)
print("Number of neurons in each hidden layer:", [layer.shape[1] for layer in mlp.coefs_[:-1]])
print("Number of output neurons (classes):", mlp.coefs_[-1].shape[1])
print("Training loss:", mlp.loss_)
print("Number of iterations to converge:", mlp.n_iter_)

Number of hidden layers: 1
Number of neurons in each hidden layer: [10]
Number of output neurons (classes): 3
Training loss: 0.09253452971808651
Number of iterations to converge: 1000


In the code above, we access the coefs_ attribute of the trained MLPClassifier model to get information about the weights and biases. The coefs_ attribute is a list of arrays, where each array represents the weights connecting the layers in the MLP. The length of the coefs_ list corresponds to the number of layers (including the input and output layers). The number of hidden layers is given by len(mlp.coefs_) - 1, and the number of neurons in each hidden layer can be obtained from the shape of the weight arrays using a list comprehension. The number of output neurons (classes) is given by the shape of the weights connecting the last hidden layer to the output layer.

The loss_ attribute gives you the final training loss value, and the n_iter_ attribute provides the number of iterations taken for the optimization process to converge.

While this doesn't give a comprehensive summary like some deep learning frameworks, it provides some useful information about the architecture and training process of the MLP.

In [9]:
total_params = sum(np.prod(layer.shape) for layer in mlp.coefs_) + len(mlp.intercepts_)
print("Total learnable parameters:", total_params)

Total learnable parameters: 72


In the code above, after initializing and fitting the MLPClassifier model with X_train and y_train, we calculate the total number of learnable parameters.

We use a list comprehension to calculate the number of parameters in each layer by using np.prod(layer.shape) which gives the total number of elements in each weight matrix. Then, we sum up the number of parameters for all layers and add the number of bias terms to get the total number of learnable parameters.

Keep in mind that this calculation assumes that X_train is a 2D array representing the input data, and y_train is a 1D array representing the target labels. The number of features in X_train should match the number of elements in y_train. Also, the hidden_layer_sizes argument in the MLPClassifier specifies the number of neurons in the single hidden layer; in this example, it's set to (10,). You can adjust this parameter based on your specific problem and dataset.


# The learnable parameters in a neural network

For a multi-layer perceptron:

The multi-layer perceptron can have one or more hidden layers, each with its own set of weights and biases. Suppose the input has n features, there are m hidden neurons in each hidden layer, and the output layer has p neurons. Then, the total number of learnable parameters in a multi-layer perceptron is given by:



**Total parameters = Number of parameters in input layer + Number of parameters in hidden layers + Number of parameters in output layer**

**Number of parameters in input layer = Number of input features (n) * Number of hidden neurons in the first hidden layer (m)**

**Number of parameters in hidden layers = (Number of hidden neurons (m) * Number of hidden neurons (m)) * (Number of hidden layers - 1)**

**Number of parameters in output layer = Number of hidden neurons in the last hidden layer (m) * Number of output neurons (p)**




For example, let's consider a multi-layer perceptron with 3 input features, 2 hidden layers with 4 neurons each, and 1 output neuron:

Number of parameters in input layer = 3 (input features) * 4 (hidden neurons in the first hidden layer) = 12

Number of parameters in hidden layers = (4 (hidden neurons) * 4 (hidden neurons)) * (2 - 1) = 16

Number of parameters in output layer = 4 (hidden neurons in the last hidden layer) * 1 (output neuron) = 4

Total parameters = 12 (input layer) + 16 (hidden layers) + 4 (output layer) = 32

It's important to note that the above equations assume that the activation function used in all neurons is the same. Different activation functions or special architectures may introduce additional parameters.