<a href="https://colab.research.google.com/github/SKumarAshutosh/Deep_learning/blob/main/FeedForwardNetwork.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Simpler analogy to explain the process of training a neural network and making predictions.

**Imagine a Teacher and a Student:**

1. **Training Data**: The teacher has a set of questions with their respective answers. The student tries to learn the pattern or logic behind those questions and answers.

2. **Training the Model (Student Learning)**: The teacher gives the student many of these questions and checks the answers. If the student is wrong, the teacher corrects them. Over time, the student gets better and starts answering most questions correctly.

3. **Testing the Model (Exam Time)**: Once the learning is done, the teacher gives the student new questions (ones they haven't seen before) to see if they truly understand. If the student performs well, it means they have not just memorized answers but have learned the underlying pattern.

4. **Making Predictions (Real-world Questions)**: Now, imagine someone else comes with a new question for the student. Based on what the student learned, they will try to answer this new question.

**Applying this to the Iris Dataset and FNN**:

1. **Training Data**: We have measurements from many flowers and we know their species. This is like our set of questions with their answers.

2. **Training the Model**: Our computer (like the student) tries to learn the relationship between the measurements and the species. We keep adjusting it until it gets most of them right.

3. **Testing the Model**: We give the computer new flower measurements (ones it hasn't seen while learning) to see if it correctly identifies the species. This is like the exam to check if our computer really understands.

4. **Making Predictions**: Now, if we find a new flower and measure it, we can ask the computer, "Based on what you've learned, what species do you think this flower is?" The computer will then give its best guess.

In essence, we're teaching the computer to recognize patterns in data, much like how a student learns patterns in questions and answers. After "learning," the computer can make educated guesses about new data it hasn't seen before.


## 1. Load the dataset:

**Variables Introduced:**


* iris: The dataset containing features and labels of the iris dataset.
* X: The feature matrix containing measurements of iris flowers.
* y: The target labels indicating the species of the iris flowers.

## **1. Load the dataset**





In [4]:
#Imports required modules.
from sklearn import datasets
from sklearn.preprocessing import OneHotEncoder

In [22]:
#Loads the iris dataset
#iris: Contains the dataset with features and labels of iris flowers.
iris = datasets.load_iris()

In [23]:
iris

{'data': array([[5.1, 3.5, 1.4, 0.2],
        [4.9, 3. , 1.4, 0.2],
        [4.7, 3.2, 1.3, 0.2],
        [4.6, 3.1, 1.5, 0.2],
        [5. , 3.6, 1.4, 0.2],
        [5.4, 3.9, 1.7, 0.4],
        [4.6, 3.4, 1.4, 0.3],
        [5. , 3.4, 1.5, 0.2],
        [4.4, 2.9, 1.4, 0.2],
        [4.9, 3.1, 1.5, 0.1],
        [5.4, 3.7, 1.5, 0.2],
        [4.8, 3.4, 1.6, 0.2],
        [4.8, 3. , 1.4, 0.1],
        [4.3, 3. , 1.1, 0.1],
        [5.8, 4. , 1.2, 0.2],
        [5.7, 4.4, 1.5, 0.4],
        [5.4, 3.9, 1.3, 0.4],
        [5.1, 3.5, 1.4, 0.3],
        [5.7, 3.8, 1.7, 0.3],
        [5.1, 3.8, 1.5, 0.3],
        [5.4, 3.4, 1.7, 0.2],
        [5.1, 3.7, 1.5, 0.4],
        [4.6, 3.6, 1. , 0.2],
        [5.1, 3.3, 1.7, 0.5],
        [4.8, 3.4, 1.9, 0.2],
        [5. , 3. , 1.6, 0.2],
        [5. , 3.4, 1.6, 0.4],
        [5.2, 3.5, 1.5, 0.2],
        [5.2, 3.4, 1.4, 0.2],
        [4.7, 3.2, 1.6, 0.2],
        [4.8, 3.1, 1.6, 0.2],
        [5.4, 3.4, 1.5, 0.4],
        [5.2, 4.1, 1.5, 0.1],
  

In [6]:
#Extracts the feature data from the iris dataset.
#X: The feature matrix containing measurements of iris flowers.
X = iris.data


In [7]:
#Extracts target labels and reshapes them into a column vector.
#y: The target labels indicating the iris flower species.
y = iris.target.reshape(-1, 1)


**Q. Why reshape with (-1, 1)?**

reshape(-1, 1) means to reshape the data such that it has many rows as needed to maintain the number of elements and exactly one column. This is used to convert the flat array of target labels into a column vector.

In [8]:
#One-hot encodes the target labels.
# encoder: Instance of OneHotEncoder.
# y_onehot: One-hot encoded target labels.
encoder = OneHotEncoder(sparse=False)
y_onehot = encoder.fit_transform(y)




**Q1. Why the need for one-hot encoding?**

The iris dataset contains three classes: Setosa, Versicolour, and Virginica. These are represented as 0, 1, and 2 in the target array. For a multi-class classification using neural networks, it's often recommended to use one-hot encoding to represent class labels. One-hot encoding transforms categorical data into a format that can be more easily understood by the model, by representing each class with a binary vector.

**Q. What is sparse and why is it False?**

In OneHotEncoder, the sparse argument determines if the returned array should be a sparse matrix or a dense numpy array. sparse=False means we get a dense array, which is easier to work with in this context.

## **2. Split the dataset into training and testing sets**





In [9]:
#Imports the module to split datasets.
from sklearn.model_selection import train_test_split


In [10]:
#Splits the dataset into training and testing subsets.
# X_train, X_test: Training and testing feature matrices.
# y_train, y_test: Training and testing target matrices
X_train, X_test, y_train, y_test = train_test_split(X, y_onehot, test_size=0.2, random_state=42)


**Q Why random_state=42?**

The random_state is a seed for the random number generator. By setting it to a fixed value (like 42), the train/test split will always be deterministic. This ensures that the results are reproducible across different runs.

## **3. Build the feed-forward neural network:**









In [12]:
#Imports required TensorFlow and Keras modules.
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense


In [13]:
#Initializes a linear stack of layers for the neural network.
#model: The neural network model.
model = Sequential()


**Sequential():** This is used to initialize a linear stack of network layers. It allows you to build a model layer by layer.

Dense: This is the layer type. Dense is a standard layer type that works for most cases. In a dense layer, all nodes in the previous layer connect to the nodes in the current layer.

The first parameter, 10, is the number of neurons/nodes the layer has. For the input layer, you must also define the input_dim parameter, specifying the number of inputs (in this case, 4 inputs for the Iris dataset).

activation: This is the activation function for the layer. The activation function decides whether a neuron should be activated based on the weighted sum. Here, we're using relu (Rectified Linear Activation) for our hidden layer and softmax for our output layer because it's a multi-class classification problem.

In [14]:
#Adds an input and hidden layer with 10 nodes and a ReLU activation function.
model.add(Dense(10, input_dim=4, activation='relu'))


**Why is the input dimension 4?**

The iris dataset has 4 features (sepal length, sepal width, petal length, petal width) for each data point. Hence, the input dimension is set to 4.

In [15]:
#Adds an output layer with 3 nodes (one for each iris species) and a softmax activation function.
model.add(Dense(3, activation='softmax'))


**Q Why use ReLU for the first activation and softmax for the second?**

ReLU (Rectified Linear Unit): It's a commonly used activation function in hidden layers because it introduces non-linearity without being computationally expensive.

Softmax: It's used in the output layer of multi-class classification tasks. It converts the raw output values (logits) from the network into probability distributions over the classes.

## **4. Compile the model:**

In [16]:
#Configures the model for training by setting the optimizer, loss function, and evaluation metric.
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])


Why choose the Adam optimizer and categorical_crossentropy loss?

Adam optimizer: Combines the best properties of the AdaGrad and RMSprop algorithms to provide an optimization algorithm that can handle sparse gradients on noisy problems. It's computationally efficient and has little memory requirement.
categorical_crossentropy loss: It's the recommended loss for multi-class classification problems. It measures the difference between the true labels and the predicted probabilities.

## **5. Train the model:**

In [17]:
#Trains the model for 50 epochs using the training data and validates using the testing data.
#history: Contains the training history, like loss and accuracy values at each epoch.
history = model.fit(X_train, y_train, epochs=50, batch_size=10, validation_data=(X_test, y_test))


Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


**Q Why is epochs 50 and batch_size 10?**

These are hyperparameters. epochs represents how many times the dataset will be passed forward and backward through the network.

batch_size is the number of training examples used in one iteration. The chosen values are just starting points and might not be optimal; in practice, these values should be tuned for best performance.

## **6. Evaluate the model:**

In [18]:
#Evaluates the model's performance on the test data.
# loss: The loss value of the model on the test data.
# accuracy: The accuracy of the model on the test data.
loss, accuracy = model.evaluate(X_test, y_test)




In [19]:
#Prints the loss and accuracy of the model on the test dataset.
print(f"Test Loss: {loss}")
print(f"Test Accuracy: {accuracy}")


Test Loss: 0.47258517146110535
Test Accuracy: 0.8333333134651184


**Q Why consider only accuracy in metrics?**

Accuracy is a straightforward metric for classification problems. It gives a general idea of how well the model is performing. Depending on the problem, other metrics (like precision, recall, F1-score) might also be relevant. In this simple example, accuracy suffices to demonstrate model performance.

**Q. How is loss and accuracy calculated?**

Loss: Represents how far the predictions of the model are from the true values. The categorical_crossentropy loss is commonly used for multi-class classification. It quantifies the difference between the predicted probability and the true class.


Accuracy: It's a metric that calculates the proportion of correctly predicted classification outcomes in the dataset. For each prediction, if the maximum index in the predicted vector matches the maximum index in the true vector, then it's considered a correct prediction.

Certainly! The ultimate goal of training a neural network, or any machine learning model for that matter, is to make predictions on new, unseen data. Let me break it down:

**Purpose of Training a Neural Network:**

1. **Modeling Relationships**: Neural networks, including FNNs, model the relationship between input features and target outputs. This modeling is learned from the provided training data.

2. **Generalization**: While the immediate goal is to perform well on the training data, the real value of a trained model lies in its ability to generalize to new, unseen data. By splitting the dataset into training and testing sets, we can evaluate how well our model is likely to perform on data it hasn't seen before.

3. **Decision Making**: Once trained, neural networks can help in decision-making processes by predicting outcomes based on new input data.

Now, after training an FNN on the Iris dataset, you would typically want to use it to make predictions:

**Making Predictions:**

1. **Prepare New Data**: Let's assume you have measurements from a new flower and you want to predict its species using your trained model:

```python
# Example: New flower measurements
new_data = [[5.1, 3.5, 1.4, 0.2]]  # Example data (sepal length, sepal width, petal length, petal width)
```

2. **Predict the Species**:

```python
predictions = model.predict(new_data)
predicted_class = np.argmax(predictions, axis=1)
```

`model.predict()` will provide you with the probabilities for each class (species, in this case). Using `np.argmax()`, you can find out which class has the highest probability.

3. **Interpret the Prediction**:

```python
iris_species = ['setosa', 'versicolor', 'virginica']
print(f"The predicted species for the new flower is: {iris_species[predicted_class[0]]}")
```

To summarize, the purpose of training the FNN on the Iris dataset is to create a model that can predict the species of iris flowers based on their measurements. After training, you can use this model to predict the species of new flowers, assisting in classification tasks. This is a simplified example, but the principles apply to more complex datasets and problems: you train on known data to make predictions on unknown data.



---



---



Feed-Forward Neural Networks (FNN) or Multi-layer Perceptrons (MLP) can be applied to a wide range of datasets, but there are some general characteristics and considerations:

1. **Numerical Data**: FNNs work with numerical data. If your dataset has categorical data, you'll need to encode it into a numerical form, e.g., using one-hot encoding, ordinal encoding, etc.

2. **Size of the Dataset**: FNNs have multiple parameters, and to train them effectively, you typically need a relatively large dataset. Small datasets might lead to overfitting, where the model performs well on the training data but poorly on unseen data.

3. **Features and Complexity**:
    - FNNs can handle datasets with many features or dimensions. However, the more features you have, the more complex (i.e., having more neurons or layers) your network might need to be.
    - It's essential to scale or normalize features, especially if they have different units or scales. Common methods include Min-Max scaling and Standard (Z-score) normalization.

4. **Task Type**:
    - **Classification**: If you're categorizing data into classes, you'll need labeled data where each input has a corresponding class label. The output layer would typically use a softmax activation function for multi-class classification, and a sigmoid for binary classification.
    - **Regression**: If you're predicting a continuous value, your dataset should have corresponding numerical targets. The output layer typically doesn't have an activation function, or it might use a linear activation function.

5. **Consistency and Quality**:
    - The dataset should be consistent. Anomalies, outliers, or noise can affect the performance of the FNN.
    - Missing values need to be addressed before training, either by imputation or by removing data points with missing values.

6. **Balanced Classes (for Classification)**: If one class has significantly more examples than another, the model might become biased towards the majority class. Techniques like oversampling, undersampling, or using synthetic data can help balance the classes.

7. **Time-Series or Sequential Data Limitation**: While FNNs can handle time-series data, they don't inherently understand the sequence or time aspect of the data. Recurrent Neural Networks (RNNs) or Long Short-Term Memory networks (LSTMs) are more suited for such data.

In summary, while FNNs are versatile and can handle a wide variety of datasets, the data needs to be preprocessed and structured correctly. The nature and quality of your dataset, along with the problem at hand (classification vs. regression), will dictate the architecture and complexity of your FNN.

# Test the knowledge

---



In [24]:
!pip install kaggle




In [26]:
from google.colab import files
files.upload()


Saving kaggle.json to kaggle.json


{'kaggle.json': b'{"username":"akashutosh09","key":"f3be8e4efaa13164998ccd39f480d0d7"}'}

In [27]:
!pip install -q kaggle


In [28]:
!mkdir -p ~/.kaggle
!cp kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json


In [29]:
!kaggle datasets download -d abhishek14398/salary-dataset-simple-linear-regression

Downloading salary-dataset-simple-linear-regression.zip to /content
  0% 0.00/457 [00:00<?, ?B/s]
100% 457/457 [00:00<00:00, 1.21MB/s]


In [32]:
!unzip salary-dataset-simple-linear-regression.zip


Archive:  salary-dataset-simple-linear-regression.zip
  inflating: Salary_dataset.csv      


In [34]:
# importing libriries
import pandas as pd
import numpy as np



In [35]:
salary_dataset = pd.read_csv("./Salary_dataset.csv")

In [37]:
salary_dataset.head()

Unnamed: 0.1,Unnamed: 0,YearsExperience,Salary
0,0,1.2,39344.0
1,1,1.4,46206.0
2,2,1.6,37732.0
3,3,2.1,43526.0
4,4,2.3,39892.0


In [39]:
salary_dataset.columns

Index(['Unnamed: 0', 'YearsExperience', 'Salary'], dtype='object')

In [40]:
salary_dataset.drop(columns = ['Unnamed: 0'])

Unnamed: 0,YearsExperience,Salary
0,1.2,39344.0
1,1.4,46206.0
2,1.6,37732.0
3,2.1,43526.0
4,2.3,39892.0
5,3.0,56643.0
6,3.1,60151.0
7,3.3,54446.0
8,3.3,64446.0
9,3.8,57190.0


In [49]:
len(salary_dataset)

30

In [43]:
X = salary_dataset.YearsExperience

In [46]:
y = salary_dataset.Salary

In [47]:
#Imports the module to split datasets.
from sklearn.model_selection import train_test_split

In [70]:
X = salary_dataset[['YearsExperience']].values  # Note the double brackets to keep it as DataFrame
y = salary_dataset['Salary'].values

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


In [69]:
print(X_train.shape)
print(y_train.shape)
print(X_test.shape)
print(y_test.shape)


(24,)
(24,)
(6,)
(6,)


In [51]:
#Building a model
#Imports required TensorFlow and Keras modules.
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

In [78]:
#model: The neural network model.
model1 = Sequential()

In [93]:
#Adds an input and hidden layer with 10 nodes and a ReLU activation function.
model1.add(Dense(3, input_dim=1, activation='relu'))  # only 1 input: YearsExperience


In [94]:
#Adds an output layer with 3 nodes (one for each iris species) and a softmax activation function.
model1.add(Dense(1))  # No activation for regression


In [95]:
#Configures the model for training by setting the optimizer, loss function, and evaluation metric.
#model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model1.compile(optimizer='adam', loss='mean_squared_error' , metrics=['accuracy'])


In [97]:
#Trains the model for 50 epochs using the training data and validates using the testing data.
#history: Contains the training history, like loss and accuracy values at each epoch.
history = model1.fit(X_train, y_train, epochs=20, batch_size=1, validation_data=(X_test, y_test))


Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


In [92]:
loss, accuracy = model1.evaluate(X_test, y_test)




In [85]:
#Prints the loss and accuracy of the model on the test dataset.
print(f"Test Loss: {loss}")
print(f"Test Accuracy: {accuracy}")


Test Loss: 7430209024.0
Test Accuracy: 0.0


In [86]:
new_age = [['31']]

In [90]:
new_data = [[5]]  # e.g., 5 years of experience
predicted_salary = model1.predict(new_data)
print(f"Predicted salary for 5 years of experience: {predicted_salary[0][0]}")


Predicted salary for 5 years of experience: 2.0694949626922607


With the above mode the accuracy are not too much satisfied now again doing with some changesCertainly! The performance of a neural network on regression tasks can be influenced by various factors including the architecture, data preprocessing, and training configurations. Here's a step-by-step guide to refining your approach:

### 1. Feature Scaling:
Neural networks often perform better when input features are scaled. A common practice is to use the `StandardScaler` to scale features:

```python
from sklearn.preprocessing import StandardScaler

# Scaling features
scaler_X = StandardScaler()
X_train_scaled = scaler_X.fit_transform(X_train)
X_test_scaled = scaler_X.transform(X_test)

# Optionally scale the target variable if the values are large. If you do this, remember to inverse-transform predictions.
scaler_y = StandardScaler()
y_train_scaled = scaler_y.fit_transform(y_train.reshape(-1, 1)).flatten()
y_test_scaled = scaler_y.transform(y_test.reshape(-1, 1)).flatten()
```

### 2. Modify Model Architecture:

Sometimes, deeper networks or different architectures can capture the relationship better. Consider adding more neurons or layers.

```python
model = Sequential()
model.add(Dense(50, input_dim=1, activation='relu'))
model.add(Dense(50, activation='relu'))
model.add(Dense(1, activation='linear'))

model.compile(optimizer='adam', loss='mean_squared_error')
```

### 3. Train with Early Stopping:

To prevent overfitting and to stop training once the validation loss stops improving, you can use Early Stopping:

```python
from tensorflow.keras.callbacks import EarlyStopping

early_stop = EarlyStopping(monitor='val_loss', patience=10)  # stop if validation loss doesn't improve for 10 epochs

history = model.fit(X_train_scaled, y_train_scaled, epochs=200, batch_size=2, validation_data=(X_test_scaled, y_test_scaled), callbacks=[early_stop])
```

### 4. Evaluate Model:

After training, evaluate the model's performance:

```python
loss = model.evaluate(X_test_scaled, y_test_scaled)
print(f"Mean Squared Error on Test Data: {loss}")
```

### 5. Make Predictions:

When making predictions, remember to inverse-transform the predicted values if you've scaled the target variable.

```python
years_experience = [[5]]
scaled_input = scaler_X.transform(years_experience)
predicted_salary_scaled = model.predict(scaled_input)
predicted_salary = scaler_y.inverse_transform(predicted_salary_scaled)

print(f"Predicted salary for 5 years of experience: {predicted_salary[0][0]}")
```

### Notes:

- **Hyperparameter Tuning**: You might also consider trying out different learning rates, optimizers, or even regularizers if overfitting is a concern.
  
- **More Data**: If your dataset is small, gathering more data (if possible) can often help improve model performance.
  
- **Alternative Models**: Depending on the dataset, sometimes simpler regression models (like linear regression) or other machine learning models might perform better. Consider trying other regression models as a benchmark.

Remember, iterating on the architecture, preprocessing, and hyperparameters is a key aspect of deep learning. Each dataset has its unique characteristics, and there's often no one-size-fits-all solution.

In [98]:
from sklearn.preprocessing import StandardScaler

# Scaling features
scaler_X = StandardScaler()
X_train_scaled = scaler_X.fit_transform(X_train)
X_test_scaled = scaler_X.transform(X_test)

# Optionally scale the target variable if the values are large. If you do this, remember to inverse-transform predictions.
scaler_y = StandardScaler()
y_train_scaled = scaler_y.fit_transform(y_train.reshape(-1, 1)).flatten()
y_test_scaled = scaler_y.transform(y_test.reshape(-1, 1)).flatten()

In [110]:
model2 = Sequential()
model2.add(Dense(50, input_dim=1, activation='relu'))
model2.add(Dense(50, activation='relu'))
model2.add(Dense(1, activation='linear'))

model2.compile(optimizer='adam', loss='mean_squared_error' )

In [111]:
from tensorflow.keras.callbacks import EarlyStopping

early_stop = EarlyStopping(monitor='val_loss', patience=10)  # stop if validation loss doesn't improve for 10 epochs

history = model2.fit(X_train_scaled, y_train_scaled, epochs=500, batch_size=2, validation_data=(X_test_scaled, y_test_scaled), callbacks=[early_stop])

Epoch 1/500
Epoch 2/500
Epoch 3/500
Epoch 4/500
Epoch 5/500
Epoch 6/500
Epoch 7/500
Epoch 8/500
Epoch 9/500
Epoch 10/500
Epoch 11/500
Epoch 12/500
Epoch 13/500
Epoch 14/500
Epoch 15/500
Epoch 16/500
Epoch 17/500
Epoch 18/500
Epoch 19/500


In [112]:
loss = model2.evaluate(X_test_scaled, y_test_scaled)
print(f"Mean Squared Error on Test Data: {loss}")

Mean Squared Error on Test Data: 0.06313470751047134


In [113]:
years_experience = [[5]]
scaled_input = scaler_X.transform(years_experience)
predicted_salary_scaled = model2.predict(scaled_input)
predicted_salary = scaler_y.inverse_transform(predicted_salary_scaled)

print(f"Predicted salary for 5 years of experience: {predicted_salary[0][0]}")

Predicted salary for 5 years of experience: 66038.9375
