<a href="https://colab.research.google.com/github/PrajwalUnaik/DataAnalytics_Practice/blob/main/Day9__neuralNetwork.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Feedforward Neural Network (FNN)** implemented with TensorFlow/Keras.

In [2]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout

# 1. Load the Breast Cancer Wisconsin Dataset
from sklearn.datasets import load_breast_cancer
data = load_breast_cancer()

# Convert to a DataFrame
X = pd.DataFrame(data.data, columns=data.feature_names)
y = pd.Series(data.target)  # Target: 0 = Malignant, 1 = Benign

# 2. Data Preprocessing
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize the features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# 3. Build the Neural Network Model
model = Sequential([
    Dense(64, activation='relu', input_shape=(X_train.shape[1],)),
    Dropout(0.3),  # Regularization
    Dense(32, activation='relu'),
    Dropout(0.3),
    Dense(1, activation='sigmoid')  # Binary classification (output: 0 or 1)
])

# Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# 4. Train the Model
history = model.fit(X_train, y_train, validation_split=0.1, epochs=50, batch_size=32, verbose=1)

# 5. Evaluate the Model
# Predictions on the test set
y_pred_probs = model.predict(X_test)
y_pred = (y_pred_probs > 0.5).astype(int).flatten()  # Convert probabilities to binary classes (0 or 1)

# Calculate Metrics
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)

print(f"Accuracy: {accuracy:.4f}")
print(f"Precision: {precision:.4f}")
print(f"Recall: {recall:.4f}")
print(f"F1-Score: {f1:.4f}")


Epoch 1/50


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m13/13[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 19ms/step - accuracy: 0.3851 - loss: 0.7892 - val_accuracy: 0.8913 - val_loss: 0.5393
Epoch 2/50
[1m13/13[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.8211 - loss: 0.5331 - val_accuracy: 0.9130 - val_loss: 0.3597
Epoch 3/50
[1m13/13[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 8ms/step - accuracy: 0.8841 - loss: 0.3701 - val_accuracy: 0.9348 - val_loss: 0.2479
Epoch 4/50
[1m13/13[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 9ms/step - accuracy: 0.9304 - loss: 0.2747 - val_accuracy: 0.9348 - val_loss: 0.1826
Epoch 5/50
[1m13/13[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 8ms/step - accuracy: 0.9404 - loss: 0.2092 - val_accuracy: 0.9348 - val_loss: 0.1486
Epoch 6/50
[1m13/13[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 8ms/step - accuracy: 0.9469 - loss: 0.1709 - val_accuracy: 0.9348 - val_loss: 0.1263
Epoch 7/50
[1m13/13[0m [32m━━━━━━━━━━━━━━━━━━━━

The code uses a **Feedforward Neural Network (FNN)** implemented with TensorFlow/Keras. The architecture is specified using the `Sequential` API. Here's a breakdown of the model and its key features:

### Model Architecture:
1. **Input Layer**:  
   - Input features are the standardized features from the Breast Cancer dataset (`X_train.shape[1]` = 30 features).  

2. **Hidden Layers**:  
   - **First Hidden Layer**:  
     - 64 neurons with ReLU activation (`Dense(64, activation='relu')`).  
     - **Dropout Layer**: Drops 30% of the neurons to prevent overfitting (`Dropout(0.3)`).  
   - **Second Hidden Layer**:  
     - 32 neurons with ReLU activation.  
     - **Dropout Layer**: Another 30% dropout rate.

3. **Output Layer**:  
   - A single neuron with a sigmoid activation function (`Dense(1, activation='sigmoid')`).  
   - This outputs a probability score (between 0 and 1), indicating the likelihood of the input belonging to the positive class (Benign in this case).

### Model Purpose:
This model performs **binary classification** using **logistic regression in the output layer** (`sigmoid` activation). The loss function used (`binary_crossentropy`) is suitable for binary classification problems.

### Optimization:
The model is optimized using the **Adam optimizer**, a variant of gradient descent that adapts learning rates for better convergence.

### Metrics:
The key performance metrics are:
- **Accuracy**: Overall correctness of predictions.
- **Precision**: Fraction of predicted positives that are true positives.
- **Recall**: Fraction of true positives correctly identified.
- **F1-Score**: Harmonic mean of precision and recall (balances both).

This architecture is a basic example of a deep neural network, specifically tailored for tabular data with a binary classification target.


 **Convolutional Neural Network (CNN)** implemented with TensorFlow/Keras.

In [5]:
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout
from tensorflow.keras.utils import to_categorical

# 1. Load the Breast Cancer Wisconsin Dataset
from sklearn.datasets import load_breast_cancer
data = load_breast_cancer()

# Convert to a DataFrame
X = pd.DataFrame(data.data, columns=data.feature_names)
y = pd.Series(data.target)  # Target: 0 = Malignant, 1 = Benign

# 2. Data Preprocessing
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize the features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Reshape data for CNN
# Assuming each "image" is a 6x5 grid (30 features reshaped into 6 rows and 5 columns)
X_train = X_train.reshape(X_train.shape[0], 6, 5, 1)  # Add a channel dimension
X_test = X_test.reshape(X_test.shape[0], 6, 5, 1)

# 3. Build the CNN Model
model = Sequential([
    Conv2D(32, (3, 3), activation='relu', input_shape=(6, 5, 1)),  # Convolutional layer
    MaxPooling2D((2, 2)),  # Pooling layer
    Flatten(),  # Flatten 2D feature maps into a 1D vector
    Dense(64, activation='relu'),  # Fully connected layer
    Dropout(0.3),  # Regularization
    Dense(1, activation='sigmoid')  # Output layer for binary classification
])

# Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# 4. Train the Model
history = model.fit(X_train, y_train, validation_split=0.1, epochs=50, batch_size=32, verbose=1)

# 5. Evaluate the Model
# Predictions on the test set
y_pred_probs = model.predict(X_test)
y_pred = (y_pred_probs > 0.5).astype(int).flatten()  # Convert probabilities to binary classes (0 or 1)

# Calculate Metrics
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)

print(f"Accuracy: {accuracy:.4f}")
print(f"Precision: {precision:.4f}")
print(f"Recall: {recall:.4f}")
print(f"F1-Score: {f1:.4f}")


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


Epoch 1/50
[1m13/13[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 20ms/step - accuracy: 0.7347 - loss: 0.6301 - val_accuracy: 0.9348 - val_loss: 0.4479
Epoch 2/50
[1m13/13[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.9027 - loss: 0.4515 - val_accuracy: 0.9348 - val_loss: 0.3085
Epoch 3/50
[1m13/13[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.9016 - loss: 0.3464 - val_accuracy: 0.9565 - val_loss: 0.2207
Epoch 4/50
[1m13/13[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.9137 - loss: 0.2773 - val_accuracy: 0.9348 - val_loss: 0.1744
Epoch 5/50
[1m13/13[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.9222 - loss: 0.2232 - val_accuracy: 0.9348 - val_loss: 0.1523
Epoch 6/50
[1m13/13[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.9294 - loss: 0.2087 - val_accuracy: 0.9565 - val_loss: 0.1371
Epoch 7/50
[1m13/13[0m [32m━━━━━━━━━


---

### Key Points

1. **Reshaping**  
   - The features (30 in total) are reshaped into a `(6, 5, 1)` grid for CNN processing.

2. **Convolution and Pooling**  
   - The `Conv2D` layer extracts **spatial patterns**.  
   - The `MaxPooling2D` layer reduces the **spatial dimensions** and helps prevent **overfitting**.

3. **Flattening**  
   - Converts the **2D feature maps** into a **1D vector** for input to fully connected layers.

4. **Output**  
   - A **sigmoid activation function** predicts the probability of the **positive class**.

---


In [None]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout

# 1. Load the Breast Cancer Wisconsin Dataset
from sklearn.datasets import load_breast_cancer
data = load_breast_cancer()

# Convert to a DataFrame
X = pd.DataFrame(data.data, columns=data.feature_names)
y = pd.Series(data.target)  # Target: 0 = Malignant, 1 = Benign

# 2. Data Preprocessing
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize the features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Reshape data for RNN
# Treat each feature as a time step (sequence length = 30, features per step = 1)
X_train = X_train.reshape(X_train.shape[0], X_train.shape[1], 1)
X_test = X_test.reshape(X_test.shape[0], X_test.shape[1], 1)

# 3. Build the RNN Model
model = Sequential([
    LSTM(64, activation='tanh', input_shape=(X_train.shape[1], X_train.shape[2])),  # LSTM layer
    Dropout(0.3),  # Regularization
    Dense(32, activation='relu'),  # Fully connected layer
    Dropout(0.3),  # Regularization
    Dense(1, activation='sigmoid')  # Output layer for binary classification
])

# Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# 4. Train the Model
history = model.fit(X_train, y_train, validation_split=0.1, epochs=50, batch_size=32, verbose=1)

# 5. Evaluate the Model
# Predictions on the test set
y_pred_probs = model.predict(X_test)
y_pred = (y_pred_probs > 0.5).astype(int).flatten()  # Convert probabilities to binary classes (0 or 1)

# Calculate Metrics
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)

print(f"Accuracy: {accuracy:.4f}")
print(f"Precision: {precision:.4f}")
print(f"Recall: {recall:.4f}")
print(f"F1-Score: {f1:.4f}")



---

### Key Details

1. **Reshaping**  
   - The input is reshaped to have a sequence length of 30 (one time step per feature) and one feature per step (`(n_samples, 30, 1)`).

2. **RNN Layer**  
   - The `LSTM` layer processes the sequential data and captures patterns across the sequence of features.  
   - Other RNN layers like `GRU` or vanilla `SimpleRNN` can also be used, but `LSTM` generally performs better for tasks requiring memory.

3. **Dropout Regularization**  
   - Dropout layers are used after the LSTM and dense layers to prevent overfitting.

4. **Output**  
   - A **sigmoid activation function** predicts the probability of the **positive class**.

---