# HW9 AmirHossein Naghdi - 400102169

# 15 Points on the notebook running correctly.

# 15 Points on having sufficient explanations and overall readability of the notebook

# 10 Points: Multilayer Perceptron with Scikit-Learn
* 5 Points: binary classification with F1-score above 0.75
* 5 Points: regression with R2-score above 0.8

In [8]:
from sklearn.datasets import load_breast_cancer
X_classification, y_classification = load_breast_cancer(return_X_y=True)

In [9]:
from sklearn.datasets import fetch_california_housing
data = fetch_california_housing()
X, y = data.data, data.target

# classification
# Multi-Layer Perceptron (MLP) for Classification
The MLP is a type of feedforward neural network used for supervised learning tasks like classification. It consists of an input layer, one or more hidden layers with non-linear activation functions, and an output layer.

It works by:

Passing input data through the network (forward propagation),

Computing a loss (e.g., cross-entropy for classification),

Adjusting weights using backpropagation and an optimizer (like stochastic gradient descent),

Iterating this process to minimize the loss and improve prediction accuracy.

In essence, MLP learns to map inputs to outputs by adjusting internal parameters based on labeled training data.

In [21]:
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import f1_score

X_train, X_test, y_train, y_test = train_test_split(X_classification, y_classification, test_size=0.2, random_state=42)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

clf = MLPClassifier(hidden_layer_sizes=(100,), max_iter=1000, random_state=42)
clf.fit(X_train_scaled, y_train)
y_pred = clf.predict(X_test_scaled)

f1 = f1_score(y_test, y_pred)
print(f"F1 Score (Classification): {f1:.2f}")


F1 Score (Classification): 0.98


#  Regression
# Multi-Layer Perceptron (MLP) for Regression
The MLPRegressor is a type of feedforward neural network used for supervised regression tasks, where the goal is to predict continuous values.

It works by:

Passing input features through multiple hidden layers with non-linear activation functions,

Computing a loss (typically mean squared error) between predicted and true values,

Updating weights using backpropagation and an optimization algorithm (e.g., Adam or SGD),

Iteratively improving predictions over training epochs.

The model learns a complex, non-linear mapping from input features to a continuous output. The depth and size of the hidden layers (e.g., (64, 32, 16)) determine the model's capacity to capture patterns in the data.

The performance is commonly evaluated using metrics like R² score, which measures how well the predictions explain the variance in the target variable.

In [20]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neural_network import MLPRegressor
from sklearn.metrics import r2_score

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

mlp = MLPRegressor(hidden_layer_sizes=(64, 32, 16), max_iter=500, random_state=42)
mlp.fit(X_train, y_train)

y_pred = mlp.predict(X_test)
r2 = r2_score(y_test, y_pred)
print(f"R2 Score (Scikit-Learn MLP): {r2:.2f}")


R2 Score (Scikit-Learn MLP): 0.80


# 15 Points: 4-layer feedforward network with Keras
* 10 Points: binary classification with F1-score above 0.75
* 5 Points: regression with R2-score above 0.8

# classification
# Binary Classification with Keras Feedforward Neural Network
This model is a feedforward neural network (a type of Multi-Layer Perceptron) built using Keras for a binary classification task—predicting whether a tumor is malignant or benign using the Breast Cancer dataset.

It works by:

Constructing a network with an input layer, three hidden layers (using ReLU activation), and a sigmoid-activated output layer that outputs a probability for the positive class.

Using binary cross-entropy as the loss function, which is appropriate for binary classification tasks.

Optimizing the model via Adam, a popular gradient-based optimizer, through backpropagation.

After training for multiple epochs, it outputs a probability for each input, which is then thresholded at 0.5 to assign class labels.

The model learns non-linear patterns in the input features to distinguish between the two classes, and its performance is evaluated using the F1-score, which balances precision and recall.

In [22]:
import numpy as np
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import f1_score
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import Adam

X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

model = Sequential([
    Dense(64, activation='relu', input_shape=(X.shape[1],)),
    Dense(32, activation='relu'),
    Dense(16, activation='relu'),
    Dense(1, activation='sigmoid')
])

model.compile(optimizer=Adam(learning_rate=0.001), loss='binary_crossentropy', metrics=['accuracy'])
model.fit(X_train, y_train, epochs=50, batch_size=16, verbose=0)

y_pred = (model.predict(X_test) > 0.5).astype("int32").flatten()
f1 = f1_score(y_test, y_pred)
print(f"F1 Score (Binary Classification): {f1:.2f}")


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 19ms/step
F1 Score (Binary Classification): 0.98


# regression
# Regression with Keras Feedforward Neural Network
This model is a Keras-based feedforward neural network used for a regression task on the California Housing dataset, where the goal is to predict continuous housing prices.

It works by:

Building a multi-layer perceptron with an input layer, three hidden layers (using ReLU activation), and an output layer with no activation (linear) to produce continuous predictions.

Using mean squared error (MSE) as the loss function, which is standard for regression problems.

Optimizing the network using the Adam optimizer, which adapts learning rates during training based on gradient estimates.

Learning to approximate the underlying relationship between input features and target values through forward propagation and backpropagation over multiple epochs.

The model is evaluated using the R² score, which indicates how well the predictions explain the variance in the actual housing prices.

In [23]:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import Adam

model = Sequential([
    Dense(64, activation='relu', input_shape=(X.shape[1],)),
    Dense(32, activation='relu'),
    Dense(16, activation='relu'),
    Dense(1)
])

model.compile(optimizer=Adam(learning_rate=0.001), loss='mse')
model.fit(X_train, y_train, epochs=100, batch_size=32, verbose=0)

y_pred = model.predict(X_test).flatten()
r2 = r2_score(y_test, y_pred)
print(f"R2 Score (Keras NN): {r2:.2f}")


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 22ms/step
R2 Score (Keras NN): 0.86


#20 Points: 4-layer feedforward network with PyTorch
* 10 Points: binary classification with F1-score above 0.75
* 10 Points: regression with R2-score above 0.8

# classification
# Binary Classification with PyTorch Feedforward Neural Network
This is a binary classification model built using PyTorch, applied to the Breast Cancer dataset.

It works by:

Defining a feedforward neural network with an input layer, multiple hidden layers using ReLU activation, and a sigmoid-activated output layer that produces probabilities between 0 and 1.

Using binary cross-entropy loss (BCELoss) to measure the difference between predicted probabilities and true binary labels.

Training the model using Adam optimizer through backpropagation over several epochs.

Making predictions on test data by thresholding the sigmoid output at 0.5 to assign binary class labels.

The network learns to distinguish between the two classes based on patterns in the input features, and its performance is assessed using the F1-score, which balances precision and recall.

In [24]:
import torch
import torch.nn as nn
import torch.optim as optim
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import f1_score

data = load_breast_cancer()
X, y = data.data, data.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

X_train_tensor = torch.tensor(X_train, dtype=torch.float32)
X_test_tensor = torch.tensor(X_test, dtype=torch.float32)
y_train_tensor = torch.tensor(y_train, dtype=torch.float32).unsqueeze(1)
y_test_tensor = torch.tensor(y_test, dtype=torch.float32).unsqueeze(1)

class Classifier(nn.Module):
    def __init__(self):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(X.shape[1], 64),
            nn.ReLU(),
            nn.Linear(64, 32),
            nn.ReLU(),
            nn.Linear(32, 16),
            nn.ReLU(),
            nn.Linear(16, 1),
            nn.Sigmoid()
        )

    def forward(self, x):
        return self.net(x)

model = Classifier()
criterion = nn.BCELoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

for epoch in range(100):
    model.train()
    optimizer.zero_grad()
    outputs = model(X_train_tensor)
    loss = criterion(outputs, y_train_tensor)
    loss.backward()
    optimizer.step()

model.eval()
with torch.no_grad():
    y_pred = model(X_test_tensor).numpy()
    y_pred_labels = (y_pred > 0.5).astype(int)
    f1 = f1_score(y_test, y_pred_labels)
    print(f"F1 Score (PyTorch Classification): {f1:.2f}")


F1 Score (PyTorch Classification): 0.99


# regression
# Regression with PyTorch Feedforward Neural Network
This is a regression model implemented in PyTorch, applied to the California Housing dataset to predict continuous housing prices.

It works by:

Building a deep feedforward neural network with several hidden layers using ReLU activation and a final linear output layer to produce continuous predictions.

Using mean squared error (MSE) as the loss function to quantify the difference between predicted and actual values.

Training the model using the Adam optimizer and backpropagation, which updates the model's weights to minimize the MSE over training epochs.

Evaluating model performance using the R² score, which reflects how well the model explains the variance in housing prices.

The network learns complex, non-linear relationships in the input features to produce accurate real-valued predictions.

In [25]:
import torch
import torch.nn as nn
import torch.optim as optim
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import r2_score

data = fetch_california_housing()
X, y = data.data, data.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

X_train_tensor = torch.tensor(X_train, dtype=torch.float32)
y_train_tensor = torch.tensor(y_train, dtype=torch.float32).view(-1, 1)
X_test_tensor = torch.tensor(X_test, dtype=torch.float32)
y_test_tensor = torch.tensor(y_test, dtype=torch.float32).view(-1, 1)

class BetterNet(nn.Module):
    def __init__(self):
        super(BetterNet, self).__init__()
        self.fc1 = nn.Linear(8, 128)
        self.fc2 = nn.Linear(128, 64)
        self.fc3 = nn.Linear(64, 32)
        self.fc4 = nn.Linear(32, 1)

    def forward(self, x):
        x = torch.relu(self.fc1(x))
        x = torch.relu(self.fc2(x))
        x = torch.relu(self.fc3(x))
        return self.fc4(x)

model = BetterNet()
optimizer = optim.Adam(model.parameters(), lr=0.001)
loss_fn = nn.MSELoss()

for epoch in range(200):
    model.train()
    optimizer.zero_grad()
    output = model(X_train_tensor)
    loss = loss_fn(output, y_train_tensor)
    loss.backward()
    optimizer.step()

model.eval()
with torch.no_grad():
    y_pred = model(X_test_tensor).numpy().flatten()

r2 = r2_score(y_test, y_pred)
print(f"Improved R² Score (PyTorch): {r2:.2f}")


Improved R² Score (PyTorch): 0.71


# 15 Points: 4-layer non-sequential feedforward network with Keras
* 5 Points: binary classification with F1-score above 0.75
* 5 Points: regression with R2-score above 0.8

# classification
# Binary Classification using Keras Functional API
This model is a binary classification network implemented using the Keras Functional API, applied to the Breast Cancer dataset.

It works by:

Defining the model structure using the Functional API:

The input layer accepts data with a shape matching the number of features (X.shape[1]).

Multiple hidden layers with ReLU activation help learn complex features.

A final sigmoid-activated output layer produces probabilities for binary classification.

Using binary cross-entropy loss to measure the difference between predicted probabilities and true binary labels.

Optimizing the model with the Adam optimizer, which adjusts learning rates during training.

Incorporating precision and recall as metrics to evaluate the classification performance, in addition to using the F1 score for final evaluation.

The model is trained on the training data and evaluated on the test set by thresholding the predicted probabilities at 0.5 to generate class labels, and its performance is assessed using the F1 score, balancing both precision and recall.

In [26]:
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Dense
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.metrics import Precision, Recall
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import f1_score
import numpy as np

data = load_breast_cancer()
X, y = data.data, data.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

inputs = Input(shape=(X.shape[1],))
x = Dense(64, activation='relu')(inputs)
x = Dense(32, activation='relu')(x)
x = Dense(16, activation='relu')(x)
outputs = Dense(1, activation='sigmoid')(x)

model = Model(inputs=inputs, outputs=outputs)
model.compile(optimizer=Adam(0.001), loss='binary_crossentropy', metrics=[Precision(), Recall()])

model.fit(X_train, y_train, epochs=100, verbose=0, batch_size=32)

y_pred = (model.predict(X_test) > 0.5).astype(int)
f1 = f1_score(y_test, y_pred)
print(f"F1 Score (Keras Non-Sequential Classification): {f1:.2f}")


[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 19ms/step
F1 Score (Keras Non-Sequential Classification): 0.98


# regression
# Regression using Keras Functional API
This model is a regression network built using the Keras Functional API, applied to the California Housing dataset to predict continuous housing prices.

It works by:

Defining the model structure using the Functional API:

The input layer accepts data with a shape corresponding to the number of features (X.shape[1]).

Several hidden layers with ReLU activation enable the model to learn complex patterns in the data.

The final output layer is a linear activation (default), which allows the model to output continuous predictions.

Using mean squared error (MSE) as the loss function to measure the difference between the predicted and actual values, which is common for regression tasks.

Optimizing the model using the Adam optimizer, which adjusts the learning rate during training.

The model is trained on the training data and evaluated on the test set using the R² score, which indicates how well the model’s predictions explain the variance in the target values (housing prices).

In [27]:
from sklearn.datasets import fetch_california_housing
from sklearn.metrics import r2_score

data = fetch_california_housing()
X, y = data.data, data.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

inputs = Input(shape=(X.shape[1],))
x = Dense(64, activation='relu')(inputs)
x = Dense(32, activation='relu')(x)
x = Dense(16, activation='relu')(x)
outputs = Dense(1)(x)

model = Model(inputs=inputs, outputs=outputs)
model.compile(optimizer=Adam(0.001), loss='mse')

model.fit(X_train, y_train, epochs=100, verbose=0, batch_size=32)

y_pred = model.predict(X_test).flatten()
r2 = r2_score(y_test, y_pred)
print(f"R² Score (Keras Non-Sequential Regression): {r2:.2f}")


[1m129/129[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step
R² Score (Keras Non-Sequential Regression): 0.80


# Bonus 15 Points (if dataset has time-series like features) 3-layer Recurrent Neural Network with Keras
* 10 Points: binary classification with F1-score above 0.75
* 5 Points: regression with R2-score above 0.8

# classification
# Binary Classification with RNN on Breast Cancer Dataset
This model uses a Recurrent Neural Network (RNN) to perform binary classification on the Breast Cancer dataset, predicting whether tumors are malignant or benign.

It works by:

Preprocessing:

The features are normalized using MinMaxScaler to scale values between 0 and 1.

The data is reshaped to fit the requirements of an RNN. Here, the data is divided into timesteps (10) and features per step (3), with padding if necessary.

RNN Architecture:

The model includes three SimpleRNN layers, each with 64 units and tanh activation.

Dropout layers are added to prevent overfitting.

The output layer uses a sigmoid activation to output probabilities for binary classification.

Training:

The model is trained using the Adam optimizer and binary cross-entropy loss for binary classification tasks.

The model is trained for 40 epochs with a validation split of 0.2, which helps monitor performance on unseen data during training.

Evaluation:

After training, the model predicts probabilities on the test set, and the predicted probabilities are thresholded at 0.5 to assign class labels.

The performance is evaluated using the F1-score and the classification report, which provides detailed metrics such as precision, recall, and F1-score.

The RNN model leverages temporal relationships in the input features and performs well on the classification task, with the F1-score serving as the primary evaluation metric.

In [28]:
import numpy as np
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score, classification_report
from sklearn.preprocessing import MinMaxScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import SimpleRNN, Dense, Dropout
from tensorflow.keras.optimizers import Adam

data = load_breast_cancer()
X, y = data.data, data.target

scaler = MinMaxScaler()
X = scaler.fit_transform(X)

X_padded = np.zeros((X.shape[0], 10 * 3))
X_padded[:, :X.shape[1]] = X
X_rnn = X_padded.reshape(-1, 10, 3)

X_train, X_test, y_train, y_test = train_test_split(X_rnn, y, test_size=0.2, random_state=42)

model = Sequential([
    SimpleRNN(64, activation='tanh', return_sequences=True, input_shape=(10, 3)),
    Dropout(0.3),
    SimpleRNN(64, activation='tanh', return_sequences=True),
    Dropout(0.3),
    SimpleRNN(64, activation='tanh'),
    Dropout(0.3),
    Dense(1, activation='sigmoid')
])

model.compile(optimizer=Adam(0.001), loss='binary_crossentropy', metrics=['accuracy'])

model.fit(X_train, y_train, epochs=40, batch_size=32, validation_split=0.2, verbose=0)

y_pred_prob = model.predict(X_test)
y_pred = (y_pred_prob > 0.5).astype(int)

f1 = f1_score(y_test, y_pred)
print(f"F1-score: {f1:.2f}")
print(classification_report(y_test, y_pred))


  super().__init__(**kwargs)


[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 119ms/step
F1-score: 0.96
              precision    recall  f1-score   support

           0       0.93      0.95      0.94        43
           1       0.97      0.96      0.96        71

    accuracy                           0.96       114
   macro avg       0.95      0.96      0.95       114
weighted avg       0.96      0.96      0.96       114



# regression
# Regression with RNN on California Housing Dataset
This model applies a Recurrent Neural Network (RNN) to perform regression on the California Housing dataset, aiming to predict continuous housing prices.

It works by:

Preprocessing:

The features are normalized using MinMaxScaler, ensuring all input values are scaled between 0 and 1.

The data is reshaped to fit the RNN input format, with 4 timesteps and 2 features per timestep, filling extra space with zeros to ensure uniform input size.

RNN Architecture:

The model consists of three SimpleRNN layers, each with 64 units and ReLU activation.

Dropout layers are added to prevent overfitting by randomly setting a fraction of input units to zero during training.

The final Dense layer provides the output with no activation function, as it is a regression task predicting continuous values.

Training:

The model is compiled with the Adam optimizer and mean squared error (MSE) loss function, as MSE is a standard loss function for regression tasks.

The model is trained for 50 epochs with a validation split of 0.2 to assess its performance on unseen data.

Evaluation:

After training, predictions are made on the test set, and the performance is evaluated using R² score and mean squared error (MSE).

The R² score indicates how well the model's predictions explain the variance in the target values (housing prices), while MSE measures the average squared difference between predicted and actual values.

This RNN model can capture temporal dependencies in the data, although its primary advantage in this context might be in handling sequential patterns that could emerge in time-based data, even though the California Housing dataset is typically non-sequential.

In [29]:
from sklearn.datasets import fetch_california_housing
from sklearn.metrics import r2_score, mean_squared_error

housing = fetch_california_housing()
X, y = housing.data, housing.target

X = MinMaxScaler().fit_transform(X)

X_padded = np.zeros((X.shape[0], 8))
X_padded[:, :X.shape[1]] = X
X_rnn = X_padded.reshape(-1, 4, 2)

X_train, X_test, y_train, y_test = train_test_split(X_rnn, y, test_size=0.2, random_state=42)

reg_model = Sequential([
    SimpleRNN(64, activation='relu', return_sequences=True, input_shape=(4, 2)),
    Dropout(0.3),
    SimpleRNN(64, activation='relu', return_sequences=True),
    Dropout(0.3),
    SimpleRNN(64, activation='relu'),
    Dropout(0.3),
    Dense(1)
])

reg_model.compile(optimizer=Adam(0.001), loss='mse', metrics=['mae'])

reg_model.fit(X_train, y_train, epochs=50, batch_size=32, validation_split=0.2, verbose=0)

y_pred = reg_model.predict(X_test)
r2 = r2_score(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)
print(f"R2-score: {r2:.2f}")
print(f"MSE: {mse:.2f}")

  super().__init__(**kwargs)


[1m129/129[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step
R2-score: 0.75
MSE: 0.33


# 10 Points: Explain why neural networks are so powerful and what the diffcult part is in designing neural networks.

## Why Neural Networks Are Powerful

1. **Universal Function Approximation**  
   Neural networks can approximate any continuous function given sufficient neurons and layers (Universal Approximation Theorem).

2. **Representation Learning**  
   They automatically learn useful features from raw data, reducing the need for manual feature engineering. This is essential in tasks like image classification, natural language processing, and speech recognition.

3. **Scalability with Data and Compute**  
   Neural networks generally perform better with more data and compute resources, which allows them to scale well with modern hardware and large datasets.

4. **Flexibility Across Domains**  
   Neural networks are adaptable to a wide range of tasks and data types, such as images, text, audio, and structured data. Architectures like CNNs, RNNs, and transformers are applicable across many domains.

5. **End-to-End Learning**  
   They enable end-to-end training, mapping raw inputs directly to outputs without requiring intermediate handcrafted processing steps.

---

## Difficulties in Designing Neural Networks

1. **Architecture Design**  
   Choosing the right model architecture—number of layers, types of layers (convolutional, recurrent, attention), and connections—is a complex and often domain-specific task.

2. **Training Stability and Optimization**  
   Neural networks involve non-convex optimization, which can be challenging due to issues like vanishing/exploding gradients, saddle points, and poor local minima.

3. **Hyperparameter Tuning**  
   Parameters such as learning rate, batch size, dropout rate, and weight decay have a significant impact on performance and require careful tuning, often through trial and error or automated search.

4. **Overfitting and Generalization**  
   Neural networks can easily overfit the training data due to their high capacity. Regularization techniques like dropout, weight decay, and data augmentation are necessary to improve generalization.

5. **Data Requirements**  
   Effective training usually requires large amounts of labeled data. In domains with limited labeled data, performance may suffer unless techniques like transfer learning or data augmentation are used.

6. **Interpretability**  
   Neural networks are often considered "black boxes" because understanding how and why they make certain decisions is difficult, which poses challenges in critical applications like healthcare, law, and finance.
