# Deep Learning Comparative Experiments Report (Iris Classification)

## 1. Objective
This report summarizes four comparative experiments on the Iris dataset to understand how the following factors affect model performance:

1) **Number of layers (depth)**: deeper vs shallower models  
2) **Number of nodes (width)**: more units vs fewer units  
3) **Activation function**: ReLU vs Sigmoid  
4) **Number of epochs**: fewer vs more training epochs  

Performance is evaluated using **test loss** and **test accuracy**.


## 2. Experimental Setup

### 2.1 Dataset & Split
- Dataset: **Iris (4 input features, 3 classes)**
- Split: `train_test_split(test_size=0.4, random_state=1)`
- Labels: integer class labels → loss uses **sparse categorical cross-entropy**

### 2.2 Training & Evaluation
- Optimizer: **Adam**
- Loss: **sparse_categorical_crossentropy**
- Metric: **accuracy**
- Evaluation: `model.evaluate(X_test, y_test)` → report `test_loss`, `test_acc`

### 2.3 Why We Re-Structured the Models as Serial (Stacked) Networks
- To make the comparisons **interpretable and fair**, we converted the earlier parallel/branched design into a **serial (stacked)** architecture.
- In a branched model, changing the number of Dense layers often changes **both** depth and width (e.g., branch count and concatenated feature size), making “layer-count” comparisons ambiguous.
- With a serial design, we can vary **one factor at a time**—**depth (number of layers)**, **width (nodes)**, **activation**, or **epochs**—and attribute performance differences to that factor more confidently.

In [1]:
from keras import Model
from keras.layers import Input, Dense
from sklearn import datasets
from sklearn.model_selection import train_test_split

  if not hasattr(np, "object"):


#### Exp 1. Large layers, nodes and epochs with Relu (LLLNLER)

In [2]:
input = Input(shape=(4,))

x = Dense(50, activation='relu')(input)
x = Dense(80, activation='relu')(x)
x = Dense(30, activation='relu')(x)
output = Dense(3, activation='softmax')(x)
model = Model(inputs=input, outputs=output)
model.summary()

model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
iris = datasets.load_iris()
X, y = iris.data, iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=1)
model.fit(X_train, y_train, epochs=200)
loss, acc = model.evaluate(X_test, y_test, verbose=0)
print(f'test_loss: {loss}\ntest_acc: {acc}')

Epoch 1/200
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 14ms/step - accuracy: 0.3667 - loss: 1.0399
Epoch 2/200
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step - accuracy: 0.6778 - loss: 0.9177 
Epoch 3/200
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.6778 - loss: 0.8386 
Epoch 4/200
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.6778 - loss: 0.7889 
Epoch 5/200
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.6778 - loss: 0.7435 
Epoch 6/200
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.7111 - loss: 0.6943 
Epoch 7/200
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - accuracy: 0.9000 - loss: 0.6565 
Epoch 8/200
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 8ms/step - accuracy: 0.9111 - loss: 0.6186 
Epoch 9/200
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[3

## Exp 2. Small layers, large nodes and epochs with ReLU (SLLNLER)

In [3]:
input = Input(shape=(4,))

x = Dense(160, activation='relu')(input)
output = Dense(3, activation='softmax')(x)
model = Model(inputs=input, outputs=output)
model.summary()

model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
iris = datasets.load_iris()
X, y = iris.data, iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=1)
model.fit(X_train, y_train, epochs=200)
loss, acc = model.evaluate(X_test, y_test, verbose=0)
print(f'test_loss: {loss}\ntest_acc: {acc}')

Epoch 1/200
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 12ms/step - accuracy: 0.3556 - loss: 1.0151 
Epoch 2/200
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.6222 - loss: 0.9355 
Epoch 3/200
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0s/step - accuracy: 0.6778 - loss: 0.8891  
Epoch 4/200
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 8ms/step - accuracy: 0.6778 - loss: 0.8388
Epoch 5/200
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 8ms/step - accuracy: 0.6778 - loss: 0.7879 
Epoch 6/200
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step - accuracy: 0.6889 - loss: 0.7515 
Epoch 7/200
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 14ms/step - accuracy: 0.8000 - loss: 0.7137
Epoch 8/200
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 8ms/step - accuracy: 0.7556 - loss: 0.6837
Epoch 9/200
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37

#### Exp 3. Large layers, small nodes and large epochs with ReLU (LLSNLER)

In [4]:
input = Input(shape=(4,))

x = Dense(5, activation='relu')(input)
x = Dense(8, activation='relu')(x) 
x = Dense(3, activation='relu')(x)
output = Dense(3, activation='softmax')(x)
model = Model(inputs=input, outputs=output)

model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
iris = datasets.load_iris()
X, y = iris.data, iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=1)
model.fit(X_train, y_train, epochs=200)
loss, acc = model.evaluate(X_test, y_test, verbose=0)
print(f'test_loss: {loss}\ntest_acc: {acc}')

Epoch 1/200
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step - accuracy: 0.3444 - loss: 1.1436  
Epoch 2/200
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 13ms/step - accuracy: 0.5222 - loss: 1.0620
Epoch 3/200
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.5667 - loss: 1.0139 
Epoch 4/200
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.5667 - loss: 0.9910 
Epoch 5/200
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.5444 - loss: 0.9830 
Epoch 6/200
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 9ms/step - accuracy: 0.4000 - loss: 0.9790 
Epoch 7/200
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.3889 - loss: 0.9757 
Epoch 8/200
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0s/step - accuracy: 0.3889 - loss: 0.9693  
Epoch 9/200
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[

#### Exp 4. Large layers, nodes and small epochs with ReLU (LLLNSER)

In [5]:
input = Input(shape=(4,))

x = Dense(50, activation='relu')(input)
x = Dense(80, activation='relu')(x)
x = Dense(30, activation='relu')(x)
output = Dense(3, activation='softmax')(x)
model = Model(inputs=input, outputs=output)
model.summary()

model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
iris = datasets.load_iris()
X, y = iris.data, iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=1)
model.fit(X_train, y_train, epochs=20)
loss, acc = model.evaluate(X_test, y_test, verbose=0)
print(f'test_loss: {loss}\ntest_acc: {acc}')

Epoch 1/20
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 8ms/step - accuracy: 0.3333 - loss: 1.1447  
Epoch 2/20
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 995us/step - accuracy: 0.3444 - loss: 1.0743
Epoch 3/20
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 11ms/step - accuracy: 0.3333 - loss: 1.0309
Epoch 4/20
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.3333 - loss: 0.9775 
Epoch 5/20
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.5556 - loss: 0.9277 
Epoch 6/20
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.8889 - loss: 0.8843 
Epoch 7/20
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 9ms/step - accuracy: 0.9222 - loss: 0.8419 
Epoch 8/20
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 10ms/step - accuracy: 0.9000 - loss: 0.8018
Epoch 9/20
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m 

#### Exp 5. Large layers, nodes and epochs with Sigmoid (LLLNLES)

In [6]:
input = Input(shape=(4,))

x = Dense(50, activation='sigmoid')(input)
x = Dense(80, activation='sigmoid')(x)
x = Dense(30, activation='sigmoid')(x)
output = Dense(3, activation='softmax')(x)
model = Model(inputs=input, outputs=output)
model.summary()

model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
iris = datasets.load_iris()
X, y = iris.data, iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=1)
model.fit(X_train, y_train, epochs=200)
loss, acc = model.evaluate(X_test, y_test, verbose=0)
print(f'test_loss: {loss}\ntest_acc: {acc}')

Epoch 1/200
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 14ms/step - accuracy: 0.3444 - loss: 1.1646 
Epoch 2/200
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 12ms/step - accuracy: 0.3444 - loss: 1.1301
Epoch 3/200
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 9ms/step - accuracy: 0.3444 - loss: 1.1069
Epoch 4/200
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.3444 - loss: 1.0952 
Epoch 5/200
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 8ms/step - accuracy: 0.4889 - loss: 1.0914
Epoch 6/200
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 8ms/step - accuracy: 0.3333 - loss: 1.0910 
Epoch 7/200
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.3333 - loss: 1.0903 
Epoch 8/200
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 0.3333 - loss: 1.0880 
Epoch 9/200
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37

## 3. Definition Decision for “Nodes” in Experiment 2 (Key Design Choice)

### 3.1 Problem
Experiment 1 uses a **deep, serial** architecture with multiple hidden layers:
- Hidden units: **50 → 80 → 30**  
This makes “node count” ambiguous when comparing to a **single hidden-layer** model:
- Should we match per-layer units?  
- Total hidden units?  
- Total trainable parameters?

### 3.2 Decision
For **Experiment 2**, we defined “nodes” as the **total number of hidden units across all hidden layers** in Experiment 1:

- Exp1 total hidden units = `50 + 80 + 30 = 160`

Therefore, Experiment 2 uses a **single hidden layer with 160 units**.

### 3.3 Rationale
- The goal of Exp1 vs Exp2 is primarily a **depth comparison** (more layers vs fewer layers).
- Matching the **sum of hidden units** provides a simple and interpretable way to keep the model’s “capacity” roughly comparable without exploding the model size.
- Matching **trainable parameters** exactly would require a very large single layer (≈ 856 units), which is disproportionately large for Iris and could distort the comparison.

## 4. Experiments and Results

### 4.1 Reported Test Results
| Exp | Comparison Focus | Model Summary (High-level) | test_loss | test_acc |
|---:|---|---|---:|---:|
| 1 | Baseline large/deep model | Serial hidden: 50→80→30, **ReLU**, epochs=200 | 0.0445 | 1.0000 |
| 2 | Depth baseline (shallow) | Serial hidden: **160**, ReLU, epochs=200 | 0.1113 | 0.9833 |
| 3 | Nodes fewer vs more | Smaller-capacity model (fewer units), ReLU | 0.4218 | 0.9000 |
| 4 | Epochs fewer vs more | Same structure class as Exp2, **fewer epochs** | 0.3310 | 0.9833 |
| 5 | Activation comparison | Serial hidden: 50→80→30, **Sigmoid**, epochs=200 | 0.0566 | 1.0000 |

## 5. Analysis by Research Question

### 5.1 More layers vs fewer layers (Depth) — Exp1 vs Exp2
- **Exp1 (deep)**: loss **0.0445**, acc **1.0000**
- **Exp2 (shallow)**: loss **0.1113**, acc **0.9833**

**Interpretation**
- The deeper model achieved **higher test accuracy** and notably **lower test loss**.
- Even when accuracy is high in both, the much lower loss in Exp1 suggests **better-calibrated / more confident correct predictions**.

**Conclusion**
- In this setup, **increasing depth improved test performance**, especially in terms of test loss.

### 5.2 More nodes vs fewer nodes (Width) — Exp2 vs Exp3
- **Exp2 (more nodes)**: loss **0.1113**, acc **0.9833**
- **Exp3 (fewer nodes)**: loss **0.4218**, acc **0.9000**

**Interpretation**
- The smaller model likely suffered from **underfitting** (insufficient capacity), reflected in both lower accuracy and significantly higher loss.

**Conclusion**
- With Iris, reducing units too much can noticeably harm performance; **more nodes performed better** in this comparison.

### 5.3 ReLU vs Sigmoid (Activation) — Exp1 vs Exp5
- **Exp1 (ReLU)**: loss **0.0445**, acc **1.0000**
- **Exp5 (Sigmoid)**: loss **0.0566**, acc **1.0000**

**Interpretation**
- Accuracy is identical (perfect), but ReLU yields a **lower loss**, indicating improved probability quality / confidence alignment.
- Sigmoid can saturate (small gradients) more easily, but Iris is simple enough that both activations can reach high accuracy.

**Conclusion**
- **Accuracy tie**, but **ReLU slightly better in loss**.

### 5.4 Fewer epochs vs more epochs — Exp4 vs Exp2
- **Exp4 (fewer epochs)**: loss **0.3310**, acc **0.9833**
- **Exp2 (more epochs)**: loss **0.1113**, acc **0.9833**

**Interpretation**
- Accuracy stayed the same, but loss improved dramatically with more epochs.
- This suggests that additional training produced **more stable decision boundaries and/or better-calibrated probabilities** even if the final predicted classes were similar.

**Conclusion**
- Increasing epochs significantly improved **test loss**, while accuracy remained unchanged in this case.


## 6. Overall Conclusions
1. **Depth matters**: the deeper serial model (Exp1) achieved the best overall test loss and perfect accuracy.
2. **Width matters (to a point)**: too few nodes (Exp3) degraded performance strongly.
3. **Activation function**: ReLU and Sigmoid both reached perfect accuracy in the deep model, but **ReLU had better test loss**.
4. **Epochs**: more epochs reduced loss significantly even when accuracy stayed the same.