# Exercice 1 - PointMLP (ModelNet40_PLY)

## 1) Architecture check against the PDF instructions

| Requirement from Exercise 1 | In `Code/pointnet.py` | Status |
|---|---|---|
| Flatten point cloud `1024 x 3 -> 3072` | `x = input.reshape(input.size(0), -1)` in `PointMLP.forward` | OK |
| First layer `MLP(3072, 512)` | `self.fc1 = nn.Linear(3072, 512)` | OK |
| Second layer `MLP(512, 256)` | `self.fc2 = nn.Linear(512, 256)` | OK |
| Dropout `p = 0.3` on second stage | `self.dropout = nn.Dropout(0.3)` | OK |
| Last layer `MLP(256, N_classes)` | `self.fc3 = nn.Linear(256, classes)` | OK |
| BatchNorm + ReLU on hidden layers | `bn1/bn2` + `F.relu` after `fc1/fc2` | OK |
| `LogSoftmax` output for class scores | `self.log_softmax = nn.LogSoftmax(dim=1)` | OK |

Note: there is no BatchNorm/ReLU after the final classification layer, which is the standard design before `LogSoftmax`.

## 2) PointMLP architecture used

![architecture MLP](figures/architecture_mlp.png)

## 3) Training curves analysis (250 epochs)

![training curves](figures/training_curves.png)

- Loss decreases quickly in the first 20-40 epochs, then reaches a plateau.
- End of training from the curves: train loss ~2.45, test loss ~2.55.
- Accuracy also saturates early (around epoch 60-80).
- End of training from the curves: train accuracy ~27-28%, test accuracy ~22-23%.
- Generalization gap is around 5 points, but both accuracies stay low: this indicates limited representational power for this task.

## 4) Final metrics measured on the saved model (`pointmlp_modelnet40.pth`)

Evaluation done with deterministic preprocessing (no random rotation/noise/shuffle during evaluation):

| Split | N samples | NLL loss | Accuracy |
|---|---:|---:|---:|
| Train | 9843 | 2.3570 | 29.27% |
| Test | 2468 | 2.5317 | 22.97% |

For reference, random guessing on ModelNet40 is `1/40 = 2.5%`, so the model learns something but remains far from good classification performance.

## 4bis) Evaluation protocol note

In the current training script (`Code/pointnet.py`), `test_ds` is created with default transforms, which include random rotation/noise/shuffle. For strict benchmarking, test preprocessing should usually be deterministic (`ToTensor` only).
This is why we report both: (1) the training-curve trend, and (2) deterministic final metrics from the saved model.

## 5) Answer to Exercise 1 questions

1. **Test accuracy (ModelNet40_PLY, PointMLP):** **22.97%**.
2. **Comment:** the network underfits. A fully connected MLP on flattened coordinates does not model geometric structure well (local neighborhoods, shape composition, permutation invariance), so performance saturates quickly at a low level.

## 6) Why results are limited

- Flattening destroys explicit 3D locality.
- The model is sensitive to point ordering, while point clouds are unordered sets.
- With random point shuffling, the same shape can appear in different flattened orders, which is especially difficult for a plain MLP.


# Exercice 2 - PointNetBasic (ModelNet40_PLY)

## 1) Implemented network (basic PointNet, no T-Net)

Shared MLP with `Conv1d(kernel_size=1)` layers, global max pooling, and final MLP classifier with dropout `p=0.3`:

![PointNetBasic architecture](figures/architecture_pointnetbasic.png)

## 2) Training setup and early stopping

- Device: CUDA (`cuda:0`)
- Requested max epochs: 250
- Early stopping: patience = 30 (on validation/test loss)
- Stop epoch: 70
- Best epoch restored: 40 (best validation/test loss = 0.538)
- Train split: 9843 samples
- Test split: 2468 samples
- Validation/test preprocessing: deterministic (`ToTensor` only)

## 2bis) Early stopping design

Early stopping was added to control overfitting and avoid unnecessary epochs. The design is:

- Monitored metric: validation/test NLL loss (`avg_test_loss`) at each epoch.
- Improvement rule: an epoch is considered better only if `avg_test_loss < best_test_loss - min_delta` (here `min_delta = 0.0`).
- Patience: if no improvement is observed for 30 consecutive epochs, training stops.
- Best checkpoint handling: whenever validation loss improves, model weights are copied; at the end, these best weights are restored before saving.
- Practical effect in this run: best epoch was 40 (loss 0.538), training stopped at epoch 70, and the final saved model corresponds to epoch 40 rather than epoch 70.

This keeps the training objective simple (same optimizer/scheduler) while ensuring the exported model is the one with best validation generalization.

Training curves:

![PointNetBasic training curves](figures/pointnetbasic_training_curves.png)

## 3) Final quantitative comparison (deterministic evaluation)

| Model | Train Acc | Test Acc | Train NLL | Test NLL |
|---|---:|---:|---:|---:|
| PointMLP (saved model) | 29.27% | 22.97% | 2.3570 | 2.5317 |
| PointNetBasic (best early-stop model) | 93.15% | 85.17% | 0.1937 | 0.5264 |

Global metric comparison:

![MLP vs PointNetBasic](figures/comparison_mlp_vs_pointnetbasic.png)

## 4) Confusion matrices and class-wise differences

PointNetBasic confusion matrix (test):

![PointNetBasic confusion](figures/confusion_pointnetbasic_test.png)

PointMLP confusion matrix (test):

![PointMLP confusion](figures/confusion_pointmlp_test.png)

Largest per-class accuracy deltas (PointNetBasic - PointMLP):

![Per-class delta](figures/per_class_delta_pointnetbasic_minus_mlp.png)

## 5) Comment (Exercise 2 questions)

1. **Test accuracy with basic PointNet:** **85.17%**.
2. **Comparison to PointMLP:** PointNetBasic is much better because it extracts shared point-wise features and aggregates them with a permutation-invariant max pooling, which is much more adapted to point clouds than flattening points into one vector.

## 6) PointNet with input T-Net (Exercise 2 - second part)

For the next step, we trained `PointNetFull` with the first T-Net (`3x3` alignment matrix) and the same training policy (max 250 epochs + early stopping).

- Device: CUDA (`cuda:0`)
- Early stopping: triggered at epoch 91
- Best epoch restored: 61 (best validation/test loss = 0.512)

PointNetFull architecture:

![PointNetFull architecture](figures/architecture_pointnetfull.png)

PointNetFull training curves:

![PointNetFull training curves](figures/pointnetfull_training_curves.png)

### Quantitative results (deterministic evaluation)

| Model | Train Acc | Test Acc | Train NLL | Test NLL |
|---|---:|---:|---:|---:|
| PointMLP | 29.27% | 22.97% | 2.3570 | 2.5317 |
| PointNetBasic | 93.15% | 85.17% | 0.1937 | 0.5264 |
| PointNetFull (with T-Net) | 95.36% | 86.87% | 0.1289 | 0.4889 |

Comparison plot (MLP vs Basic vs Full):

![All model comparison](figures/comparison_mlp_basic_full.png)

PointNetFull confusion matrix (test):

![PointNetFull confusion](figures/confusion_pointnetfull_test.png)

Largest per-class deltas (PointNetFull - PointNetBasic):

![Per-class delta full-basic](figures/per_class_delta_pointnetfull_minus_basic.png)

### Comment

Adding the input T-Net improves the basic PointNet result from **85.17%** to **86.87%** on ModelNet40 test (+**1.70** points). This is consistent with the role of T-Net: learning a canonical alignment of input point clouds before feature extraction.


# Exercice 3 - Detailed explanation of the 3 added data augmentations

We denote one point cloud by:

$$\mathcal{P}=\{\mathbf{p}_i\}_{i=1}^{N},\quad \mathbf{p}_i\in\mathbb{R}^3$$

with fixed point count $N=1024$ in our setup.

The 3 methods added in code are:
- `RandomScaleShift`
- `RandomPointDropout`
- `RandomLocalPatchDropout`

They are defined in `Code/pointnet.py` and were copied from the validated protocol in `05_workspaces/student_workspace/scripts/pointnet_train.py`.

## 1) RandomScaleShift (course/classical geometric augmentation)

### Formula
A global isotropic scale and a global translation are sampled:

$$s\sim\mathcal{U}(s_{\min}, s_{\max}),\qquad \mathbf{t}\sim\mathcal{U}([-r,r]^3)$$

and applied to each point:

$$\mathbf{p}_i' = s\,\mathbf{p}_i + \mathbf{t},\quad i=1,\dots,N$$

In our implementation:
- $s_{\min}=0.8$, $s_{\max}=1.25$
- $r=0.1$

### Intuition (simple words)
The same object can be seen slightly bigger/smaller or shifted in sensor coordinates. This augmentation teaches the model to focus on shape rather than absolute size/position.

### Source and justification
- Primary source used in workspace notes: PointNet++ official provider (`random_scale_point_cloud`, `shift_point_cloud`):
  https://raw.githubusercontent.com/charlesq34/pointnet2/master/utils/provider.py
- Also aligned with course ideas about geometric nuisance robustness (L03 point cloud processing).

## 2) RandomPointDropout (custom, course-motivated)

### Formula
A dropout ratio is sampled:

$$\rho\sim\mathcal{U}(0,\rho_{\max})$$

Then each point is independently dropped with probability $\rho$.
Using a Bernoulli mask $m_i\sim\mathrm{Bernoulli}(\rho)$, implementation is:

$$\mathbf{p}_i'=(1-m_i)\mathbf{p}_i + m_i\mathbf{p}_1$$

So dropped points are replaced by the first point to keep the tensor size fixed.

### Intuition (simple words)
Real sensors often miss returns (sparse capture, self-occlusion, reflective surfaces). This simulates that by removing random measurements.

### Source and justification
- Documented in: `05_workspaces/student_workspace/scripts/AUGMENTATION_REFERENCES.md`
- Framed as a custom augmentation in the project protocol, motivated by course L01 (sensor limitations / missing measurements).
- This style is also common in PointNet-family training practice.

## 3) RandomLocalPatchDropout (custom, paper-inspired)

### Formula
Let $\alpha\in(0,1)$ be the local drop ratio (`drop_ratio`).

1. Choose a random pivot index $c\in\{1,\dots,N\}$.
2. Compute distances to pivot:

$$d_i = \lVert \mathbf{p}_i-\mathbf{p}_c\rVert_2^2$$

3. Drop the $K=\lfloor\alpha N\rfloor$ nearest points (local patch):

$$\mathcal{D}=\operatorname{arg\,topK}_{\text{smallest}}(d_i)$$

4. Keep set is $\mathcal{K}=\{1,\dots,N\}\setminus\mathcal{D}$, then refill dropped slots by sampling from kept points to preserve size $N$.

### Intuition (simple words)
Instead of random isolated missing points, this removes a coherent local region (like a part hidden by occlusion), which is often more realistic in 3D scenes.

### Source and justification
- Project note says this is inspired by **PointCutMix**:
  https://arxiv.org/abs/2101.01461
- The implementation is a simplified local-region removal strategy, consistent with the paper spirit (structured local perturbation).
- Also coherent with L01 (occlusion/sensing effects).

## Why these 3 are complementary

- `RandomScaleShift`: global geometric perturbation (pose/scale nuisance).
- `RandomPointDropout`: unstructured sparsity/noise in acquisition.
- `RandomLocalPatchDropout`: structured local occlusion.

So they stress the model at different levels (global transform, random missing samples, local missing region), which generally improves robustness and generalization.

## Reference mapping used in this report

From `05_workspaces`:
- `student_workspace/scripts/AUGMENTATION_REFERENCES.md`
- `student_workspace/scripts/pointnet_train.py` (augmentation class comments)
- `agent_workspace/docs/AGENT_PROJECT_BRIEF.md` (L01/L02/L03 course alignment guidance)
