[TOC]
- Approximation error
- decreases when
$\mathcal{F}$ increases - but typically bounded by computational constraints
- decreases when
- Estimation error
- decreases when
$n$ increases - Can increase when
$\mathcal{F}$ increase (accoring to VC theory)
- decreases when
- Optimization error
- can increase when tolerance
$\rho$ increases - can increase when
$\mathcal{F}$ gets more complex (non-convex obj)
- can increase when tolerance
Overfitting
Adding hidden units:
- does not lead to more overfitting
- make optimization easier (less steps)
- make computation slower
Adding layers:
- does not cause much more overfitting either
- can cause optimization issues (underfitting)
- DL optimization is non-convex but bad local minima and saddle structures are rarely a problem
- A stronger optimizer is not necessarily a stronger learner
- Neural Networks are over-parameterized but can still generalize
- We need more theoty to guide the design of architecture and optimizers that make learning faster with fewer labels