Skip to content

Latest commit

 

History

History
executable file
·
61 lines (29 loc) · 1.28 KB

08_Expressivity_Optimizaiton_and_Generalization.md

File metadata and controls

executable file
·
61 lines (29 loc) · 1.28 KB

8. Expressivity, Optimizaiton and Generalization

[TOC]

Decomposition of the Error

$\mathcal{F}$ : hypothesis calss, and $f_{\mathcal{F}} ^{\star} \in \mathcal{F}$ is the best hypothesis

  • Approximation error
    • decreases when $\mathcal{F}$ increases
    • but typically bounded by computational constraints
  • Estimation error
    • decreases when $n$ increases
    • Can increase when $\mathcal{F}$ increase (accoring to VC theory)
  • Optimization error
    • can increase when tolerance $\rho$ increases
    • can increase when $\mathcal{F}$ gets more complex (non-convex obj)

Expressivity and Universal Function Approximation

Optimization for Deep Networks

Generalization

Overfitting

Adding hidden units:

  • does not lead to more overfitting
  • make optimization easier (less steps)
  • make computation slower

Adding layers:

  • does not cause much more overfitting either
  • can cause optimization issues (underfitting)

Conclusions

  • DL optimization is non-convex but bad local minima and saddle structures are rarely a problem
  • A stronger optimizer is not necessarily a stronger learner
  • Neural Networks are over-parameterized but can still generalize
  • We need more theoty to guide the design of architecture and optimizers that make learning faster with fewer labels