Skip to content
Permalink
Browse files

Updated markdown in notebooks

  • Loading branch information...
alabatie committed May 13, 2019
1 parent 39b927b commit 2c42f0d290c2d70ef3b7ae1ff8fe7af520b52263
@@ -4,18 +4,18 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Fully-connected networks\n",
"# Fully-Connected Networks\n",
"\n",
"This notebook provides results equivalent to Fig. $2$, $3$, $4$, $5$ for fully-connected networks. All parameters are left unchanged apart from the parameters `batch_size` and `kernel_size`. The fully-connected behaviour is simply enforced by setting `kernel_size = 1` with the effect of flattening the original images and reducing the spatial extent to $n=1$.\n",
"This notebook considers experiments equivalent to Fig. $2$, $3$, $4$, $5$ for fully-connected networks. All parameters are left unchanged apart from the parameters `batch_size` and `kernel_size`. The fully-connected behaviour is simply enforced by setting `kernel_size = 1` with the effect of flattening the original images and reducing the spatial extent to $n=1$.\n",
"\n",
"For the experiments of Fig. $3$, $4$, $5$, the number of realizations is reduced to `num_realizations = 200` for reasons of computing time. This has the effect of making the curves slightly more noisy, but it is already enough to gain insights."
"For the experiments of Fig. $3$, $4$, $5$, the number of realizations is reduced to `num_realizations = 200`. This has the effect of making the curves slightly more noisy, but it is already enough to gain insights."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Moments of vanilla nets\n",
"## Moments of Vanilla Nets\n",
"\n",
"The results here are equivalent to Fig. $2$ with the only difference that $\\log \\mu_2(\\mathrm{d}\\mathbf{x}^l)$ diffuses faster. "
]
@@ -29,7 +29,7 @@
"from run_experiment import run_experiment\n",
"from manage_experiments import prune_experiment\n",
"\n",
"# this computation ran in the cloud, but it is left here to show the function call\n",
"# this experiment ran in the cloud, but it is left here to show the function call\n",
"run_experiment(architecture='vanilla', total_depth=200, kernel_size=1, num_channels=128, \n",
" dataset='cifar10', # boundary conditions are not relevant for fully-connected networks\n",
" batch_size=1024, num_realizations=10000, name_experiment='vanilla_histo_FC', \n",
@@ -71,9 +71,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Evolution of vanilla nets\n",
"## Evolution of Vanilla Nets\n",
"\n",
"The evolution of fully-connected vanilla nets is equivalent to the evolution of Fig. $3$, with the convergence to $\\delta \\chi^l \\to 1$ and the convergence to the pathology of one-dimensional signal: $r_\\text{eff}(\\mathbf{x}^l) \\to 1$. "
"The evolution of fully-connected vanilla nets is equivalent to the evolution of Fig. $3$ with the convergence to $\\delta \\chi^l \\to 1$ and the convergence to the pathology of one-dimensional signal: $r_\\text{eff}(\\mathbf{x}^l) \\to 1$. "
]
},
{
@@ -85,7 +85,7 @@
"from run_experiment import run_experiment\n",
"from manage_experiments import prune_experiment\n",
"\n",
"# this computation ran in the cloud, but it is left here to show the function call\n",
"# this experiment ran in the cloud, but it is left here to show the function call\n",
"run_experiment(architecture='vanilla', total_depth=200, kernel_size=1, num_channels=512, \n",
" dataset='cifar10', # boundary conditions are not relevant for fully-connected networks\n",
" batch_size=64, num_realizations=200, name_experiment='vanilla_fc', \n",
@@ -124,13 +124,13 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Evolution of batch-normalized feedforward nets\n",
"## Evolution of Batch-Normalized Feedforward Nets\n",
"\n",
"*As in Fig. $4$, fully-connected batch-normalized feedforward nets are subject to the pathology of exploding sensitivity. However, there is an earlier saturation of the pathologies in the signal, with relatively large $r_\\text{eff}(\\mathbf{x}^l)$ and low $\\mu_4(\\mathbf{z}^l)$.* \n",
"\n",
"The reason is that rare events cannot be arbitrarily rare for a distribution comprised of a finite number of point masses, $\\frac{1}{M} \\sum^M_{i=1} \\delta_{p_i}$. Indeed it is shown in the paper that the kurtosis of $M$ batch-normalized point masses is roughly bounded by $M$. Since different spatial positions $\\alpha$ count as different samples for convolutional networks, a batch of 64 images of size $16 \\times 16$ for convolutional networks is equivalent to a batch of $16,384$ samples for fully-connected networks.\n",
"The reason is that rare events cannot be arbitrarily rare for a distribution comprised of a finite number of point masses, $\\frac{1}{M} \\sum^M_{i=1} \\delta_{p_i}$. Indeed it is shown in the paper that the kurtosis of $M$ point masses is roughly bounded by $M$. Since different spatial positions $\\alpha$ count as different samples for convolutional networks, a batch of 64 images of size $16 \\times 16$ for convolutional networks is equivalent to a batch of $16,384$ samples for fully-connected networks.\n",
"\n",
"Smaller batch sizes leads to larger effect of the nonlinearity: $\\delta^{}_\\phi \\chi^l$, and smaller effect of batch normalization: $\\delta^{}_\\text{BN} \\chi^l$. Interestingly, this also leads to the noise geting ill-conditioned with lower $r_\\text{eff}(\\mathbf{x}^l)$."
"Smaller batch size leads to larger $\\delta^{}_\\phi \\chi^l$ and smaller $\\delta^{}_\\text{BN} \\chi^l$. Interestingly, this also leads to the noise geting ill-conditioned with lower $r_\\text{eff}(\\mathrm{d}\\mathbf{x}^l)$."
]
},
{
@@ -142,7 +142,7 @@
"from run_experiment import run_experiment\n",
"from manage_experiments import prune_experiment\n",
"\n",
"# these computations ran in the cloud, but they are left here to show the function calls\n",
"# these experiments ran in the cloud, but they are left here to show the function calls\n",
"run_experiment(architecture='bn_ff', total_depth=200, kernel_size=1, num_channels=512, \n",
" dataset='cifar10', # boundary conditions are not relevant for fully-connected networks\n",
" batch_size=64, num_realizations=200, name_experiment='bn_ff_fc_64', \n",
"cell_type": "markdown",
"metadata": {},
"source": [
"## Evolution of batch-normalized resnets\n",
"## Evolution of Batch-Normalized Resnets\n",
"\n",
"*The evolution of batch-normalized resnets is the slowed down version of the evolution of batch-normalized feedforward nets*, with perfect power-law fit of $\\chi^l$ and the subplots (a), (c), (d) very similar to the subplots (a), (c), (d) for batch-normalized feedforward nets."
]
"from run_experiment import run_experiment\n",
"from manage_experiments import prune_experiment\n",
"\n",
"# this computation ran in the cloud, but it is left here to show the function call\n",
"# this experiment ran in the cloud, but it is left here to show the function call\n",
"run_experiment(architecture='bn_res', total_depth=500, kernel_size=1, num_channels=512, \n",
" dataset='cifar10', res_depth=2, # boundary conditions are not relevant for fully-connected networks\n",
" batch_size=64, num_realizations=200, name_experiment='bn_res_fc', \n",
@@ -4,7 +4,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Complements on the effect of $N_l$, boundary conditions, dataset, $\\epsilon$\n",
"# Complements on the Effect of $N_l$, Boundary Conditions, Dataset, $\\epsilon$\n",
"\n",
"This notebook looks at the influence of four factors:\n",
"\n",
@@ -13,24 +13,24 @@
"* the input dataset\n",
"* the fuzz factor $\\epsilon$ of batch normalization\n",
"\n",
"At times, the number of realizations will be reduced to `num_realizations = 200`. This has the effect of making the curves slightly more noisy, but it is already enough to gain insights."
"At times, the number of realizations is reduced to `num_realizations = 200`. This has the effect of making the curves slightly more noisy, but it is already enough to gain insights."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Influence of the width $N_l$\n",
"## Influence of the Width $N_l$\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This experiment share the same parameters as Fig. $2$ except for `num_channels = 512`. \n",
"In this experiment, parameters are set as in Fig. $2$ except for `num_channels = 512`. \n",
"\n",
"The observed behaviour is absolutely equivalent to Fig. $2$, apart from the diffusion of $\\log \\nu_2(\\mathbf{x}^l)$, $\\log \\mu_2(\\mathrm{d}\\mathbf{x}^l)$ being slowed down by a factor ≈ 4."
"The behaviour is absolutely equivalent to Fig. $2$ apart from the diffusion of $\\log \\nu_2(\\mathbf{x}^l)$, $\\log \\mu_2(\\mathrm{d}\\mathbf{x}^l)$ being slowed down by a factor ≈ 4."
]
},
{
@@ -42,10 +42,7 @@
"from run_experiment import run_experiment\n",
"from manage_experiments import prune_experiment\n",
"\n",
"# this computation ran in the cloud, but it is left here to show the function call\n",
"# if computation time is an issue, it is possible to reduce:\n",
"# - the number of channels\n",
"# - the number of realizations (in which case, the histograms will be more noisy)\n",
"# this experiment ran in the cloud, but it is left here to show the function call\n",
"run_experiment(architecture='vanilla', total_depth=200, kernel_size=3, num_channels=512, \n",
" boundary='periodic', dataset='cifar10',\n",
" batch_size=1024, num_realizations=10000, name_experiment='vanilla_histo_512', \n",
@@ -85,20 +82,20 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"In the experiments of Fig. $3$, $4$, $5$, an equivalent behaviour is observed apart from changes in $1\\sigma$ intervals as long as $N_l \\gg 1$."
"The evolution of vanilla nets, batch-normalized feedforward nets and batch-normalized resnets is always equivalent to Fig. $3$, $4$, $5$, with only changes in $1\\sigma$ intervals, when changing the width $N_l$ while keeping $N_l \\gg 1$."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Influence of boundary conditions for vanilla nets\n",
"## Influence of Boundary Conditions for Vanilla Nets\n",
"\n",
"In these two experiments, parameters are set as in Fig. $3$ except for `boundary = 'symmetric'` and `boundary = 'zero_padding'`:\n",
"In these experiments, parameters are set as in Fig. $3$ except for `boundary = 'symmetric'` and `boundary = 'zero_padding'`:\n",
"\n",
"* In the case of symmetric boundary conditions, there is a fully equivalent behaviour to Fig. $3$, with $\\delta \\chi^l \\to 1$ and the convergence to the pathology of one-dimension signal: $r_\\text{eff}(\\mathbf{x}^l) \\to 1$;\n",
"* In the case of symmetric boundary conditions, there is a fully equivalent behaviour to Fig. $3$ with $\\delta \\chi^l \\to 1$ and the convergence to the pathology of one-dimension signal: $r_\\text{eff}(\\mathbf{x}^l) \\to 1$;\n",
"\n",
"* In the case of zero-padding boundary conditions, there is an equivalent behaviour of $\\chi^l$ with $\\delta \\chi^l \\to 1$. However, the effective rank $r_\\text{eff}(\\mathbf{x}^l)$ does not converge to $1$, but rather to a value close to $2$. Indeed with periodic or symmetric boundary conditions, the signal becomes homogeneous with respect to $\\alpha$ so that receptive fields remain one-dimensional. But this mechanism is hindered by zero-padding with semi-padded receptive fields creating new directions of variance in $r_\\text{eff}(\\mathbf{x}^l)$."
"* In the case of zero-padding boundary conditions, there is an equivalent behaviour of $\\chi^l$ with $\\delta \\chi^l \\to 1$. However, the effective rank $r_\\text{eff}(\\mathbf{x}^l)$ does not converge to $1$, but rather to ≈ $2$. Indeed with periodic or symmetric boundary conditions, the signal becomes homogeneous with respect to $\\alpha$ such that receptive fields become one-dimensional. This mechanism however is hindered by zero-padding with semi-padded receptive fields creating new directions of variance in $r_\\text{eff}(\\mathbf{x}^l)$."
]
},
{
@@ -110,7 +107,7 @@
"from run_experiment import run_experiment\n",
"from manage_experiments import prune_experiment\n",
"\n",
"# these computations ran in the cloud, but they are left here to show the function calls\n",
"# these experiments ran in the cloud, but they are left here to show the function calls\n",
"run_experiment(architecture='vanilla', total_depth=200, kernel_size=3, num_channels=512, \n",
" boundary='symmetric', dataset='cifar10',\n",
" batch_size=64, num_realizations=200, name_experiment='vanilla_symmetric', \n",
@@ -128,7 +125,7 @@
},
{
"cell_type": "code",
"execution_count": 1,
"execution_count": 2,
"metadata": {},
"outputs": [
{
@@ -153,7 +150,7 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": 7,
"metadata": {},
"outputs": [
{
@@ -187,7 +184,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Influence of boundary conditions for batch-normalized feedforward nets\n",
"## Influence of Boundary Conditions for Batch-Normalized Feedforward Nets\n",
"\n",
"In this experiment, parameters are set as in Fig. $4$ except for `boundary = 'zero_padding'`. In this case, zero-padding conditions lead to a fully equivalent behaviour to Fig. $4$."
]
@@ -201,7 +198,7 @@
"from run_experiment import run_experiment\n",
"from manage_experiments import prune_experiment\n",
"\n",
"# this computation ran in the cloud, but it is left here to show the function call\n",
"# this experiment ran in the cloud, but it is left here to show the function call\n",
"run_experiment(architecture='bn_ff', total_depth=200, kernel_size=3, num_channels=512, \n",
" boundary='zero_padding', dataset='cifar10',\n",
" batch_size=64, num_realizations=200, name_experiment='bn_ff_zero_padding', \n",
@@ -213,7 +210,7 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": 4,
"metadata": {},
"outputs": [
{
@@ -240,11 +237,11 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Influence of input dataset\n",
"## Influence of Input Dataset\n",
"\n",
"In this experiment, parameters are set as in Fig. $4$ except for `dataset = 'mnist'`. \n",
"\n",
"There is a slightly different evolution at very low depth, with higher $\\mu_4(\\mathbf{z}^l)$ and lower $\\delta_\\phi \\chi^l$. This is presumably due to the fact that `mnist` is more fat-tailed than `cifar-10`.\n",
"There is a slightly different behaviour at very low depth, with higher $\\mu_4(\\mathbf{z}^l)$ and lower $\\delta_\\phi \\chi^l$. This is presumably due to the fact that `mnist` is more fat-tailed than `cifar-10`.\n",
"\n",
"The evolution becomes equivalent to Fig. $4$ at higher depth."
]
@@ -258,7 +255,7 @@
"from run_experiment import run_experiment\n",
"from manage_experiments import prune_experiment\n",
"\n",
"# this computation ran in the cloud, but it is left here to show the function call\n",
"# this experiment ran in the cloud, but it is left here to show the function call\n",
"run_experiment(architecture='bn_ff', total_depth=200, kernel_size=3, num_channels=512, \n",
" boundary='periodic', dataset='mnist',\n",
" batch_size=64, num_realizations=200, name_experiment='bn_ff_mnist', \n",
@@ -311,7 +308,7 @@
"from run_experiment import run_experiment\n",
"from manage_experiments import prune_experiment\n",
"\n",
"# this computation ran in the cloud, but it is left here to show the function call\n",
"# this experiment ran in the cloud, but it is left here to show the function call\n",
"run_experiment(architecture='bn_ff', total_depth=200, kernel_size=3, num_channels=512, \n",
" boundary='periodic', dataset='cifar10', epsilon=0.,\n",
" batch_size=64, num_realizations=200, name_experiment='bn_ff_epsilon', \n",

0 comments on commit 2c42f0d

Please sign in to comment.
You can’t perform that action at this time.