<a href="https://colab.research.google.com/github/CShorten/AIWeeklyUpdates/blob/main/Investigating_Generalization.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<h1> Utilizing AugmentationZoo, TensorFlow Addons, and Imgaug </h1>

In [None]:
!git clone https://github.com/CShorten/AugmentationZoo.git
import os
os.chdir("AugmentationZoo")

In [None]:
!pip install tensorflow-addons

In [None]:
!pip install --upgrade imgaug

<h1>
Testing Inductive Bias in Computer Vision </h1>
<p>
In Computer Vision, <b>Global</b> and <b>Local</b> priors bias the processing of an image data structure.</p>
<ul>
  <li> <b>Local</b> priors focus on smaller windows of the image, such as contiguous 7 x 7 pixel squares. </li>
  <li><b>Global</b> priors attend on the entire 256 x 256 image at once.</li>
</ul>

<h1> Architecture Bias </h1>
  <ul>
    <li>The Convolutional Network has a strong <b>Local</b> bias.</li>
    <li>The Vision Transformer has a strong <b>Global</b> bias.</li>
  </ul>


In [3]:
from Datasets import get_cifar_10
x_train, y_train, x_test, y_test = get_cifar_10()

from Models import ResNet50_with_upsampling, create_vit_classifier, compile_model
ResNet50 = ResNet50_with_upsampling(x_train)
ViT = create_vit_classifier(x_train)
compile_model(ResNet50)
compile_model(ViT) # merge into one line of code

<h1> Data Augmentation Bias </h1>

Similarly, data augmentations contain <b> Global </b> and <b> Local </b> priors.
<ul>
  <li> The crop augmentation 
  <li> The rotate augmentation

In [None]:
from imgaug import augmenters as iaa
rotate = iaa.Affine(rotate=(-45,45))
crop = iaa.Crop(percent=(0, 0.2))

<h1> There are no Inductive Biases in the Models with Random Weights </h1>

In [11]:
from AugEval import model_list_aug_results
train_results, test_results = model_list_aug_results(
    [ResNet50, ViT], ["ResNet50", "ViT"],
    [rotate, crop], ["Rotate", "Crop"],
    x_train, y_train, x_test, y_test
)

from Checkpoints import save_file
save_file(train_results, "train_results.csv")
import pandas as pd
df = pd.read_csv("train_results.csv")
df.head()

Unnamed: 0,Model Name,Original,Rotate,Crop
0,ResNet50,0.09988,0.1,0.09996
1,ViT,0.08654,0.0884,0.0851


<h1> Training the Models <h1>
<ul>
  <li> Each model takes about 70 seconds per epoch. </li>
  <li> Each model is trained for 100 epochs. </li>
  <li> The total training time is ~120 minutes / 2 hours </li>
</ul>

Epoch count is a potential confounder that will be isolated in future work.

In [13]:
ResNet50.fit(x_train, y_train, batch_size=256, epochs=100)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

<keras.callbacks.History at 0x7f4aec12fd50>

In [12]:
ViT.fit(x_train, y_train, batch_size=256, epochs=100)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

<keras.callbacks.History at 0x7f4b01e34750>

In [None]:
from Checkpoints import save_file
train_results, test_results = model_list_aug_results(
    [ResNet50, ViT], ["ResNet50-NoAug", "ViT-NoAug"],
    [rotate, crop], ["Rotate", "Crop"],
    x_train, y_train, x_test, y_test
)
save_file(train_results, "train_results.csv")
save_file(test_results, "test_results.csv")

<h1> Commentary on Results </h1>

<p> Both the ResNet50 and 20M parameter ViT are overfitted, see original train accuracy above. The ViT was quicker to overfit the training data. This illustrates the capacity of transformers to fit their training data. This is further evidenced by their appeal in self-supervised learning. In self-supervised learning, the dataset are very large and the model usually never repeats data during training. In this setting, the capacity of transformers is useful, but in smaller setings the strong inductive bias of convolutions is less prone to overfitting.</p>

<h1> References </h1>
<h6> Language Models are Few-Shot Learners. Brown et al. 2021.</h6>
<h6>A Farewell to the Bias-Variance Tradeoff?
An Overview of the Theory of Overparameterized Machine Learning. Yehuda Dar et al. 2021.</h6>

In [17]:
import pandas as pd
df = pd.read_csv("train_results.csv")
df.head()

Unnamed: 0,Model Name,Original,Rotate,Crop
0,ResNet50-NoAug,0.95392,0.47966,0.6164
1,ViT-NoAug,1.0,0.57608,0.72864


In [19]:
df = pd.read_csv("test_results.csv")
df.head()

Unnamed: 0,Model Name,Original,Rotate,Crop
0,ResNet50-NoAug,0.7221,0.429,0.5545
1,ViT-NoAug,0.7578,0.5072,0.6508


<h1> Please Cite </h1>
<h2> Investigating the Generalization of Image Classifiers with Augmented Test Sets. Shorten and Khoshgoftaar, In ICTAI 2021. </h2>