# Assignment 4: CNN

## Description

Implement a Convolutional Neural Network (CNN) classifier to predict whether a given icon image is the real / fake. Where the fake images were generated by TAs with a neural network.

- You are not required to use Colab in this assignment, but you have to **submit your source code**.

## Dataset

- https://lab.djosix.com/icons.zip
- 64x64 RGB jpg images


```
real/           (10000 images)
    0000.jpg
    0001.jpg
    ...
    9999.jpg
fake/           (10000 images)
    0000.jpg
    0001.jpg
    ...
    9999.jpg
unknown/        (5350 images, testing set)
    0000.jpg
    0001.jpg
    ...
    5349.jpg
```

- Training set
  - 20000 icons in `real/` and `fake/`
  - You should predict 1 for icons in `real/` and 0 for icons in `fake/`
- Testing set:
  - 5350 icons in `unknown/`
  - Your score depends on the **accuracy** on this testing set,  
    so the prediction of each icon in `unknown/` should be submitted (totally 5350 predictions, see below).


## Submission

Please upload **2 files** to E3. (`XXXXXXX` is your student ID)

1. **`XXXXXXX_4_result.json`**  
  This file contains your model prediction for the testing set.  
  You must generate this file with the function called `save_predictions()`.
2. **`XXXXXXX_4_source.zip`**  
  Zip your source code into this archive.


## Hints

- **Deep Learning Libraries**: You can use any deep learning frameworks (PyTorch, TensorFlow, ...).
- **How to implement**: There are many CNN examples for beginners on the internet, e.g. official websites of the above libraries, play with them and their model architectures to abtain high accuracy on testing set.
- **GPU/TPU**: Colab provides free TPU/GPU for training speedup, please refer to [this page in `pytut.pdf` on E3](https://i.imgur.com/VsrUh7I.png).


### Include this in your code to generate result file

In [1]:
import json

def save_predictions(student_id, predictions):
  # Please use this function to generate 'XXXXXXX_4_result.json'
  # `predictions` is a list of int (0 or 1; fake=0 and real=1)
  # For example, `predictions[0]` is the prediction given "unknown/0000.jpg".
  # it will be 1 if your model think it is real, else 0 (fake).

  assert isinstance(student_id, str)
  assert isinstance(predictions, list)
  assert len(predictions) == 5350

  for y in predictions:
    assert y in (0, 1)

  with open('{}_4_result.json'.format(student_id), 'w') as f:
    json.dump(predictions, f)


In [2]:
import requests 
import zipfile
import os

def download_url(url, save_path, chunk_size=128):
    r = requests.get(url, stream=True)
    with open(save_path, 'wb') as fd:
        for chunk in r.iter_content(chunk_size=chunk_size):
            fd.write(chunk)
    
    with zipfile.ZipFile("icons.zip","r") as zip_ref:
      zip_ref.extractall("icons")

download_url("https://lab.djosix.com/icons.zip", "icons.zip")

In [3]:
import numpy as np
import pandas as pd
import keras
from keras.utils import np_utils

In [4]:
import numpy as np
import PIL
import glob, os

os.chdir("/content/icons/fake")
X_tr = []
Y_tr = []
for file in glob.glob("*.jpg"):
#  print(file)
  X_train_jpg = PIL.Image.open(file)
  X_train_seq = X_train_jpg.getdata()
  X_tr.append(np.reshape(np.array(X_train_seq),(64,64,3)))
  Y_tr.append(0)
# print(X_tr)
os.chdir("/content/icons/real")
for file in glob.glob("*.jpg"):
#  print(file)
  X_train_jpg = PIL.Image.open(file)
  X_train_seq = X_train_jpg.getdata()
  X_tr.append(np.reshape(np.array(X_train_seq),(64,64,3)))
  Y_tr.append(1)

In [5]:
X_data = np.array(X_tr)
Y_data = np.array(Y_tr)
X_data = X_data / 255

In [6]:
# print(X_data.shape)
# print(Y_data.shape)
# print(X_data[50])


In [7]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [8]:
from keras.models import Sequential
from keras.layers import Dense,Dropout,Flatten,Conv2D,MaxPooling2D
from keras import layers
import numpy as np
from keras.layers.normalization import BatchNormalization
from keras.layers import GlobalAveragePooling2D,AveragePooling2D

model = Sequential()

model.add(Conv2D(filters=32,
        kernel_size=(5,5),
        padding='same',
        input_shape=(64,64,3),
        activation='relu'))

model.add(BatchNormalization())

model.add(Conv2D(filters=64,
        kernel_size=(5,5),
        padding='same',
        activation='relu'))
model.add(BatchNormalization())

# model.add(Dropout(0.25))
model.add(GlobalAveragePooling2D())
model.add(Flatten())
model.add(Dense(256, activation='relu'))
# model.add(Dropout(0.5))
model.add(Dense(2, activation='softmax'))

model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

train_history=model.fit(x=X_data,
            y=Y_data,validation_split=0.2,
            epochs=20, batch_size=300, verbose=2)

Epoch 1/20
54/54 - 54s - loss: 0.3316 - accuracy: 0.8622 - val_loss: 0.0689 - val_accuracy: 0.9955
Epoch 2/20
54/54 - 6s - loss: 0.0122 - accuracy: 0.9980 - val_loss: 0.9908 - val_accuracy: 0.4027
Epoch 3/20
54/54 - 6s - loss: 0.0015 - accuracy: 0.9998 - val_loss: 2.4736 - val_accuracy: 0.0018
Epoch 4/20
54/54 - 6s - loss: 0.0012 - accuracy: 0.9999 - val_loss: 4.4152 - val_accuracy: 0.0000e+00
Epoch 5/20
54/54 - 6s - loss: 7.7866e-04 - accuracy: 0.9999 - val_loss: 7.2099 - val_accuracy: 0.0000e+00
Epoch 6/20
54/54 - 6s - loss: 0.0040 - accuracy: 0.9992 - val_loss: 0.0171 - val_accuracy: 0.9927
Epoch 7/20
54/54 - 6s - loss: 2.8875e-04 - accuracy: 1.0000 - val_loss: 0.0523 - val_accuracy: 0.9772
Epoch 8/20
54/54 - 6s - loss: 1.5596e-04 - accuracy: 1.0000 - val_loss: 0.0114 - val_accuracy: 0.9952
Epoch 9/20
54/54 - 6s - loss: 1.0218e-04 - accuracy: 1.0000 - val_loss: 8.4673e-06 - val_accuracy: 1.0000
Epoch 10/20
54/54 - 6s - loss: 3.0650e-04 - accuracy: 1.0000 - val_loss: 0.0082 - val_acc

In [9]:
os.chdir("/content/icons/unknown")
unknown_X = []
for file in glob.glob("*.jpg"):
#  print(file)
  X_train_jpg = PIL.Image.open(file)
  X_train_seq = X_train_jpg.getdata()
  unknown_X.append(np.reshape(np.array(X_train_seq),(64,64,3)))
unknown_X_array = np.array(unknown_X)
X_normalize = unknown_X_array / 255

In [10]:
os.chdir("/content")

prediction = model.predict_classes(X_normalize)
predictions = prediction.tolist()
# print(len(predictions), predictions)

# with open('{}_4_result.json'.format("0816050"), 'w') as f:
#   json.dump(predictions, f)



In [11]:
save_predictions("0816050", predictions)
