# **CIFAR-10 notebook**

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Deyht/AI_astro_ED_AAIF/blob/main/practical_works/CNN/classification/CIFAR-10/CIFAR-10_starter.ipynb)

---


**Link to the CIANNA github repository**
https://github.com/Deyht/CIANNA

### **CIANNA installation**

#### Query GPU allocation and properties

If nvidia-smi fail, it might indicate that you launched the colab session whithout GPU reservation.  
To change the type of reservation go to "Runtime"->"Change runtime type" and select "GPU" as your hardware accelerator.

In [None]:
%%shell

nvidia-smi

cd /content/

git clone https://github.com/NVIDIA/cuda-samples/

cd /content/cuda-samples/Samples/1_Utilities/deviceQuery/

cmake CMakeLists.txt

make SMS="50 60 70 80"

./deviceQuery | grep Capability | cut -c50- > ~/cuda_infos.txt
./deviceQuery | grep "CUDA Driver Version / Runtime Version" | cut -c57- >> ~/cuda_infos.txt

cd ~/

If you are granted a GPU that supports high FP16 compute scaling (e.g the Tesla T4), it is advised to change the mixed_precision parameter in the prediction to "FP16C_FP32A".  
See the detail description on mixed precision support with CIANNA on the [Systeme Requirements](https://github.com/Deyht/CIANNA/wiki/1\)-System-Requirements) wiki page.

#### Clone CIANNA git repository

In [None]:
%%shell

cd /content/

git clone https://github.com/Deyht/CIANNA

cd CIANNA

#### Compiling CIANNA for the allocated GPU generation

There is no guaranteed forward or backward compatibility between Nvidia GPU generation, and some capabilities are generation specific. For these reasons, CIANNA must be provided the platform GPU generation at compile time.
The following cell will automatically update all the necessary files based on the detected GPU, and compile CIANNA.

In [None]:
%%shell

cd /content/CIANNA

mult="10"
cat ~/cuda_infos.txt
comp_cap="$(sed '1!d' ~/cuda_infos.txt)"
cuda_vers="$(sed '2!d' ~/cuda_infos.txt)"

lim="11.1"
old_arg=$(awk '{if ($1 < $2) print "-D CUDA_OLD";}' <<<"${cuda_vers} ${lim}")

sm_val=$(awk '{print $1*$2}' <<<"${mult} ${comp_cap}")

gen_val=$(awk '{if ($1 >= 80) print "-D GEN_AMPERE"; else if($1 >= 70) print "-D GEN_VOLTA";}' <<<"${sm_val}")

sed -i "s/.*arch=sm.*/\\t\tcuda_arg=\"\$cuda_arg -D CUDA -D comp_CUDA -lcublas -lcudart -arch=sm_$sm_val $old_arg $gen_val\"/g" compile.cp
sed -i "s/\/cuda-[0-9][0-9].[0-9]/\/cuda-$cuda_vers/g" compile.cp
sed -i "s/\/cuda-[0-9][0-9].[0-9]/\/cuda-$cuda_vers/g" src/python_module_setup.py

./compile.cp CUDA PY_INTERF

mv src/build/lib.linux-x86_64-* src/build/lib.linux-x86_64

#### Testing CIANNA installation

**IMPORTANT NOTE**   
CIANNA is mainly used in a script fashion and was not designed to run in notebooks. Every cell code that directly invokes CIANNA functions must be run as a script to avoid possible errors.  
To do so, the cell must have the following structure.

```
%%shell

cd /content/CIANNA

python3 - <<EOF

[... your python code ...]

EOF
```

This syntax allows one to easily edit python code in the notebook while running the cell as a script. Note that all the notebook variables can not be accessed by the cell in this context.


### **CIFAR-10**

CIFAR-10 is a lightweight dataset, which comprises 60000 images of 32x32 pixels labeled into 10 classes. 50000 images are used to train supervised learning models, with 5000 examples for each class, and 10000 images are used for testing trained models, with 1000 examples for each class.

#### Downloading and visualizing the data


We start by downloading and visualizing the raw data.

In [None]:
%%shell

cd /content

wget https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
tar -xzf cifar-10-python.tar.gz

In [None]:
%cd /content/

import os, glob
import matplotlib.pyplot as plt
import numpy as np
import pickle

def unpickle(file):
    with open(file, 'rb') as fo:
        dict = pickle.load(fo, encoding='bytes')
    return dict

image_size = 32
nb_class = 10

v_width = 8; v_height = 5
nb_images = v_width*v_height

f_im_s = image_size*image_size*3

batches_meta = unpickle("cifar-10-batches-py/batches.meta")
class_list = batches_meta[b"label_names"]

dict_batch = unpickle(glob.glob("cifar-10-batches-py/data_batch_*")[0])
data = dict_batch[b"data"]
labels = dict_batch[b"labels"]

fig, ax = plt.subplots(v_height, v_width, figsize=(v_width*1.5,v_height*1.5), dpi=200, constrained_layout=True)

for i in range(0, v_width*v_height):
  c_x = i // v_width; c_y = i % v_width
  img_flat = data[i,:]
  patch = np.zeros((image_size, image_size, 3), dtype="uint8")
  for depth in range(0,3):
    patch[:,:,depth] = np.reshape(img_flat[depth*image_size*image_size:(depth+1)*image_size*image_size], (image_size, image_size))

  ax[c_x,c_y].imshow(patch, interpolation="lanczos")
  ax[c_x,c_y].text(0,0, class_list[labels[i]], c="red", va="top")
  ax[c_x,c_y].axis('off')

plt.show()

#### Data handling and augmentation

To ease data manipulation and hyperparameter exploration, we first provide a set of helper functions. To make them accessible within the CIANNA script cells, we need to export them to a Python file. Every time you want to change the content of these functions, you will need to rerun the cell to generate a new .py file. If loaded in an interactive cell, you will need to restart the kernel after changing this file to re-import it properly.

In [None]:
%%writefile helper.py

import numpy as np
import matplotlib.pyplot as plt
import os, sys, gc, glob, time, cv2
from threading import Thread
from PIL import Image
import albumentations as A
import pickle

def unpickle(file):
    with open(file, 'rb') as fo:
        dict = pickle.load(fo, encoding='bytes')
    return dict

def data_prep(nb_images_per_iter, image_size, test_mode=0):
  #Data arrays are declared as global so we can work in place to reduce RAM footpring
  global raw_train_images, train_classes, g_image_size, raw_test_images, test_classes, input_data, targets, input_val, targets_val

  raw_train_images = np.zeros((50000,image_size*image_size*3), dtype="uint8")
  train_classes = np.zeros((50000))

  raw_test_images = np.zeros((10000,image_size*image_size*3), dtype="uint8")
  test_classes = np.zeros((10000))

  i = 0
  for batch in glob.glob("cifar-10-batches-py/data_batch_*"):
    dict_batch = unpickle(batch)
    raw_train_images[i*10000:(i+1)*10000,:] = dict_batch[b"data"]
    train_classes[i*10000:(i+1)*10000] = dict_batch[b"labels"]
    i += 1

  dict_batch = unpickle("cifar-10-batches-py/test_batch")
  raw_test_images[:,:] = dict_batch[b"data"]
  test_classes[:] = dict_batch[b"labels"]

  if(test_mode == 0):
    input_data = np.zeros((nb_images_per_iter,3*image_size**2), dtype="float32") #CIANNA expects "float32" arrays
    targets = np.zeros((nb_images_per_iter,10), dtype="float32")

  input_val = np.zeros((10000,3*image_size**2), dtype="float32")
  targets_val = np.zeros((10000,10), dtype="float32")


def create_augmented_batch(A_transform):

  nb_images = np.shape(input_data)[0]
  image_size = int(np.sqrt(np.shape(input_data)[1]/3))
  patch = np.zeros((image_size, image_size, 3), dtype="uint8")

  for i in range(0,nb_images):

    l_id = np.random.randint(0,50000)
    flat_patch = raw_train_images[l_id]
    for depth in range(0,3):
      patch[:,:,depth] = np.reshape(flat_patch[depth*image_size*image_size:(depth+1)*image_size*image_size], (image_size, image_size))

    transformed = A_transform(image=patch)
    patch_aug = transformed['image']

    #CIANNA expects data formated as 2D numpy arrays representing a list of flattened images (with every channel flattened after the others)
    for depth in range(0,3): #We normalize based on mean pixel value
      input_data[i,depth*image_size**2:(depth+1)*image_size**2] = (patch_aug[:,:,depth].flatten("C") - 100.0)/155.0

    targets[i,:] = 0.0
    targets[i,int(train_classes[l_id])] = 1.0

  return input_data, targets


def create_validation_set(A_transform):

  for i in range(0,10000):

    input_val[i,:] = (raw_test_images[i] - 100.0)/155.0

    targets_val[i,:] = 0.0
    targets_val[i,int(test_classes[i])] = 1.0

  return input_val, targets_val


def free_data_helper():
  global raw_train_images, train_classes, raw_test_images, test_classes, input_data, targets, input_val, targets_val
  del (raw_train_images, train_classes, raw_test_images, test_classes, input_data, targets, input_val, targets_val)
  return

We can test the helper functions with a simple example and display the produced images. The next cell illustrates how we can create an augmented batch of images for training from the raw image dataset.  

Use this example to test the effect of combining different transform operations for data augmentation. You can also test the impact of the image resolution and of the position of the resize transformation in the augmentation list.

In [None]:
%cd /content/

from helper import *

batches_meta = unpickle("cifar-10-batches-py/batches.meta")
class_list = batches_meta[b"label_names"]

image_size = 32

v_width = 8; v_height = 5
nb_images_per_iter = v_width*v_height

#See Albumentation documentation for a list of existing augmentations
train_transform = A.Compose([
  #Image resize can be done after all other transform to preserve as much details as possible
  #or as the fist operation so other transforms are faster
  #A.Resize(image_size,image_size, interpolation=2, p=1.0),
  A.HorizontalFlip(p=0.5),
  #Affine here act more as an aspect ratio transform than a scaling variation
  #A.Affine(..., p=1.0),
  #A.ToGray(p=0.02),
  #A.ColorJitter(..., p=1.0),
  ])

val_transform = A.Compose([ #Here only a resize, but val transform could be more complex (center crop, padding, etc)
  A.Resize(image_size, image_size, interpolation=2, p=1.0)])

data_prep(nb_images_per_iter, image_size)
input_data, targets = create_augmented_batch(train_transform)

fig, ax = plt.subplots(v_height, v_width, figsize=(v_width*1.5,v_height*1.5), dpi=200, constrained_layout=True)
patch = np.zeros((image_size,image_size,3), dtype="uint8")

for i in range(0, v_width*v_height):
  c_x = i // v_width; c_y = i % v_width
  #Images in the augmented input_data array are directly in the CIANNA format.
  #We need to convert them back to classical RGB for display.
  for depth in range(0,3):
    patch[:,:,depth] = np.reshape(input_data[i,depth*image_size**2:(depth+1)*image_size**2]*155 + 100,(image_size,image_size))

  ax[c_x,c_y].imshow(patch)
  ax[c_x,c_y].text(0, 0, class_list[(np.argmax(targets[i]))], c="red", fontsize=10, clip_on=True, va="top")
  ax[c_x,c_y].axis('off')

plt.show()

free_data_helper()

#### Training a network

The following cell allows to train a neural network architecture with CIANNA using dynamical augmentation through Albumentation. The architecture has been left empty as an exercice. Try to implement a woking architecture on this dataset and then try to achieve the highest possible accuracy.

*Link to the [CIANNA](https://github.com/Deyht/CIANNA) repository. You can refer to CIANNA's [WIKI page](https://github.com/Deyht/CIANNA/wiki) for a complete framework description. You can also look at the full [API documentation](https://github.com/Deyht/CIANNA/wiki/4\)-Interface-API-documentation) to add layer types that are absent from the LeNET-5 example.
The saved models are available in the "net_save" repository that is automatically created when starting network training. The default naming scheme only refers to the training iteration, so rename your saving files with comprehensive information about your model to keep track of your progress. A saved model can be uploaded to a new Colab session for inference or further training.*

In [None]:
%%shell
cd /content/

python3 - <<EOF

from helper import *

sys.path.insert(0,glob.glob('/content/CIANNA/src/build/lib.*/')[-1])
import CIANNA as cnn

def i_ar(int_list):
  return np.array(int_list, dtype="int")

image_size = 32
nb_images_per_iter = 4096 #Must likely be reduced if the image size is aumgented so examples can fit in RAM


#See Albumentation documentation for a list of existing augmentations
train_transform = A.Compose([
  #Image resize can be done after all other transform to preserve as much details as possible
  #or as the fist operation so other transforms are faster
  #A.Resize(image_size,image_size, interpolation=2, p=1.0),
  A.HorizontalFlip(p=0.5),
  #Affine here act more as an aspect ratio transform than a scaling variation
  #A.Affine(..., p=1.0),
  #A.ToGray(p=0.02),
  #A.ColorJitter(..., p=1.0),
  ])

val_transform = A.Compose([]) #Images are alreay in the proper format, but create_validation_set expect an augm argument


#This funtion allow to launch data augmentation on a separate thread.
#This way we can train on the GPU and generate new agumented examples in parallel.
def data_augm():
  input_data, targets = create_augmented_batch(train_transform)
  cnn.delete_dataset("TRAIN_buf", silent=1)
  cnn.create_dataset("TRAIN_buf", nb_images_per_iter, input_data[:,:], targets[:,:], silent=1)
  return

#In case the creation of new augmented data is too long compared to training, you can
#increase the number of training iteration over a single augmentation
nb_iter_per_augm = 1
if(nb_iter_per_augm > 1):
  shuffle_frequency = 1
else:
  shuffle_frequency = 0


total_iter = 400 #Should be increased with the complexity of the network and task
load_iter = 0 #Used to reload a model at a given iteration

if (len(sys.argv) > 1):
  load_iter = int(sys.argv[1])

start_iter = int(load_iter / nb_iter_per_augm)

cnn.init(in_dim=i_ar([image_size,image_size]), in_nb_ch=3, out_dim=10,
  bias=0.1, b_size=16, comp_meth='C_CUDA', dynamic_load=1,
  mixed_precision="FP32C_FP32A", adv_size=30)

data_prep(nb_images_per_iter, image_size)

input_val, targets_val = create_validation_set(val_transform)
cnn.create_dataset("VALID", 10000, input_val[:,:], targets_val[:,:])
cnn.create_dataset("TEST", 10000, input_val[:,:], targets_val[:,:])
del (input_val, targets_val) #The python arrays are no longer required after import in CIANNA
gc.collect()

#Create fist augmentation before parallelization
input_data, targets = create_augmented_batch(train_transform)
cnn.create_dataset("TRAIN", nb_images_per_iter, input_data[:,:], targets[:,:])

if(load_iter > 0):
  cnn.load("net_save/net0_s%04d.dat"%load_iter, load_iter, bin=1)
else:

  ##### -> Declare your network backbone architecture here


cnn.print_arch_tex("./arch/", "arch", activation=1, dropout=1)

for run_iter in range(start_iter,int(total_iter/nb_iter_per_augm)):

  t = Thread(target=data_augm)
  t.start()

  cnn.train(nb_iter=nb_iter_per_augm, learning_rate=0.002, shuffle_every=shuffle_frequency ,\
        control_interv=20, confmat=1, save_every=20, silent=0, save_bin=1, TC_scale_factor=128.0)

  if(run_iter == start_iter):
    cnn.perf_eval()

  t.join()
  cnn.swap_data_buffers("TRAIN")

EOF


#### Evaluate your model

The following cell evaluates the accuracy and inference time over the test set.

Colab usually puts the GPU into sleep mode after idling for a few seconds. Always run this cell a few times in a row to get the real execution time.

In [None]:

%%shell

cd /content/

python3 - <<EOF 2>&1 | tee out.txt

import numpy as np
from threading import Thread
from helper import *
import gc, time, sys, glob

#Comment to access system wide install
sys.path.insert(0,glob.glob('/content/CIANNA/src/build/lib.*/')[-1])
import CIANNA as cnn

def i_ar(int_list):
  return np.array(int_list, dtype="int")

image_size = 32

val_transform = A.Compose([]) #Images are alreay in the proper format, but create_validation_set expect an augm argument

cnn.init(in_dim=i_ar([image_size,image_size]), in_nb_ch=3, out_dim=10,
	bias=0.1, b_size=16, comp_meth='C_CUDA', dynamic_load=1,
	mixed_precision="FP32C_FP32A", adv_size=30)

data_prep(0, image_size, test_mode=1)

input_test, targets_test = create_validation_set(val_transform)
cnn.create_dataset("TEST", 10000, input_test[:,:], targets_test[:,:])
gc.collect()

load_epoch = 400
cnn.load("net_save/net0_s%04d.dat"%load_epoch, load_epoch, bin=1)


#Run forward a first time to wake up the GPU and save the result
cnn.forward(repeat=1, no_error=1, saving=2, drop_mode="AVG_MODEL")

start = time.perf_counter()
#Run forward to evaluate raw model performance
cnn.forward(no_error=1, saving=0, drop_mode="AVG_MODEL")
end = time.perf_counter()

cnn.perf_eval()
compute_time = (end-start)*1000 #in miliseconds
print ("Inference time: %f ms (%d ips)"%(compute_time, int(10000/compute_time)))


pred_results = np.fromfile("fwd_res/net0_%04d.dat"%(load_epoch), dtype="float32")
pred_result = np.reshape(pred_results, (10000,-1))

pred_correct = np.shape(np.where(np.argmax(pred_result[:,:10], axis=1) == np.argmax(targets_test[:,:], axis=1)))[1]
pred_accuracy = (pred_correct/10000)*100.0
print ("Accuracy: %f, Error rate: %f"%(pred_accuracy, 100.0-pred_accuracy))

EOF