# **Green AI - EPSCI - 2024**

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Deyht/green_ai_espci/blob/main/opt_cnn/Green_AI_ESPCI_base_notebook.ipynb)

---


**Link to the CIANNA github repository**
https://github.com/Deyht/CIANNA

### **CIANNA installation**

#### Query GPU allocation and properties

If nvidia-smi fail, it might indicate that you launched the colab session whithout GPU reservation.  
To change the type of reservation go to "Runtime"->"Change runtime type" and select "GPU" as your hardware accelerator.

In [None]:
%%shell

nvidia-smi

cd /content/

git clone https://github.com/NVIDIA/cuda-samples/

cd /content/cuda-samples/Samples/1_Utilities/deviceQuery/

make SMS="50 60 70 80"

./deviceQuery | grep Capability | cut -c50- > ~/cuda_infos.txt
./deviceQuery | grep "CUDA Driver Version / Runtime Version" | cut -c57- >> ~/cuda_infos.txt

cd ~/

If you are granted a GPU that supports high FP16 compute scaling (e.g the Tesla T4), it is advised to change the mixed_precision parameter in the prediction to "FP16C_FP32A".  
See the detail description on mixed precision support with CIANNA on the [Systeme Requirements](https://github.com/Deyht/CIANNA/wiki/1\)-System-Requirements) wiki page.

#### Clone CIANNA git repository

In [None]:
%%shell

cd /content/

git clone https://github.com/Deyht/CIANNA

cd CIANNA

#### Compiling CIANNA for the allocated GPU generation

There is no guaranteed forward or backward compatibility between Nvidia GPU generation, and some capabilities are generation specific. For these reasons, CIANNA must be provided the platform GPU generation at compile time.
The following cell will automatically update all the necessary files based on the detected GPU, and compile CIANNA.

In [None]:
%%shell

cd /content/CIANNA

mult="10"
cat ~/cuda_infos.txt
comp_cap="$(sed '1!d' ~/cuda_infos.txt)"
cuda_vers="$(sed '2!d' ~/cuda_infos.txt)"

lim="11.1"
old_arg=$(awk '{if ($1 < $2) print "-D CUDA_OLD";}' <<<"${cuda_vers} ${lim}")

sm_val=$(awk '{print $1*$2}' <<<"${mult} ${comp_cap}")

gen_val=$(awk '{if ($1 >= 80) print "-D GEN_AMPERE"; else if($1 >= 70) print "-D GEN_VOLTA";}' <<<"${sm_val}")

sed -i "s/.*arch=sm.*/\\t\tcuda_arg=\"\$cuda_arg -D CUDA -D comp_CUDA -lcublas -lcudart -arch=sm_$sm_val $old_arg $gen_val\"/g" compile.cp
sed -i "s/\/cuda-[0-9][0-9].[0-9]/\/cuda-$cuda_vers/g" compile.cp
sed -i "s/\/cuda-[0-9][0-9].[0-9]/\/cuda-$cuda_vers/g" src/python_module_setup.py

./compile.cp CUDA PY_INTERF

mv src/build/lib.linux-x86_64-* src/build/lib.linux-x86_64

#### Testing CIANNA installation

**IMPORTANT NOTE**   
CIANNA is mainly used in a script fashion and was not designed to run in notebooks. Every cell code that directly invokes CIANNA functions must be run as a script to avoid possible errors.  
To do so, the cell must have the following structure.

```
%%shell

cd /content/CIANNA

python3 - <<EOF

[... your python code ...]

EOF
```

This syntax allows one to easily edit python code in the notebook while running the cell as a script. Note that all the notebook variables can not be accessed by the cell in this context.


### **CIFAR-10**

CIFAR-10 is a lightweihgt dataset, which comprises 60000 images of 32x32 pixels labeled into 10 classes. 50000 images are used to train supervised learning models, with 5000 examples for each class, and 10000 images are used for testing trained models, with 1000 examples for each class.

#### Downloading and visualizing the data


Create a work directory and download the default aux_fct.py file that contain the image augmentation policy.
You can edit this file by opening the left panel, navigating to /content/green_ai_opt_cnn/ and double clicking on the aux_fct.py

Note that modifications of this file are only saved for the current session. You can download you modified file to save it for a later session. In that case you can upload a modified aux_fct.py file instead of loading the default one from the GitHub.

If you have indent error after modifying the file, it is certainly cause by the different "tabulation" caracter used by Colab. By selecting multiple lines of codes around your error you should be able to distinguish which line has the wrong tabulation character.

In [None]:
%%shell

mkdir /content/green_ai_opt_cnn/
cd /content/green_ai_opt_cnn/

#Get the default aux_fct.py file from the GitHub.
wget https://github.com/Deyht/green_ai_espci/raw/main/opt_cnn/aux_fct.py

In [None]:
%%shell

cd /content/green_ai_opt_cnn/

python3 - <<EOF

from aux_fct import *

init_data_gen(0)

print("\nOrdered validation examples")
create_val_batch()

print("Create visualization of the validation dataset")
visual_val(8,4)

EOF

The next cell display the saved .jpg representation of the validation dataset.

In [None]:
%cd /content/green_ai_opt_cnn/
from PIL import Image
import matplotlib.pyplot as plt

im = Image.open("val_mosaic.jpg")
plt.figure(figsize=(8,4), dpi=200)
plt.imshow(im)
plt.gca().axis('off')
plt.show()

#### Training a network

You can modify whatever you find necessary to optimize the mixed accuracy/efficiency/size metric. The only rule is to not use external data and outside pretrained model (you can still reload a model you trained here and modify part of its architecture to not restart from scratch every time you change something).

In [None]:

%%shell

cd /content/green_ai_opt_cnn/

python3 - <<EOF

import numpy as np
from threading import Thread
from aux_fct import *
import gc, sys, glob

#Comment to access system wide install
sys.path.insert(0,glob.glob('/content/CIANNA/src/build/lib.*/')[-1])
import CIANNA as cnn


def data_augm():
	input_data, targets = create_train_batch()
	cnn.delete_dataset("TRAIN_buf", silent=1)
	cnn.create_dataset("TRAIN_buf", nb_images_per_iter, input_data[:,:], targets[:,:], silent=1)
	return


total_iter = 900 #Larger architecture are likely to require more iterations
nb_iter_per_augm = 1 #Increase if augmentation is slower that training on one iteration
if(nb_iter_per_augm > 1):
	shuffle_frequency = 1
else:
	shuffle_frequency = 0

load_iter = 0 #Used to select the net_save file from which the training must be restarted

start_iter = int(load_iter / nb_iter_per_augm)

#The batch size can be adapted to reduce the RAM footprint, but it is likely to affect the reachable accuracy
#The mixed precision can be switched to FP16C_FP32A to speed up training on T4 GPUs,
#but it can induce vanishing or exploding gradients
cnn.init(in_dim=i_ar([image_size,image_size]), in_nb_ch=3, out_dim=nb_class,
		bias=0.1, b_size=16, comp_meth='C_CUDA', dynamic_load=1,
		mixed_precision="FP32C_FP32A", adv_size=30)

init_data_gen()

input_val, targets_val = create_val_batch()
cnn.create_dataset("VALID", nb_val, input_val[:,:], targets_val[:,:])
cnn.create_dataset("TEST", nb_val, input_val[:,:], targets_val[:,:])
del (input_val, targets_val)
gc.collect()

input_data, targets = create_train_batch()
cnn.create_dataset("TRAIN", nb_images_per_iter, input_data[:,:], targets[:,:])

if(load_iter > 0):
	#Load a model save file based on the default naming scheme
	cnn.load("net_save/net0_s%04d.dat"%load_iter, load_iter, bin=1)
	#You can change the file path to load a renamed saved model; e.g.:
	#cnn.load("my_model.dat", load_iter, bin=1)
else:
	#Create a new network structure to train from scratch
	#See the CIANNA API description for a list of available layers and there parameters
	cnn.conv(f_size=i_ar([5,5]), nb_filters=8 , padding=i_ar([2,2]), activation="RELU")
	cnn.pool(p_size=i_ar([2,2]), p_type="MAX")
	cnn.conv(f_size=i_ar([5,5]), nb_filters=16, padding=i_ar([2,2]), activation="RELU")
	cnn.pool(p_size=i_ar([2,2]), p_type="MAX")
	cnn.dense(nb_neurons=256, activation="RELU", drop_rate=0.5)
	cnn.dense(nb_neurons=128, activation="RELU", drop_rate=0.2)
	cnn.dense(nb_neurons=nb_class, strict_size=1, activation="SMAX")

for run_iter in range(start_iter,int(total_iter/nb_iter_per_augm)):

	t = Thread(target=data_augm)
	t.start()

	#See the CIANNA API description for a list and descroption of available keywords
	cnn.train(nb_iter=nb_iter_per_augm, learning_rate=0.002, end_learning_rate=0.00003, shuffle_every=shuffle_frequency ,\
			 control_interv=20, confmat=1, momentum=0.9, lr_decay=0.0015, weight_decay=0.0005, save_every=20,\
			 silent=0, save_bin=1, TC_scale_factor=256.0)

	if(run_iter == start_iter):
		cnn.perf_eval()

	t.join()
	cnn.swap_data_buffers("TRAIN")

EOF


#### Evaluate your model

The following cell allow to evaluate the accuracy and compute performance of your model in inference mode. Always run the inference at least twice in a row to remove GPU startup effects from the compute time.

The last cell compute the score from the inference result based on the reference model performance. Edit the corresponding Google Sheet to add your model result:  
https://docs.google.com/spreadsheets/d/1kfTaaakPPD8Oa4YuzvYuMpFHZTKxBctc899s_E_U6iw/edit?usp=sharing

In [None]:

%%shell

cd /content/green_ai_opt_cnn/

python3 - <<EOF 2>&1 | tee out.txt

import numpy as np
from threading import Thread
from aux_fct import *
import gc, time, sys, glob

#Comment to access system wide install
sys.path.insert(0,glob.glob('/content/CIANNA/src/build/lib.*/')[-1])
import CIANNA as cnn

load_epoch = 900

#Change image test mode in aux_fct to change network resolution in all functions
init_data_gen(test_mode=1)

#Batch size does not affect the inference result, but larger batch size process faster.
#Using FP16C_FP32A can sligtly reduce the accuracy, but it strongly accelerate computation.
cnn.init(in_dim=i_ar([image_size,image_size]), in_nb_ch=3, out_dim=nb_class,
	bias=0.1, b_size=16, comp_meth='C_CUDA', dynamic_load=1,
	mixed_precision="FP32C_FP32A", adv_size=30, inference_only=1)

#Compute on only half the validation set to reduce memory footprint
input_test, targets_test = create_val_batch()
cnn.create_dataset("TEST", nb_val, input_test[:,:], targets_test[:,:])

del (input_test, targets_test)
gc.collect()

if(load_epoch == 0):
	#If load epoch is 0, load the reference model instead
	if(not os.path.isfile("arch_ref_res32_err24.18_ms330.dat")):
		os.system("wget https://share.obspm.fr/s/TnLePm62SjCg4s4/download/arch_ref_res32_err24.18_ms330.dat")
	cnn.load("arch_ref_res32_err24.18_ms330.dat", load_epoch, bin=1)
else:
	cnn.load("net_save/net0_s%04d.dat"%load_epoch, load_epoch, bin=1)


cnn.forward(repeat=1, no_error=1, saving=2, drop_mode="AVG_MODEL")

start = time.perf_counter()
cnn.forward(no_error=1, saving=2, drop_mode="AVG_MODEL")
end = time.perf_counter()

cnn.perf_eval()

compute_time = (end-start)*1000 #in miliseconds
np.savetxt("compute_time.txt", [compute_time])

EOF

In [None]:
%%shell

cd /content/green_ai_opt_cnn/

python3 - <<EOF

from aux_fct import *

load_epoch = 900

init_data_gen(test_mode=1)
input_test, targets_test = create_val_batch()
score_eval(load_epoch)

EOF