# End-to-End Pipeline

In [None]:
import json
from src import getSystemInfo
import os
from constants import DATASETS
import sys
from dataset import BoltNutDataset
from torch.utils.data import DataLoader
from torchvision import transforms
from src import video_to_img
from dataset_analysis import  DatasetNumericalAnalysis
import matplotlib.pyplot as plt
import cv2
import torchvision
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor
import tqdm
import wandb
import torch
from config import Config
from model import *



## Hardware specs

In [None]:
json.loads(getSystemInfo())

In [None]:
print(torch.cuda.is_available())
print(torch.cuda.get_device_name(0))

## Preprocessing  
Convert video frame into the image for training.

In [None]:
# for dt in DATASETS:
#     video_to_img('/home/cagnur/stroma/dataset/images/'+dt+'/'+dt+'.mp4', '/home/cagnur/stroma/dataset/images/'+dt)

## Data

### Data Analysis: Numerical

In [None]:
annotation_path = '/home/cagnur/stroma/dataset/annotations'
analysis = DatasetNumericalAnalysis(annotation_path)
analysis.vis_compare_categories()

**Comment**: This dataset distribution is from ML Era. With deep learning era, this distribution change drasticaly.   
**Conclusion:** We have less number of data compare to deep learning approaches (data size ~1M). We can use transfer learning to use advantages of deep learning. 

In [None]:
analysis.vis_compare_subcategories()

**Comment 1 :** Validation and Test set should have same distribution. Otherwise it can explode.  
**Comment 2 :** Becasue #nut is significantly less than #bolt, we need to apply data augmentation to balance.

**Conclusion:** Good to go :) There is not any unknown category in the dataset. There is not any inconsistency.

#### Challange for Data Augmentation:  
In order to increase the number of nut, we can extract nut pixels, apply rotation or etc, and add into random places of training images. If there was a segmentation info, this challange can be handled much more easily. However, we know bbox only. Extracting bbox and applying augmentation techniques and adding random places of images might hurt the training. Because there will be some background inconsistency.

### Data Analysis: Computer Vision Point of View

Because there is a light condition in the problem definition, converting HLS channel might help, which converts the image into a hue, saturation, and lightness components instead of the RGB representation. Why? Because under different lightning conditions, H&S channels help.

In [None]:
figure = plt.figure(figsize=(8, 8))
cols, rows = 3, 3
for i in range(1, cols * rows + 1):
    sample_idx = torch.randint(analysis.train_img_num, size=(1,)).item()
    path = os.path.join('/home/cagnur/stroma/dataset/images/train/imgs',str(sample_idx).zfill(4)+'.jpg')
    bgr_img = cv2.imread(path)
    hls_img = cv2.cvtColor(bgr_img, cv2.COLOR_BGR2HLS)
    figure.add_subplot(rows, cols, i)
    plt.axis("off")
    plt.imshow(hls_img.reshape((640,640,3)))
plt.show()

In MyDataset class, I add HSL conversion. Becasue there is not torch transformation. 

Data augmentation is for increasing number of data in dataset, especially for imbalanced data. However, it is not the only reason. Sometimes we apply augmentation techniques for decreasing trainig time or to boost our model's performance.

### Data Transformation

We can add many transformation as we can. Important thing is not all transformation is applied for validation!

In [None]:
rotate = transforms.RandomRotation(degrees=15)
hFlip = transforms.RandomHorizontalFlip(p=0.25)
vFlip = transforms.RandomVerticalFlip(p=0.25)
trainTransforms = transforms.Compose([hFlip, vFlip, rotate,
        transforms.ToTensor()])
valTransforms = transforms.Compose([transforms.ToTensor()])

### Dataloader

In [None]:
data_path = '/home/cagnur/stroma/dataset/'
video_path = os.path.join(data_path, 'images')
ann_paths = os.path.join(data_path, 'annotations')

for dt in DATASETS:
    img_path = os.path.join(video_path, dt+'/imgs')
    ann_path = os.path.join(ann_paths, 'instances_'+dt+'.json')
    if dt == 'train':        
        train_set = BoltNutDataset(img_path,ann_path, trainTransforms)
        train_loader = DataLoader(train_set, batch_size=4, shuffle=True)
    elif dt == 'val':
        val_set = BoltNutDataset(img_path,ann_path, valTransforms)
        val_loader = DataLoader(val_set, batch_size=4, shuffle=True)
    elif dt == 'test':
        test_set = BoltNutDataset(img_path,ann_path, valTransforms)
        test_loader = DataLoader(test_set, batch_size=4, shuffle=True)
    else:
        print("Unknown!")
        sys.exit()

## Model

Although I added model.py which has several models, I will not use them for this challenge. The reason I want to add into the file is to show OOP's clarity and efficiency in programming. Also, this facilitates the debugging. (Clean code principles)

I started with faster rcnn due to this paper: https://www.nature.com/articles/s41598-021-02805-y.pdf
I could not write the model similar to models in model.py. I got error and I skipped because of time limitation.

In [None]:
def modified_faster_rcnn():
    # load a model pre-trained on COCO
    FasterRCNN = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
    # replace the classifier with a new one, that has
    # num_classes which is user-defined
    num_classes = 2  # 1 class (person) + background
    # get number of input features for the classifier
    in_features = FasterRCNN.roi_heads.box_predictor.cls_score.in_features
    # replace the pre-trained head with a new one
    FasterRCNN.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)
    return FasterRCNN

In [None]:
# print(model)

These following functions are from one of my homeworks. In that hw, I applied these following functions to prune with different proportion rate. As a result, I reported effect's of two pruning methods by comparing such as proportion rate vs accuracy, cpu inference time.

In [None]:
from torch.nn.utils import prune

def prune_model_l1_unstructured(model, layer_type, proportion):
    for module_name, module in model.named_modules():
        if isinstance(module, torch.nn.Conv2d):
            prune.l1_unstructured(module, 'weight', proportion)
            prune.remove(module, 'weight')
    return model

def prune_model_l1_structured(model, layer_type, proportion):
    for module_name, module in model.named_modules():
        if isinstance(module, torch.nn.Conv2d):
            prune.ln_structured(module, 'weight', proportion, n=1, dim=1)
            prune.remove(module, 'weight')
    return model

# Ex:
# compressed_0p1_model =  prune_model_l1_unstructured(copy.deepcopy(uncompressed_model), torch.nn.Conv2d, 0.7)


## Configuration

For this part, I use wandb :). Since I have student licence, I can use free. Epoch, optimizer and etc. are defined in this class. Also, I can define selections for hyperparameter search purposes. For example, in Config class, momentum and learning rates will sweep.

In [None]:
config = Config('faster')
config.train_dataloader = train_loader
config.valid_dataloader = val_loader
config.test_dataloader = test_loader

## Experiment

In [None]:
def grid_search(config, model):
    sweep_id = wandb.sweep(config.sweep, entity = "cagnur", project="Stroma")
    
    def train():
        wandb.init()
        # Training
        if model == 'cnn':
            config.model = Net(wandb.config.hidden_dim).cuda()
        elif model == 'resnet':
            config.model = ResNet(wandb.config.hidden_dim).cuda()
        elif model == 'mlp':
            config.model = MLP(wandb.config.hidden_dim).cuda()
        elif model == 'efficient':
            config.model = Efficient().cuda()
        elif model == 'faster':
            config.model = modified_faster_rcnn()
        config.optimizer = torch.optim.SGD(config.model.parameters(), lr=wandb.config.learning_rate, momentum=wandb.config.momentum)
        wandb.watch(config.model, config.criterion, log = 'all', log_freq = config.log_freq)
        config.model.train()
        counter = 0
        for epoch in range(config.epoch):            
            for imgs, labels in tqdm.tqdm(config.train_dataloader):
                imgs, labels = imgs.cuda(), labels.cuda()
                # imgs, labels = imgs, labels

                out = config.model(imgs)
                loss = config.criterion(out, labels)
                config.optimizer.zero_grad()
                loss.backward()
                config.optimizer.step()
                counter += 1
                if counter % 5 == 0:
                    wandb.log({'Loss': loss}, step = counter)
        # Training is done
        # Validation
        config.model.eval()
        correct = 0
        with torch.no_grad():
            for imgs, labels in tqdm.tqdm(config.test_dataloader):
                imgs, labels = imgs.cuda(), labels.cuda()
                # imgs, labels = imgs, labels
                out = config.model(imgs)
                predictions = out.argmax(dim=1, keepdim=True)  
                correct += predictions.eq(labels.view_as(predictions)).sum().item()
        accuracy = correct/len(config.valid_dataloader.dataset)
        wandb.log({"Accuracy":accuracy} )
        # Validation is done
        # Export the model   
        # torch.onnx.export(config.model,         # model being run 
        #                  imgs,     # model input (or a tuple for multiple inputs) 
        #                  "model.onnx",     # where to save the model  
        #                  export_params=True # store the trained parameter weights inside the model file 
        #                  )
        # wandb.save("model.onnx")
    wandb.agent(sweep_id, function=train)

In [None]:
torch.cuda.empty_cache()
wandb.login()


In [None]:
grid_search(config, 'faster')

Again, becasue of the time limitation, I could not train and hyperparameter search since I had error in model part. But I believe that the practice I am trying to implement is clear and very easy to implement.