<a 
href="https://colab.research.google.com/github/5af1/Pioneer-Alpha/blob/master/task3/catr.ipynb"
target="_parent">
<!--- "https://colab.research.google.com/github/saahiluppal/catr/blob/master/catr_demo.ipynb" -->
<img 
src="https://colab.research.google.com/assets/colab-badge.svg" 
alt="Open In Colab"/></a>

<!--- This is an HTML comment in Markdown -->

# Task3

Run image captioning codebase and update the code for the Bangla dataset.

# Drive mount

Frist we mount the drive. The main data is stored in the following links. The drive links are made public. But if any problem arises please visit the following links.

The Bangla Dataset: [BanglaLekhaImageCaptions](https://drive.google.com/drive/u/2/folders/10BCfOwDyroU69Nn61LeZic0c9lQVF74a).

Trained model [Saved Weight](https://drive.google.com/drive/u/2/folders/1AA2e7AH-lZzy9zGoztZSSwnjoZDffmGF)

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


# Clone github repo

Al three tasks are stored here in this repo. The main code repo for this task is a sub repo inside this repo. Minor changes were made here and there to properly load the dataset and save the weights.

Major changes include:
- Updating the coco.py file to read .png files and reading the image files names properly.
- Changing the config.py file to save checkpoints in google drive

In [None]:
! git clone --recursive https://github.com/5AF1/Pioneer-Alpha.git

Cloning into 'Pioneer-Alpha'...
remote: Enumerating objects: 101, done.[K
remote: Counting objects: 100% (36/36), done.[K
remote: Compressing objects: 100% (18/18), done.[K
remote: Total 101 (delta 23), reused 22 (delta 18), pack-reused 65[K
Receiving objects: 100% (101/101), 37.45 MiB | 13.88 MiB/s, done.
Resolving deltas: 100% (31/31), done.
Submodule 'task3/catr' (https://github.com/5AF1/catr) registered for path 'task3/catr'
Cloning into '/content/Pioneer-Alpha/task3/catr'...
remote: Enumerating objects: 143, done.        
remote: Counting objects: 100% (86/86), done.        
remote: Compressing objects: 100% (50/50), done.        
remote: Total 143 (delta 60), reused 48 (delta 36), pack-reused 57        
Receiving objects: 100% (143/143), 3.05 MiB | 12.02 MiB/s, done.
Resolving deltas: 100% (65/65), done.
Submodule path 'task3/catr': checked out 'f096715745a4c6a21bccaac82b5085511d900e38'


# Basic imports

In [None]:
from pathlib import Path
from zipfile import ZipFile
import requests, shutil, os
from PIL import Image
from tqdm import tqdm
import numpy as np
np.random.seed(42)

# Unzip Function

In [None]:
def my_unzip(archive,output,extract_here = False):
    output = output/archive.stem if not extract_here else output
    
    os.makedirs(output,exist_ok=True)

    with ZipFile(archive, 'r') as zip_ref:
        zip_ref.extractall(output)


# Unzipping the dataset.

Here the data is unzipped

In [None]:
content = Path("/content")
drive_shortcut = Path('drive/.shortcut-targets-by-id/10BCfOwDyroU69Nn61LeZic0c9lQVF74a/paper dataset/')
image_zip_file = content/drive_shortcut/'images.zip'

In [None]:
my_unzip(image_zip_file,content/'Pioneer-Alpha/task3/catr',extract_here=True)

# Loading and saving functions for json files

In [None]:
import json
def load_data(file_):
    with open(file_, 'r', encoding='utf-8') as f:
        data = json.load(f)
    
    return data

def save_data(file_,data):
    os.makedirs(file_.parent,exist_ok=True)
    with open(file_,'w', encoding='utf-8') as f:
        json.dump(data, f, indent=4)

# json file structure

For training a proper json file has the following structure
```python
captions_2017 = {
                "info":INFO_DICT,
                "licenses":LICENSES_LIST,
                "images":images_list,
                "annotations":annotations_list,
    }
```
Now the image_list and annotation list holds a list of dictionary of the following structure.
```python
image_dict = {
            "file_name": "000000000042.jpg",
            "height": 478,
            "width": 640,
            "id": 42
}

annotations_dict = {
            "image_id": 74,
            "id": 145996,
            "caption": "A picture of a dog laying on the ground."
}
```
The INFO_DICT has the following structure
```python
INFO_DICT = {
        "description": "BanglaLekhaImageCaptions Dataset",
        "url": "https://data.mendeley.com/datasets/rxxch9vw59/2",
        "version": "2.0",
        "year": 2019,
        "contributor": "Nafees Mansoor, Abrar Hasin Kamal, Nabeel Mohammed, Sifat Momen, Md Matiur Rahman",
        "date_created": "2019/07/28"
    }
```

Now our given dataset json file has the following structure
```python
caption = [
    {
        'caption': ['রাস্তা দিয়ে কতকগুলো শিশু সারিবদ্ধ ভাবে হেঁটে যাচ্ছে।','কিছু বাচ্চা গ্রামের রাস্তা দিয়ে হাতে যাচ্ছে এবং সাথে একজন পুরুষ মানুষ।'],
        'filename': '981.png'
    },
    {},{},{},...
]
```

So given the caption.json file we create 3 other json files. They are
1. captions_train2017.json
1. captions_val2017.json
1. captions_test2017.json

---
Two constants variables for creating the json files. 

In [None]:
INFO_DICT = {
        "description": "BanglaLekhaImageCaptions Dataset",
        "url": "https://data.mendeley.com/datasets/rxxch9vw59/2",
        "version": "2.0",
        "year": 2019,
        "contributor": "Nafees Mansoor, Abrar Hasin Kamal, Nabeel Mohammed, Sifat Momen, Md Matiur Rahman",
        "date_created": "2019/07/28"
    }

LICENSES_LIST = []

## First we load the data from caption.json
---

In [None]:
caption_path = Path('/content/Pioneer-Alpha/task3/bl_cap/captions.json')
caption_data = load_data(caption_path)

## Split the indices of the caption list so that train val test distribution can be done
---

In [None]:
np.random.seed(42)
indices = np.arange(len(caption_data))
np.random.shuffle(indices)

split_frac_size = [.7,.2,.1]
split_frac_points = np.cumsum(split_frac_size)
split_frac_points = split_frac_points[:-1]

train_indices, val_indices, test_indices = np.split(indices,[round( len(indices)*frac ) for frac in split_frac_points])
len(train_indices), len(val_indices), len(test_indices)

(6408, 1831, 915)

In [None]:
train_indices

array([8874, 2214, 4112, ..., 4082, 1486, 3678])

## Split the data
---

In [None]:
train_dict_list = [caption_data[i] for i in train_indices]
val_dict_list = [caption_data[i] for i in val_indices]
test_dict_list = [caption_data[i] for i in test_indices]

# Create a function to generate the json files and copy the image files.

A coco folder is created in the code repo and it needs 3 folders for the training to work. They are
1. annotations (to store the json files captions_train2017.json and captions_val2017.json)
1. train2017 (to store the images for training)
1. val2017 (to store the images for validation)

In [None]:
def create_annotation_json(data_dict_list,images_path,coco_path,tvt='train'):
    images_list = []
    annotations_list = []


    # Traverse each data_dict in the given list
    for data_dict in tqdm(data_dict_list,desc = tvt):
        img_file_name = data_dict['filename']
        caption_list = data_dict['caption']

        id_no = int(img_file_name.split('.')[0])
        img_src = Image.open(images_path/img_file_name)

        image_dict = {
            "file_name": img_file_name.zfill(5+4),
            "height": img_src.height,
            "width": img_src.width,
            "id": id_no
        }
        # Iterate over each caption to create seperate annotations_list elements
        for i,caption in enumerate(caption_list):
            annotations_dict = {
                "image_id": id_no,
                "id": id_no*1000+i+1,
                "caption": caption
            }
            annotations_list.append(annotations_dict)
        
        images_list.append(image_dict)

        #Copying the image to the proper directory
        src_path = images_path/img_file_name
        dst_path = coco_path/f'{tvt}2017'/image_dict['file_name']

        os.makedirs(dst_path.parent,exist_ok=True)
        shutil.copy(src_path,dst_path)

    # Saving the json file
    captions_2017 = {
                "info":INFO_DICT,
                "licenses":LICENSES_LIST,
                "images":images_list,
                "annotations":annotations_list,
    }
    captions_2017_path = coco_path/'annotations'/f'captions_{tvt}2017.json'
    save_data(captions_2017_path,captions_2017)



In [None]:
images_path = Path('/content/Pioneer-Alpha/task3/catr/images')
coco_path = Path('/content/Pioneer-Alpha/task3/coco/')

create_annotation_json(train_dict_list,images_path,coco_path,tvt='train')
create_annotation_json(val_dict_list,images_path,coco_path,tvt='val')
create_annotation_json(test_dict_list,images_path,coco_path,tvt='test')

train: 100%|██████████| 6408/6408 [00:56<00:00, 114.34it/s]
val: 100%|██████████| 1831/1831 [00:16<00:00, 108.31it/s]
test: 100%|██████████| 915/915 [00:08<00:00, 109.89it/s]


# Change directory
Go to the catr directory for training and prediction

In [None]:
%cd /content/Pioneer-Alpha/task3/catr

/content/Pioneer-Alpha/task3/catr


In [None]:
!pip install -r requirements.txt

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting transformers
  Downloading transformers-4.20.1-py3-none-any.whl (4.4 MB)
[K     |████████████████████████████████| 4.4 MB 4.2 MB/s 
Collecting huggingface-hub<1.0,>=0.1.0
  Downloading huggingface_hub-0.8.1-py3-none-any.whl (101 kB)
[K     |████████████████████████████████| 101 kB 9.8 MB/s 
[?25hCollecting pyyaml>=5.1
  Downloading PyYAML-6.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (596 kB)
[K     |████████████████████████████████| 596 kB 43.1 MB/s 
Collecting tokenizers!=0.11.3,<0.13,>=0.11.1
  Downloading tokenizers-0.12.1-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (6.6 MB)
[K     |████████████████████████████████| 6.6 MB 11.0 MB/s 
Installing collected packages: pyyaml, tokenizers, huggingface-hub, transformers
  Attempting uninstall: pyyaml
    Found existing installation: PyYAML 3.13
    Uninstall

# Prediction for the model I trained.

In [None]:
!python predict.py --path images/1896.png --v v25 --checkpoint ../../../drive/.shortcut-targets-by-id/1AA2e7AH-lZzy9zGoztZSSwnjoZDffmGF/catr/checkpoint.pth
#12 1105

Checking for checkpoint.
Found checkpoint! Loading!
Loading Checkpoint...


## I trained the model for 30 epochs over 1.5 days in colab

The weights are stored in google drive.

In [None]:
!python -W ignore main.py

Initializing Device: cuda
Number of params: 83959866
Train: 12815
Valid: 3662
Start Training..
Epoch: 0
100% 400/400 [11:52<00:00,  1.78s/it]
Training Loss: 1.4886506993323565
100% 115/115 [01:42<00:00,  1.13it/s]
Validation Loss: 0.595809015761251

Epoch: 1
100% 400/400 [11:57<00:00,  1.79s/it]
Training Loss: 0.5492002014070749
100% 115/115 [01:42<00:00,  1.13it/s]
Validation Loss: 0.46083275598028434

Epoch: 2
100% 400/400 [11:59<00:00,  1.80s/it]
Training Loss: 0.4535487465560436
100% 115/115 [01:42<00:00,  1.12it/s]
Validation Loss: 0.3982638268367104

Epoch: 3
100% 400/400 [11:58<00:00,  1.80s/it]
Training Loss: 0.4036784271150827
100% 115/115 [01:42<00:00,  1.12it/s]
Validation Loss: 0.367786269084267

Epoch: 4
100% 400/400 [11:58<00:00,  1.80s/it]
Training Loss: 0.37383348800241945
100% 115/115 [01:42<00:00,  1.12it/s]
Validation Loss: 0.3467674408269965

Epoch: 5
100% 400/400 [11:56<00:00,  1.79s/it]
Training Loss: 0.35282120361924174
100% 115/115 [01:45<00:00,  1.09it/s]
Valid

In [None]:
!python -W ignore main.py

Initializing Device: cuda
Downloading: "https://download.pytorch.org/models/resnet101-63fe2227.pth" to /root/.cache/torch/hub/checkpoints/resnet101-63fe2227.pth
100% 171M/171M [00:01<00:00, 143MB/s]
Number of params: 83959866
Downloading: 100% 226k/226k [00:00<00:00, 929kB/s]
Downloading: 100% 28.0/28.0 [00:00<00:00, 26.0kB/s]
Downloading: 100% 570/570 [00:00<00:00, 562kB/s]
Train: 12815
Valid: 3662
Loading Checkpoint...
Start Training..
Epoch: 17
100% 400/400 [11:42<00:00,  1.76s/it]
Training Loss: 0.2401186903938651
100% 115/115 [01:40<00:00,  1.15it/s]
Validation Loss: 0.2744591354028038

Epoch: 18
100% 400/400 [11:46<00:00,  1.77s/it]
Training Loss: 0.2336150925606489
100% 115/115 [01:42<00:00,  1.13it/s]
Validation Loss: 0.2736491858959198

Epoch: 19
 28% 110/400 [03:22<08:25,  1.74s/it]

In [None]:
!python -W ignore main.py

Initializing Device: cuda
Downloading: "https://download.pytorch.org/models/resnet101-63fe2227.pth" to /root/.cache/torch/hub/checkpoints/resnet101-63fe2227.pth
100% 171M/171M [00:00<00:00, 253MB/s]
Number of params: 83959866
Downloading: 100% 226k/226k [00:00<00:00, 266kB/s]
Downloading: 100% 28.0/28.0 [00:00<00:00, 24.8kB/s]
Downloading: 100% 570/570 [00:00<00:00, 489kB/s]
Train: 12815
Valid: 3662
Loading Checkpoint...
Start Training..
Epoch: 20
100% 400/400 [11:27<00:00,  1.72s/it]
Training Loss: 0.1950124372728169
100% 115/115 [01:42<00:00,  1.13it/s]
Validation Loss: 0.2765206084303234

Epoch: 21
100% 400/400 [11:30<00:00,  1.73s/it]
Training Loss: 0.1873413324728608
100% 115/115 [01:40<00:00,  1.15it/s]
Validation Loss: 0.2793176759844241

Epoch: 22
100% 400/400 [11:32<00:00,  1.73s/it]
Training Loss: 0.18051403019577264
100% 115/115 [01:40<00:00,  1.14it/s]
Validation Loss: 0.2808492205713106

Epoch: 23
100% 400/400 [11:31<00:00,  1.73s/it]
Training Loss: 0.20310788014903663
100

# Thank you.