# ***0. CONNECT GGDRIVE***  - for data processing

# ***1. Load Dataset - EDA - Processing Yolov5***

## 1.3 Processing: Train - Val - Test for Yolov5s

### 1.3.1 Reference structure of Yolov5s:



- https://github.com/deepakat002/yolov5_facemask
- https://www.pandaml.com/train-yolov5/
- https://www.kaggle.com/datasets/deepakat002/face-mask-detection-yolov5


```
yolov5
custom_dataset
├── custom_dataset.yaml
├── custom_model.yaml
└── images_and_labels
```

```
images_and_labels
├── images
│   ├── train
│   │   ├── train_001.jpg
│   │   ├── train_002.jpg
│   │   └── ...
│   ├── valid
│   │   ├── valid_001.jpg
│   │   ├── valid_002.jpg
│   │   └── ...
│   └── test
│       ├── test_001.jpg
│       ├── test_002.jpg
│       └── ...
└── labels
  ├── train
  │   ├── train_001.txt
  │   ├── train_002.txt
  │   └── ...
  └── valid
      ├── valid_001.txt
      ├── valid_002.txt
      └── ...
```
hoặc (đã test và không hoạt đông)
```
images_and_labels
├── train
│   ├── images
│   │   ├── train_001.jpg
│   │   ├── train_002.jpg
│   │   └── ...
│    ── labels
│       ├── train_001.txt
│       ├── train_002.txt
│       └── ...                
└── val
  ├── images
  │   ├── valid_001.jpg
  │   ├── valid_002.jpg
  │   └── ...
  └── labels
      ├── valid_001.txt
      ├── valid_002.txt
      └── ...
```

# ***1'. Download the kaggle dataset THAT I PROCESSED IN THE PREVIOUS STEP AND UPLOAD TO KAGGLE***
- https://www.kaggle.com/datasets/cngonngc/facedetection-widerfacedataset-yolov5-zip
- Dataset upload Drive is also quite convenient, but the synchronization speed is slow when switching between different accounts to use free GPU.-> ZIP FILE to upload to Kaggle dataset, downloading is more convenient when using.


# ***2. Model: Yolov5s***

## ***2.1 SETUP YOLOV5S MODEL AND PREOCESSING PRETRAINING***

## ***2.2 TRAINING YOLOV5 (same PaddleOCR use)*** with command line training - shell script training

- This command includes parameters for training configuration, such as image size, number of epochs, paths to data and model configuration files, initial weights, training process name, and other options.
    - Using a Pretrained Weight File (weights):
    The model's weights are adjusted during the training process on your custom dataset. This approach is often used for transfer learning, where the pretrained model has already learned features from a large and diverse dataset.
    - Using a Configuration File Without Pretrained Weights (cfg, null weights):
    which defines the architecture of the model but does not use pretrained weights. The model will be trained from scratch on your custom dataset.

    -- cache: load data vào train.cache ở RAM, tối ưu việc train.
    -- project (log with local/wandb) is project: FaceDetection_Yolov5 trong wandb,
    -- name: is name các lần run. (lưu local/wandb)
    -- entity: doanngoccuong_nh (tên hiển thị trong wandb) hoặc local là tên tác giả
    --save-period: Determines how often (in terms of epochs) the model checkpoints are saved. If less than 1, this feature is disabled.
    
```python
# train.py
    # Logger arguments
    parser.add_argument('--entity', default=None, help='Entity')
    parser.add_argument('--upload_dataset', nargs='?', const=True, default=False, help='Upload data, "val" option')
    parser.add_argument('--bbox_interval', type=int, default=-1, help='Set bounding-box image logging interval')
    parser.add_argument('--artifact_alias', type=str, default='latest', help='Version of dataset artifact to use')
    
```

In [None]:
# git clone - prepare training
!git clone https://github.com/ultralytics/yolov5.git
!pip install -r /kaggle/working/yolov5/requirements.txt

Collecting thop>=0.1.1 (from -r /kaggle/working/yolov5/requirements.txt (line 14))
  Downloading thop-0.1.1.post2209072238-py3-none-any.whl (15 kB)
Collecting ultralytics>=8.0.232 (from -r /kaggle/working/yolov5/requirements.txt (line 18))
  Obtaining dependency information for ultralytics>=8.0.232 from https://files.pythonhosted.org/packages/41/8f/37e7c14912a504df76212ff93eb8b78047fa2b3318dece8c2da6192231be/ultralytics-8.0.235-py3-none-any.whl.metadata
  Downloading ultralytics-8.0.235-py3-none-any.whl.metadata (35 kB)
Downloading ultralytics-8.0.235-py3-none-any.whl (677 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m677.8/677.8 kB[0m [31m5.8 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hInstalling collected packages: thop, ultralytics
Successfully installed thop-0.1.1.post2209072238 ultralytics-8.0.235


In [None]:
# Load trained model - 10 epochs
!pip install wandb
import wandb

wandb.login(key = "c8767797aae76cbcd389ff29929ace1ac3021161")    # key's DoanNgocCuong
run = wandb.init()
artifact = run.use_artifact('doanngoccuong_nh/FaceDetection_Yolov5/facedet_widerface_cfgyolov5_colab:v0', type='model')
artifact_dir = artifact.download()



[34m[1mwandb[0m: W&B API key is configured. Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33mdoanngoccuong[0m ([33mdoanngoccuong_nh[0m). Use [1m`wandb login --relogin`[0m to force relogin


[34m[1mwandb[0m:   1 of 1 files downloaded.  


In [None]:
# Write again file custom_dataset.yaml (because file trong dataset is for kaggle).

# Nội dung của file .yaml
yaml_content = """

# train: /content/drive/MyDrive/colab/custom_dataset/images_and_labels/images/train
# val: /content/drive/MyDrive/colab/custom_dataset/images_and_labels/images/val
train: /kaggle/input/facedetection-widerfacedataset-yolov5-zip/custom_dataset/images_and_labels/images/train
val: /kaggle/input/facedetection-widerfacedataset-yolov5-zip/custom_dataset/images_and_labels/images/val

# number of classes
nc: 1
# class names
names: ['face']
"""
# Tạo và ghi (w ghi mới, a: append ghi thêm)
with open('/kaggle/working/custom_dataset.yaml', 'w') as file:
    file.write(yaml_content)
print("custom_dataset.yaml file created successfully!")

In [None]:
# --resume flag is for continuing training that was interrupted,  should not change any settings from your initial training command, such as --epochs, --batch, or --data.
# --weights flag is for starting a new training based on a pre-trained model.

# Kaggle GPU T4x2 mạnh gấp đôi GPU thì batch *2 lên from 16->32
# cache for load trainloader, testloader. name for name run in wandb
# SUCCESSFUL - 18722.6s - GPU T4x2
%cd /kaggle/working/yolov5
!python train.py --img 640 --batch 32 --epochs 100 \
  --data /kaggle/working/custom_dataset.yaml \
  --weights '/kaggle/working/artifacts/facedet_widerface_cfgyolov5_colab:v0/best.pt' \
  --project FaceDetection_Yolov5   --name facedet_widerds_yolov5s_kaggle_ep10to100 \
  --entity doanngoccuong_nh \
  --cache
wandb.finish() # Kết thúc quá trình Train model - TẮT CHẤM XANH WANDB

# https://wandb.ai/doanngoccuong_nh/FaceDetection_Yolov5/artifacts/model/facedet_widerface_cfgyolov5_colab/v0
# --weights 'wandb-artifact://<entity>/<project>/model:v0' --name custom_model --cache \
# UN SUCCESSFUL
# CODE KO HOẠT ĐỘNG KHI LOAD TRỰC TIẾP TỪ WANDB
# !python train.py --img 640 --batch 32 --epochs 100 \
#   --data /kaggle/working/custom_dataset.yaml \
#   --weights 'wandb-artifact://doanngoccuong_nh/FaceDetection_Yolov5/facedet_widerface_cfgyolov5_colab:v0' \
#   --project FaceDetection_Yolov5   --name facedet_widerds_yolov5s_kaggle_ep10to100 \
#   --entity doanngoccuong_nh \
#   --cache


Đồ thị wandb cho thấy, sau 20 epochs, model cải thiện nhưng ko đáng kể so với tốc độ cải thiện của 10 epochs đầu (Cũng cho thấy 10 epochs đầu quá ukii)