# 主流程文件 Main Training Script

> 主训练脚本入口，调用各模块进行模型训练
> 
> The main entry point for running training, orchestrating all modules for model training

## 简介/Description:
main 模块是项目的主训练入口。它结合了 core 模块中的任务定义和 data 模块中的数据加载功能，通过调用 PyTorch Lightning 的 Trainer 对模型进行训练。用户可以通过配置类快速切换不同的数据集、模型和训练策略，灵活完成实验任务。

The main module serves as the primary entry point for training. It combines task definitions from the core module and data loading from the data module to execute model training via PyTorch Lightning’s Trainer. Users can flexibly switch between different datasets, models, and training strategies through configuration classes to perform experiments.

## 主要符号/Main symbols:

- Trainer: PyTorch Lightning 的训练控制器，用于管理训练过程。  
  
  Trainer: The PyTorch Lightning controller for managing the training process.

- ClassificationTask: 从 core 导入，用于模型训练的主要任务类。
  
  ClassificationTask: Imported from core, the primary task class for model training.

- CIFAR100DataModule: 从 data 导入的数据加载模块。
  
  CIFAR100DataModule: Data loading module imported from data.

In [7]:
#| default_exp __main__

In [1]:
#| hide
%load_ext autoreload
%autoreload 2
from nbdev.showdoc import *
import treescope
treescope.basic_interactive_setup(autovisualize_arrays=True)

In [9]:
#| export
from namable_classify.auto.experiment.infra import run_with_config
from boguan_yuequ.benchmarking import pe_list_tiny_for_all_size, backbone_names
from namable_classify.nucleus import ClassificationTask, ClassificationTaskConfig

In [10]:
backbone_names

In [11]:
#| export
config = ClassificationTaskConfig()
config.yuequ = "wave_high_shoulder"
# config.yuequ = "wave_high_ladder"
config.cls_model_config.checkpoint = backbone_names[2]
config.dataset_config.batch_size = 128

In [12]:
# 开发测试一下看看是否正确
run_with_config(config,
                # fast_dev_run=True, 
                batch_size = -1,
                profile = True,
                limit_train_batches  = 10,
                limit_val_batches=100,
                limit_test_batches=100,
                max_epochs = 2,
                search_learning_rate=True, 
                    )

INFO: Seed set to 0


Some weights of ViTModel were not initialized from the model checkpoint at WinKawaks/vit-tiny-patch16-224 and are newly initialized: ['vit.pooler.dense.bias', 'vit.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


INFO: Trainer already configured with model summary callbacks: [<class 'lightning.pytorch.callbacks.model_summary.ModelSummary'>]. Skipping setting a default `ModelSummary` callback.


INFO: GPU available: True (cuda), used: True


INFO: TPU available: False, using: 0 TPU cores


INFO: HPU available: False, using: 0 HPUs


INFO: Trainer will use only 1 of 8 GPUs because it is running inside an interactive / notebook environment. You may try to set `Trainer(devices=8)` but please note that multi-GPU inside interactive / notebook environments is considered experimental and unstable. Your mileage may vary.


INFO: GPU available: True (cuda), used: True


INFO: TPU available: False, using: 0 TPU cores


INFO: HPU available: False, using: 0 HPUs


Files already downloaded and verified
Files already downloaded and verified


INFO: LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7]


INFO: `Trainer.fit` stopped: `max_steps=3` reached.


INFO: Batch size 128 succeeded, trying batch size 256


INFO: Batch size 256 failed, trying batch size 128


INFO: Finished batch size finder, will continue with full run using batch size 128


INFO: Restoring states from the checkpoint path at /home/ycm/repos/novelties/cv/cls/NamableClassify/src/notebooks/.scale_batch_size_47fa75e0-30d9-4a10-b435-bc9d61b11a3f.ckpt


INFO: Restored all states from the checkpoint at /home/ycm/repos/novelties/cv/cls/NamableClassify/src/notebooks/.scale_batch_size_47fa75e0-30d9-4a10-b435-bc9d61b11a3f.ckpt


Files already downloaded and verified
Files already downloaded and verified


INFO: LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7]


Validation loader has been setup before. 


AttributeError: 'LearningRateFinder' object has no attribute 'optimal_lr'

In [13]:
from namable_classify import HuggingfaceModel

# model = HuggingfaceModel(config.cls_model_config)

# TODO wave high shoulder 这两个怎么加在一起的

In [1]:
from transformers import AutoModel
# AutoModel.from_pretrained("WinKawaks/vit-tiny-patch16-224")
# AutoModel.from_pretrained("timm/tiny_vit_5m_224.dist_in22k_ft_in1k")
import timm
# model = timm.create_model('vit_tiny_patch16_224', pretrained=True)
model = timm.create_model('vit_tiny_patch16_224.augreg_in21k', pretrained=True)
model = timm.create_model('vit_small_patch16_224.augreg_in21k', pretrained=True)
model = timm.create_model('vit_giant_patch14_clip_224.laion2b', pretrained=True)
model = timm.create_model('vit_giant_patch14_clip_224.laion2b', pretrained=True)
# model = timm.create_model('tiny_vit_5m_224.dist_in22k_ft_in1k', pretrained=True)
# model = timm.create_model('vit_xsmall_patch16_clip_224.tinyclip_yfcc15m', pretrained=True)

In [2]:
from torchsummary import summary
summary(model.cuda(), input_size=(3, 224, 224))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1          [-1, 192, 14, 14]         147,648
          Identity-2             [-1, 196, 192]               0
        PatchEmbed-3             [-1, 196, 192]               0
           Dropout-4             [-1, 197, 192]               0
          Identity-5             [-1, 197, 192]               0
          Identity-6             [-1, 197, 192]               0
         LayerNorm-7             [-1, 197, 192]             384
            Linear-8             [-1, 197, 576]         111,168
          Identity-9           [-1, 3, 197, 64]               0
         Identity-10           [-1, 3, 197, 64]               0
           Linear-11             [-1, 197, 192]          37,056
          Dropout-12             [-1, 197, 192]               0
        Attention-13             [-1, 197, 192]               0
         Identity-14             [-1, 1

In [11]:
timm?

[0;31mType:[0m        module
[0;31mString form:[0m <module 'timm' from '/home/ycm/program_files/miniconda3/envs/fastai/lib/python3.10/site-packages/timm/__init__.py'>
[0;31mFile:[0m        ~/program_files/miniconda3/envs/fastai/lib/python3.10/site-packages/timm/__init__.py
[0;31mDocstring:[0m   <no docstring>

In [None]:
# https://github.com/huggingface/pytorch-image-models/blob/main/timm/models/vision_transformer.py
# 14 家族
# DINOV2 ，缺少 tiny
# 16 家族

# vit_giant_patch16_gap_224.in22k_ijepa
# vit_so400m_patch16_siglip_256.webli_i18n

In [8]:
import torch
model(torch.randn(1, 3, 224, 224)).shape

In [None]:
#| export
#| eval: false
config.experiment_project = "Homogeneous dwarf model is all you need for tuning pretrained giant model." 
config.experiment_task = "Auto LR Range Test for wave_high_shoulder"
run_with_config(config,
                need_reproduce=True, 
                profile = True,
                
                batch_size = -1,
                search_learning_rate=True, 
                
                max_epochs = 100,
                    )

In [None]:
#| hide
import nbdev; nbdev.nbdev_export()