
从AI gallery中下载竞赛数据集至个人OBS桶中，[单击此处](https://developer.huaweicloud.com/develop/aigallery/dataset/detail?id=83c5a97d-05ec-4464-93ca-a621f3e03e82)进入数据集页面。

要将您OBS桶中的数据文件加载到此notebook中，需将如下代码中 "obs://***/data/" 修改成您OBS桶名称和数据存储路径

注意：此baseline notebook需要在“华北-北京四”区域运行

此处为导入模型拷贝库，按Crtl+回车键即可运行框内的代码，运行完成后框左侧会显示一个序号，后面所有代码的运行方式都这样。

In [2]:
import moxing as mox

INFO:root:Using MoXing-v2.1.0.5d9c87c8-5d9c87c8

INFO:root:Using OBS-Python-SDK-3.20.9.1


此处代码将从OBS拷贝数据至开发环境，这段代码需要大家根据井号后面的注释修改拷贝数据集的命令。

In [1]:
mox.file.copy_parallel('obs://swss/data/', './data/')#拷贝数据集的命令，此处的路径需要修改为您自己命名的桶名和路径名。
#"swss"改成自己的桶名，"swss斜杠后面的data"改为自己的路径名，第二个data不需要修改。

mox.file.copy_parallel('obs://ma-competitions-bj4/keyi/model/', './model/')#此命令为拷贝模型命令，无须修改。

NameError: name 'mox' is not defined

下面的代码将打印数据集的图片数，本次比赛所用数据集共5302张图片，如果下面的代码显示出正确的数字，那么证明你的数据集下载成功了。

In [1]:
import glob
images = glob.glob("./data/train/*/*")
#print(images)
print(f"数据集图像的数量: {len(images)}")

数据集图像的数量: 20012


下面的代码将把data中的数据拆分为训练集（train）和测试集（val），此处默认训练集的占比为0.8。运行完成后如果你的路径中出现了上述两个文件夹，那么证明拆分成功了。

In [2]:
import os
import shutil
import random

# 训练集的占比
train_percent = 0.9
# 数据集的目录
dataset_dir = "./data/train"
# 获取类别的名称
dir_names = sorted(os.listdir(dataset_dir))
splits = ["train", "val"]
for split in splits:
    for dir_name in dir_names:
        os.makedirs(os.path.join(split, dir_name), exist_ok=True)
for dir_name in dir_names:
    images = os.listdir(os.path.join(dataset_dir, dir_name))
    for index, image in enumerate(images):
        if random.random() < train_percent:
            split = "train"
        else:
            split = "val"
        if image==".ipynb_checkpoints":
            continue
        source_path = os.path.join(dataset_dir, dir_name, image)
        dist_path = os.path.join(split, dir_name, image)
        shutil.copyfile(source_path, dist_path)

下面是定义训练方法的代码。

In [6]:
import argparse
import os

from mindspore import Tensor, context, set_seed
from mindspore.common import dtype as mstype
from mindspore.communication.management import get_group_size, get_rank, init
from mindspore.context import ParallelMode
from mindspore.nn.optim.momentum import Momentum
from mindspore.train.callback import TimeMonitor, LossMonitor, ModelCheckpoint, CheckpointConfig
from mindspore.train.loss_scale_manager import DynamicLossScaleManager, FixedLossScaleManager
from mindspore.train.model import Model
from mindspore.train.serialization import load_checkpoint, load_param_into_net

from model.src.callback import EvaluateCallBack
from model.src.config import data_config
from model.src.dataset import create_dataset
from model.src.loss import CrossEntropySmooth
#from model.src.model import ResNet
from model.src.image_classification import get_network

set_seed(1)



def get_param_groups(network):
    """ get param groups """
    decay_params = []
    no_decay_params = []
    for x in network.trainable_params():
        parameter_name = x.name
        if parameter_name.endswith('.bias'):
            # all bias not using weight decay
            no_decay_params.append(x)
        elif parameter_name.endswith('.gamma'):
            # bn weight bias not using weight decay, be carefully for now x not include BN
            no_decay_params.append(x)
        elif parameter_name.endswith('.beta'):
            # bn weight bias not using weight decay, be carefully for now x not include BN
            no_decay_params.append(x)
        else:
            decay_params.append(x)

    return [{'params': no_decay_params, 'weight_decay': 0.0}, {'params': decay_params}]

def train_model():
    config={"device_target":"CPU","device_id":0,"device_num":1,"is_distributed":0}

    cfg = data_config
    print(cfg.val_data_path)
    # set context
    context.set_context(mode=context.GRAPH_MODE, device_target=config["device_target"],)
    if config["device_target"] == 'Ascend':
        context.set_context(enable_graph_kernel=True)

        device_num = int(os.getenv('DEVICE_NUM', '1'))
        device_id = int(os.getenv('DEVICE_ID', '0'))

        if args_opt.device_id is not None:
            context.set_context(device_id=config["device_id"])
        else:
            context.set_context(device_id=config["device_id"])

        if device_num > 1:
            context.reset_auto_parallel_context()
            context.set_auto_parallel_context(device_num=device_num,
                                              parallel_mode=ParallelMode.DATA_PARALLEL,
                                              gradients_mean=True)
            init()
    else:
        config["device_num"] = 1
        config["device_id"] = 0
        if config["is_distributed"]:
            init()
            device_num = get_group_size()
            device_id = get_rank()
            context.reset_auto_parallel_context()
            context.set_auto_parallel_context(device_num=config["device_num"],
                                              parallel_mode=ParallelMode.DATA_PARALLEL,
                                              gradients_mean=True)

    dataset = create_dataset(cfg.data_path, 1)

    batch_num = dataset.get_dataset_size()

    #net = ResNet(num_classes=cfg.num_classes)
    net = get_network('resnext101',num_classes=cfg.num_classes, platform=cfg.device_target)
    # Continue training if set pre_trained to be True
    if cfg.pre_trained:
        param_dict = load_checkpoint(cfg.checkpoint_path)
        load_param_into_net(net, param_dict)

    loss_scale_manager = None

    if cfg.is_dynamic_loss_scale:
        cfg.loss_scale = 1

    opt = Momentum(params=get_param_groups(net),
                   learning_rate=Tensor(cfg.lr_init, dtype=mstype.float32),
                   momentum=cfg.momentum,
                   loss_scale=cfg.loss_scale)

    loss = CrossEntropySmooth(sparse=True, reduction="mean", num_classes=cfg.num_classes)

    if config["device_target"] == 'Ascend':
        if cfg.is_dynamic_loss_scale == 1:
            loss_scale_manager = DynamicLossScaleManager(init_loss_scale=65536, scale_factor=2, scale_window=2000)
        else:
            loss_scale_manager = FixedLossScaleManager(cfg.loss_scale, drop_overflow_update=False)
    else:
        loss_scale_manager = FixedLossScaleManager(cfg.loss_scale, drop_overflow_update=False)

    model = Model(net, loss_fn=loss, optimizer=opt, metrics={'acc'},
                  amp_level="O2", keep_batchnorm_fp32=True,
                  loss_scale_manager=loss_scale_manager)

    config_ck = CheckpointConfig(save_checkpoint_steps=batch_num, keep_checkpoint_max=cfg.keep_checkpoint_max)
    time_cb = TimeMonitor(data_size=batch_num)
    ckpt_save_dir = "./ckpt/"
    ckpoint_cb = ModelCheckpoint(prefix="ResNet", directory=ckpt_save_dir,
                                 config=config_ck)
    loss_cb = LossMonitor()
    val_dataset = create_dataset(cfg.val_data_path, training=False)
    eval_cb = EvaluateCallBack(model=model, eval_dataset=val_dataset)
    cbs = [time_cb, ckpoint_cb, loss_cb, eval_cb]
    model.train(cfg.epoch_size, dataset, callbacks=cbs, dataset_sink_mode=cfg.use_dataset_sink)
    print("train success")


模型训练
采用卷积神经网络结构训练模型,模型训练需要一定时间，等待该段代码运行完成后再往下执行,运行完后会保存成生成许多不同轮次的ckpt模型文件，并生成ckpt文件夹，总共有50个epoch，大家需要把所有的轮次运行完了再去跑下面的代码。

In [7]:
train_model()

./val/


AttributeError: 'dict' object has no attribute 'num_classes'

下面定义模型测试函数（方法）,用于测试刚刚训练出来的模型的准确率。由于部分同学可能运行了训练脚本多次，checkpoint_path可能需要修改成自己模型的ckpt文件名字，请大家注意检查。

In [4]:
"""
Process the test set with the .ckpt model in turn.
"""
import argparse
import os

import numpy as np
from PIL import Image
from mindspore import context, Tensor
from mindspore import dtype as mstype
from mindspore.common import set_seed
from mindspore.train.serialization import load_checkpoint, load_param_into_net

from model.src.config import data_config
from model.src.model import ResNet

set_seed(1)

device_target="CPU"   #使用显卡来加速进行模型训练
checkpoint_path='./ckpt/ResNet-50_125.ckpt'#注意检查此处需改成自己ckpt文件夹下ckpt模型文件的名字

def eval():
    cfg = data_config
    class_indexing = {name: index for index, name in enumerate(sorted(os.listdir(cfg.val_data_path)))}
    indexing_class = {index: name for index, name in enumerate(sorted(os.listdir(cfg.val_data_path)))}
    images = []
    # 存储
    for dir_name in os.listdir(cfg.val_data_path):
        for class_name in os.listdir(os.path.join(cfg.val_data_path, dir_name)):
            images.append([os.path.join(cfg.val_data_path, dir_name, class_name), dir_name])
    net = ResNet(cfg.num_classes)
    net.set_train(False)

    context.set_context(mode=context.GRAPH_MODE, device_target=device_target)
   # context.set_context()#device_id=device_id)
    param_dict = load_checkpoint(checkpoint_path)
    load_param_into_net(net, param_dict)
    num_images = len(images)
    correct = 0
    mean = np.array([0.5, 0.5, 0.5]).reshape(1, 1, 3)
    std = np.array([0.5, 0.5, 0.5]).reshape(1, 1, 3)
    for image_path, label in images:
        images = Image.open(image_path)
        width, height = images.size
        left, upper = (width - 224) // 2, (height - 224) // 2
        images = images.crop((left, upper, left + 224, upper + 224))
        images = np.array(images) / 255  # (224, 224, 3)
        images = (images - mean) / std
        images = np.transpose(images, (2, 0, 1))
        images = Tensor(np.expand_dims(images, 0), dtype=mstype.float32)
        result = net(images).asnumpy()
        # print(image_path, indexing_class[int(np.argmax(result).reshape(-1))], label)
        predict = int(np.argmax(result).reshape(-1))
        #print(f"image_path: {image_path} predict: {indexing_class[predict]} ground_true: {label}")
        if predict == class_indexing[label]:
            correct += 1
    print(correct)
    print(f"Acc: {correct / num_images * 100}%")
    

下面运行测试代码，会显示出分类正确模型的个数以及训练好的模型在测试集上的准确率。准确率约为0.6，如果显示出结果证明测试成功。

In [5]:
eval()

1275
Acc: 71.79054054054053%


将最后一轮训练生成的ckpt模型文件从ckpt文件夹里拷贝出来，准备上传评分。大家仍然要检查一下下面的命名和ckpt文件夹中的是否一致。

In [5]:
import moxing as mox
mox.file.copy_parallel('./ckpt/train_simple_cnn-50_33.ckpt', './model/train_simple_cnn-50_33.ckpt')

将训练好的模型导入ModelArts
将模型导入ModelArts，为后续推理测试、模型提交做准备。最后显示“所有模型导入完成”证明运行成功。

In [3]:
from modelarts.session import Session
from modelarts.model import Model
from modelarts.config.model_config import TransformerConfig,Params
!pip install json5
import json5
import re
import traceback
import random

try:
    session = Session()
    config_path = 'model/config.json' 
    if mox.file.exists(config_path): # 判断一下是否存在配置文件，如果没有则不能导入模型
        model_location =  './model'
        model_name = "simple_cnn"
        load_dict = json5.loads(mox.file.read(config_path))
        model_type = load_dict['model_type']
        re_name = '_'+str(random.randint(0,1000))
        model_name += re_name
        runtime=load_dict['runtime']
        print("正在导入模型,模型名称：", model_name)
        model_instance = Model(
                     session, 
                     model_name=model_name,               # 模型名称
                     model_version="1.0.0",               # 模型版本
                      source_location_type='LOCAL_SOURCE',
                     source_location=model_location,      # 模型文件路径
                     model_type=model_type,# 模型类型
                     runtime=runtime
                     )

    print("所有模型导入完成")
except Exception as e:
    print("发生了一些问题，请看下面的报错信息：") 
    traceback.print_exc()
    print("模型导入失败")

Looking in indexes: http://repo.myhuaweicloud.com/repository/pypi/simple


正在导入模型,模型名称： simple_cnn_476

modelarts-cn-north-4-d714a5cc is existed


INFO:obs:Successfully upload file /home/ma-user/work/ma_share/competition/model to OBS modelarts-cn-north-4-d714a5cc/model-0708-193313

INFO:obs:Successfully upload file /home/ma-user/work/ma_share/competition/model to OBS modelarts-cn-north-4-d714a5cc/model-0708-193313

INFO:obs:Successfully upload file /home/ma-user/work/ma_share/competition/model to OBS modelarts-cn-north-4-d714a5cc/model-0708-193313

INFO:obs:Successfully upload file /home/ma-user/work/ma_share/competition/model to OBS modelarts-cn-north-4-d714a5cc/model-0708-193313

INFO:obs:Successfully upload file /home/ma-user/work/ma_share/competition/model/src to OBS modelarts-cn-north-4-d714a5cc/model-0708-193313/model

INFO:obs:Successfully upload file /home/ma-user/work/ma_share/competition/model/src to OBS modelarts-cn-north-4-d714a5cc/model-0708-193313/model

INFO:obs:Successfully upload file /home/ma-user/work/ma_share/competition/model/src to OBS modelarts-cn-north-4-d714a5cc/model-0708-193313/model

INFO:obs:Successfu

Successfully upload file /home/ma-user/work/ma_share/competition/model to OBS modelarts-cn-north-4-d714a5cc/model-0708-193313

Successfully upload model files from /home/ma-user/work/ma_share/competition/model to obs path /modelarts-cn-north-4-d714a5cc/model-0708-193313.

The model source location is https://modelarts-cn-north-4-d714a5cc.obs.cn-north-4.myhuaweicloud.com/model-0708-193313/model

publishing

published

所有模型导入完成
