# Обучение модели

Обучите модель в GPU-тренажёре и сохраните результат вывода модели на экран.

Но прежде ознакомьтесь с несколькими рекомендациями:

- Функцией потерь не обязательно должна быть MAE. Зачастую нейронные сети с функцией потерь MSE обучаются быстрее.
- Качество на валидационной выборке улучшается, но модель при этом переобучается всё сильнее? Не спешите менять модель. Обычно нейронные сети с большим числом слоёв сильно переобучаются.
- Проверьте, что методы `load_train(path)` и `load_test(path)` корректно работают с данными. В папке `path` содержится csv-файл `labels.csv` с двумя колонками `file_name` и `real_age` и папка с изображениями `/final_files`. Прочитайте данные из csv-файла `labels.csv` в датафрейм, который будет одним из параметров метода [ImageDataGenerator](https://keras.io/preprocessing/image/) — `flow_from_dataframe(dataframe, directory, ...)`.

Сначала ваш код должен пройти предварительную проверку, а затем его поставят в очередь на обучение. Когда пройдёт 2–3 часа, загляните в этот урок и проверьте, не завершилось ли обучение модели. Когда модель обучится, можете **продолжить работу над проектом**. Вас ждёт заключительная часть — анализ модели.

In [1]:
# Подключение библиотеки
from tensorflow.keras.models import Sequential
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.layers import Dense, GlobalAveragePooling2D
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.applications.resnet import ResNet50

import matplotlib.pyplot as plt
import numpy as np

ImportError: cannot import name 'type_spec_registry' from 'tensorflow.python.framework' (D:\Program File\Anaconda\lib\site-packages\tensorflow\python\framework\__init__.py)

In [None]:
# Загрузка обучающей выборки
def load_train(path):
    datagen = ImageDataGenerator(
        # Получаем 25% долю от данных для валидационной проверки
        validation_split=0.25, 
        # Генерирует случайным образом зеральное отражение для повышения качеств обучения
        horizontal_flip=True, 
        # Уменьшение масштаба модели
        rescale=1. / 255
    )
    
    train_gen_flow = datagen.flow_from_dataframe(
        dataframe=pd.read_csv(path + 'labels.csv'),
        directory=path + 'final_files/',
        x_col='file_name',
        y_col='real_age',
        target_size=(224, 224),
        batch_size=16,
        class_mode='raw',
        subset='training',
        seed=12345)

    return train_gen_flow

In [None]:
# Загрузка тестовой выборки
def load_test(path):
    datagen = ImageDataGenerator(        
        # Получаем 25% долю от данных для валидационной проверки
        validation_split=0.25, 
        # Уменьшение масштаба модели
        rescale=1. / 255
    )
    
    test_gen_flow = datagen.flow_from_dataframe(
        dataframe=pd.read_csv(path + 'labels.csv'),
        directory=path + 'final_files/',
        x_col='file_name',
        y_col='real_age',
        target_size=(224, 224),
        batch_size=16,
        class_mode='raw',
        subset='validation',
        seed=12345)

    return test_gen_flow

In [None]:
# Создание модели
def create_model(input_shape):
    backbone = ResNet50(input_shape=input_shape,
                    weights='/datasets/keras_models/resnet50_weights_tf_dim_ordering_tf_kernels_notop.h5',
                    include_top=False) 
    
    # Инициализируем модель глубокого обучения
    model = Sequential()
    # Добавляем модель ResNet50
    model.add(backbone)
    # Добавляем модель GlobalAveragePooling2D
    model.add(GlobalAveragePooling2D())
    # Добавляем модель Dense с активацией "relu"
    model.add(Dense(1, activation='relu'))
    
    # Для настройки гиперпараметров: Основной настраиваемый гиперпараметр в алгоритме Adam — скорость обучения (learning rate). 
    # Это шаг градиентного спуска, с которого алгоритм стартует.
    # По умолчанию он равен 0.01. Уменьшение шага иногда может замедлить обучение, но улучшить итоговое качество модели.
    optimizer_adam = Adam(lr=0.01)
    
    # Также устанавливаем параметры, отвечающие за обучение
    model.compile(optimizer=optimizer_adam, loss='mean_squared_error', metrics=['mae'])
    
    return model

In [None]:
# Запуск модели
def train_model(model, train_data, test_data, batch_size=None, epochs=10, steps_per_epoch=None, validation_steps=None):    
    if steps_per_epoch is None:
        steps_per_epoch = len(train_data)
    if validation_steps is None:
        validation_steps = len(test_data)
        
    model.fit(train_data, 
              validation_data=test_data,
              batch_size=batch_size, epochs=epochs,
              steps_per_epoch=steps_per_epoch,
              validation_steps=validation_steps,
              verbose=2, shuffle=True)
 
    return model

# Вывод на консольное окно

```
2023-05-15 13:26:40.149946: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer_plugin.so.6
2023-05-15 13:26:40.148294: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer.so.6
2023-05-15 13:26:40.977983: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2023-05-15 13:26:40.987124: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties: 
2023-05-15 13:26:40.987214: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2023-05-15 13:26:40.987183: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
pciBusID: 0000:8b:00.0 name: Tesla V100-SXM2-32GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 31.75GiB deviceMemoryBandwidth: 836.37GiB/s
2023-05-15 13:26:40.991350: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2023-05-15 13:26:40.989036: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2023-05-15 13:26:40.992458: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2023-05-15 13:26:40.989393: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2023-05-15 13:26:40.992513: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2023-05-15 13:26:40.996180: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
Using TensorFlow backend.
Found 5694 validated image filenames.
Found 1897 validated image filenames.
2023-05-15 13:26:41.161915: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2023-05-15 13:26:41.168172: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2099995000 Hz
2023-05-15 13:26:41.168779: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5bafa70 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2023-05-15 13:26:41.168801: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2023-05-15 13:26:41.349943: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5a8cfd0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2023-05-15 13:26:41.349983: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Tesla V100-SXM2-32GB, Compute Capability 7.0
2023-05-15 13:26:41.352045: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties: 
pciBusID: 0000:8b:00.0 name: Tesla V100-SXM2-32GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 31.75GiB deviceMemoryBandwidth: 836.37GiB/s
2023-05-15 13:26:41.352103: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2023-05-15 13:26:41.352113: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2023-05-15 13:26:41.352136: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2023-05-15 13:26:41.352146: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2023-05-15 13:26:41.352155: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2023-05-15 13:26:41.352163: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2023-05-15 13:26:41.352171: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2023-05-15 13:26:41.356410: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2023-05-15 13:26:41.356475: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2023-05-15 13:26:41.753961: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2023-05-15 13:26:41.754008: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102]      0 
2023-05-15 13:26:41.754015: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0:   N 
2023-05-15 13:26:41.757853: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:39] Overriding allow_growth setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
2023-05-15 13:26:41.757901: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10240 MB memory) -> physical GPU (device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:8b:00.0, compute capability: 7.0)
<class 'tensorflow.python.keras.engine.sequential.Sequential'>
WARNING:tensorflow:sample_weight modes were coerced from
  ...
    to  
  ['...']
WARNING:tensorflow:sample_weight modes were coerced from
  ...
    to  
  ['...']
Train for 356 steps, validate for 119 steps
Epoch 1/10
2023-05-15 13:26:52.221544: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2023-05-15 13:26:52.539556: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
356/356 - 64s - loss: 215.6088 - mae: 10.6122 - val_loss: 705.9674 - val_mae: 21.5358
Epoch 2/10
356/356 - 67s - loss: 84.5707 - mae: 6.9750 - val_loss: 164.0117 - val_mae: 9.6336
Epoch 3/10
356/356 - 69s - loss: 54.7848 - mae: 5.6227 - val_loss: 80.7079 - val_mae: 6.6989
Epoch 4/10
356/356 - 68s - loss: 42.7224 - mae: 4.9490 - val_loss: 92.2831 - val_mae: 7.5083
Epoch 5/10
356/356 - 68s - loss: 33.5699 - mae: 4.4030 - val_loss: 83.9321 - val_mae: 7.0734
Epoch 6/10
356/356 - 43s - loss: 26.1638 - mae: 3.8782 - val_loss: 71.7095 - val_mae: 6.4015
Epoch 7/10
356/356 - 42s - loss: 21.6567 - mae: 3.5772 - val_loss: 74.1264 - val_mae: 6.4298
Epoch 8/10
356/356 - 43s - loss: 18.9548 - mae: 3.3102 - val_loss: 69.2546 - val_mae: 6.2176
Epoch 9/10
356/356 - 43s - loss: 16.3927 - mae: 3.0635 - val_loss: 63.2988 - val_mae: 5.9638
Epoch 10/10
356/356 - 42s - loss: 13.6584 - mae: 2.8123 - val_loss: 65.7167 - val_mae: 6.0708
WARNING:tensorflow:sample_weight modes were coerced from
  ...
    to  
  ['...']
119/119 - 10s - loss: 65.7167 - mae: 6.0708
Test MAE: 6.0708
```