# Прекод

# Сборный проект-4
        
Вам поручено разработать демонстрационную версию поиска изображений по запросу.

Для демонстрационной версии нужно обучить модель, которая получит векторное представление изображения, векторное представление текста, а на выходе выдаст число от 0 до 1 — покажет, насколько текст и картинка подходят друг другу.

### Описание данных

Данные лежат в папке `/datasets/image_search/` или доступны по [ссылке](https://code.s3.yandex.net/datasets/dsplus_integrated_project_4.zip).

В файле `train_dataset.csv` находится информация, необходимая для обучения: имя файла изображения, идентификатор описания и текст описания. Для одной картинки может быть доступно до 5 описаний. Идентификатор описания имеет формат `<имя файла изображения>#<порядковый номер описания>`.

В папке `train_images` содержатся изображения для тренировки модели.

В файле `CrowdAnnotations.tsv` — данные по соответствию изображения и описания, полученные с помощью краудсорсинга. Номера колонок и соответствующий тип данных:

1. Имя файла изображения.
2. Идентификатор описания.
3. Доля людей, подтвердивших, что описание соответствует изображению.
4. Количество человек, подтвердивших, что описание соответствует изображению.
5. Количество человек, подтвердивших, что описание не соответствует изображению.

В файле `ExpertAnnotations.tsv` содержатся данные по соответствию изображения и описания, полученные в результате опроса экспертов. Номера колонок и соответствующий тип данных:

1. Имя файла изображения.
2. Идентификатор описания.

3, 4, 5 — оценки трёх экспертов.

Эксперты ставят оценки по шкале от 1 до 4, где 1 — изображение и запрос совершенно не соответствуют друг другу, 2 — запрос содержит элементы описания изображения, но в целом запрос тексту не соответствует, 3 — запрос и текст соответствуют с точностью до некоторых деталей, 4 — запрос и текст соответствуют полностью.

В файле `test_queries.csv` находится информация, необходимая для тестирования: идентификатор запроса, текст запроса и релевантное изображение. Для одной картинки может быть доступно до 5 описаний. Идентификатор описания имеет формат `<имя файла изображения>#<порядковый номер описания>`.

В папке `test_images` содержатся изображения для тестирования модели.

In [2]:
pip install torchvision transformers optuna ipywidgets  --quiet

Note: you may need to restart the kernel to use updated packages.


In [221]:
import pandas as pd
import numpy as np
import torch
import torch.nn as nn
import torchvision.models as models
import os
import torch.optim as optim
import optuna
import random as rd

from math import ceil
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import f1_score, mean_squared_error
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GroupShuffleSplit
from tqdm import notebook 
from PIL import Image
from transformers import BertTokenizer, BertConfig, BertModel
from scipy.stats import mode
from torchvision import transforms

## 1. Исследовательский анализ данных

Наш датасет содержит экспертные и краудсорсинговые оценки соответствия текста и изображения.

В файле с экспертными мнениями для каждой пары изображение-текст имеются оценки от трёх специалистов. Для решения задачи вы должны эти оценки агрегировать — превратить в одну. Существует несколько способов агрегации оценок, самый простой — голосование большинства: за какую оценку проголосовала большая часть экспертов (в нашем случае 2 или 3), та оценка и ставится как итоговая. Поскольку число экспертов меньше числа классов, может случиться, что каждый эксперт поставит разные оценки, например: 1, 4, 2. В таком случае данную пару изображение-текст можно исключить из датасета.

Вы можете воспользоваться другим методом агрегации оценок или придумать свой.

В файле с краудсорсинговыми оценками информация расположена в таком порядке:

1. Доля исполнителей, подтвердивших, что текст **соответствует** картинке.
2. Количество исполнителей, подтвердивших, что текст **соответствует** картинке.
3. Количество исполнителей, подтвердивших, что текст **не соответствует** картинке.

После анализа экспертных и краудсорсинговых оценок выберите либо одну из них, либо объедините их в одну по какому-то критерию: например, оценка эксперта принимается с коэффициентом 0.6, а крауда — с коэффициентом 0.4.

Ваша модель должна возвращать на выходе вероятность соответствия изображения тексту, поэтому целевая переменная должна иметь значения от 0 до 1.


In [5]:
absolute_path = r'C:\Users\miks9\PY\Assembly project 3\datasets' 

***Абсолютный путь до папки с датой***

In [7]:
expert_annotations_col_names = ['image', 'query_id', 'expert_1', 'expert_2', 'expert_3']

In [8]:
expert_annotations_data = pd.read_csv(absolute_path+'\ExpertAnnotations.tsv', header=None, sep='\t', names=expert_annotations_col_names)

In [9]:
expert_annotations_data.head(5)

Unnamed: 0,image,query_id,expert_1,expert_2,expert_3
0,1056338697_4f7d7ce270.jpg,2549968784_39bfbe44f9.jpg#2,1,1,1
1,1056338697_4f7d7ce270.jpg,2718495608_d8533e3ac5.jpg#2,1,1,2
2,1056338697_4f7d7ce270.jpg,3181701312_70a379ab6e.jpg#2,1,1,2
3,1056338697_4f7d7ce270.jpg,3207358897_bfa61fa3c6.jpg#2,1,2,2
4,1056338697_4f7d7ce270.jpg,3286822339_5535af6b93.jpg#2,1,1,2


In [10]:
expert_annotations_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5822 entries, 0 to 5821
Data columns (total 5 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   image     5822 non-null   object
 1   query_id  5822 non-null   object
 2   expert_1  5822 non-null   int64 
 3   expert_2  5822 non-null   int64 
 4   expert_3  5822 non-null   int64 
dtypes: int64(3), object(2)
memory usage: 227.6+ KB


In [11]:
train_dataset = pd.read_csv(absolute_path+'/train_dataset.csv')

In [12]:
train_dataset.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5822 entries, 0 to 5821
Data columns (total 3 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   image       5822 non-null   object
 1   query_id    5822 non-null   object
 2   query_text  5822 non-null   object
dtypes: object(3)
memory usage: 136.6+ KB


In [13]:
train_dataset.head(5)

Unnamed: 0,image,query_id,query_text
0,1056338697_4f7d7ce270.jpg,2549968784_39bfbe44f9.jpg#2,A young child is wearing blue goggles and sitt...
1,1262583859_653f1469a9.jpg,2549968784_39bfbe44f9.jpg#2,A young child is wearing blue goggles and sitt...
2,2447284966_d6bbdb4b6e.jpg,2549968784_39bfbe44f9.jpg#2,A young child is wearing blue goggles and sitt...
3,2549968784_39bfbe44f9.jpg,2549968784_39bfbe44f9.jpg#2,A young child is wearing blue goggles and sitt...
4,2621415349_ef1a7e73be.jpg,2549968784_39bfbe44f9.jpg#2,A young child is wearing blue goggles and sitt...


In [14]:
crowd_annotations_col_names = ['image', 'query_id', 'percent_of_agree', 'agree_count', 'disagree_count']

In [15]:
crowd_annotations_data = pd.read_csv(absolute_path+'/CrowdAnnotations.tsv', sep='\t', header=None, names=crowd_annotations_col_names)

In [16]:
crowd_annotations_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 47830 entries, 0 to 47829
Data columns (total 5 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   image             47830 non-null  object 
 1   query_id          47830 non-null  object 
 2   percent_of_agree  47830 non-null  float64
 3   agree_count       47830 non-null  int64  
 4   disagree_count    47830 non-null  int64  
dtypes: float64(1), int64(2), object(2)
memory usage: 1.8+ MB


In [17]:
crowd_annotations_data.head(5)

Unnamed: 0,image,query_id,percent_of_agree,agree_count,disagree_count
0,1056338697_4f7d7ce270.jpg,1056338697_4f7d7ce270.jpg#2,1.0,3,0
1,1056338697_4f7d7ce270.jpg,114051287_dd85625a04.jpg#2,0.0,0,3
2,1056338697_4f7d7ce270.jpg,1427391496_ea512cbe7f.jpg#2,0.0,0,3
3,1056338697_4f7d7ce270.jpg,2073964624_52da3a0fc4.jpg#2,0.0,0,3
4,1056338697_4f7d7ce270.jpg,2083434441_a93bc6306b.jpg#2,0.0,0,3


In [18]:
test_queries_data = pd.read_csv(absolute_path+'/test_queries.csv', sep='|')

In [19]:
test_queries_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 500 entries, 0 to 499
Data columns (total 4 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   Unnamed: 0  500 non-null    int64 
 1   query_id    500 non-null    object
 2   query_text  500 non-null    object
 3   image       500 non-null    object
dtypes: int64(1), object(3)
memory usage: 15.8+ KB


In [20]:
test_queries_data.head(25)

Unnamed: 0.1,Unnamed: 0,query_id,query_text,image
0,0,1177994172_10d143cb8d.jpg#0,"Two blonde boys , one in a camouflage shirt an...",1177994172_10d143cb8d.jpg
1,1,1177994172_10d143cb8d.jpg#1,Two boys are squirting water guns at each other .,1177994172_10d143cb8d.jpg
2,2,1177994172_10d143cb8d.jpg#2,Two boys spraying each other with water,1177994172_10d143cb8d.jpg
3,3,1177994172_10d143cb8d.jpg#3,Two children wearing jeans squirt water at eac...,1177994172_10d143cb8d.jpg
4,4,1177994172_10d143cb8d.jpg#4,Two young boys are squirting water at each oth...,1177994172_10d143cb8d.jpg
5,5,1232148178_4f45cc3284.jpg#0,A baby girl playing at a park .,1232148178_4f45cc3284.jpg
6,6,1232148178_4f45cc3284.jpg#1,A closeup of a child on a playground with adul...,1232148178_4f45cc3284.jpg
7,7,1232148178_4f45cc3284.jpg#2,A young boy poses for a picture in front of a ...,1232148178_4f45cc3284.jpg
8,8,1232148178_4f45cc3284.jpg#3,A young girl is smiling in front of the camera...,1232148178_4f45cc3284.jpg
9,9,1232148178_4f45cc3284.jpg#4,There is a little blond hair girl with a green...,1232148178_4f45cc3284.jpg


Загружаю все датасеты

In [22]:
selected_columns = ['expert_1', 'expert_2', 'expert_3']

In [23]:
expert_annotations_data['aggregated'] = expert_annotations_data[selected_columns].mode(axis=1)[0]

По принципу большинства выбираются оценки экспертов

In [24]:
expert_annotations_data.loc[(expert_annotations_data['expert_1'] != expert_annotations_data['expert_2']) & 
                             (expert_annotations_data['expert_2'] != expert_annotations_data['expert_3']), 'aggregated' ] = 'unknown'

  expert_annotations_data.loc[(expert_annotations_data['expert_1'] != expert_annotations_data['expert_2']) &


In [25]:
filterd_expert_annotations_data = expert_annotations_data.drop(expert_annotations_data.loc[expert_annotations_data['aggregated'] == 'unknown'].index)

Строки, где эксперты не смогли сойтись во мнении, удаляются

In [26]:
filterd_expert_annotations_data.head(25)

Unnamed: 0,image,query_id,expert_1,expert_2,expert_3,aggregated
0,1056338697_4f7d7ce270.jpg,2549968784_39bfbe44f9.jpg#2,1,1,1,1.0
1,1056338697_4f7d7ce270.jpg,2718495608_d8533e3ac5.jpg#2,1,1,2,1.0
2,1056338697_4f7d7ce270.jpg,3181701312_70a379ab6e.jpg#2,1,1,2,1.0
3,1056338697_4f7d7ce270.jpg,3207358897_bfa61fa3c6.jpg#2,1,2,2,2.0
4,1056338697_4f7d7ce270.jpg,3286822339_5535af6b93.jpg#2,1,1,2,1.0
5,1056338697_4f7d7ce270.jpg,3360930596_1e75164ce6.jpg#2,1,1,1,1.0
6,1056338697_4f7d7ce270.jpg,3545652636_0746537307.jpg#2,1,1,1,1.0
7,1056338697_4f7d7ce270.jpg,434792818_56375e203f.jpg#2,1,1,2,1.0
8,106490881_5a2dd9b7bd.jpg,1425069308_488e5fcf9d.jpg#2,1,1,1,1.0
9,106490881_5a2dd9b7bd.jpg,1714316707_8bbaa2a2ba.jpg#2,2,2,2,2.0


In [27]:
filterd_expert_annotations_data.info()

<class 'pandas.core.frame.DataFrame'>
Index: 5696 entries, 0 to 5821
Data columns (total 6 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   image       5696 non-null   object
 1   query_id    5696 non-null   object
 2   expert_1    5696 non-null   int64 
 3   expert_2    5696 non-null   int64 
 4   expert_3    5696 non-null   int64 
 5   aggregated  5696 non-null   object
dtypes: int64(3), object(3)
memory usage: 311.5+ KB


In [28]:
expert_annotations_residual = expert_annotations_data.shape[0] - filterd_expert_annotations_data.shape[0]
print(expert_annotations_residual)

126


Было удалено {{ expert_annotations_residual }} строк. Это данные, в которых эксперты не сошлись во мнении

In [30]:
merge_opinion_data = filterd_expert_annotations_data.merge(crowd_annotations_data, on=['image', 'query_id'], how='inner')

In [31]:
merge_opinion_data['aggregated_normalized'] = merge_opinion_data['aggregated'] / 4

In [32]:
expert_weight = 0.6
crowd_weight = 0.4

In [33]:
merge_opinion_data['combined_score'] = merge_opinion_data['aggregated_normalized']*expert_weight + merge_opinion_data['percent_of_agree']*crowd_weight

Таблицы мнений экспертов и краудсорсинга объединяются с коэффициентами 0,6 и 0,4 соответственно и получается обощенный коэффициент

In [34]:
merge_opinion_data.head(25)

Unnamed: 0,image,query_id,expert_1,expert_2,expert_3,aggregated,percent_of_agree,agree_count,disagree_count,aggregated_normalized,combined_score
0,1056338697_4f7d7ce270.jpg,2549968784_39bfbe44f9.jpg#2,1,1,1,1.0,0.0,0,3,0.25,0.15
1,1056338697_4f7d7ce270.jpg,2718495608_d8533e3ac5.jpg#2,1,1,2,1.0,0.0,0,3,0.25,0.15
2,1056338697_4f7d7ce270.jpg,434792818_56375e203f.jpg#2,1,1,2,1.0,0.0,0,3,0.25,0.15
3,1084040636_97d9633581.jpg,256085101_2c2617c5d0.jpg#2,2,3,3,3.0,0.333333,1,2,0.75,0.583333
4,1084040636_97d9633581.jpg,3396157719_6807d52a81.jpg#2,1,2,2,2.0,0.0,0,3,0.5,0.3
5,1096395242_fc69f0ae5a.jpg,1425069308_488e5fcf9d.jpg#2,2,2,2,2.0,0.0,0,3,0.5,0.3
6,1096395242_fc69f0ae5a.jpg,2370481277_a3085614c9.jpg#2,1,2,2,2.0,0.0,0,3,0.5,0.3
7,1107246521_d16a476380.jpg,2410320522_d967f0b75c.jpg#2,2,3,3,3.0,0.0,0,3,0.75,0.45
8,1107246521_d16a476380.jpg,293327462_20dee0de56.jpg#2,2,2,3,2.0,0.333333,1,2,0.5,0.433333
9,1107246521_d16a476380.jpg,3582742297_1daa29968e.jpg#2,3,3,3,3.0,0.0,0,3,0.75,0.45


In [35]:
merge_opinion_data.duplicated().sum()

0

In [36]:
merge_opinion_data.loc[merge_opinion_data['combined_score'] >= 0.6, 'target'] = 1
merge_opinion_data.loc[merge_opinion_data['combined_score'] < 0.6, 'target'] = 0

In [37]:
main_opinion_data = merge_opinion_data[['image','query_id','target']].copy()

В новой таблице имеется целевая переменная. Если мнение всех участников более 0,6, то считается, что картинка соответсвует описанию. Иначе не соответствует

In [38]:
main_opinion_data.head(5)

Unnamed: 0,image,query_id,target
0,1056338697_4f7d7ce270.jpg,2549968784_39bfbe44f9.jpg#2,0.0
1,1056338697_4f7d7ce270.jpg,2718495608_d8533e3ac5.jpg#2,0.0
2,1056338697_4f7d7ce270.jpg,434792818_56375e203f.jpg#2,0.0
3,1084040636_97d9633581.jpg,256085101_2c2617c5d0.jpg#2,0.0
4,1084040636_97d9633581.jpg,3396157719_6807d52a81.jpg#2,0.0


## 2. Проверка данных

В некоторых странах, где работает ваша компания, действуют ограничения по обработке изображений: поисковым сервисам и сервисам, предоставляющим возможность поиска, запрещено без разрешения родителей или законных представителей предоставлять любую информацию, в том числе, но не исключительно тексты, изображения, видео и аудио, содержащие описание, изображение или запись голоса детей. Ребёнком считается любой человек, не достигший 16 лет.

В вашем сервисе строго следуют законам стран, в которых работают. Поэтому при попытке посмотреть изображения, запрещённые законодательством, вместо картинок показывается дисклеймер:

> This image is unavailable in your country in compliance with local laws
>

Однако у вас в PoC нет возможности воспользоваться данным функционалом. Поэтому все изображения, которые нарушают данный закон, нужно удалить из обучающей выборки.

In [40]:
ban_words = ['boy', 'girl', 'child', 'children', 'baby','youth', 'teenager']

Список запрещенных слов

In [41]:
mask_train = train_dataset['query_text'].str.contains('|'.join(ban_words), case=False, na=False)

In [42]:
train_dataset_filtered_data = train_dataset[~mask_train] 
rows_to_delete_train = train_dataset_filtered_data.shape[0] % 100
train_dataset_filtered_data = train_dataset_filtered_data.drop(train_dataset_filtered_data.index[-rows_to_delete_train:], inplace=False)

Удаляем строки с запрещенными словами и округляю тренировочную выборку кратно 100

In [43]:
train_dataset_filtered_data.info()

<class 'pandas.core.frame.DataFrame'>
Index: 4300 entries, 22 to 5755
Data columns (total 3 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   image       4300 non-null   object
 1   query_id    4300 non-null   object
 2   query_text  4300 non-null   object
dtypes: object(3)
memory usage: 134.4+ KB


In [44]:
train_dataset_filtered_data.head(5)

Unnamed: 0,image,query_id,query_text
22,1056338697_4f7d7ce270.jpg,3181701312_70a379ab6e.jpg#2,A man sleeps under a blanket on a city street .
23,3187395715_f2940c2b72.jpg,3181701312_70a379ab6e.jpg#2,A man sleeps under a blanket on a city street .
24,463978865_c87c6ca84c.jpg,3181701312_70a379ab6e.jpg#2,A man sleeps under a blanket on a city street .
25,488590040_35a3e96c89.jpg,3181701312_70a379ab6e.jpg#2,A man sleeps under a blanket on a city street .
26,534875358_6ea30d3091.jpg,3181701312_70a379ab6e.jpg#2,A man sleeps under a blanket on a city street .


In [45]:
mask_test = test_queries_data['query_text'].str.contains('|'.join(ban_words), case=False, na=False)

In [46]:
mask_test.sum()

160

In [47]:
test_queries_filtered_data = test_queries_data[~mask_test] 

Удаляю запрещенные строки из тестовой выборки

In [48]:
test_queries_filtered_data.info()

<class 'pandas.core.frame.DataFrame'>
Index: 340 entries, 11 to 499
Data columns (total 4 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   Unnamed: 0  340 non-null    int64 
 1   query_id    340 non-null    object
 2   query_text  340 non-null    object
 3   image       340 non-null    object
dtypes: int64(1), object(3)
memory usage: 13.3+ KB


In [49]:
test_queries_filtered_data.head(10)

Unnamed: 0.1,Unnamed: 0,query_id,query_text,image
11,11,123997871_6a9ca987b1.jpg#1,Several female lacrosse players are going afte...,123997871_6a9ca987b1.jpg
13,13,123997871_6a9ca987b1.jpg#3,The woman lacrosse player in blue is about to ...,123997871_6a9ca987b1.jpg
14,14,123997871_6a9ca987b1.jpg#4,Women play lacrosse .,123997871_6a9ca987b1.jpg
15,15,1319634306_816f21677f.jpg#0,A brown dog is sitting in some long grass .,1319634306_816f21677f.jpg
16,16,1319634306_816f21677f.jpg#1,A brown dog sits still on a hillside .,1319634306_816f21677f.jpg
17,17,1319634306_816f21677f.jpg#2,A large tan dog sits on a grassy hill .,1319634306_816f21677f.jpg
18,18,1319634306_816f21677f.jpg#3,A large yellow dog is sitting on a hill .,1319634306_816f21677f.jpg
19,19,1319634306_816f21677f.jpg#4,The dog is sitting on the side of the hill .,1319634306_816f21677f.jpg
20,20,1429546659_44cb09cbe2.jpg#0,A white dog and a black dog in a field .,1429546659_44cb09cbe2.jpg
21,21,1429546659_44cb09cbe2.jpg#1,A white dog with a branch in his mouth and a b...,1429546659_44cb09cbe2.jpg


## 3. Векторизация изображений

Перейдём к векторизации изображений.

Самый примитивный способ — прочесть изображение и превратить полученную матрицу в вектор. Такой способ нам не подходит: длина векторов может быть сильно разной, так как размеры изображений разные. Поэтому стоит обратиться к свёрточным сетям: они позволяют "выделить" главные компоненты изображений. Как это сделать? Нужно выбрать какую-либо архитектуру, например ResNet-18, посмотреть на слои и исключить полносвязные слои, которые отвечают за конечное предсказание. При этом можно загрузить модель данной архитектуры, предварительно натренированную на датасете ImageNet.

In [51]:
resnet = models.resnet18(pretrained=True)



In [52]:
for param in resnet.parameters():
    param.requires_grad_(False) 

In [53]:
print(list(resnet.children())) 

[Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False), BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True), ReLU(inplace=True), MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False), Sequential(
  (0): BasicBlock(
    (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (relu): ReLU(inplace=True)
    (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  )
  (1): BasicBlock(
    (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (relu): ReLU(inplace=True)
    (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bia

In [54]:
modules = list(resnet.children())[:-2]
resnet = nn.Sequential(*modules) 

In [55]:
resnet.eval()

Sequential(
  (0): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (2): ReLU(inplace=True)
  (3): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (4): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (1): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Con

In [56]:
norm = transforms.Normalize(
    mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
preprocess = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    norm,
])

In [57]:
file_list_train = os.listdir(absolute_path+'/train_images')

In [58]:
for image_file in file_list_train:
    try:
        img = Image.open(absolute_path+"/train_images/"+image_file).convert('RGB')
        image_tensor = preprocess(img)
        image_tensor = image_tensor.unsqueeze(0)
    
        with torch.no_grad():
            output_tensor_train = resnet(image_tensor).flatten()
    except Exception as e:
        print(f"Error processing {image_file}: {e}")

In [59]:
embeddings_img_train = []
for image_file, embedding in zip(file_list_train, output_tensor_train):
    embeddings_img_train.append({
        'image': image_file,
        'embedding': embedding.tolist()
    })

In [60]:
embeddings_img_train_df = pd.DataFrame(embeddings_img_train)

In [61]:
embeddings_img_train_df.head(5)

Unnamed: 0,image,embedding
0,1056338697_4f7d7ce270.jpg,0.029657
1,106490881_5a2dd9b7bd.jpg,0.049338
2,1082379191_ec1e53f996.jpg,0.0
3,1084040636_97d9633581.jpg,0.162801
4,1096395242_fc69f0ae5a.jpg,0.83736


Провожу эмбендинг для тренировочной выборки изображений

In [62]:
output_tensor_train

tensor([0.0297, 0.0493, 0.0000,  ..., 0.0000, 0.0000, 0.0000])

In [63]:
file_list_test = os.listdir(absolute_path+'/test_images')

In [64]:
for image_file in file_list_test:
    try:
        img = Image.open(absolute_path+"/test_images/"+image_file).convert('RGB')
        image_tensor = preprocess(img)
        image_tensor = image_tensor.unsqueeze(0)
    
        with torch.no_grad():
            output_tensor_test = resnet(image_tensor).flatten()
    except Exception as e:
        print(f"Error processing {image_file}: {e}")

In [65]:
embeddings_img_test = []
for image_file, embedding in zip(file_list_test, output_tensor_test):
    embeddings_img_test.append({
        'image': image_file,
        'embedding': embedding.tolist() 
    })

In [66]:
embeddings_img_test_df = pd.DataFrame(embeddings_img_test)

In [67]:
embeddings_img_test_df.head(5)

Unnamed: 0,image,embedding
0,1177994172_10d143cb8d.jpg,0.0
1,1232148178_4f45cc3284.jpg,0.0
2,123997871_6a9ca987b1.jpg,0.0
3,1319634306_816f21677f.jpg,0.508549
4,1429546659_44cb09cbe2.jpg,0.740343


Провожу эмбендинг для тестовой выборки изображений

In [68]:
output_tensor_test

tensor([0.0000, 0.0000, 0.0000,  ..., 0.0000, 1.1078, 0.7653])

## 4. Векторизация текстов

Следующий этап — векторизация текстов. Вы можете поэкспериментировать с несколькими способами векторизации текстов:

- tf-idf
- word2vec
- \*трансформеры (например Bert)

\* — если вы изучали трансформеры в спринте Машинное обучение для текстов.


In [70]:
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

In [71]:
config = BertConfig.from_pretrained('bert-base-uncased')
model = BertModel.from_pretrained('bert-base-uncased', config=config)

In [72]:
n = 0
tokenized_texts = []

In [73]:
for promt in train_dataset_filtered_data['query_text']:
    vector = tokenizer.encode(promt, add_special_tokens=True)
    tokenized_texts.append(vector)
    n = max(n, len(vector))

In [74]:
padded = [vector + [0]*(n - len(vector)) for vector in tokenized_texts]

In [75]:
attention_mask = np.where(np.array(padded) != 0, 1, 0)

In [76]:
batch_size = 100 

In [77]:
embeddings = []

In [78]:
for i in notebook.tqdm(range((train_dataset_filtered_data.shape[0]) // batch_size)):
    batch = train_dataset_filtered_data['query_text'][batch_size * i:batch_size * (i + 1)]
    batch_attention_mask = torch.LongTensor(attention_mask[batch_size*i:batch_size*(i+1)]) 
    
    with torch.no_grad():
        outputs = model(input_ids=torch.tensor(padded[batch_size * i:batch_size * (i + 1)]), attention_mask=batch_attention_mask)
        
    batch_embeddings = outputs.last_hidden_state[:, 0, :] 
    embeddings.append(batch_embeddings.numpy())

  0%|          | 0/43 [00:00<?, ?it/s]

In [79]:
embeddings = np.concatenate(embeddings, axis=0)

In [80]:
embeddings_df = pd.DataFrame(embeddings)
embeddings_df.head(10)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,758,759,760,761,762,763,764,765,766,767
0,0.225945,-0.14777,-0.410427,0.009857,0.024527,-0.01276,-0.160295,1.20093,-0.40779,-0.553016,...,-0.113069,-0.112948,-0.309222,-0.056991,0.130479,0.988333,-0.092634,-0.319207,0.430009,0.243055
1,0.225945,-0.14777,-0.410427,0.009857,0.024527,-0.01276,-0.160295,1.20093,-0.40779,-0.553016,...,-0.113069,-0.112948,-0.309222,-0.056991,0.130479,0.988333,-0.092634,-0.319207,0.430009,0.243055
2,0.225945,-0.14777,-0.410427,0.009857,0.024527,-0.01276,-0.160295,1.20093,-0.40779,-0.553016,...,-0.113069,-0.112948,-0.309222,-0.056991,0.130479,0.988333,-0.092634,-0.319207,0.430009,0.243055
3,0.225945,-0.14777,-0.410427,0.009857,0.024527,-0.01276,-0.160295,1.20093,-0.40779,-0.553016,...,-0.113069,-0.112948,-0.309222,-0.056991,0.130479,0.988333,-0.092634,-0.319207,0.430009,0.243055
4,0.225945,-0.14777,-0.410427,0.009857,0.024527,-0.01276,-0.160295,1.20093,-0.40779,-0.553016,...,-0.113069,-0.112948,-0.309222,-0.056991,0.130479,0.988333,-0.092634,-0.319207,0.430009,0.243055
5,0.363719,0.163009,-0.490554,0.019927,0.072832,-0.12519,-0.251206,0.910355,-0.63008,-0.298645,...,-0.033179,0.282282,-0.214147,0.347063,0.388013,0.89665,0.471118,-0.301245,0.148655,0.346313
6,0.363719,0.163009,-0.490554,0.019927,0.072832,-0.12519,-0.251206,0.910355,-0.63008,-0.298645,...,-0.033179,0.282282,-0.214147,0.347063,0.388013,0.89665,0.471118,-0.301245,0.148655,0.346313
7,0.363719,0.163009,-0.490554,0.019927,0.072832,-0.12519,-0.251206,0.910355,-0.63008,-0.298645,...,-0.033179,0.282282,-0.214147,0.347063,0.388013,0.89665,0.471118,-0.301245,0.148655,0.346313
8,0.363719,0.163009,-0.490554,0.019927,0.072832,-0.12519,-0.251206,0.910355,-0.63008,-0.298645,...,-0.033179,0.282282,-0.214147,0.347063,0.388013,0.89665,0.471118,-0.301245,0.148655,0.346313
9,-0.368424,-0.235774,-0.305801,0.17535,0.182789,-0.072998,0.053588,1.238182,-0.730164,-0.333559,...,0.186065,-0.355032,-0.142532,0.475379,0.240221,-0.005223,-0.248849,-0.535116,0.498605,0.076596


Провожу эмбендинг для тренировочной выборки запросов

In [81]:
embeddings_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4300 entries, 0 to 4299
Columns: 768 entries, 0 to 767
dtypes: float32(768)
memory usage: 12.6 MB


## 5. Объединение векторов

Подготовьте данные для обучения: объедините векторы изображений и векторы текстов с целевой переменной.

In [83]:
train_dataset_filtered_embeddings = pd.concat([train_dataset_filtered_data, embeddings_df], axis=1)

In [84]:
train_dataset_filtered_embeddings = train_dataset_filtered_embeddings.dropna()

In [85]:
train_dataset_filtered_embeddings.info()

<class 'pandas.core.frame.DataFrame'>
Index: 3262 entries, 22 to 4299
Columns: 771 entries, image to 767
dtypes: float32(768), object(3)
memory usage: 9.7+ MB


In [86]:
train_dataset_filtered_embeddings = pd.merge(train_dataset_filtered_embeddings, embeddings_img_train_df, on='image', how='inner')

In [87]:
train_dataset_filtered_embeddings.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3262 entries, 0 to 3261
Columns: 772 entries, image to embedding
dtypes: float32(768), float64(1), object(3)
memory usage: 9.7+ MB


In [88]:
train_dataset_filtered_embeddings = train_dataset_filtered_embeddings.merge(main_opinion_data, on=['image', 'query_id'], how='inner')

In [89]:
train_dataset_filtered_embeddings.columns = train_dataset_filtered_embeddings.columns.astype(str)

In [90]:
train_dataset_filtered_embeddings.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1382 entries, 0 to 1381
Columns: 773 entries, image to target
dtypes: float32(768), float64(2), object(3)
memory usage: 4.1+ MB


In [91]:
train_dataset_filtered_embeddings.head(5)

Unnamed: 0,image,query_id,query_text,0,1,2,3,4,5,6,...,760,761,762,763,764,765,766,767,embedding,target
0,1056338697_4f7d7ce270.jpg,434792818_56375e203f.jpg#2,A man and woman look back at the camera while ...,0.130422,-0.037170,-0.338869,-0.016837,0.108761,-0.292961,0.043540,...,-0.067451,0.171396,0.334391,0.243756,-0.053264,-0.275054,0.692342,0.353757,0.029657,0.0
1,3187395715_f2940c2b72.jpg,300577375_26cc2773a1.jpg#2,An officer stands next to a car on a city stre...,0.083606,-0.146299,-0.731420,0.083850,0.446841,-0.611448,-0.331658,...,-0.028371,0.243266,0.031148,0.461950,0.229092,-0.198117,0.571113,0.446735,2.036200,0.0
2,463978865_c87c6ca84c.jpg,3181701312_70a379ab6e.jpg#2,A man sleeps under a blanket on a city street .,0.208375,0.198959,0.058023,-0.024403,0.156402,-0.376268,0.228204,...,-0.188570,-0.133327,0.049694,0.756743,0.089408,-0.387838,0.098020,0.058181,1.662843,0.0
3,463978865_c87c6ca84c.jpg,260520547_944f9f4c91.jpg#2,People walking down a sidewalk on a beach .,0.043938,0.581405,0.015249,-0.030840,0.260595,-0.543070,0.019188,...,-0.296617,-0.207228,0.157452,0.303370,0.228893,0.082603,0.202067,0.588910,1.662843,0.0
4,3208074567_ac44aeb3f3.jpg,3655964639_21e76383d0.jpg#2,The woman wearing a red bow walks past a bicyc...,-0.077238,-0.019634,-0.565421,0.345388,0.299098,-0.613959,0.047469,...,-0.365767,0.303600,0.263684,0.246732,0.125311,-0.239420,0.333488,-0.169922,1.784997,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1377,3027397797_4f1d305ced.jpg,3270691950_88583c3524.jpg#2,A man is diving into water near a shore .,-0.449384,-0.263797,-0.681446,0.132174,0.068195,-0.065863,-0.454374,...,-0.158239,-0.214286,0.307321,0.095848,-0.453346,-0.469998,0.335383,-0.089274,0.000000,0.0
1378,2759860913_f75b39d783.jpg,2759860913_f75b39d783.jpg#2,A man on top of a high mountain,-0.002501,0.163937,-0.442110,0.066301,-0.017070,-0.089583,-0.309687,...,0.061090,0.243191,0.118329,0.267516,-0.134084,-0.297520,0.124231,-0.077339,0.856501,1.0
1379,317488612_70ac35493b.jpg,3064383768_f6838f57da.jpg#2,a surfer is riding his board over a wave .,0.374781,0.173804,-0.094735,-0.389569,-0.079404,-0.392917,0.077303,...,0.009453,-0.253995,0.378418,0.391211,-0.195037,-0.261474,0.363062,0.290213,0.050992,0.0
1380,2384353160_f395e9a54b.jpg,3503689049_63212220be.jpg#2,a young man wearing dark sunglasses smiling,0.394784,0.295883,-0.065090,-0.290850,0.017728,-0.644718,0.373176,...,0.201370,-0.033256,0.090386,0.967703,-0.154269,-0.397845,0.013589,0.215697,0.000000,0.0


Провожу объединение таблиц. В конечную таблицу попали эмбендинги текстовых запросов, изобржений, а также сам запрос и "ID" изображений и запросов

## 6. Обучение модели предсказания соответствия

Для обучения разделите датасет на тренировочную и тестовую выборки. Простое случайное разбиение не подходит: нужно исключить попадание изображения и в обучающую, и в тестовую выборки.
Для того чтобы учесть изображения при разбиении, можно воспользоваться классом [GroupShuffleSplit](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GroupShuffleSplit.html) из библиотеки sklearn.model_selection.

Код ниже разбивает датасет на тренировочную и тестовую выборки в пропорции 7:3 так, что строки с одинаковым значением 'group_column' будут содержаться либо в тестовом, либо в тренировочном датасете.

```
from sklearn.model_selection import GroupShuffleSplit
gss = GroupShuffleSplit(n_splits=1, train_size=.7, random_state=42)
train_indices, test_indices = next(gss.split(X=df.drop(columns=['target']), y=df['target'], groups=df['group_column']))
train_df, test_df = df.loc[train_indices], df.loc[test_indices]

```

Какую модель использовать — выберите самостоятельно. Также вам предстоит выбрать метрику качества либо реализовать свою.

In [93]:
main_data = train_dataset_filtered_embeddings.drop(['image', 'query_id', 'query_text'], axis=1).copy()

In [94]:
main_data.head(5)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,760,761,762,763,764,765,766,767,embedding,target
0,0.130422,-0.03717,-0.338869,-0.016837,0.108761,-0.292961,0.04354,0.561109,-0.314614,-0.0697,...,-0.067451,0.171396,0.334391,0.243756,-0.053264,-0.275054,0.692342,0.353757,0.029657,0.0
1,0.083606,-0.146299,-0.73142,0.08385,0.446841,-0.611448,-0.331658,0.922622,-0.543876,-0.082709,...,-0.028371,0.243266,0.031148,0.46195,0.229092,-0.198117,0.571113,0.446735,2.0362,0.0
2,0.208375,0.198959,0.058023,-0.024403,0.156402,-0.376268,0.228204,0.597776,-0.190187,-0.52586,...,-0.18857,-0.133327,0.049694,0.756743,0.089408,-0.387838,0.09802,0.058181,1.662843,0.0
3,0.043938,0.581405,0.015249,-0.03084,0.260595,-0.54307,0.019188,0.479385,-0.02363,-0.78208,...,-0.296617,-0.207228,0.157452,0.30337,0.228893,0.082603,0.202067,0.58891,1.662843,0.0
4,-0.077238,-0.019634,-0.565421,0.345388,0.299098,-0.613959,0.047469,0.901103,-0.513558,-0.644246,...,-0.365767,0.3036,0.263684,0.246732,0.125311,-0.23942,0.333488,-0.169922,1.784997,0.0


In [95]:
main_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1382 entries, 0 to 1381
Columns: 770 entries, 0 to target
dtypes: float32(768), float64(2)
memory usage: 4.1 MB


Дата для модели

In [96]:
gss = GroupShuffleSplit(n_splits=1, train_size=.7, random_state=42)
train_indices, test_indices = next(gss.split(X=main_data.drop(columns=['target']),
                                            y=main_data['target'], groups=train_dataset_filtered_embeddings['image']))

train_df, test_df = main_data.loc[train_indices], main_data.loc[test_indices]

In [97]:
X_train = train_df.drop(columns=['target'])
y_train = train_df['target']
X_test = test_df.drop(columns=['target'])
y_test = test_df['target']

Разделение выборки на тренировочную и обучающую

In [98]:
model_linear = LogisticRegression(random_state=42, max_iter=1000)
model_linear.fit(X_train, y_train)

In [99]:
scaler = MinMaxScaler()

In [100]:
print("RMSE кросс-валидации:", 
      mean_squared_error(y_test, scaler.fit_transform(model_linear.predict(X_test).reshape(-1, 1)), squared=False))

RMSE кросс-валидации: 0.47871355387816905


Для линейной модели получилось следующее значение RMSE

In [101]:
X_train_neuron = torch.FloatTensor(np.array(X_train))
y_train_neuron = torch.FloatTensor(np.array(y_train))
X_test_neuron = torch.FloatTensor(np.array(X_test))
y_test_neuron = torch.FloatTensor(np.array(y_test))

In [362]:
X_train.head(5)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,759,760,761,762,763,764,765,766,767,embedding
0,0.130422,-0.03717,-0.338869,-0.016837,0.108761,-0.292961,0.04354,0.561109,-0.314614,-0.0697,...,0.110325,-0.067451,0.171396,0.334391,0.243756,-0.053264,-0.275054,0.692342,0.353757,0.029657
2,0.208375,0.198959,0.058023,-0.024403,0.156402,-0.376268,0.228204,0.597776,-0.190187,-0.52586,...,-0.298644,-0.18857,-0.133327,0.049694,0.756743,0.089408,-0.387838,0.09802,0.058181,1.662843
3,0.043938,0.581405,0.015249,-0.03084,0.260595,-0.54307,0.019188,0.479385,-0.02363,-0.78208,...,0.175533,-0.296617,-0.207228,0.157452,0.30337,0.228893,0.082603,0.202067,0.58891,1.662843
4,-0.077238,-0.019634,-0.565421,0.345388,0.299098,-0.613959,0.047469,0.901103,-0.513558,-0.644246,...,0.127619,-0.365767,0.3036,0.263684,0.246732,0.125311,-0.23942,0.333488,-0.169922,1.784997
5,-0.225,0.120534,-0.425053,0.268838,-0.167923,-0.093963,-0.037915,0.820136,-0.38105,-0.529588,...,-0.457995,-0.405307,-0.363889,0.33032,0.251376,-0.274512,-0.145131,0.316766,0.018943,1.784997


In [102]:
X_train_neuron.shape

torch.Size([950, 769])

In [103]:
class Net(nn.Module):
    def __init__(self, input_size, hidden_sizes):
        super(Net, self).__init__()
        
        self.fc1 = nn.Linear(input_size, hidden_sizes[0])
        self.act1 = nn.ReLU()
        self.fc2 = nn.Linear(hidden_sizes[0], hidden_sizes[1])
        self.act2 = nn.ReLU()
        self.fc3 = nn.Linear(hidden_sizes[1], hidden_sizes[2])
        self.act3 = nn.ReLU()
        self.fc4 = nn.Linear(hidden_sizes[2], hidden_sizes[3])
        self.act4 = nn.ReLU()
        self.fc5 = nn.Linear(hidden_sizes[3], 1)
        self.act5 = nn.Sigmoid()
        
        nn.init.kaiming_uniform_(self.fc1.weight, mode='fan_in', nonlinearity='relu')
        nn.init.kaiming_uniform_(self.fc2.weight, mode='fan_in', nonlinearity='relu')
        nn.init.kaiming_uniform_(self.fc3.weight, mode='fan_in', nonlinearity='relu')
        nn.init.kaiming_uniform_(self.fc4.weight, mode='fan_in', nonlinearity='relu')
        nn.init.kaiming_uniform_(self.fc5.weight, mode='fan_in', nonlinearity='sigmoid')
        
    def forward(self, x):
            x = self.fc1(x)
            x = self.act1(x)
            x = self.fc2(x)
            x = self.act2(x)
            x = self.fc3(x)
            x = self.act3(x)
            x = self.fc4(x)
            x = self.act4(x)
            x = self.fc5(x)
            x = self.act5(x)
        
            return x

In [104]:
def train_model(trial):
    hidden_sizes =[
        trial.suggest_int('n_units_l1', 100, 1000),
        trial.suggest_int('n_units_l2', 100, 1000),
        trial.suggest_int('n_units_l3', 10, 500),
        trial.suggest_int('n_units_l4', 1, 100)
    ]
    
    learning_rate = trial.suggest_float('lr', 1e-5, 1e-1, log=True)
    batch_size = trial.suggest_categorical('batch_size', [64, 128, 256])
    optimizer_name = trial.suggest_categorical('optimizer', ['SGD', 'Adam'])
    momentum = trial.suggest_float('momentum', 0.0, 0.9) if optimizer_name == 'SGD' else None

    input_size = X_train_neuron.shape[1]
    net = Net(input_size, hidden_sizes)

    if optimizer_name == 'SGD':
        optimizer = optim.SGD(net.parameters(), lr=learning_rate, momentum=momentum)
    else:
        optimizer = optim.Adam(net.parameters(), lr=learning_rate)

    criterion = nn.MSELoss()

    num_epochs = 900
    num_batches = ceil(len(X_train_neuron) / batch_size)

    for epoch in range(num_epochs):
        net.train()
        order = np.random.permutation(len(X_train_neuron))
        
        for batch_idx in range(num_batches):
            start_index = batch_idx * batch_size
            optimizer.zero_grad()

            batch_indexes = order[start_index : start_index + batch_size]
            X_batch = X_train_neuron[batch_indexes]
            y_batch = y_train_neuron[batch_indexes]

            preds = net(X_batch).flatten()
            loss_value = criterion(preds, y_batch)

            loss_value.backward()
            optimizer.step()

        if epoch % 10 == 0 or epoch == num_epochs - 1:
            with torch.no_grad():
                net.eval()
                test_preds = net(X_test_neuron).flatten()
                rmse = torch.sqrt(criterion(test_preds, y_test_neuron))
                trial.report(rmse.item(), epoch)
                if trial.should_prune():
                    raise optuna.exceptions.TrialPruned()

    return rmse.item()

In [105]:
study = optuna.create_study(direction='minimize')
study.optimize(train_model, n_trials=100, timeout=6000)

[I 2024-07-11 11:46:46,172] A new study created in memory with name: no-name-291a2984-b090-4037-a667-915e4cc17958
[I 2024-07-11 11:48:15,372] Trial 0 finished with value: 0.4639803469181061 and parameters: {'n_units_l1': 315, 'n_units_l2': 549, 'n_units_l3': 333, 'n_units_l4': 79, 'lr': 0.035036559398821644, 'batch_size': 256, 'optimizer': 'Adam'}. Best is trial 0 with value: 0.4639803469181061.
[I 2024-07-11 11:51:04,737] Trial 1 finished with value: 0.5019779205322266 and parameters: {'n_units_l1': 162, 'n_units_l2': 852, 'n_units_l3': 248, 'n_units_l4': 63, 'lr': 0.0007879138004612055, 'batch_size': 64, 'optimizer': 'Adam'}. Best is trial 0 with value: 0.4639803469181061.
[I 2024-07-11 11:52:00,037] Trial 2 finished with value: 0.4431387186050415 and parameters: {'n_units_l1': 930, 'n_units_l2': 488, 'n_units_l3': 290, 'n_units_l4': 33, 'lr': 0.01236147854022546, 'batch_size': 256, 'optimizer': 'SGD', 'momentum': 0.33018880730797473}. Best is trial 2 with value: 0.4431387186050415.


Обучение нейросетей и подбор наилучших гипперпараметров при помощи optuna

In [106]:
print('Лучшие гиперпараметры: ', study.best_params)
print('Лучшее значение RMSE: ', study.best_value)

Лучшие гиперпараметры:  {'n_units_l1': 930, 'n_units_l2': 488, 'n_units_l3': 290, 'n_units_l4': 33, 'lr': 0.01236147854022546, 'batch_size': 256, 'optimizer': 'SGD', 'momentum': 0.33018880730797473}
Лучшее значение RMSE:  0.4431387186050415


In [107]:
num_epochs = 900
num_batches = ceil(len(X_train_neuron) / batch_size)

best_params = study.best_params
hidden_sizes = [
    best_params['n_units_l1'],
    best_params['n_units_l2'],
    best_params['n_units_l3'],
    best_params['n_units_l4']
]
learning_rate = best_params['lr']
batch_size = best_params['batch_size']
optimizer_name = best_params['optimizer']
momentum = best_params.get('momentum', 0.0)

net = Net(X_train_neuron.shape[1], hidden_sizes)

if optimizer_name == 'SGD':
    optimizer = optim.SGD(net.parameters(), lr=learning_rate, momentum=momentum)
else:
    optimizer = optim.Adam(net.parameters(), lr=learning_rate)

criterion = nn.MSELoss()

for epoch in range(num_epochs):
    net.train()
    order = np.random.permutation(len(X_train))
    
    for batch_idx in range(num_batches):
        start_index = batch_idx * batch_size
        optimizer.zero_grad()

        batch_indexes = order[start_index : start_index + batch_size]
        X_batch = X_train.iloc[batch_indexes].values
        y_batch = y_train.iloc[batch_indexes].values

        preds = net(torch.tensor(X_batch, dtype=torch.float32)).flatten()
        loss_value = criterion(preds, torch.tensor(y_batch, dtype=torch.float32))

        loss_value.backward()
        optimizer.step()

    if epoch % 10 == 0 or epoch == num_epochs - 1:
        with torch.no_grad():
            net.eval()
            test_preds = net(torch.tensor(X_test.values, dtype=torch.float32)).flatten()
            rmse = torch.sqrt(criterion(test_preds, torch.tensor(y_test.values, dtype=torch.float32)))
            print(f'Метрика RMSE на {epoch} итерации =', rmse.item())

Метрика RMSE на 0 итерации = 0.4178735613822937
Метрика RMSE на 10 итерации = 0.4142720699310303
Метрика RMSE на 20 итерации = 0.4130581021308899
Метрика RMSE на 30 итерации = 0.4121643900871277
Метрика RMSE на 40 итерации = 0.4117955267429352
Метрика RMSE на 50 итерации = 0.41139763593673706
Метрика RMSE на 60 итерации = 0.4111558496952057
Метрика RMSE на 70 итерации = 0.4109346866607666
Метрика RMSE на 80 итерации = 0.4108116626739502
Метрика RMSE на 90 итерации = 0.41079193353652954
Метрика RMSE на 100 итерации = 0.4118150472640991
Метрика RMSE на 110 итерации = 0.4109261631965637
Метрика RMSE на 120 итерации = 0.41094255447387695
Метрика RMSE на 130 итерации = 0.41163089871406555
Метрика RMSE на 140 итерации = 0.4113667607307434
Метрика RMSE на 150 итерации = 0.41176873445510864
Метрика RMSE на 160 итерации = 0.41186630725860596
Метрика RMSE на 170 итерации = 0.4118007719516754
Метрика RMSE на 180 итерации = 0.41183358430862427
Метрика RMSE на 190 итерации = 0.41255059838294983
Мет

Обучение лучшей модели

In [354]:
net

Net(
  (fc1): Linear(in_features=769, out_features=930, bias=True)
  (act1): ReLU()
  (fc2): Linear(in_features=930, out_features=488, bias=True)
  (act2): ReLU()
  (fc3): Linear(in_features=488, out_features=290, bias=True)
  (act3): ReLU()
  (fc4): Linear(in_features=290, out_features=33, bias=True)
  (act4): ReLU()
  (fc5): Linear(in_features=33, out_features=1, bias=True)
  (act5): Sigmoid()
)

## 7. Тестирование модели

Настало время протестировать модель. Для этого получите эмбеддинги для всех тестовых изображений из папки `test_images`, выберите случайные 10 запросов из файла `test_queries.csv` и для каждого запроса выведите наиболее релевантное изображение. Сравните визуально качество поиска.

In [109]:
embeddings_img_test_df.head(5)

Unnamed: 0,image,embedding
0,1177994172_10d143cb8d.jpg,0.0
1,1232148178_4f45cc3284.jpg,0.0
2,123997871_6a9ca987b1.jpg,0.0
3,1319634306_816f21677f.jpg,0.508549
4,1429546659_44cb09cbe2.jpg,0.740343


In [213]:
embeddings_img_test_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 2 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   image      100 non-null    object 
 1   embedding  100 non-null    float64
dtypes: float64(1), object(1)
memory usage: 1.7+ KB


In [450]:
test_queries_filtered_data.head(5)

Unnamed: 0.1,Unnamed: 0,query_id,query_text,image
11,11,123997871_6a9ca987b1.jpg#1,Several female lacrosse players are going afte...,123997871_6a9ca987b1.jpg
13,13,123997871_6a9ca987b1.jpg#3,The woman lacrosse player in blue is about to ...,123997871_6a9ca987b1.jpg
14,14,123997871_6a9ca987b1.jpg#4,Women play lacrosse .,123997871_6a9ca987b1.jpg
15,15,1319634306_816f21677f.jpg#0,A brown dog is sitting in some long grass .,1319634306_816f21677f.jpg
16,16,1319634306_816f21677f.jpg#1,A brown dog sits still on a hillside .,1319634306_816f21677f.jpg


In [452]:
embeddings_img_test_df.loc[embeddings_img_test_df['image'] == '1319634306_816f21677f.jpg']

Unnamed: 0,image,embedding
3,1319634306_816f21677f.jpg,0.508549


In [454]:
test_queries_filtered_data.loc[test_queries_filtered_data['image'] == '1319634306_816f21677f.jpg']

Unnamed: 0.1,Unnamed: 0,query_id,query_text,image
15,15,1319634306_816f21677f.jpg#0,A brown dog is sitting in some long grass .,1319634306_816f21677f.jpg
16,16,1319634306_816f21677f.jpg#1,A brown dog sits still on a hillside .,1319634306_816f21677f.jpg
17,17,1319634306_816f21677f.jpg#2,A large tan dog sits on a grassy hill .,1319634306_816f21677f.jpg
18,18,1319634306_816f21677f.jpg#3,A large yellow dog is sitting on a hill .,1319634306_816f21677f.jpg
19,19,1319634306_816f21677f.jpg#4,The dog is sitting on the side of the hill .,1319634306_816f21677f.jpg


In [460]:
test_data = test_queries_filtered_data.copy()

In [468]:
test_data = test_queries_filtered_data.merge(embeddings_img_test_df, on='image', how='inner')

In [470]:
test_data.head(5)

Unnamed: 0.1,Unnamed: 0,query_id,query_text,image,embedding
0,11,123997871_6a9ca987b1.jpg#1,Several female lacrosse players are going afte...,123997871_6a9ca987b1.jpg,0.0
1,13,123997871_6a9ca987b1.jpg#3,The woman lacrosse player in blue is about to ...,123997871_6a9ca987b1.jpg,0.0
2,14,123997871_6a9ca987b1.jpg#4,Women play lacrosse .,123997871_6a9ca987b1.jpg,0.0
3,15,1319634306_816f21677f.jpg#0,A brown dog is sitting in some long grass .,1319634306_816f21677f.jpg,0.508549
4,16,1319634306_816f21677f.jpg#1,A brown dog sits still on a hillside .,1319634306_816f21677f.jpg,0.508549


In [472]:
test_data.shape[0]

340

In [474]:
def ten_texts():
    texts = []
    for i in range(0,10):
        texts.append(test_queries_filtered_data.iloc[rd.randint(0, test_queries_filtered_data.shape[0])]['query_text'])
    return texts

In [476]:
texts = ten_texts()

In [478]:
texts

['A dog jumps to catch a ball in the surf .',
 'Two women , one with a head bandanna , are standing next to each other while one holds a bottle .',
 'a man leans against a large robot .',
 'A woman sitting at a sewing machine looks up .',
 'Woman in white bikini top and blue shorts with body of water in the background .',
 'The lady in the multi-colored shirt has a necklace on a white object in her hand .',
 'A brown dog and a black and white dog stand beside a hole in the dirt .',
 'Camera man in populated building taping an event .',
 'An elderly woman sits on a tree stump with a white dog .',
 'A man in a flowered bathing suit waterskies on one ski while being pulled by a rope .']

In [480]:
tokenized_texts_new = [tokenizer.encode(text, add_special_tokens=True) for text in texts]
max_length_new = max(len(text) for text in tokenized_texts_new)
padded_new = [vector + [0]*(max_length_new - len(vector)) for vector in tokenized_texts_new]
attention_mask_new = np.where(np.array(padded_new) != 0, 1, 0)

In [482]:
len(texts)

10

In [484]:
model.eval()
embeddings_new = []

In [486]:
for i in range(len(texts)):
    input_ids_new = torch.LongTensor([padded_new[i]])
    attention_mask_new_tensor = torch.LongTensor([attention_mask_new[i]])
    
    with torch.no_grad():
        outputs_new = model(input_ids=input_ids_new, attention_mask=attention_mask_new_tensor)
    
    embedding_new = outputs_new.last_hidden_state[:, 0, :]
    embeddings_new.append(embedding_new.numpy())

In [493]:
embeddings_new = np.vstack(embeddings_new)

In [495]:
embeddings_df_new = pd.DataFrame(embeddings_new)

In [497]:
embeddings_df_new.head(5)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,758,759,760,761,762,763,764,765,766,767
0,-0.293052,-0.102268,-0.75638,0.111458,0.24485,-0.099602,0.073888,1.059385,-0.64436,-0.197928,...,0.170011,0.169519,0.097043,0.224412,0.453495,0.686391,0.605305,-0.165152,0.259473,0.327376
1,0.127684,0.54172,-0.42616,-0.22086,-0.022864,-0.35482,0.237999,0.332168,-0.272202,-0.394,...,-0.527836,-0.073376,-0.032632,-0.071418,-0.136802,0.496716,-0.01961,0.061686,0.497307,0.327462
2,0.089423,0.278459,-0.444158,-0.119649,0.107583,-0.262254,-0.163501,0.648039,-0.328425,-0.56615,...,-0.270418,0.3953,-0.244473,0.19668,0.18715,0.770325,0.214962,-0.301492,0.198887,0.187087
3,0.513452,0.185382,-0.306839,-0.205825,0.395236,-0.081755,0.132239,0.446152,-0.561193,-0.608231,...,-0.437483,0.311197,-0.34565,-0.03364,-0.089878,0.370564,-0.089534,-0.159521,0.001081,0.734053
4,-0.344435,-0.520848,-0.578598,0.089004,-0.109726,-0.094075,0.08172,0.951913,-0.670697,-0.292546,...,0.206207,-0.24975,-0.256411,-0.170797,-0.07748,0.352324,-0.005063,-0.485721,0.44175,-0.143674


In [499]:
test_embending = embeddings_df_new.join(test_data['embedding'], how='inner')

In [531]:
test_embending.head(5)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,759,760,761,762,763,764,765,766,767,embedding
0,-0.293052,-0.102268,-0.75638,0.111458,0.24485,-0.099602,0.073888,1.059385,-0.64436,-0.197928,...,0.169519,0.097043,0.224412,0.453495,0.686391,0.605305,-0.165152,0.259473,0.327376,0.0
1,0.127684,0.54172,-0.42616,-0.22086,-0.022864,-0.35482,0.237999,0.332168,-0.272202,-0.394,...,-0.073376,-0.032632,-0.071418,-0.136802,0.496716,-0.01961,0.061686,0.497307,0.327462,0.0
2,0.089423,0.278459,-0.444158,-0.119649,0.107583,-0.262254,-0.163501,0.648039,-0.328425,-0.56615,...,0.3953,-0.244473,0.19668,0.18715,0.770325,0.214962,-0.301492,0.198887,0.187087,0.0
3,0.513452,0.185382,-0.306839,-0.205825,0.395236,-0.081755,0.132239,0.446152,-0.561193,-0.608231,...,0.311197,-0.34565,-0.03364,-0.089878,0.370564,-0.089534,-0.159521,0.001081,0.734053,0.508549
4,-0.344435,-0.520848,-0.578598,0.089004,-0.109726,-0.094075,0.08172,0.951913,-0.670697,-0.292546,...,-0.24975,-0.256411,-0.170797,-0.07748,0.352324,-0.005063,-0.485721,0.44175,-0.143674,0.508549


In [527]:
test_embending.info()

<class 'pandas.core.frame.DataFrame'>
Index: 10 entries, 0 to 9
Columns: 769 entries, 0 to embedding
dtypes: float32(768), float64(1)
memory usage: 30.4 KB


In [529]:
net

Net(
  (fc1): Linear(in_features=769, out_features=930, bias=True)
  (act1): ReLU()
  (fc2): Linear(in_features=930, out_features=488, bias=True)
  (act2): ReLU()
  (fc3): Linear(in_features=488, out_features=290, bias=True)
  (act3): ReLU()
  (fc4): Linear(in_features=290, out_features=33, bias=True)
  (act4): ReLU()
  (fc5): Linear(in_features=33, out_features=1, bias=True)
  (act5): Sigmoid()
)

In [533]:
test_embending.shape

(10, 769)

In [543]:
for item in texts:
    print(item)
    res = net.forward(torch.FloatTensor(test_embending)).flatten().detach().numpy()
    print(res.argmax())

A dog jumps to catch a ball in the surf .


ValueError: expected sequence of length 10 at dim 0 (got 769)

<div class="alert" style="background-color:#bfbf7f;color:#000000">
    <font size="3"><b>Комментарий студента</b></font>Привет. Я в ступоре, не понимаю, что мне делать дальше
k

</div>

## 8. Выводы

- Jupyter Notebook открыт
- Весь код выполняется без ошибок
- Ячейки с кодом расположены в порядке исполнения
- Исследовательский анализ данных выполнен
- Проверены экспертные оценки и краудсорсинговые оценки
- Из датасета исключены те объекты, которые выходят за рамки юридических ограничений
- Изображения векторизованы
- Текстовые запросы векторизованы
- Данные корректно разбиты на тренировочную и тестовую выборки
- Предложена метрика качества работы модели
- Предложена модель схожести изображений и текстового запроса
- Модель обучена
- По итогам обучения модели сделаны выводы
- Проведено тестирование работы модели
- По итогам тестирования визуально сравнили качество поиска