# Создание модели с коллаборативной фильтрацией с применением Torch CUDA

В этой части выполнения задачи сформируем данные для создания рекомендательной модели. Для этого будем использовать полносвязную нейронную сеть из библиотеки `torch` с ипользованием `CUDA`

**Задача**

В группе компаний Тинькофф есть команда edTech, создающая платформу для обучающих курсов.
У команды edTech возникает вопрос, а какие курсы сильнее всего влияют на рабочие показатели сотрудников в колл-центре.
Помогите составить рекомендации, какие обучающие курсы стоит проходить сотрудникам, а какие курсы стоит убрать с edTech платформы.
Решение может быть как рекомендательной моделью для каждого из сотрудников, так и основано на бизнес-правилах и статистическом анализе (например, выделить для каждого департамента полезные курсы).

# Описание таблиц

**employees**

Информация о сотрудниках колл-центра
Поля:
- employee_id - идентификатор сотрудника
- sex – пол
- region - идентификатор федерального округа
- age – возраст
- head_employee_id – идентификатор руководителя
- exp_days – опыт в днях
- edu_degree – уровень образования
- department_id – индентификатор департамента, в котором работает сотрудник
- work_online_flg – флаг работы на удалённом режиме

**communications**

Информация о рабочих показателях сотрудников. Рассматривались рабочие коммункации операторов колл-центра
Поля:
- communication_id – идентификатор коммуникации
- communication_dt – дата коммуникации
- employee_id - идентификатор сотрудника
- communication_score – оценка качества коммуникации
- util_flg – флаг того, что клиент воспользовался банковским продуктом в течение 2 недель

**courses_passing**

Статиситка прохождения обучающих курсов сотрудниками
- course_id – идентификатор курса
- employee_id - идентификатор сотрудника
- pass_frac – доля прохождения курса
- start_dt – дата начала прохождения
- last_activity_dt – последняя активность сотрудника в обучающем курсе
- end_dt – дата окончания обучения. Если обучение пройдено не полностью, то NaN
- educ_duration_days – длительность полного обучения в днях. Если обучение пройдено не полностью, то NaN

**courses_info**

Информация о курсах
- course_id – идентификатор курса
- course_nm – название курса

**course_employee_sms**

Сводная таблица с нотификациями сотрудникам с предложением пройти обучение. Нотификации рассылались случайным образом
Поля:
- employee_id - идентификатор сотрудника
- course_i – флаг наличия нотификации

In [1]:
# импортируем необходимые библиотеки
import pandas as pd
import numpy as np

from sklearn.preprocessing import StandardScaler, LabelEncoder

from sklearn.model_selection import train_test_split
from sklearn.utils import resample

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, Dataset

import warnings

In [2]:
# Установка параметра отображения всех строк
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)

In [3]:
warnings.filterwarnings('ignore')

In [4]:
# Установка опции для отображения нормальных чисел
pd.set_option('display.float_format', lambda x: '%.6f' % x)

## Объединие таблиц в один датасет

Решили оставить создание итогового датасета, так как его загрузка и сохранение занмиает значительное количество времени. 
Сохраним все этапы создания `full_data.csv`.

Загрузим все датасеты

In [5]:
# Загрузим датасеты
communications = pd.read_csv('../data/src/communications.csv', sep=';', dtype={'employee_id': 'category'})
courses_passing = pd.read_csv('../data/src/courses_passing.csv', sep=';',  dtype={'employee_id': 'category'})
employees = pd.read_csv('../data/src/employees.csv', sep=';', dtype={'employee_id': 'category', 'head_employee_id': 'category', 'sex': 'category'})
courses_info = pd.read_csv('../data/src/courses_info.csv', sep=';')

Преобразуем данные и отсортируем перед объединением таблиц `communications`, `courses_passing`

In [6]:
# Преобразование даты в datetime
communications['communication_dt'] = pd.to_datetime(communications['communication_dt'])
courses_passing['end_dt'] = pd.to_datetime(courses_passing['end_dt'])

# Преобразование employee_id в строковый тип в обеих таблицах
communications['employee_id'] = communications['employee_id'].astype(str)
courses_passing['employee_id'] = courses_passing['employee_id'].astype(str)

# Сортировка данных перед объединением
communications_sorted = communications.sort_values(by='communication_dt')
courses_passing_sorted = courses_passing[courses_passing['end_dt'].notna()].sort_values(by='end_dt')

Объединим полученные таблицы

In [7]:
# Оптимизированное объединение с использованием merge_asof
merged_data = pd.merge_asof(
    communications_sorted,
    courses_passing_sorted,
    by='employee_id',
    left_on='communication_dt',
    right_on='end_dt',
    direction='backward'  # Используем ближайшее значение end_dt, которое не позже communication_dt
)

In [8]:
del communications_sorted
del courses_passing_sorted

Для каждого курса создадим отдельный столбец и на дату коммуникации он будет показывать прошел его сотрудник или нет

In [9]:
# Создание флагов для каждого курса
for i in range(92):  # Предполагаем, что курсы нумеруются от 0 до 91
    merged_data[f'course_{i}'] = np.where(merged_data['course_id'] == i, 1, np.nan)

In [10]:
# Применение ffill для каждого курса по каждому сотруднику
for i in range(92):
    merged_data[f'course_{i}'] = merged_data.groupby('employee_id')[f'course_{i}'].ffill()

Удалим ненужные столбцы из датасета

In [11]:
# Очистка от временных и ненужных столбцов
final_data = merged_data.drop(columns=['course_id', 'pass_frac', 'start_dt', 'end_dt', 'last_activity_dt', 'educ_duration_days'])

In [12]:
# Сохранение исходного порядка строк
final_data = final_data.sort_index()

In [None]:
# Рассмотрим первые строки
final_data.head()

Unnamed: 0,communication_id,communication_dt,employee_id,communication_score,util_flg,course_0,course_1,course_2,course_3,course_4,course_5,course_6,course_7,course_8,course_9,course_10,course_11,course_12,course_13,course_14,course_15,course_16,course_17,course_18,course_19,course_20,course_21,course_22,course_23,course_24,course_25,course_26,course_27,course_28,course_29,course_30,course_31,course_32,course_33,course_34,course_35,course_36,course_37,course_38,course_39,course_40,course_41,course_42,course_43,course_44,course_45,course_46,course_47,course_48,course_49,course_50,course_51,course_52,course_53,course_54,course_55,course_56,course_57,course_58,course_59,course_60,course_61,course_62,course_63,course_64,course_65,course_66,course_67,course_68,course_69,course_70,course_71,course_72,course_73,course_74,course_75,course_76,course_77,course_78,course_79,course_80,course_81,course_82,course_83,course_84,course_85,course_86,course_87,course_88,course_89,course_90,course_91
0,265773861079506507,2023-01-01,cf2226dd-d41b-1a2d-0ae5-1dab54d32c36,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1,278568857626326381,2023-01-01,7f5d04d1-89df-b634-e6a8-5bb9d9adf21e,68,1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
2,466811215985540640,2023-01-01,04ecb1fa-2850-6ccb-6f72-b12c0245ddbc,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
3,187483347234781892,2023-01-01,af3303f8-52ab-eccd-7930-68486a391626,100,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
4,47065300189886434,2023-01-01,16026d60-ff9b-5441-0b34-35b403afd226,0,1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


Видим, что создались необходимые столбцы для каждого курса. Есть пропуски, но обработаем их чуть позже

In [13]:
del merged_data

In [14]:
# Посмотрим информацию
final_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5345246 entries, 0 to 5345245
Data columns (total 97 columns):
 #   Column               Dtype         
---  ------               -----         
 0   communication_id     int64         
 1   communication_dt     datetime64[ns]
 2   employee_id          object        
 3   communication_score  int64         
 4   util_flg             int64         
 5   course_0             float64       
 6   course_1             float64       
 7   course_2             float64       
 8   course_3             float64       
 9   course_4             float64       
 10  course_5             float64       
 11  course_6             float64       
 12  course_7             float64       
 13  course_8             float64       
 14  course_9             float64       
 15  course_10            float64       
 16  course_11            float64       
 17  course_12            float64       
 18  course_13            float64       
 19  course_14            

Начнем с объединения таблиц `employees` и `final_data`. В качестве ключа используем `employee_id` 

In [15]:
# Объединение данных
full_data = pd.merge(employees, final_data, on='employee_id', how='inner')

## Обработка полученного датасета

Обработаем пропуски в датасете `full_data`

In [16]:
# Заполнение NaN нулями для всех курсовых столбцов
for i in range(92):  # курсы нумеруются от 0 до 91
    column_name = f'course_{i}'
    full_data[column_name] = full_data[column_name].fillna(0)

Теперь создадим дополнительный признак `communication_score_change`, который будет показывать скользящее среднее разницу между средним значением `communication_score` до и после даты коммуникации. Период возьмем в 30 дней.

In [17]:
# Установка временного индекса
full_data.set_index('communication_dt', inplace=True)

In [18]:
# Сортировка данных по дате коммуникации
full_data.sort_values(by=['employee_id', 'communication_dt'], inplace=True)

In [19]:
# Расчет скользящего среднего для 30 дней до и после каждой даты для каждого сотрудника
full_data['communication_score_before'] = full_data.groupby('employee_id')['communication_score']\
    .rolling(window='30D', closed='left').mean().shift(1).reset_index(level=0, drop=True)

full_data['communication_score_after'] = full_data.groupby('employee_id')['communication_score']\
    .rolling(window='30D', closed='right').mean().shift(-1).reset_index(level=0, drop=True)

In [20]:
# Сброс индекса для возвращения к исходному формату
full_data.reset_index(inplace=True)

In [21]:
# Сортировка данных по дате коммуникации
full_data.sort_values(by=['communication_dt'], inplace=True)

In [22]:
# Расчет изменения и сохранение в новый столбец
full_data['communication_score_change'] = full_data['communication_score_after'] - full_data['communication_score_before']

Добавим дополнительные временные признаки из `communication_dt`

In [23]:
# Добавляем столбцы с годом, месяцем и днем
full_data['year'] = full_data['communication_dt'].dt.year
full_data['month'] = full_data['communication_dt'].dt.month
full_data['day'] = full_data['communication_dt'].dt.day

In [24]:
full_data.head()

Unnamed: 0,communication_dt,employee_id,sex,region,age,head_employee_id,exp_days,edu_degree,department_id,work_online_flg,communication_id,communication_score,util_flg,course_0,course_1,course_2,course_3,course_4,course_5,course_6,course_7,course_8,course_9,course_10,course_11,course_12,course_13,course_14,course_15,course_16,course_17,course_18,course_19,course_20,course_21,course_22,course_23,course_24,course_25,course_26,course_27,course_28,course_29,course_30,course_31,course_32,course_33,course_34,course_35,course_36,course_37,course_38,course_39,course_40,course_41,course_42,course_43,course_44,course_45,course_46,course_47,course_48,course_49,course_50,course_51,course_52,course_53,course_54,course_55,course_56,course_57,course_58,course_59,course_60,course_61,course_62,course_63,course_64,course_65,course_66,course_67,course_68,course_69,course_70,course_71,course_72,course_73,course_74,course_75,course_76,course_77,course_78,course_79,course_80,course_81,course_82,course_83,course_84,course_85,course_86,course_87,course_88,course_89,course_90,course_91,communication_score_before,communication_score_after,communication_score_change,year,month,day
1296160,2023-01-01,3a077244-3a07-3914-1292-a5429b952fe6,F,4,47,d9d4f495-e875-a2e0-75a1-a4a6e1b9770f,354,2,1,0,757195518054963759,61,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,46.23301,80.5,34.26699,2023,1,1
1103250,2023-01-01,31b3b31a-1c2f-8a37-0206-f111127c0dbd,F,3,41,d1f491a4-04d6-8548-8094-3e5c3cd9ca25,665,1,2,0,962669936512349950,0,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,28.0,,2023,1,1
1103251,2023-01-01,31b3b31a-1c2f-8a37-0206-f111127c0dbd,F,3,41,d1f491a4-04d6-8548-8094-3e5c3cd9ca25,665,1,2,0,30857629143646893,0,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,21.0,,2023,1,1
1103252,2023-01-01,31b3b31a-1c2f-8a37-0206-f111127c0dbd,F,3,41,d1f491a4-04d6-8548-8094-3e5c3cd9ca25,665,1,2,0,595769366001190172,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,16.8,,2023,1,1
4167623,2023-01-01,cb79f8fa-58b9-1d3a-f6c9-c991f63962d3,M,4,28,2723d092-b638-85e0-d7c2-60cc007e8b9d,1306,2,0,1,863787866877216311,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,26.5,,2023,1,1


Видим пропуски в столбце `communication_score_change`. Этих пропусков совсем немного. Удалим их

In [25]:
# Сброс индекса для возвращения к исходному формату
full_data.reset_index(inplace=True)

In [26]:
full_data.head()

Unnamed: 0,index,communication_dt,employee_id,sex,region,age,head_employee_id,exp_days,edu_degree,department_id,work_online_flg,communication_id,communication_score,util_flg,course_0,course_1,course_2,course_3,course_4,course_5,course_6,course_7,course_8,course_9,course_10,course_11,course_12,course_13,course_14,course_15,course_16,course_17,course_18,course_19,course_20,course_21,course_22,course_23,course_24,course_25,course_26,course_27,course_28,course_29,course_30,course_31,course_32,course_33,course_34,course_35,course_36,course_37,course_38,course_39,course_40,course_41,course_42,course_43,course_44,course_45,course_46,course_47,course_48,course_49,course_50,course_51,course_52,course_53,course_54,course_55,course_56,course_57,course_58,course_59,course_60,course_61,course_62,course_63,course_64,course_65,course_66,course_67,course_68,course_69,course_70,course_71,course_72,course_73,course_74,course_75,course_76,course_77,course_78,course_79,course_80,course_81,course_82,course_83,course_84,course_85,course_86,course_87,course_88,course_89,course_90,course_91,communication_score_before,communication_score_after,communication_score_change,year,month,day
0,1296160,2023-01-01,3a077244-3a07-3914-1292-a5429b952fe6,F,4,47,d9d4f495-e875-a2e0-75a1-a4a6e1b9770f,354,2,1,0,757195518054963759,61,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,46.23301,80.5,34.26699,2023,1,1
1,1103250,2023-01-01,31b3b31a-1c2f-8a37-0206-f111127c0dbd,F,3,41,d1f491a4-04d6-8548-8094-3e5c3cd9ca25,665,1,2,0,962669936512349950,0,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,28.0,,2023,1,1
2,1103251,2023-01-01,31b3b31a-1c2f-8a37-0206-f111127c0dbd,F,3,41,d1f491a4-04d6-8548-8094-3e5c3cd9ca25,665,1,2,0,30857629143646893,0,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,21.0,,2023,1,1
3,1103252,2023-01-01,31b3b31a-1c2f-8a37-0206-f111127c0dbd,F,3,41,d1f491a4-04d6-8548-8094-3e5c3cd9ca25,665,1,2,0,595769366001190172,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,16.8,,2023,1,1
4,4167623,2023-01-01,cb79f8fa-58b9-1d3a-f6c9-c991f63962d3,M,4,28,2723d092-b638-85e0-d7c2-60cc007e8b9d,1306,2,0,1,863787866877216311,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,26.5,,2023,1,1


In [27]:
# Удаляем лишние столбцы
full_data_cleaned = full_data.drop(columns=['index', 'head_employee_id', 'communication_id', 'communication_dt',
                                    'communication_score_before', 'communication_score_after', 'communication_score', 'util_flg'])

In [28]:
full_data_cleaned.isna().sum()

employee_id                      0
sex                              0
region                           0
age                              0
exp_days                         0
edu_degree                       0
department_id                    0
work_online_flg                  0
course_0                         0
course_1                         0
course_2                         0
course_3                         0
course_4                         0
course_5                         0
course_6                         0
course_7                         0
course_8                         0
course_9                         0
course_10                        0
course_11                        0
course_12                        0
course_13                        0
course_14                        0
course_15                        0
course_16                        0
course_17                        0
course_18                        0
course_19                        0
course_20           

In [29]:
# Удалим пропуски
full_data_cleaned = full_data_cleaned.dropna(subset='communication_score_change')

In [30]:
full_data_cleaned.isna().sum()

employee_id                   0
sex                           0
region                        0
age                           0
exp_days                      0
edu_degree                    0
department_id                 0
work_online_flg               0
course_0                      0
course_1                      0
course_2                      0
course_3                      0
course_4                      0
course_5                      0
course_6                      0
course_7                      0
course_8                      0
course_9                      0
course_10                     0
course_11                     0
course_12                     0
course_13                     0
course_14                     0
course_15                     0
course_16                     0
course_17                     0
course_18                     0
course_19                     0
course_20                     0
course_21                     0
course_22                     0
course_2

Видим что теперь пропусков теперь нет.

Теперь разделим датасет на два периода и закодируем столбцы `employee_id`, `course_id`

In [31]:
# Проверка порядка индекса
if full_data_cleaned.index.is_monotonic_increasing:
    print("Временной ряд идет по возрастанию.")
elif full_data_cleaned.index.is_monotonic_decreasing:
    print("Временной ряд идет по убыванию.")
else:
    print("Временной ряд не отсортирован.")

Временной ряд идет по возрастанию.


In [32]:
# Инициализация кодировщиков и масштабировщика
le_employee = LabelEncoder()
le_sex = LabelEncoder()
scaler = StandardScaler()

In [33]:
# Проверяем типы данных в столбцах
print(full_data_cleaned.dtypes)

employee_id                     object
sex                           category
region                           int64
age                              int64
exp_days                         int64
edu_degree                       int64
department_id                    int64
work_online_flg                  int64
course_0                       float64
course_1                       float64
course_2                       float64
course_3                       float64
course_4                       float64
course_5                       float64
course_6                       float64
course_7                       float64
course_8                       float64
course_9                       float64
course_10                      float64
course_11                      float64
course_12                      float64
course_13                      float64
course_14                      float64
course_15                      float64
course_16                      float64
course_17                

Скорректируем тип данных в столбцах с информацией о прохождении курсов

In [34]:
# Список столбцов с префиксом "course_"
course_columns = [col for col in full_data_cleaned.columns if col.startswith('course_')]

# Изменение типа данных столбцов на int
full_data_cleaned[course_columns] = full_data_cleaned[course_columns].astype(int)

# Список столбцов для коррекции типа данных на category
category_columns = ['region', 'edu_degree', 'department_id', 'work_online_flg']

# Коррекция типа данных на category
full_data_cleaned[category_columns] = full_data_cleaned[category_columns].astype('category')

In [35]:
# Проверяем типы данных в столбцах
print(full_data_cleaned.dtypes)

employee_id                     object
sex                           category
region                        category
age                              int64
exp_days                         int64
edu_degree                    category
department_id                 category
work_online_flg               category
course_0                         int32
course_1                         int32
course_2                         int32
course_3                         int32
course_4                         int32
course_5                         int32
course_6                         int32
course_7                         int32
course_8                         int32
course_9                         int32
course_10                        int32
course_11                        int32
course_12                        int32
course_13                        int32
course_14                        int32
course_15                        int32
course_16                        int32
course_17                

In [36]:
# Кодируем категориальные переменные
full_data_cleaned['employee_id'] = le_employee.fit_transform(full_data_cleaned['employee_id'].astype(str))
full_data_cleaned['sex'] = le_sex.fit_transform(full_data_cleaned['sex'].astype(str))

# Масштабирование числовых переменных
full_data_cleaned['age'] = scaler.fit_transform(full_data_cleaned[['age']])
full_data_cleaned['exp_days'] = scaler.fit_transform(full_data_cleaned[['exp_days']])

In [37]:
full_data_cleaned['employee_id'].unique()

array([ 577, 1860,  494, ...,  885, 2109,  176])

Теперь для создания рекомендательной модели отфильтруем данные, где `communication_score_change` положительный. То есть будем оценивать только положительные изменения в оценках и создадим дополнительные метки с положительным изменением оценок коммуникации

In [38]:
# Определение полезности прохождения курса
full_data_cleaned['positive_change'] = (full_data_cleaned['communication_score_change'] > 0).astype(int)

Для каждого курса и сотрудника создаем бинарную метку `positive_change`, которая равна 1, если изменение `communication_score_change` после прохождения курса положительное, и 0 в противном случае.

Подготовка данных перед обучением:
1. **Индексация**: Данные сортируются по дате для временной целостности и разделяются на обучающую и тестовую выборки на основе 80/20 распределения.
2. **Метки и признаки**:
    - **Метки (Y)**: Создаются путем умножения статуса прохождения каждого курса на метку `positive_change`. Таким образом, мы оцениваем полезность курса.
    - **Признаки (X)**: Используются все доступные данные, исключая непосредственно оценки изменения и статусы прохождения курсов.

In [39]:
# Определение индекса для разделения
split_index = int(len(full_data_cleaned) * 0.8)

# Метки определяются на основе прохождения курсов и положительного изменения
labels = full_data_cleaned[[f'course_{i}' for i in range(92)]] * full_data_cleaned['positive_change'].values[:, None]

# Признаки для модели
features = full_data_cleaned.drop(columns=['communication_score_change'] + [f'course_{i}' for i in range(92)])

# Разделение данных на обучающую и тестовую выборки
X_train = features.iloc[:split_index]
Y_train = labels.iloc[:split_index]
X_test = features.iloc[split_index:]
Y_test = labels.iloc[split_index:]

## Создание модели с коллаборативной фильтрацией

Используется полносвязная нейронная сеть с двумя скрытыми слоями по 128 нейронов в каждом и функциями активации ReLU. Выходной слой с применением сигмоиды предсказывает вероятность полезности каждого курса.

In [40]:
# Определение модели
class NeuralNet(nn.Module):
    def __init__(self, input_features, num_courses=92):
        """
        Инициализация нейронной сети с двумя скрытыми слоями и выходным слоем.
        
        Параметры:
            input_features (int): Количество входных признаков.
            num_courses (int): Количество курсов, для которых необходимо предсказать вероятность полезности.
        """
        super(NeuralNet, self).__init__()
        
        # Первый полносвязный слой с активацией ReLU
        self.fc1 = nn.Linear(input_features, 128)
        
        # Второй полносвязный слой с активацией ReLU
        self.fc2 = nn.Linear(128, 128)
        
        # Выходной полносвязный слой, преобразующий признаки в вероятности полезности курсов
        self.output_layer = nn.Linear(128, num_courses)
        
        # Функция активации ReLU
        self.relu = nn.ReLU()
        
        # Функция активации Sigmoid, применяемая к выходным данным
        self.sigmoid = nn.Sigmoid()
    
    def forward(self, x):
        """
        Прямой проход модели: преобразует входные данные в вероятности полезности курсов.
        
        Параметры:
            x (Tensor): Входной тензор с признаками размерности (batch_size, input_features)
        
        Возвращает:
            Tensor: Тензор размерности (batch_size, num_courses) с вероятностями полезности каждого курса.
        """
        x = self.relu(self.fc1(x))  # Применение первого слоя и ReLU
        x = self.relu(self.fc2(x))  # Применение второго слоя и ReLU
        x = self.sigmoid(self.output_layer(x))  # Получение выходных вероятностей через sigmoid
        return x

Модель обучается с использованием функции потерь `BCELoss` и оптимизатора `AdamW`. Тренировка продолжается в течение 100 эпох с выводом значения функции потерь каждые 5 эпох для мониторинга процесса обучения.

In [41]:
# Инициализация модели
model = NeuralNet(input_features=X_train.shape[1]).cuda()
# Объяснение:
# model - экземпляр класса NeuralNet, инициализируемый с числом входных признаков, равным количеству колонок в X_train.
# .cuda() - переводит модель на GPU для ускорения вычислений

# Инициализация функции потерь и оптимизатора для обучения модели
criterion = nn.BCELoss()  # Функция потерь Binary Cross-Entropy для бинарной классификации
optimizer = torch.optim.AdamW(model.parameters(), lr=0.001)
# Объяснение:
# criterion - используется для вычисления ошибки между предсказанными и реальными метками.
# optimizer - оптимизатор AdamW, который является модификацией Adam, часто используемый для глубокого обучения.
# model.parameters() - передает параметры модели в оптимизатор.
# lr=0.001 - скорость обучения, определяющая величину шага обновления весов в процессе обучения.

In [42]:
# Создание тензоров после очистки и преобразования типов
train_features = torch.tensor(X_train.values, dtype=torch.float32).cuda()
train_targets = torch.tensor(Y_train.values, dtype=torch.float32).cuda()
test_features = torch.tensor(X_test.values, dtype=torch.float32).cuda()
test_targets = torch.tensor(Y_test.values, dtype=torch.float32).cuda()

In [43]:
# Обучение модели
for epoch in range(100):
    model.train()  # Устанавливает модель в режим обучения, который включает трекинг градиентов
    optimizer.zero_grad()  # Обнуляет градиенты перед началом каждой эпохи, чтобы избежать их накопления

    outputs = model(train_features)  # Прямой проход: получение предсказаний модели для обучающего набора
    loss = criterion(outputs, train_targets)  # Вычисление потерь между предсказанными и истинными метками
    loss.backward()  # Обратное распространение ошибки для вычисления градиентов

    optimizer.step()  # Шаг оптимизатора, обновляющий веса модели на основе вычисленных градиентов
    
    # Логирование процесса обучения каждые 5 эпох
    if epoch % 5 == 0:
        print(f'Epoch {epoch+1}/100, Loss: {loss.item()}')
        # Вывод текущей эпохи и значения потерь для мониторинга процесса обучения

Epoch 1/100, Loss: 38.38179016113281
Epoch 6/100, Loss: 16.708026885986328
Epoch 11/100, Loss: 13.931645393371582
Epoch 16/100, Loss: 12.309261322021484
Epoch 21/100, Loss: 11.498690605163574
Epoch 26/100, Loss: 11.264776229858398
Epoch 31/100, Loss: 11.256667137145996
Epoch 36/100, Loss: 11.246185302734375
Epoch 41/100, Loss: 11.24840259552002
Epoch 46/100, Loss: 11.24002456665039
Epoch 51/100, Loss: 11.23602294921875
Epoch 56/100, Loss: 11.231306076049805
Epoch 61/100, Loss: 11.228625297546387
Epoch 66/100, Loss: 11.227848052978516
Epoch 71/100, Loss: 11.225282669067383
Epoch 76/100, Loss: 11.225509643554688
Epoch 81/100, Loss: 11.222457885742188
Epoch 86/100, Loss: 11.221357345581055
Epoch 91/100, Loss: 11.219695091247559
Epoch 96/100, Loss: 11.217514991760254


In [44]:
# Сохранение модели
torch.save(model, '..\models\model_v2.pth')

In [48]:
# Загрузка модели
model = torch.load('..\models\model_v2.pth')
model.eval()

NeuralNet(
  (fc1): Linear(in_features=12, out_features=128, bias=True)
  (fc2): Linear(in_features=128, out_features=128, bias=True)
  (output_layer): Linear(in_features=128, out_features=92, bias=True)
  (relu): ReLU()
  (sigmoid): Sigmoid()
)

## Проверка модели на тестовых данных

Теперь попробуем полученную модель на 20% данных по коммуникациям и сформируем рекомендации для каждого сотрудника

In [45]:
# Переключаем модель в режим оценки
model.eval()
with torch.no_grad():
    test_predictions = model(test_features)

In [46]:
# Преобразуем тензор вероятностей в DataFrame
predictions_df = pd.DataFrame(test_predictions.cpu().numpy(), columns=[f'course_{i}' for i in range(92)])

In [47]:
# Добавление идентификатора сотрудников в DataFrame предсказаний
predictions_df['employee_id'] = X_test['employee_id'].values 

Посмотрим полученные предсказания

In [48]:
# Рассмотрии полученный датасет предсказаний
predictions_df.head()

Unnamed: 0,course_0,course_1,course_2,course_3,course_4,course_5,course_6,course_7,course_8,course_9,course_10,course_11,course_12,course_13,course_14,course_15,course_16,course_17,course_18,course_19,course_20,course_21,course_22,course_23,course_24,course_25,course_26,course_27,course_28,course_29,course_30,course_31,course_32,course_33,course_34,course_35,course_36,course_37,course_38,course_39,course_40,course_41,course_42,course_43,course_44,course_45,course_46,course_47,course_48,course_49,course_50,course_51,course_52,course_53,course_54,course_55,course_56,course_57,course_58,course_59,course_60,course_61,course_62,course_63,course_64,course_65,course_66,course_67,course_68,course_69,course_70,course_71,course_72,course_73,course_74,course_75,course_76,course_77,course_78,course_79,course_80,course_81,course_82,course_83,course_84,course_85,course_86,course_87,course_88,course_89,course_90,course_91,employee_id
0,0.0,0.023487,0.003149,0.0,0.01277,1.0,0.0,0.0,0.029444,0.0,0.002887,0.0,0.0,0.0,0.0,1.0,1.0,0.016457,0.0,0.0,0.001435,4.2e-05,0.0,0.009186,0.0,0.0,0.0,0.0,0.0,0.006608,0.0,0.0,0.001556,0.0,0.0,0.002882,0.0,1.0,0.0,0.006523,0.003887,0.0,1.0,0.04209,0.0,0.014042,0.0,0.004852,0.0,0.0,0.0,0.0,1.0,0.007359,0.0,0.0,0.057756,0.0,0.0,0.002854,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.005833,0.0,1.0,0.0,0.013129,0.0,0.0,1.0,0.0,0.0,0.0,0.008407,1.0,0.0,0.0,0.08069,0.0,0.0,0.0,0.0,0.031211,0.0,0.0,0.0,2218
1,0.0,0.023487,0.003149,0.0,0.01277,1.0,0.0,0.0,0.029444,0.0,0.002887,0.0,0.0,0.0,0.0,1.0,1.0,0.016457,0.0,0.0,0.001435,4.2e-05,0.0,0.009186,0.0,0.0,0.0,0.0,0.0,0.006608,0.0,0.0,0.001556,0.0,0.0,0.002882,0.0,1.0,0.0,0.006523,0.003887,0.0,1.0,0.04209,0.0,0.014042,0.0,0.004852,0.0,0.0,0.0,0.0,1.0,0.007359,0.0,0.0,0.057756,0.0,0.0,0.002854,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.005833,0.0,1.0,0.0,0.013129,0.0,0.0,1.0,0.0,0.0,0.0,0.008407,1.0,0.0,0.0,0.08069,0.0,0.0,0.0,0.0,0.031211,0.0,0.0,0.0,2218
2,0.0,0.023952,0.003346,0.0,0.004524,1.0,0.0,0.0,0.024952,0.0,0.005144,0.0,0.0,0.0,0.0,1.0,1.0,0.006814,0.0,0.0,0.002098,0.000139,0.0,0.011001,0.0,0.0,0.0,0.0,0.0,0.008562,0.0,0.0,0.001484,0.0,0.0,0.002924,0.0,1.0,0.0,0.001395,0.003466,0.0,1.0,0.041003,0.0,0.020618,0.0,0.005823,0.0,0.0,0.0,0.0,1.0,0.003616,0.0,0.0,0.034425,0.0,0.0,0.005138,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.008658,0.0,1.0,0.0,0.019496,0.0,0.0,1.0,0.0,0.0,0.0,0.004224,1.0,0.0,0.0,0.062011,0.0,0.0,0.0,0.0,0.028744,0.0,0.0,0.0,2102
3,0.0,0.023952,0.003346,0.0,0.004524,1.0,0.0,0.0,0.024952,0.0,0.005144,0.0,0.0,0.0,0.0,1.0,1.0,0.006814,0.0,0.0,0.002098,0.000139,0.0,0.011001,0.0,0.0,0.0,0.0,0.0,0.008562,0.0,0.0,0.001484,0.0,0.0,0.002924,0.0,1.0,0.0,0.001395,0.003466,0.0,1.0,0.041003,0.0,0.020618,0.0,0.005823,0.0,0.0,0.0,0.0,1.0,0.003616,0.0,0.0,0.034425,0.0,0.0,0.005138,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.008658,0.0,1.0,0.0,0.019496,0.0,0.0,1.0,0.0,0.0,0.0,0.004224,1.0,0.0,0.0,0.062011,0.0,0.0,0.0,0.0,0.028744,0.0,0.0,0.0,2102
4,0.0,0.010504,0.000265,0.0,0.001132,1.0,0.0,0.0,0.072806,0.0,0.007264,0.0,0.0,0.0,0.0,1.0,1.0,0.000558,0.0,0.0,0.077653,0.001833,0.0,0.011698,0.0,0.0,0.0,0.0,0.0,0.005148,0.0,0.0,0.01806,0.0,0.0,0.00997,0.0,1.0,0.0,4.4e-05,0.007996,0.0,1.0,0.036018,0.0,0.043635,0.0,0.001389,0.0,0.0,0.0,0.0,1.0,0.001313,0.0,0.0,0.006419,0.026601,0.0,0.018783,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.019197,0.0,1.0,0.0,0.002904,0.0,0.0,1.0,0.0,0.0,0.0,0.005545,1.0,0.0,0.0,0.020655,0.0,0.0,0.0,0.0,0.019444,0.0,0.0,0.0,587


Вернем id для каждого сотрудника

In [49]:
# Используйте inverse_transform для возврата к оригинальному employee_id
original_employee_ids = le_employee.inverse_transform(full_data_cleaned['employee_id'])

In [50]:
# Исправленное присваивание идентификаторов
predictions_df['employee_id'] = le_employee.inverse_transform(X_test['employee_id'].values)

Сгруппируем данные по каждому сотруднику взяв среднее значение по каждому курсу

In [51]:
# Группировка данных по 'employee_id' и расчет средних значений для каждого курса
grouped_predictions = predictions_df.groupby('employee_id').mean().reset_index()

In [52]:
# Посмотрим информацию о датасете
grouped_predictions.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2381 entries, 0 to 2380
Data columns (total 93 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   employee_id  2381 non-null   object 
 1   course_0     2381 non-null   float32
 2   course_1     2381 non-null   float32
 3   course_2     2381 non-null   float32
 4   course_3     2381 non-null   float32
 5   course_4     2381 non-null   float32
 6   course_5     2381 non-null   float32
 7   course_6     2381 non-null   float32
 8   course_7     2381 non-null   float32
 9   course_8     2381 non-null   float32
 10  course_9     2381 non-null   float32
 11  course_10    2381 non-null   float32
 12  course_11    2381 non-null   float32
 13  course_12    2381 non-null   float32
 14  course_13    2381 non-null   float32
 15  course_14    2381 non-null   float32
 16  course_15    2381 non-null   float32
 17  course_16    2381 non-null   float32
 18  course_17    2381 non-null   float32
 19  course

А теперь расплавим полученный датасет. Преобразуем столбцы с номерами курсов в строки

In [53]:
# Расплавление DataFrame, чтобы преобразовать столбцы курсов в строки
melted_predictions = grouped_predictions.melt(id_vars='employee_id', value_vars=[f'course_{i}' for i in range(92)],
                                             var_name='course_id', value_name='course_pred')

In [54]:
# Изменение значения 'course_id', чтобы оставить только номер курса
melted_predictions['course_id'] = melted_predictions['course_id'].str.replace('course_', '').astype(int)

In [55]:
# Посмотрим информацию о датасете
melted_predictions.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 219052 entries, 0 to 219051
Data columns (total 3 columns):
 #   Column       Non-Null Count   Dtype  
---  ------       --------------   -----  
 0   employee_id  219052 non-null  object 
 1   course_id    219052 non-null  int32  
 2   course_pred  219052 non-null  float32
dtypes: float32(1), int32(1), object(1)
memory usage: 3.3+ MB


Видим, что теперь для каждого сотрудника есть 92 строки с курсами. Теперь удалим те курсы, которые прошли сотрудники к этому периоду

In [56]:
# Отфильтровываем записи в courses_passing, где курс считается пройденным
completed_courses = courses_passing[courses_passing['pass_frac'] >= 1]

# Создаем список уникальных пар employee_id и course_id из completed_courses
completed_pairs = completed_courses[['employee_id', 'course_id']]

In [57]:
# Удаляем записи в melted_predictions, где курс уже пройден
filtered_predictions = melted_predictions.merge(completed_pairs, on=['employee_id', 'course_id'], how='left', indicator=True)
filtered_predictions = filtered_predictions[filtered_predictions['_merge'] == 'left_only'].drop(columns=['_merge'])

In [58]:
# Посмотрим информацию о датасете
filtered_predictions.info()

<class 'pandas.core.frame.DataFrame'>
Index: 210539 entries, 0 to 219051
Data columns (total 3 columns):
 #   Column       Non-Null Count   Dtype  
---  ------       --------------   -----  
 0   employee_id  210539 non-null  object 
 1   course_id    210539 non-null  int32  
 2   course_pred  210539 non-null  float32
dtypes: float32(1), int32(1), object(1)
memory usage: 4.8+ MB


И теперь сгруппируем данные по 20 рекомендаций курсов для каждого сотрудника

In [59]:
# Присоединяем наименования курсов к отфильтрованным предсказаниям
merged_predictions = filtered_predictions.merge(courses_info[['course_id', 'course_nm']], on='course_id', how='left')

In [60]:
# Группируем по 'employee_id' и для каждого сотрудника выбираем топ-5 курсов, которые еще не пройдены
top_courses = merged_predictions.groupby('employee_id').apply(
    lambda x: x.nlargest(20, 'course_pred').sort_values(by='course_pred', ascending=False)
).reset_index(drop=True)

In [61]:
# Вывод результатов для одного сотрудника
top_courses.head(20)

Unnamed: 0,employee_id,course_id,course_pred,course_nm
0,0004d0b5-9e19-461f-f126-e3a08a814c33,5,1.0,Проактивное обслуживание клиентов: Ключевые ст...
1,0004d0b5-9e19-461f-f126-e3a08a814c33,15,1.0,Как эффективно решать проблемы клиентов: Практ...
2,0004d0b5-9e19-461f-f126-e3a08a814c33,16,1.0,Управление клиентскими ожиданиями: Ключевые ас...
3,0004d0b5-9e19-461f-f126-e3a08a814c33,37,1.0,Как преодолевать трудности в общении с трудным...
4,0004d0b5-9e19-461f-f126-e3a08a814c33,42,1.0,Создание уникального брендового опыта для клие...
5,0004d0b5-9e19-461f-f126-e3a08a814c33,52,1.0,Методы оценки и улучшения качества обслуживани...
6,0004d0b5-9e19-461f-f126-e3a08a814c33,62,1.0,Эффективное управление стрессом в клиентском с...
7,0004d0b5-9e19-461f-f126-e3a08a814c33,70,1.0,Использование игровых технологий в обучении кл...
8,0004d0b5-9e19-461f-f126-e3a08a814c33,75,1.0,Эффективное использование мультимедийных инстр...
9,0004d0b5-9e19-461f-f126-e3a08a814c33,80,1.0,Эффективное использование сетевых ресурсов для...


In [62]:
# Сохраним полученные результаты
top_courses.to_csv('../data/recomendations/top_recomendations.csv')

## Анализ полученных результатов

Проанализируем полученные результаты следующим образом:
1. **Анализ Частоты Рекомендаций Курсов**. Мы можем проверить, какие курсы рекомендуются чаще всего. Это поможет выявить, есть ли курсы, которые предпочтительны моделью.

In [63]:
# Подсчет частоты встречаемости каждого курса среди топ рекомендаций
top_course_counts = top_courses[['course_id','course_nm']].value_counts()

# Вывод топ-30 курсов
print("Топ-30 наиболее высоко оцененных курсов:")
print(top_course_counts.head(30))

Топ-30 наиболее высоко оцененных курсов:
course_id  course_nm                                                                 
37         Как преодолевать трудности в общении с трудными клиентами                     2354
52         Методы оценки и улучшения качества обслуживания клиентов                      2348
75         Эффективное использование мультимедийных инструментов в клиентском сервисе    2341
62         Эффективное управление стрессом в клиентском сервисе                          2337
15         Как эффективно решать проблемы клиентов: Практические методы                  2330
5          Проактивное обслуживание клиентов: Ключевые стратегии                         2326
70         Использование игровых технологий в обучении клиентов                          2323
42         Создание уникального брендового опыта для клиентов                            2239
80         Эффективное использование сетевых ресурсов для решения проблем клиентов       2183
16         Управление клиен

**Выводы:**

1. Решение проблем и управление стрессом:
    - Курсы, такие как "Как эффективно решать проблемы клиентов: Практические методы" и "Эффективное управление стрессом в клиентском сервисе", показывают акцент на важности развития умений справляться с давлением и находить решения в сложных ситуациях, что критически важно в динамичной рабочей среде.

2. Технологии и инновации:
    - Значительное количество курсов, связанных с использованием технологий, таких как "Эффективное использование мультимедийных инструментов в клиентском сервисе" и "Использование игровых технологий в обучении клиентов", отражает стремление к интеграции современных технологий в процесс обслуживания клиентов.

3. Эмоциональный интеллект и персонализация:
    - Упор на развитии эмпатии и понимания клиентов через курсы, такие как "Развитие навыков эмпатии в клиентском обслуживании" и "Использование эмоционального интеллекта в общении с клиентами", подчеркивает важность персонального подхода и способности налаживать эффективное общение.

**Рекомендации:**

1. Усиление обучения по управлению стрессом и решению проблем:
    - Разработать и провести целевые тренинги и мастер-классы, направленные на освоение техник решения проблем и снижения стресса, чтобы повысить устойчивость сотрудников к рабочим нагрузкам и улучшить их производительность.

2. Интеграция новых технологий в клиентское обслуживание:
    - Внедрение последних технологических решений в практику работы с клиентами, включая обучение сотрудников работе с новыми инструментами и платформами, чтобы повысить качество и эффективность обслуживания.

3. Фокус на развитие межличностных навыков:
    - Поддержка программ по развитию межличностных и коммуникативных навыков, особенно в области эмоционального интеллекта и персонализации обслуживания, чтобы сотрудники могли более эффективно удовлетворять потребности и ожидания клиентов.