<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Загрузка-данных-для-обучения" data-toc-modified-id="Загрузка-данных-для-обучения-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Загрузка данных для обучения</a></span></li><li><span><a href="#Подготовка-пайплайна" data-toc-modified-id="Подготовка-пайплайна-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Подготовка пайплайна</a></span></li><li><span><a href="#Трансформация-данных" data-toc-modified-id="Трансформация-данных-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Трансформация данных</a></span></li><li><span><a href="#Обучение-модели" data-toc-modified-id="Обучение-модели-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Обучение модели</a></span></li></ul></div>

Проект для Хакатона "ЛЦТ-2024", задание №10.  
Задание предполагало считывание данных из эксель-таблиц, полученных из автоматизированных систем Департамента ЖКХ города Москвы, автоматизированный анализ этих данных и составления предсказаний вероятностей аварий для каждого отдельного здания и визуализацию их на карте.  
  
Нашей команде не хватило опыта для поднятия полноценного веб-сервиса, но мы смогли сделать модуль визуализации на открытых библиотеках и подготовить алгоритм предсказаний, на основе RandomForest.  
  
Команда состояла из студентов когорты Практикума, условный уровень навыка - Junior.  
Моя роль в проекте была - ML-Enginieer, общее руководство проектом. 

  
Любопытные элементы задания:  
- Большое количество не относящихся к заданию данных в тяжеловесных таблицах;
- Низкая связность критических таблиц между собой;
- Критический недостаток данных для предсказаний полного спектра аварий;
- Необходимость мультивыводной модели.  
  
Что было реализовано:
- Обработка (декодировка в отдельных случаях) данных, автоматизация очистки выбросов;
- Свод данных в единую строку для каждого здания;
- Модуль обучения мультивыводной модели;
- Формирование предсказаний для топ-5 аварий;
- Графическое отображение вероятности аварии на карте.  
  
Что это за тетрадь:
- Модуль обучения модели на подготовленных данных.
- Тетрадь формирует пайплайн и модель для тетради демонстрации.

In [1]:
# Загрузка библиотек
!pip install -U optuna-integration
!pip install -U optuna





In [2]:
# импорты
import pandas as pd
import numpy as np
import seaborn as sns


from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder, OrdinalEncoder, StandardScaler, MinMaxScaler, RobustScaler

from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.impute import SimpleImputer
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.dummy import DummyRegressor, DummyClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score

from sklearn.metrics import roc_auc_score

from sklearn.multioutput import MultiOutputClassifier

from scipy import stats as st

import optuna
import shap
import pickle

In [3]:
# правило визуализации
pd.set_option('display.max_columns', None)

In [4]:
# глобальные переменные
RANDOM_STATE = 42

## Загрузка данных для обучения

In [5]:
# загрузка заранее подготовленных данных
ml_data = pd.read_csv('../location_p/model_training/ml_data.csv')
ml_data= ml_data.set_index('unom')
ml_data.head()

Unnamed: 0_level_0,T1 < min,P1 <= 0,Сильная течь в системе отопления,Протечка труб в подъезде,Течь в системе отопления,T1 > max,P2 <= 0,Температура в квартире ниже нормативной,Отсутствие отопления в доме,Температура в помещении общего пользования ниже нормативной,Аварийная протечка труб в подъезде,Крупные пожары,ID ODS,№ ОДС,ods_adress,ID УУ,Адрес,Округ,Группа,Потребитель (или УК),ЦТП,Объём поданого теплоносителя в систему ЦО,Объём обратного теплоносителя из системы ЦО,Температура подачи,Температура обратки,Наработка часов счётчика,Проект,Количество этажей,Количество подъездов,Количество квартир,Общая площадь,Общая площадь жилых помещений,Общая площадь нежилых помещений,Материалы стен,Признак аварийности здания,Количество пассажирских лифтов,Количество грузопассажирских лифтов,Материалы кровли по БТИ,Типы жилищного фонда,Статусы МКД
unom,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1
309,1,0,0,0,0,0,0,0,0,0,0,0,143495140.0,ОДС №2-Восточный,"город Москва, посёлок Акулово, дом 1",785496920.0,"пос. Акулово, д. 9",ВАО,МКД,"ГБУ ""ЖИЛИЩНИК РАЙОНА ВОСТОЧНЫЙ""",03-09-315,93.805257,96.158162,55.258967,45.742535,23.330282,нет данных,3.0,2.0,24.0,1018.5,978.1,40.4,кирпичные,нет,0.0,0.0,асбофанера-шифер,МКД,в эксплуатации
313,0,0,0,0,0,0,0,1,0,0,0,0,,,,,,,,,,,,,,,нет данных,3.0,2.0,12.0,1216.4,1046.5,169.9,кирпичные,нет,0.0,0.0,стальная,МКД,в эксплуатации
314,0,0,0,0,0,0,0,1,1,0,0,0,143495140.0,ОДС №2-Восточный,"город Москва, посёлок Акулово, дом 1",785497048.0,"пос. Акулово, д. 15",ВАО,МКД,"ГБУ ""ЖИЛИЩНИК РАЙОНА ВОСТОЧНЫЙ""",03-09-315,134.79204,134.506538,57.884291,42.463973,23.997742,нет данных,5.0,3.0,60.0,2489.5,2489.5,0.0,кирпичные,нет,0.0,0.0,мягкая-совмещенная с рубероидным покрытием,МКД,в эксплуатации
316,0,0,0,0,0,0,0,1,0,0,0,0,143495140.0,ОДС №2-Восточный,"город Москва, посёлок Акулово, дом 1",785496973.0,"пос. Акулово, д. 22",ВАО,МКД,"ГБУ ""ЖИЛИЩНИК РАЙОНА ВОСТОЧНЫЙ""",03-09-315,176.248322,176.096565,56.669053,42.428112,23.948139,114-85-1,5.0,6.0,80.0,4204.5,4204.5,0.0,кирпичные,нет,0.0,0.0,мягкая-совмещенная с рубероидным покрытием,МКД,в эксплуатации
317,0,0,0,0,0,0,0,1,0,0,0,0,143495140.0,ОДС №2-Восточный,"город Москва, посёлок Акулово, дом 1",785497040.0,"пос. Акулово, д. 24",ВАО,МКД,"ГБУ ""ЖИЛИЩНИК РАЙОНА ВОСТОЧНЫЙ""",03-09-315,644.444109,642.00972,56.811822,50.302409,23.995042,I-515,9.0,4.0,143.0,7827.0,7070.0,757.0,панельные,нет,4.0,0.0,мягкая-совмещенная с рубероидным покрытием,МКД,в эксплуатации


In [6]:
#Выделение топ-5 таргетов, по количеству в предоставленных данных
top_5_target = pd.DataFrame(ml_data.iloc[:, 0:12].sum()).sort_values(by = 0, ascending = False).head(5).T
top_5_target

Unnamed: 0,Температура в квартире ниже нормативной,T1 > max,Отсутствие отопления в доме,Сильная течь в системе отопления,Течь в системе отопления
0,3839,1899,1010,380,367


In [7]:
# Формирование целевых признаков
ml_target = ml_data.loc[:,top_5_target.columns]
ml_target.head()

Unnamed: 0_level_0,Температура в квартире ниже нормативной,T1 > max,Отсутствие отопления в доме,Сильная течь в системе отопления,Течь в системе отопления
unom,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
309,0,0,0,0,0
313,1,0,0,0,0
314,1,0,1,0,0
316,1,0,0,0,0
317,1,0,0,0,0


In [8]:
# Формирование входящих признаков - прочие признаки исключены по результатам анализа
ml_feature = ml_data.iloc[:, 21:]
ml_feature.head()

Unnamed: 0_level_0,Объём поданого теплоносителя в систему ЦО,Объём обратного теплоносителя из системы ЦО,Температура подачи,Температура обратки,Наработка часов счётчика,Проект,Количество этажей,Количество подъездов,Количество квартир,Общая площадь,Общая площадь жилых помещений,Общая площадь нежилых помещений,Материалы стен,Признак аварийности здания,Количество пассажирских лифтов,Количество грузопассажирских лифтов,Материалы кровли по БТИ,Типы жилищного фонда,Статусы МКД
unom,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1
309,93.805257,96.158162,55.258967,45.742535,23.330282,нет данных,3.0,2.0,24.0,1018.5,978.1,40.4,кирпичные,нет,0.0,0.0,асбофанера-шифер,МКД,в эксплуатации
313,,,,,,нет данных,3.0,2.0,12.0,1216.4,1046.5,169.9,кирпичные,нет,0.0,0.0,стальная,МКД,в эксплуатации
314,134.79204,134.506538,57.884291,42.463973,23.997742,нет данных,5.0,3.0,60.0,2489.5,2489.5,0.0,кирпичные,нет,0.0,0.0,мягкая-совмещенная с рубероидным покрытием,МКД,в эксплуатации
316,176.248322,176.096565,56.669053,42.428112,23.948139,114-85-1,5.0,6.0,80.0,4204.5,4204.5,0.0,кирпичные,нет,0.0,0.0,мягкая-совмещенная с рубероидным покрытием,МКД,в эксплуатации
317,644.444109,642.00972,56.811822,50.302409,23.995042,I-515,9.0,4.0,143.0,7827.0,7070.0,757.0,панельные,нет,4.0,0.0,мягкая-совмещенная с рубероидным покрытием,МКД,в эксплуатации


In [9]:
# Формирование данных для обучения
ml_research = pd.concat([ml_target, ml_feature], axis = 1)
ml_research = ml_research.loc[(ml_research['Объём поданого теплоносителя в систему ЦО'].isna()==False)]
ml_research.head()

Unnamed: 0_level_0,Температура в квартире ниже нормативной,T1 > max,Отсутствие отопления в доме,Сильная течь в системе отопления,Течь в системе отопления,Объём поданого теплоносителя в систему ЦО,Объём обратного теплоносителя из системы ЦО,Температура подачи,Температура обратки,Наработка часов счётчика,Проект,Количество этажей,Количество подъездов,Количество квартир,Общая площадь,Общая площадь жилых помещений,Общая площадь нежилых помещений,Материалы стен,Признак аварийности здания,Количество пассажирских лифтов,Количество грузопассажирских лифтов,Материалы кровли по БТИ,Типы жилищного фонда,Статусы МКД
unom,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1
309,0,0,0,0,0,93.805257,96.158162,55.258967,45.742535,23.330282,нет данных,3.0,2.0,24.0,1018.5,978.1,40.4,кирпичные,нет,0.0,0.0,асбофанера-шифер,МКД,в эксплуатации
314,1,0,1,0,0,134.79204,134.506538,57.884291,42.463973,23.997742,нет данных,5.0,3.0,60.0,2489.5,2489.5,0.0,кирпичные,нет,0.0,0.0,мягкая-совмещенная с рубероидным покрытием,МКД,в эксплуатации
316,1,0,0,0,0,176.248322,176.096565,56.669053,42.428112,23.948139,114-85-1,5.0,6.0,80.0,4204.5,4204.5,0.0,кирпичные,нет,0.0,0.0,мягкая-совмещенная с рубероидным покрытием,МКД,в эксплуатации
317,1,0,0,0,0,644.444109,642.00972,56.811822,50.302409,23.995042,I-515,9.0,4.0,143.0,7827.0,7070.0,757.0,панельные,нет,4.0,0.0,мягкая-совмещенная с рубероидным покрытием,МКД,в эксплуатации
364,1,1,1,1,1,232.970281,232.190526,92.37896,50.518503,23.999743,,,,,,,,,,,,,,


In [10]:
ml_research.info()

<class 'pandas.core.frame.DataFrame'>
Index: 3996 entries, 309 to 4200267
Data columns (total 24 columns):
 #   Column                                       Non-Null Count  Dtype  
---  ------                                       --------------  -----  
 0   Температура в квартире ниже нормативной      3996 non-null   int64  
 1   T1 > max                                     3996 non-null   int64  
 2   Отсутствие отопления в доме                  3996 non-null   int64  
 3   Сильная течь в системе отопления             3996 non-null   int64  
 4   Течь в системе отопления                     3996 non-null   int64  
 5   Объём поданого теплоносителя в систему ЦО    3996 non-null   float64
 6   Объём обратного теплоносителя из системы ЦО  3996 non-null   float64
 7   Температура подачи                           3996 non-null   float64
 8   Температура обратки                          3996 non-null   float64
 9   Наработка часов счётчика                     3995 non-null   float64
 10  

## Подготовка пайплайна

In [11]:
ml_research.select_dtypes(exclude='number').columns

Index(['Проект', 'Материалы стен', 'Признак аварийности здания ',
       'Материалы кровли по БТИ', 'Типы жилищного фонда', 'Статусы МКД'],
      dtype='object')

In [12]:
ml_research.select_dtypes('float64').columns

Index(['Объём поданого теплоносителя в систему ЦО',
       'Объём обратного теплоносителя из системы ЦО', 'Температура подачи',
       'Температура обратки', 'Наработка часов счётчика', 'Количество этажей',
       'Количество подъездов', 'Количество квартир', 'Общая площадь',
       'Общая площадь жилых помещений', 'Общая площадь нежилых помещений',
       'Количество пассажирских лифтов',
       'Количество грузопассажирских лифтов'],
      dtype='object')

In [13]:
ohe_columns=[
    'Группа',
    'Потребитель (или УК)',
    'ЦТП',
    'Проект',
    'Материалы стен',
    'Признак аварийности здания ',
    'Материалы кровли по БТИ',
    'Типы жилищного фонда',
    'Статусы МКД'
    ]

num_columns=[
    'Объём поданого теплоносителя в систему ЦО',
    'Объём обратного теплоносителя из системы ЦО',
    'Температура подачи',
    'Температура обратки',
    'Наработка часов счётчика',
    'Количество этажей',
    'Количество подъездов',
    'Количество квартир',
    'Общая площадь',
    'Общая площадь жилых помещений',
    'Общая площадь нежилых помещений',
    'Количество пассажирских лифтов',
    'Количество грузопассажирских лифтов'
    ]

In [14]:
ohe_columns= ml_research.select_dtypes(exclude='number').columns

num_columns= ml_research.select_dtypes('float64').columns

In [15]:
#пайплан кодирования неранговых категорий
ohe_pipe = Pipeline(
    [
        ('simp_ohe', SimpleImputer(missing_values=np.nan, strategy='most_frequent')),
        ('ohe', OneHotEncoder(drop='first', handle_unknown='ignore',  sparse_output=False))
    ])
#пайплан масштабирования
num_pipe = Pipeline(
    [
        ('simp_num', SimpleImputer(missing_values=np.nan, strategy='mean')),
        ('ohe', MinMaxScaler())
    ])


# общий пайплайн для подготовки данных
data_preprocessor = ColumnTransformer(
    [
        ('ohe', ohe_pipe, ohe_columns),
        ('num', num_pipe, num_columns)
    ]
)

## Трансформация данных

In [16]:
ml_processed = data_preprocessor.fit_transform(ml_research.drop(list(ml_target.columns), axis=1))
ml_processed = pd.DataFrame(ml_processed)
ml_processed.columns = data_preprocessor.get_feature_names_out()
display(ml_processed)
ml_processed.describe()

Unnamed: 0,ohe__Проект_1-МГ-601,ohe__Проект_10-03-11,ohe__Проект_10-03-711,ohe__Проект_10-03-74,ohe__Проект_114-85-1,ohe__Проект_2-18-31/12А,ohe__Проект_222,ohe__Проект_3/М 23БИ,ohe__Проект_5/1-3/Ю22БИ,ohe__Проект_93/87,ohe__Проект_I-335,ohe__Проект_I-43-9-А-5,ohe__Проект_I-510,ohe__Проект_I-510-4/М6,ohe__Проект_I-511,ohe__Проект_I-511-130/37,ohe__Проект_I-511-4/М-37,ohe__Проект_I-511-4/М22БН,ohe__Проект_I-513,ohe__Проект_I-515,ohe__Проект_II-18-01-МН,ohe__Проект_II-18-01/12,ohe__Проект_II-18-01/9,ohe__Проект_II-18-21/12,ohe__Проект_II-18-31/12,ohe__Проект_II-18-31/12А,ohe__Проект_II-18/22,"ohe__Проект_II-18/МИ ""ВАРИАНТ6""",ohe__Проект_II-29-04/Ю37,ohe__Проект_II-29-05/Ю37,ohe__Проект_II-29-41/37,ohe__Проект_II-49-04/М вар Д,ohe__Проект_II-49-04/Ю,ohe__Проект_II-49-06,ohe__Проект_II-49-06-ЮД,ohe__Проект_II-49-08/Ю,ohe__Проект_II-68-01,ohe__Проект_II-68-01/12,ohe__Проект_II-68-02/12К,ohe__Проект_II-68-02/16М,ohe__Проект_II-68-03/12Ю,ohe__Проект_IМГ-601Д,ohe__Проект_Башня Вулых,"ohe__Проект_ВПО ""КАСЧАД""",ohe__Проект_И-209А,ohe__Проект_И-441,ohe__Проект_И-491А,ohe__Проект_И-522,ohe__Проект_И-700А,ohe__Проект_И-759,ohe__Проект_КОПЭ,ohe__Проект_КТЖС,ohe__Проект_МГ-601,ohe__Проект_П-14,ohe__Проект_П-14/35,ohe__Проект_П-18-21/12,ohe__Проект_П-18/22,ohe__Проект_П-28,ohe__Проект_П-29,ohe__Проект_П-29-03/Ю37,ohe__Проект_П-29-05/м37,ohe__Проект_П-3/16,ohe__Проект_П-3/17,ohe__Проект_П-3/22,ohe__Проект_П-30,ohe__Проект_П-32,ohe__Проект_П-3м,ohe__Проект_П-4,ohe__Проект_П-42,ohe__Проект_П-43,ohe__Проект_П-44,ohe__Проект_П-44м,ohe__Проект_П-44т,ohe__Проект_П-45,ohe__Проект_П-46,ohe__Проект_П-46-2/12в,ohe__Проект_П-46м,ohe__Проект_П-47,ohe__Проект_П-49 Д,ohe__Проект_П-55,ohe__Проект_П-57,ohe__Проект_П-68,ohe__Проект_ПЗ-1/16,ohe__Проект_ПЗМ-3/16,ohe__Проект_индивидуальный проект,ohe__Проект_нет данных,ohe__Проект_унифицированный каркас,ohe__Материалы стен_железобетонные,ohe__Материалы стен_из железобетонных сегментов,ohe__Материалы стен_из легкобетонных панелей,ohe__Материалы стен_из мелких бетонных блоков,ohe__Материалы стен_из унифицированных железобетонных элементов,ohe__Материалы стен_каменные и бетонные,ohe__Материалы стен_каркасно-панельные,ohe__Материалы стен_кирпичные,ohe__Материалы стен_кирпичные облегченные,ohe__Материалы стен_крупноблочные,ohe__Материалы стен_крупнопанельные,ohe__Материалы стен_легкобетонные блоки,ohe__Материалы стен_легкобетонные блоки с утеплением,ohe__Материалы стен_монолитные (ж-б),ohe__Материалы стен_не определено,ohe__Материалы стен_панели керамзитобетонные,ohe__Материалы стен_панельные,ohe__Материалы стен_шлакобетонные,ohe__Признак аварийности здания _нет,ohe__Материалы кровли по БТИ_гидростеклоизол,ohe__Материалы кровли по БТИ_мягкая-совмещенная с рубероидным покрытием,ohe__Материалы кровли по БТИ_полиуритан,ohe__Материалы кровли по БТИ_прочая(черепица;щепа;дранка),ohe__Материалы кровли по БТИ_стальная,ohe__Материалы кровли по БТИ_толь-рубероид по деревянному настилу,num__Объём поданого теплоносителя в систему ЦО,num__Объём обратного теплоносителя из системы ЦО,num__Температура подачи,num__Температура обратки,num__Наработка часов счётчика,num__Количество этажей,num__Количество подъездов,num__Количество квартир,num__Общая площадь,num__Общая площадь жилых помещений,num__Общая площадь нежилых помещений,num__Количество пассажирских лифтов,num__Количество грузопассажирских лифтов
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.843528e-23,0.037135,0.062562,3.540034e-14,0.066655,0.050000,0.125000,0.066246,0.027412,0.027921,0.017107,0.000000,0.000000
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,2.649030e-23,0.051901,0.066236,3.221567e-14,0.068562,0.150000,0.250000,0.179811,0.126780,0.143411,0.000000,0.000000,0.000000
2,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,3.463758e-23,0.067915,0.064535,3.218083e-14,0.068420,0.150000,0.625000,0.242902,0.242630,0.274458,0.000000,0.000000,0.000000
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,1.266508e-22,0.247317,0.064735,3.982962e-14,0.068554,0.350000,0.375000,0.441640,0.487334,0.493417,0.320545,0.571429,0.000000
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,4.578498e-23,0.089515,0.114505,4.003953e-14,0.068568,0.309822,0.253491,0.278134,0.276157,0.289490,0.161456,0.195738,0.069572
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3991,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,1.674964e-23,0.033415,0.098449,3.879607e-14,0.068181,0.309822,0.253491,0.278134,0.276157,0.289490,0.161456,0.195738,0.069572
3992,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,8.989474e-24,0.017850,0.093183,4.611642e-14,0.061930,0.309822,0.253491,0.278134,0.276157,0.289490,0.161456,0.195738,0.069572
3993,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,4.716981e-03,0.011287,0.104103,4.920596e-14,0.068364,0.309822,0.253491,0.278134,0.276157,0.289490,0.161456,0.195738,0.069572
3994,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,4.660681e-23,0.092028,0.107992,6.370513e-14,0.066594,0.309822,0.253491,0.278134,0.276157,0.289490,0.161456,0.195738,0.069572


Unnamed: 0,ohe__Проект_1-МГ-601,ohe__Проект_10-03-11,ohe__Проект_10-03-711,ohe__Проект_10-03-74,ohe__Проект_114-85-1,ohe__Проект_2-18-31/12А,ohe__Проект_222,ohe__Проект_3/М 23БИ,ohe__Проект_5/1-3/Ю22БИ,ohe__Проект_93/87,ohe__Проект_I-335,ohe__Проект_I-43-9-А-5,ohe__Проект_I-510,ohe__Проект_I-510-4/М6,ohe__Проект_I-511,ohe__Проект_I-511-130/37,ohe__Проект_I-511-4/М-37,ohe__Проект_I-511-4/М22БН,ohe__Проект_I-513,ohe__Проект_I-515,ohe__Проект_II-18-01-МН,ohe__Проект_II-18-01/12,ohe__Проект_II-18-01/9,ohe__Проект_II-18-21/12,ohe__Проект_II-18-31/12,ohe__Проект_II-18-31/12А,ohe__Проект_II-18/22,"ohe__Проект_II-18/МИ ""ВАРИАНТ6""",ohe__Проект_II-29-04/Ю37,ohe__Проект_II-29-05/Ю37,ohe__Проект_II-29-41/37,ohe__Проект_II-49-04/М вар Д,ohe__Проект_II-49-04/Ю,ohe__Проект_II-49-06,ohe__Проект_II-49-06-ЮД,ohe__Проект_II-49-08/Ю,ohe__Проект_II-68-01,ohe__Проект_II-68-01/12,ohe__Проект_II-68-02/12К,ohe__Проект_II-68-02/16М,ohe__Проект_II-68-03/12Ю,ohe__Проект_IМГ-601Д,ohe__Проект_Башня Вулых,"ohe__Проект_ВПО ""КАСЧАД""",ohe__Проект_И-209А,ohe__Проект_И-441,ohe__Проект_И-491А,ohe__Проект_И-522,ohe__Проект_И-700А,ohe__Проект_И-759,ohe__Проект_КОПЭ,ohe__Проект_КТЖС,ohe__Проект_МГ-601,ohe__Проект_П-14,ohe__Проект_П-14/35,ohe__Проект_П-18-21/12,ohe__Проект_П-18/22,ohe__Проект_П-28,ohe__Проект_П-29,ohe__Проект_П-29-03/Ю37,ohe__Проект_П-29-05/м37,ohe__Проект_П-3/16,ohe__Проект_П-3/17,ohe__Проект_П-3/22,ohe__Проект_П-30,ohe__Проект_П-32,ohe__Проект_П-3м,ohe__Проект_П-4,ohe__Проект_П-42,ohe__Проект_П-43,ohe__Проект_П-44,ohe__Проект_П-44м,ohe__Проект_П-44т,ohe__Проект_П-45,ohe__Проект_П-46,ohe__Проект_П-46-2/12в,ohe__Проект_П-46м,ohe__Проект_П-47,ohe__Проект_П-49 Д,ohe__Проект_П-55,ohe__Проект_П-57,ohe__Проект_П-68,ohe__Проект_ПЗ-1/16,ohe__Проект_ПЗМ-3/16,ohe__Проект_индивидуальный проект,ohe__Проект_нет данных,ohe__Проект_унифицированный каркас,ohe__Материалы стен_железобетонные,ohe__Материалы стен_из железобетонных сегментов,ohe__Материалы стен_из легкобетонных панелей,ohe__Материалы стен_из мелких бетонных блоков,ohe__Материалы стен_из унифицированных железобетонных элементов,ohe__Материалы стен_каменные и бетонные,ohe__Материалы стен_каркасно-панельные,ohe__Материалы стен_кирпичные,ohe__Материалы стен_кирпичные облегченные,ohe__Материалы стен_крупноблочные,ohe__Материалы стен_крупнопанельные,ohe__Материалы стен_легкобетонные блоки,ohe__Материалы стен_легкобетонные блоки с утеплением,ohe__Материалы стен_монолитные (ж-б),ohe__Материалы стен_не определено,ohe__Материалы стен_панели керамзитобетонные,ohe__Материалы стен_панельные,ohe__Материалы стен_шлакобетонные,ohe__Признак аварийности здания _нет,ohe__Материалы кровли по БТИ_гидростеклоизол,ohe__Материалы кровли по БТИ_мягкая-совмещенная с рубероидным покрытием,ohe__Материалы кровли по БТИ_полиуритан,ohe__Материалы кровли по БТИ_прочая(черепица;щепа;дранка),ohe__Материалы кровли по БТИ_стальная,ohe__Материалы кровли по БТИ_толь-рубероид по деревянному настилу,num__Объём поданого теплоносителя в систему ЦО,num__Объём обратного теплоносителя из системы ЦО,num__Температура подачи,num__Температура обратки,num__Наработка часов счётчика,num__Количество этажей,num__Количество подъездов,num__Количество квартир,num__Общая площадь,num__Общая площадь жилых помещений,num__Общая площадь нежилых помещений,num__Количество пассажирских лифтов,num__Количество грузопассажирских лифтов
count,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0,3996.0
mean,0.00025,0.00025,0.005005,0.00025,0.00025,0.00025,0.00025,0.00025,0.00025,0.00025,0.002002,0.00025,0.010511,0.00025,0.01977,0.00025,0.00025,0.000751,0.00025,0.098348,0.001001,0.017017,0.00025,0.003504,0.002002,0.002503,0.001251,0.00025,0.00025,0.000501,0.000751,0.000751,0.006507,0.003253,0.00025,0.00025,0.008759,0.000501,0.00025,0.000751,0.000501,0.001251,0.001752,0.00025,0.03028,0.00025,0.000501,0.000501,0.001001,0.000501,0.003003,0.000751,0.001752,0.000501,0.001502,0.000501,0.0498,0.000751,0.016517,0.001251,0.00025,0.008759,0.004755,0.000501,0.006006,0.000501,0.001001,0.00025,0.00025,0.003754,0.012012,0.00025,0.002252,0.00025,0.01001,0.00025,0.003253,0.00025,0.035285,0.004505,0.004254,0.018018,0.00025,0.00025,0.086587,0.492242,0.00025,0.006507,0.195696,0.002503,0.001502,0.001001,0.00025,0.000751,0.515766,0.00025,0.010511,0.003754,0.013764,0.000751,0.002753,0.00025,0.003253,0.235235,0.003504,0.99975,0.005255,0.744995,0.000751,0.040541,0.15966,0.000751,0.0004360647,0.089774,0.088404,0.0002502503,0.067228,0.309822,0.253491,0.278134,0.276157,0.28949,0.161456,0.195738,0.069572
std,0.015819,0.015819,0.070578,0.015819,0.015819,0.015819,0.015819,0.015819,0.015819,0.015819,0.044705,0.015819,0.101993,0.015819,0.139226,0.015819,0.015819,0.027393,0.015819,0.297822,0.031627,0.129351,0.015819,0.059094,0.044705,0.049969,0.035355,0.015819,0.015819,0.022369,0.027393,0.027393,0.08041,0.056952,0.015819,0.015819,0.093189,0.022369,0.015819,0.027393,0.022369,0.035355,0.041822,0.015819,0.171379,0.015819,0.022369,0.022369,0.031627,0.022369,0.054724,0.027393,0.041822,0.022369,0.038725,0.022369,0.217558,0.027393,0.127467,0.035355,0.015819,0.093189,0.068799,0.022369,0.077275,0.022369,0.031627,0.015819,0.015819,0.06116,0.108953,0.015819,0.04741,0.015819,0.09956,0.015819,0.056952,0.015819,0.184523,0.066973,0.065094,0.133033,0.015819,0.015819,0.281263,0.500002,0.015819,0.08041,0.396785,0.049969,0.038725,0.031627,0.015819,0.027393,0.499814,0.015819,0.101993,0.06116,0.116523,0.027393,0.052401,0.015819,0.056952,0.424199,0.059094,0.015819,0.072311,0.435919,0.027393,0.197248,0.366336,0.027393,0.01594577,0.089635,0.024055,0.0158193,0.01619,0.181618,0.178915,0.12798,0.144798,0.160483,0.173905,0.225221,0.186759
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.58789e-23,0.029994,0.068101,3.660898e-14,0.066143,0.15,0.125,0.192429,0.192761,0.197256,0.015964,0.0,0.0
50%,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,3.1212740000000003e-23,0.057338,0.096615,3.920892e-14,0.067642,0.309822,0.253491,0.255521,0.259778,0.262774,0.161456,0.195738,0.0
75%,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,6.434965e-23,0.118788,0.105465,4.281393e-14,0.068526,0.35,0.375,0.303628,0.30589,0.299882,0.26767,0.285714,0.069572
max,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0


In [17]:
#выгружаем подготовленный пайплайн
pickle.dump(data_preprocessor, open("../location_p/model_training/data_pipe.pkl", "wb"))

## Обучение модели

Подбор параметров осуществляется байесовской оптимизацией

In [18]:
X_train, X_test, y_train, y_test = train_test_split(
    ml_processed,
    ml_research[ml_target.columns],
    test_size = 0.25,
    random_state = 42
    )

In [19]:
#задаём функцию для последующей оптимизации
def objective(trial):
  #задаём параметры функции - первый параметр - название модели
  classifier_name = trial.suggest_categorical("classifier", ["KNN", "SVC", "RandomForest"])
  if classifier_name == "KNN":
    #в завимимости от выбранной модели передаём прочие параметры
    n_n = trial.suggest_int("n_neighbors", 5, 15, log=True)
    classifier_obj = KNeighborsClassifier(n_neighbors= n_n)
  elif classifier_name == "SVC":
    svc_c = trial.suggest_int("svc_c", 1, 30, log=True)
    svc_kernel = trial.suggest_categorical("svc_kernel", ['linear', 'poly', 'rbf'])
    classifier_obj = SVC(
        C = svc_c,
        kernel = svc_kernel,
        probability = True,
        random_state=RANDOM_STATE)
  else:
    rf_max_depth = trial.suggest_int("rf_max_depth", 2, 32, log=True)
    rf_min_split= trial.suggest_int("rf_min_split", 2, 5, log=True)
    classifier_obj = RandomForestClassifier(
        max_depth=rf_max_depth,
        min_samples_split = rf_min_split,
        random_state=RANDOM_STATE
        )

  #Обучаем модель и её клоны
  mo_opt = MultiOutputClassifier(classifier_obj).fit(X_train, y_train)
  #Задаём дополнительный параметр для сохранения модели
  trial.set_user_attr(key="best_model", value=mo_opt)
  #Делаем предсказания, считаем среднюю метрику
  y_pred = mo_opt.predict_proba(X_test)
  score = []
  for i in range(0,len(y_train.columns)):
    try:
      score.append(roc_auc_score(y_test.iloc[:,i].values, pd.DataFrame(y_pred[i])[1].values))
    except:
       score.append(0)
  roc_auc = sum(score) / float(len(score))

  #Возвращаем метрику которую будет оптимизировать оптуна
  return roc_auc



In [20]:
#функция коллбэка - чтобы можно было работать напрямую с лучшей моделью
def callback(study, trial):
    if study.best_trial.number == trial.number:
        study.set_user_attr(key="best_model", value=trial.user_attrs["best_model"])

In [21]:
sampler = optuna.samplers.TPESampler(seed=42)
study = optuna.create_study(direction="maximize", sampler = sampler)
study.optimize(objective, n_trials=30, callbacks=[callback])
best_model=study.user_attrs["best_model"]
print(study.best_trial)
print(best_model)

[I 2024-08-01 13:50:34,162] A new study created in memory with name: no-name-d1834fd9-1cf3-45af-bf43-ad75030ef01e
[I 2024-08-01 13:50:41,775] Trial 0 finished with value: 0.6382110437977179 and parameters: {'classifier': 'SVC', 'svc_c': 6, 'svc_kernel': 'linear'}. Best is trial 0 with value: 0.6382110437977179.
[I 2024-08-01 13:50:42,092] Trial 1 finished with value: 0.6754870517177475 and parameters: {'classifier': 'KNN', 'n_neighbors': 5}. Best is trial 1 with value: 0.6754870517177475.
[I 2024-08-01 13:50:42,249] Trial 2 finished with value: 0.6770964611492104 and parameters: {'classifier': 'KNN', 'n_neighbors': 6}. Best is trial 2 with value: 0.6770964611492104.
[I 2024-08-01 13:50:44,072] Trial 3 finished with value: 0.7125630291268634 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 6, 'rf_min_split': 2}. Best is trial 3 with value: 0.7125630291268634.
[I 2024-08-01 13:50:44,358] Trial 4 finished with value: 0.6719878334864277 and parameters: {'classifier': 'KNN', '

FrozenTrial(number=21, state=TrialState.COMPLETE, values=[0.7283709540413993], datetime_start=datetime.datetime(2024, 8, 1, 13, 51, 52, 265338), datetime_complete=datetime.datetime(2024, 8, 1, 13, 51, 54, 640324), params={'classifier': 'RandomForest', 'rf_max_depth': 12, 'rf_min_split': 4}, user_attrs={'best_model': MultiOutputClassifier(estimator=RandomForestClassifier(max_depth=12,
                                                       min_samples_split=4,
                                                       random_state=42))}, system_attrs={}, intermediate_values={}, distributions={'classifier': CategoricalDistribution(choices=('KNN', 'SVC', 'RandomForest')), 'rf_max_depth': IntDistribution(high=32, log=True, low=2, step=1), 'rf_min_split': IntDistribution(high=5, log=True, low=2, step=1)}, trial_id=21, value=None)
MultiOutputClassifier(estimator=RandomForestClassifier(max_depth=12,
                                                       min_samples_split=4,
                       

In [22]:
y_pred = best_model.predict_proba(X_test)

for i in range(0,len(y_train.columns)):
  try:
    print(f'Метрика ROC-AUC на тестовой выборке у признака {y_test.columns[i]}: {roc_auc_score(y_test.iloc[:,i].values, pd.DataFrame(y_pred[i])[1].values)}')
  except:
    print(f'Случился упс у признака {y_test.columns[i]}')

Метрика ROC-AUC на тестовой выборке у признака Температура в квартире ниже нормативной: 0.9109827180693323
Метрика ROC-AUC на тестовой выборке у признака T1 > max: 0.9301647247890719
Метрика ROC-AUC на тестовой выборке у признака Отсутствие отопления в доме: 0.6111343003234895
Метрика ROC-AUC на тестовой выборке у признака Сильная течь в системе отопления: 0.6159330011074198
Метрика ROC-AUC на тестовой выборке у признака Течь в системе отопления: 0.5736400259176833


In [23]:
#выгрузка готовой модели
pickle.dump(best_model, open("../location_p/model_training/rf_zkh.pkl", "wb"))