## Машинное обучение в бизнесе
### Урок 9. Интеграция. Итоговый проект - клиентская часть для проверки REST API модели на базе Flask

Реализация REST API на базе Flask

1. Взять данные *data* из материала урока.
2. В коде функции send_json(data) установить значение mysql для локального компьютера, т.к. проверка выполняется в Jupyter Notebook.
3. Отправить код response = send_json(data) на предварительно запущенный сервер вручную; также можно, изменив данные, получить другое значение предсказания.
4. Загрузить ранее сохранённые данные X_test, y_test и имитировать посылку множества запросов.

In [1]:
import requests
import urllib.request
import json 

In [28]:
# Пример данных
data = ( 
    "Stylect is a dynamic startup that helps helps women discover and buy shoes. We’re a small team based in London that has previously worked at Google, Techstars, Pixelmator and Rocket Internet.We place a high premium on simplicity no matter what we’re working on (i.e. design, programming, marketing). We’re also a team that ships fast. We built version 1 of our app in a week, the next release (built in a month) was featured in the Apple Appstore Italy as a best new fashion app. Fast release cycles are challenging, but also very fun - which is why we love them.\xa0As we’ve grown, the projects that we’re working on have grown both in scale and in technical complexity. \xa0Stylect is looking for someone who can help us improve our backend which gathers product data; analyses/categorizes it; and shows it to thousands of users daily. Each step in the process has unique challenges that demands a strong technical background.",
    "ustwo offers you the opportunity to be yourself, whilst delivering the best work on the planet for some of the biggest and most innovative brands. A culture thriving on collaboration underpins what is an amazing work smart/ live well environment.We genuinely care about the work that we deliver and the people who help make it all possible. We only invest in projects, people and practices that we believe in, to ensure we remain excited about every opportunity.",
    "We are negotiable on salary and there is the potential for equity for the right candidate."
)

**Ручной единичный запрос к серверу**

In [29]:
# формируем запрос
def send_json(x):
    description, company_profile, benefits = x
    #print(description, company_profile, benefits)
    body = {
        'description': description, 
        'company_profile': company_profile,
        'benefits': benefits
        }
    #myurl = 'http://948a-35-230-58-89.ngrok.io/' + '/predict'
    myurl = 'http://127.0.0.1:5000/' + '/predict'
    headers = {'content-type': 'application/json; charset=utf-8'}
    response = requests.post(myurl, json=body, headers=headers)
    return response.json()['predictions']

In [30]:
# обращение к серверу с запросом из одного набора (его построили руками выше - data)
response = send_json(data)
print('предсказание', response)

предсказание 0.0011293598194111265


### Проверка обработки множества запросов

1. Загрузить данные X_test, y_test и заполнить пустые значения X_test нулевыми строками

In [31]:
import pandas as pd
from sklearn.metrics import roc_auc_score, roc_curve, precision_recall_curve
from sklearn.metrics import f1_score
from urllib import request, parse

In [33]:
X_test = pd.read_csv("X_test.csv")
y_test = pd.read_csv("y_test.csv")

In [34]:
X_test.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5901 entries, 0 to 5900
Data columns (total 18 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   job_id               5901 non-null   int64 
 1   title                5901 non-null   object
 2   location             5800 non-null   object
 3   department           2078 non-null   object
 4   salary_range         954 non-null    object
 5   company_profile      4742 non-null   object
 6   description          5900 non-null   object
 7   requirements         5011 non-null   object
 8   benefits             3522 non-null   object
 9   telecommuting        5901 non-null   int64 
 10  has_company_logo     5901 non-null   int64 
 11  has_questions        5901 non-null   int64 
 12  employment_type      4759 non-null   object
 13  required_experience  3556 non-null   object
 14  required_education   3230 non-null   object
 15  industry             4262 non-null   object
 16  functi

In [35]:
y_test.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5901 entries, 0 to 5900
Data columns (total 1 columns):
 #   Column      Non-Null Count  Dtype
---  ------      --------------  -----
 0   fraudulent  5901 non-null   int64
dtypes: int64(1)
memory usage: 46.2 KB


In [36]:
X_test.fillna('', inplace=True)   # Заполнение пустот нулевыми строками

In [37]:
X_test.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5901 entries, 0 to 5900
Data columns (total 18 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   job_id               5901 non-null   int64 
 1   title                5901 non-null   object
 2   location             5901 non-null   object
 3   department           5901 non-null   object
 4   salary_range         5901 non-null   object
 5   company_profile      5901 non-null   object
 6   description          5901 non-null   object
 7   requirements         5901 non-null   object
 8   benefits             5901 non-null   object
 9   telecommuting        5901 non-null   int64 
 10  has_company_logo     5901 non-null   int64 
 11  has_questions        5901 non-null   int64 
 12  employment_type      5901 non-null   object
 13  required_experience  5901 non-null   object
 14  required_education   5901 non-null   object
 15  industry             5901 non-null   object
 16  functi

In [38]:
X_test[['description', 'company_profile', 'benefits']]

Unnamed: 0,description,company_profile,benefits
0,Stylect is a dynamic startup that helps helps ...,,We are negotiable on salary and there is the p...
1,General Summary: Achieves maximum sales profit...,,Great Health and DentalFast Advancement Opport...
2,"At ustwo™ you get to be yourself, whilst deliv...",ustwo offers you the opportunity to be yoursel...,
3,About EDITDEDITD runs the world's biggest appa...,"We build software for fashion retailers, to he...",
4,As a Web Engineer at Runscope you'll be respon...,Runscope is building tools for developers work...,Be a part of an experienced team who have work...
...,...,...,...
5896,Fabrication and Printing Company in Long Islan...,,Excellent opportunity to learn exci...
5897,Serve as the primary lead and project manager ...,"Palerra, Inc. designed LORIC™ to protect an en...",What's In It For You?Competitive compensation ...
5898,"Jiffy, a world wide leader in mobile applicati...",Jiffy Worldwide is the parent company of the J...,
5899,What our client needs…A Medical Director who i...,Human capital is usually the biggest asset and...,


2. Одиночный запрос к серверу с применением данных X_test

In [39]:
response = send_json(X_test[['description', 'company_profile', 'benefits']].iloc[0,:])
print('предсказание', response)

предсказание 0.026022025069640166


3. Множественный запрос к серверу с оценкой времени выполнения *N* запросов

In [48]:
N = 150

In [49]:
%%time
predictions = X_test[
                     ['description', 'company_profile', 'benefits']
                     ].iloc[:N].apply(lambda x: send_json(x), axis=1)

Wall time: 10.4 s


In [50]:
print(predictions.shape)
predictions.values[:10]

(150,)


array([0.02602203, 0.04317015, 0.00370601, 0.00112958, 0.00151454,
       0.00213981, 0.00256837, 0.00373913, 0.00069803, 0.0122069 ])

4. Оценка метрики

In [51]:
import numpy as np

In [52]:
precision, recall, thresholds = precision_recall_curve(y_test[:N], predictions)

fscore = (2 * precision * recall) / (precision + recall)
# locate the index of the largest f score
ix = np.argmax(fscore)
print(f'Best Threshold = {thresholds[ix]}, F-Score = {fscore[ix]:.3f}, Precision = {precision[ix]:.3f}, Recall = {recall[ix]:.3f}')

Best Threshold = 0.7333309720872493, F-Score = 0.667, Precision = 1.000, Recall = 0.500


In [53]:
roc_auc_score(y_score=predictions.values, y_true=y_test.values[:N])

0.9606481481481481

### Вывод: 
Работоспособность достигнута, свойства Flask воспроизведены по материалам урока. В целом в реализации сервера и клиента прослеживается подход событийно-ориентированного программирования из IDE Borland Delphi или MS visual Basic. Налицо общность развёртывания десктопного приложения и модели, использующей богатство возможностей внешнего системного окружения. "Круг замкнулся" (c). Вероятно, технически возможо определить автоматически, где запускается блокнот - в Collab или Jupyter Notebook и соответствующими образом запускать специфические для каждой из сред фрагменты кода