<center>
  <font size="7">Tests unitaires de l'API</font><br>
  <font size="5">Projet 7 - Implémentez un modèle de scoring</font>
</center>
<div align="right">
  <font size="4"><i>par Jean Vallée</i></font>
</div>

<hr size=5>

# Scénario de test unitaire 
Vu qu'il s'agit d'un modèle de classification binaire, le test unitaire consistera à :
- faire prédire la variable cible par l'API du modèle 
- fournir au modèle les valeurs des attributs via un formulaire

Le micro-framework Web léger [_Flask_](https://flask.palletsprojects.com/en/3.0.x/quickstart/) prend en charge la gestion de requêtes et réponses REST. Cf. section [Serveur Web](#web_server)

In [41]:
import pandas as pd
import json
with open('../config.json') as file_object:
    dict_config = json.load(file_object)

# Tests avec _pytest_

## Installation de _pytest_

In [43]:
! pip install pytest



## Catalogue de tests
Des fichiers sont écrits depuis ce _Noteboook_ pour faciliter les mises à jour des paramètres de connexion tels que l'adresse et le port d'un serveur, par exemple 

### Test 1 de connexion et du format
Test de connexion au serveur et de vérification du format des données en entrée

In [246]:
dir_test_data = '../test_api/data/'
path_test_1 = './test_1.py'

#### Paramètres de connexion

In [247]:
ip_server           = dict_config['ip_host']
port_staging_server = dict_config['port_staging']
url_staging_server  = ip_server + ':' + port_staging_server + '/invocations'

#### Vérification de connexion

In [248]:
! curl -v "http://4.233.201.217:5677/version" 

*   Trying 4.233.201.217:5677...
* TCP_NODELAY set
* Connected to 4.233.201.217 (4.233.201.217) port 5677 (#0)
> GET /version HTTP/1.1
> Host: 4.233.201.217:5677
> User-Agent: curl/7.68.0
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Server: gunicorn
< Date: Tue, 25 Jun 2024 11:21:42 GMT
< Connection: close
< Content-Type: application/json
< Content-Length: 6
< 
* Closing connection 0
2.14.1

#### Contenu de _test_1.py_

In [249]:
str_test_1 = \
'''\
import requests
def test_connection():    
    print('__________Connection to API Server_____________')
    req_post = requests.post(   url     = 'http://ip_server:port_server/invocations', 
                                headers = {'Content-Type': 'application/json'}, 
                                data    = '{"dataframe_split": str_dataframe_split }' )
    # Evaluation of response
    is_connected = req_post.ok
    if is_connected : print('OK : Connected to API Server & validated input format')
    assert is_connected
'''

**Mise à jour du contenu**

On remplace les chaînes de caractères avec les valeurs des variables :
- 1. Connexion au serveur via requête POST 

In [250]:
str_test_1 = str_test_1 .replace('ip_server',   ip_server) \
                        .replace('port_server', port_staging_server)

- 2. Données avec un format valide (cf. [Deploy MLflow Model](https://mlflow.org/docs/latest/deployment/deploy-model-locally.html))

In [251]:
path_TP = '../modeling/data/out/X_TP.csv'   # True Positives dataset
df_TP = pd.read_csv(path_TP)
df_TP_sample_1 = df_TP.head(1)
str_dataframe_split = str(df_TP_sample_1.to_dict(orient="split")).replace("'", '"')
str_dataframe_split

'{"index": [0], "columns": ["NAME_EDUCATION_TYPE_Higher_education", "EXT_SOURCE_3", "EXT_SOURCE_2", "CODE_GENDER_M", "NAME_EDUCATION_TYPE_Secondary_or_secondary_special", "NAME_CONTRACT_TYPE_Cash_loans", "NAME_INCOME_TYPE_Working"], "data": [[0, 0.5638350489514956, 0.3058183171273599, 0, 1, 1, 1]]}'

In [252]:
str_test_1 = str_test_1.replace('str_dataframe_split', str_dataframe_split)

**Création du fichier**

In [253]:
with open(path_test_1, "w") as file_object:
    print(str_test_1, file=file_object)

In [254]:
!cat $path_test_1

import requests
def test_connection():    
    print('__________Connection to API Server_____________')
    req_post = requests.post(   url     = 'http://4.233.201.217:5677/invocations', 
                                headers = {'Content-Type': 'application/json'}, 
                                data    = '{"dataframe_split": {"index": [0], "columns": ["NAME_EDUCATION_TYPE_Higher_education", "EXT_SOURCE_3", "EXT_SOURCE_2", "CODE_GENDER_M", "NAME_EDUCATION_TYPE_Secondary_or_secondary_special", "NAME_CONTRACT_TYPE_Cash_loans", "NAME_INCOME_TYPE_Working"], "data": [[0, 0.5638350489514956, 0.3058183171273599, 0, 1, 1, 1]]} }' )
    # Evaluation of response
    is_connected = req_post.ok
    if is_connected : print('OK : Connected to API Server & validated input format')
    assert is_connected



### Test 2 de prédiction
Comparaison des prédictions sur un jeu de 3 observations sensées être de classe 1

In [23]:
path_test_2 = './test_2.py'

In [433]:
str_test_2 = \
'''
import requests
import pandas as pd
import json

def test_predict():
    dict_inputs = str_dataframe_split                     # input observations      
    df_result_grid = pd.DataFrame(data=dict_inputs['data'], columns=dict_inputs['columns'])
    df_result_grid['target'] = str_li_targets      # expected output

    # Send input data to prediction API via POST request 
    req_post = requests.post(   url     = 'http://ip_server:port_server/invocations', 
                                headers = {'Content-Type': 'application/json'}, 
                                data    = '{"dataframe_split": str_dataframe_split }' )
    
    dict_predicted = json.loads(req_post.text)  # output predictions {'predictions': [1, 1, 1]}
    df_result_grid['prediction'] = dict_predicted['predictions']
    print(df_result_grid.round(3).T)            # results as pandas

    # Compare predicted vs. expected values
    assert df_result_grid['prediction'].equals(df_result_grid['target'])
'''

**Mise à jour du contenu**

On remplace les chaînes de caractères avec les valeurs des variables :
- 1. Connexion au serveur via requête POST 

In [434]:
str_test_2 = str_test_2 .replace('ip_server',   ip_server) \
                        .replace('port_server', port_staging_server)

- 2. Jeu de 3 observations

In [435]:
nb_observations = 3
path_TP = '../modeling/data/out/X_TP.csv'   # True Positives dataset
df_TP = pd.read_csv(path_TP)
df_TP_sample_N = df_TP.tail(nb_observations)
str_dataframe_split = str(df_TP_sample_N.to_dict(orient="split")).replace("'", '"')
str_dataframe_split

'{"index": [2582, 2583, 2584], "columns": ["NAME_EDUCATION_TYPE_Higher_education", "EXT_SOURCE_3", "EXT_SOURCE_2", "CODE_GENDER_M", "NAME_EDUCATION_TYPE_Secondary_or_secondary_special", "NAME_CONTRACT_TYPE_Cash_loans", "NAME_INCOME_TYPE_Working"], "data": [[0, 0.3944954053123993, 0.2731892846252296, 0, 1, 1, 1], [0, 0.1566398270314114, 0.5342365946929042, 1, 1, 1, 0], [0, 0.1176137317080569, 0.4562911010223129, 0, 1, 1, 0]]}'

In [436]:
str_test_2 = str_test_2.replace('str_dataframe_split', str_dataframe_split)

- 3. Cibles attendues

In [437]:
li_targets = [1] * nb_observations
str_li_targets = str(li_targets)
str_li_targets

'[1, 1, 1]'

In [438]:
str_test_2 = str_test_2 .replace('str_li_targets', str_li_targets)

**Création du fichier**

In [439]:
with open(path_test_2, "w") as file_object:
    print(str_test_2, file=file_object)

In [440]:
!cat $path_test_2


import requests
import pandas as pd
import json

def test_predict():
    dict_inputs = {"index": [2582, 2583, 2584], "columns": ["NAME_EDUCATION_TYPE_Higher_education", "EXT_SOURCE_3", "EXT_SOURCE_2", "CODE_GENDER_M", "NAME_EDUCATION_TYPE_Secondary_or_secondary_special", "NAME_CONTRACT_TYPE_Cash_loans", "NAME_INCOME_TYPE_Working"], "data": [[0, 0.3944954053123993, 0.2731892846252296, 0, 1, 1, 1], [0, 0.1566398270314114, 0.5342365946929042, 1, 1, 1, 0], [0, 0.1176137317080569, 0.4562911010223129, 0, 1, 1, 0]]}                     # input observations      
    df_result_grid = pd.DataFrame(data=dict_inputs['data'], columns=dict_inputs['columns'])
    df_result_grid['target'] = [1, 1, 1]      # expected output

    # Send input data to prediction API via POST request 
    req_post = requests.post(   url     = 'http://4.233.201.217:5677/invocations', 
                                headers = {'Content-Type': 'application/json'}, 
                                data    = '{"dataframe_sp

## Exécution et résultats des tests

In [441]:
!python -m pytest --import-mode=append -rA ../test_api/

platform linux -- Python 3.10.12, pytest-8.2.2, pluggy-1.5.0
rootdir: /home/user_n/Documents/Dev/git/project_7/test_api
plugins: anyio-4.4.0, Faker-25.9.1
collected 2 items                                                              [0m

test_1.py [32m.[0m[32m                                                              [ 50%][0m
test_2.py [32m.[0m[32m                                                              [100%][0m

[32m[1m_______________________________ test_connection ________________________________[0m
----------------------------- Captured stdout call -----------------------------
__________Connection to API Server_____________
OK : Connected to API Server & validated input format
[32m[1m_________________________________ test_predict _________________________________[0m
----------------------------- Captured stdout call -----------------------------
                                                        0      1      2
NAME_EDUCATION_TYPE_Higher_education   

**Observation** : les résultats des tests unitaires sont positifs

# Tests via _curl_

In [442]:
import subprocess

Récupération de la liste d'attributs

In [443]:
with open(dir_test_data + 'li_features.txt') as file_object:
    str_li_features = file_object.read()
str_li_features = str_li_features.replace('\'', '"').replace('\n', '')
str_li_features

'["NAME_EDUCATION_TYPE_Higher_education", "EXT_SOURCE_3", "EXT_SOURCE_2", "CODE_GENDER_M", "NAME_EDUCATION_TYPE_Secondary_or_secondary_special", "NAME_CONTRACT_TYPE_Cash_loans", "NAME_INCOME_TYPE_Working"]'

In [447]:
def get_single_prediction(str_values, str_expected) :
    shell_command = 'curl -d \'{"dataframe_split": { "columns": ' + str_li_features \
        + ', "data": [[' + str_values + ']]}}\' -H \'Content-Type: application/json\' -X POST ' \
        + url_staging_server + ' --no-progress-meter'
    print('Shell command :', shell_command, '\n')
    print(subprocess.getoutput(shell_command))
    print('expected :', str_expected, 'for features values:', str_values)

In [448]:
str_values_TP_sample_1 = ','.join([str(value) for value in list(dict_TP_sample_1.values())])
str_values_TP_sample_1

'0,0.5638350489514956,0.3058183171273599,0,1,1,1'

In [449]:
get_single_prediction(str_values_TP_sample_1, '1')

Shell command : curl -d '{"dataframe_split": { "columns": ["NAME_EDUCATION_TYPE_Higher_education", "EXT_SOURCE_3", "EXT_SOURCE_2", "CODE_GENDER_M", "NAME_EDUCATION_TYPE_Secondary_or_secondary_special", "NAME_CONTRACT_TYPE_Cash_loans", "NAME_INCOME_TYPE_Working"], "data": [[0,0.5638350489514956,0.3058183171273599,0,1,1,1]]}}' -H 'Content-Type: application/json' -X POST 4.233.201.217:5677/invocations --no-progress-meter 

{"predictions": [1]}
expected : 1 for features values: 0,0.5638350489514956,0.3058183171273599,0,1,1,1


# Tests via interface Web d'API

### Accès en ligne

In [450]:
port_web_server = dict_config['port_flask']
print('Website URL =' , 'http://' + ip_server + ':' + port_web_server)

Website URL = http://4.233.201.217:6543


### Arborescence de fichiers

In [453]:
! find ../website | grep -v "checkpoint" | sed -e "s/[^\/]*\//  |/g" -e "s/|\([^ ]\)/└── \1/"

  └── website
  |  └── web_router.py
  |  └── templates
  |  |  └── report.html
  |  |  └── index.html
  |  |  └── result.html
  |  |  └── form.html
  |  |  └── report_simulation.html
  |  |  └── api_observations.html
  |  └── instructions_WebServer.md
  |  └── static
  |  |  └── css
  |  |  |  └── style.css


### Fichiers _HTML_

In [454]:
import IPython
dir_website = '../website/'
dir_html = dir_website + 'templates/'

#### Index  
Fichier d'accueil

In [455]:
file_html_index = dir_html + 'index.html'
!cat $file_html_index


<!DOCTYPE html>
<html lang="en">
    <head>
        <meta charset="UTF-8">
        <link rel="stylesheet" href="{{ url_for('static', filename= 'css/style.css') }}">
        <title>ML model API</title>
    </head>
    <body>
       <h1>API du modèle de classification binaire</h1>      
       <a href="/form/"><h4>Formulaire</h4></a>
       <a href="/report/"><h4>Rapport Evidently</h4></a>
       <a href="/report_simul/"><h4>Rapport Evidently de simulation</h4></a>
       <img src="https://miro.medium.com/v2/resize:fit:1200/0*5llOZkUFa4KHgDMQ.jpg">
    </body>
</html>



In [456]:
IPython.display.IFrame(file_html_index, width=700, height=350) 

#### Formulaire  
Formulaire de saisie des atttributs

In [538]:
path_form = dir_html + 'form.html'

In [572]:
str_form = \
'''\
<!DOCTYPE html>
<html lang="en">
    <head>
        <meta charset="UTF-8">
        <link rel="stylesheet" href="{{ url_for('static', filename= 'css/style.css') }}">
        <title>ML model API</title>
    </head>
    <body>
        <h1>API du modèle de classification binaire</h1>       
        <h2>Formulaire de saisie d'attributs</h2>       
        
        <form method="post" action="{{ url_for('result') }}">
            <div>
            	<label for="port">Server:</label>
            	<select id="port" name="port">
                		<option value="5677"> Staging    </option>
                		<option value="5678"> Production </option>
            	</select>
            </div>
            <br>
            <table> \
'''

In [573]:
li_features = eval(str_li_features)
li_features

['NAME_EDUCATION_TYPE_Higher_education',
 'EXT_SOURCE_3',
 'EXT_SOURCE_2',
 'CODE_GENDER_M',
 'NAME_EDUCATION_TYPE_Secondary_or_secondary_special',
 'NAME_CONTRACT_TYPE_Cash_loans',
 'NAME_INCOME_TYPE_Working']

In [574]:
with open(dir_test_data + 'li_types.txt') as file_object:
    str_li_types = file_object.read()
str_li_types = str_li_types.replace('[', '["').replace(', ', '", "').replace(']\n', '"]')
li_types = eval(str_li_types)
li_types

['long', 'double', 'double', 'long', 'long', 'long', 'long']

In [575]:
def get_div_per_feature(idx, str_feature_i, str_type_i) :
    str_div_i = \
    '''
                <tr>
                    <td><label for="input_str_idx">str_feature_i :</label></td>
                    <td><input name="str_feature_i" id="input_str_idx" type="number" step="any" value="1"></td>
                    <td>str_type_i</td>
                </tr> \
    '''
    return str_div_i.replace('str_idx',       str(idx))      \
                    .replace('str_feature_i', str_feature_i) \
                    .replace('str_type_i',    str_type_i)

In [576]:
for idx, (feature_i, type_i) in enumerate(zip(li_features, li_types)) :
    str_form += get_div_per_feature(idx + 1, feature_i, type_i)

In [577]:
str_form += \
'''
            </table>
            <br>
            <div>
                <button>Predict</button>
            </div>
        </form>
        <br>
        <p>PI, valeurs des attributs pour obtenir des TP (vrai positif):</p>
        str_pandas_as_html
    </body>
</html>
'''

In [578]:
df_TP_sample_10 = pd.read_csv(path_TP).sample(10, random_state=0)
df_TP_sample_10.T

Unnamed: 0,2396,2203,2033,443,1537,1359,795,817,2544,215
NAME_EDUCATION_TYPE_Higher_education,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
EXT_SOURCE_3,0.345785,0.227613,0.141115,0.712155,0.093837,0.076474,0.107532,0.152866,0.165407,0.415347
EXT_SOURCE_2,0.70249,0.317802,0.3499,0.089189,0.07555,0.018896,0.245341,0.546527,0.261886,0.40358
CODE_GENDER_M,1.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0
NAME_EDUCATION_TYPE_Secondary_or_secondary_special,1.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
NAME_CONTRACT_TYPE_Cash_loans,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0
NAME_INCOME_TYPE_Working,1.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0


In [579]:
str_pandas_as_html = df_TP_sample_10.T.to_html(justify='center', border=0, max_rows=10) #, 
                                #float_format=lambda x: '%.10f' % x)
str_form = str_form .replace('str_pandas_as_html', str_pandas_as_html) \
                    .replace('.000000', '') #('.0000000000', '')

In [580]:
with open(path_form, 'w') as file_object:
    print(str_form, file=file_object)

In [581]:
!cat $path_form

<!DOCTYPE html>
<html lang="en">
    <head>
        <meta charset="UTF-8">
        <link rel="stylesheet" href="{{ url_for('static', filename= 'css/style.css') }}">
        <title>ML model API</title>
    </head>
    <body>
        <h1>API du modèle de classification binaire</h1>       
        <h2>Formulaire de saisie d'attributs</h2>       
        
        <form method="post" action="{{ url_for('result') }}">
            <div>
            	<label for="port">Server:</label>
            	<select id="port" name="port">
                		<option value="5677"> Staging    </option>
                		<option value="5678"> Production </option>
            	</select>
            </div>
            <br>
            <table> 
                <tr>
                    <td><label for="input_1">NAME_EDUCATION_TYPE_Higher_education :</label></td>
                    <td><input name="NAME_EDUCATION_TYPE_Higher_education" id="input_1" type="number" step="any" value="1"></td>
                    <td>lo

In [582]:
IPython.display.IFrame(path_form, width=700, height=350) 

#### Résultat
Page de résultat de prédiction par l'API du modèle

In [584]:
path_result = dir_html + 'result.html'

In [593]:
str_result = \
'''\
<!DOCTYPE html>
<html lang="en">
    <head>
        <meta charset="UTF-8">
        <link rel="stylesheet" href="{{ url_for('static', filename= 'css/style.css') }}">
        <title>ML model API</title>
    </head>
    <body>
       <h1>API du modèle de classification binaire</h1>      
       <h2>Page de prédiction du modèle</h2> 
       <h3>Valeur prédite = {{target_value['predictions']}}</h3>
       <p>Pour les valeurs saisies des attributs suivants :</p>
       <table>
'''

In [594]:
li_features

['NAME_EDUCATION_TYPE_Higher_education',
 'EXT_SOURCE_3',
 'EXT_SOURCE_2',
 'CODE_GENDER_M',
 'NAME_EDUCATION_TYPE_Secondary_or_secondary_special',
 'NAME_CONTRACT_TYPE_Cash_loans',
 'NAME_INCOME_TYPE_Working']

In [595]:
def get_div_per_feature(idx, str_feature_i) :
    str_div_i = \
    ''' 
            <tr>
                <td>str_feature_i</td><td>{{features['str_feature_i']}}</td>  
            </tr> \
    '''
    return str_div_i.replace('str_idx',       str(idx))    \
                    .replace('str_feature_i', str_feature_i)                        

In [596]:
for idx, feature_i in enumerate(li_features) :
    str_result += get_div_per_feature(idx + 1, feature_i)

In [597]:
str_result += \
'''
       </table>
       <br>
       <a href="/">Retour à l'accueil</a>
       <a href="/form/">Formulaire</a>
    </body>
</html>
'''

In [598]:
with open(path_result, "w") as file_object:
    print(str_result, file=file_object)

In [599]:
!cat $path_result

<!DOCTYPE html>
<html lang="en">
    <head>
        <meta charset="UTF-8">
        <link rel="stylesheet" href="{{ url_for('static', filename= 'css/style.css') }}">
        <title>ML model API</title>
    </head>
    <body>
       <h1>API du modèle de classification binaire</h1>      
       <h2>Page de prédiction du modèle</h2> 
       <h3>Valeur prédite = {{target_value['predictions']}}</h3>
       <p>Pour les valeurs saisies des attributs suivants :</p>
       <table>
 
            <tr>
                <td>NAME_EDUCATION_TYPE_Higher_education</td><td>{{features['NAME_EDUCATION_TYPE_Higher_education']}}</td>  
            </tr>      
            <tr>
                <td>EXT_SOURCE_3</td><td>{{features['EXT_SOURCE_3']}}</td>  
            </tr>      
            <tr>
                <td>EXT_SOURCE_2</td><td>{{features['EXT_SOURCE_2']}}</td>  
            </tr>      
            <tr>
                <td>CODE_GENDER_M</td><td>{{features['CODE_GENDER_M']}}</td>  
            </tr>      


In [600]:
IPython.display.IFrame(path_result, width=700, height=350) 

#### Style

In [603]:
dir_style = '../website/static/css/'

In [604]:
file_style = dir_style + 'style.css'
!cat $file_style

html {
    text-align: center;
    font-family: Arial;
}
h1 {
    border: 2px #eee solid;
    color: brown;
    text-align: center;
    padding: 10px;
}
h3 {
    color: darkred;
}
table { 
    margin-left: auto;
    margin-right: auto;
    font-size: 12px;
}
tr:nth-child(odd) {
    background-color: #e6f7ff;
}
tr:nth-child(even) {
    background-color: #b3e6ff;
}

### Fichiers _Python_

#### Routeur Web  
Un fichier en Pyhton couvre la gestion REST

In [605]:
web_router = dir_website + 'web_router.py'
!cat $web_router

import pandas as pd
import json
import subprocess
import requests
from flask import Flask, render_template, request
from mlflow import sklearn as skl
from sklearn.metrics import recall_score, roc_auc_score

# **************************************************** FUNCTIONS ************************
def run_shell(command) :
    shell_process = subprocess.run([command], shell=True, capture_output=True, text=True)
    return str(shell_process.stdout) + str(shell_process.stderr)
def pull() :
    dir_root = '/home/azureuser/project_7/'
    str_command_pull = 'cd ' + dir_root + ' ; git pull origin main'
    str_output = run_shell(str_command_pull)
def restart(str_environment) : 
    if str_environment == 'staging'    : dir_model, port = '../api/staging_model/',    '5677'
    if str_environment == 'production' : dir_model, port = '../api/production_model/', '5678' 
    str_command_serve = 'mlflow models serve -m ' + dir_model + ' -p ' + port + ' -h 0.0.0.0 --no-conda &'
    str_command_ps = 'ps 

## Serveur _Web_

<a id="web_server"></a>
<hr>

### Lancement

In [27]:
shell_command = 'nohup flask --app $web_router run -h 0.0.0.0 -p $port_web_server --debug > ./flask_app.log 2>&1 &'
get_ipython().system_raw(shell_command) # run model API in background

### Vérifications
#### Log

In [28]:
!tail ./flask_app.log

 * Serving Flask app '../website/web_router.py'
 * Debug mode: on
 * Running on all addresses (0.0.0.0)
 * Running on http://127.0.0.1:6543
 * Running on http://10.1.0.4:6543
[33mPress CTRL+C to quit[0m
 * Restarting with watchdog (inotify)


#### Processus
Liste des processus (2)

In [606]:
li_ps = !ps aux | grep "flask" | grep -v "grep" | awk '{print $2}' 
!ps aux | grep "flask" | grep -v "grep" 

Arrêt des processus

In [80]:
for ps_i in li_ps : 
    #!kill -9 $ps_i