# Projet 9: Réalisez une application mobile de recommandation de contenu

1. [Introduction](#introduction)
    1. [Importation des librairies](#imports)
    2. [Transformations du jeu de données](#transformations)
2. [Recommenders](#recommenders)
    1. [Content-based recommender](#content)
    2. [Collaborative filtering](#collaborative)
    3. [Similarité cosine](#cosine)
    4. [Surprise library](#surprise)
4. [Déploiement](#deployment)


## Introduction <a name="introduction"></a>

My Content est une start-up qui veut encourager la lecture en recommandant des contenus pertinents pour ses utilisateurs. En temps que CTO et cofondateur de la start-up, nous sommes chargés de la construction d’un premier MVP qui prendra la forme d’une application mobile. 

### Importation des librairies <a name="imports"></a>

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt


import os

from surprise import Reader, Dataset
from surprise.model_selection import train_test_split, cross_validate
from surprise import SVD, accuracy

from sklearn.metrics.pairwise import cosine_similarity
from sklearn.metrics.pairwise import linear_kernel

### Transformations du jeu de données <a name="transformations"></a>

Notre jeu de données consiste en deux types de tableaux. Le premier, articles_metadata, contient les informations concernant les articles. Il y a notamment leur catégorie, et d'autres informations telles que la date de création qui ne nous seront pas utiles.

In [3]:
articles_metadata = pd.read_csv('E://datasets/Globo/articles_metadata.csv')  
articles_metadata

Unnamed: 0,article_id,category_id,created_at_ts,publisher_id,words_count
0,0,0,1513144419000,0,168
1,1,1,1405341936000,0,189
2,2,1,1408667706000,0,250
3,3,1,1408468313000,0,230
4,4,1,1407071171000,0,162
...,...,...,...,...,...
364042,364042,460,1434034118000,0,144
364043,364043,460,1434148472000,0,463
364044,364044,460,1457974279000,0,177
364045,364045,460,1515964737000,0,126


Le deuxième type de tableau contient des informations sur les clics d'utilisateurs: chaque ligne correspond à un clic, et on a de nombreuses informations sur ces clics. On connait entre autres l'utilisateur qui a cliqué, le numéro de session, la date du clic, etc. Nous nous servirons principalement du numéro d'utilisateur. Il y a plusieurs tableaux de ce type, séparés par période de temps, que nous allons regrouper.

In [4]:
clicks_hour_000 = pd.read_csv('E://datasets/Globo/clicks/clicks/clicks_hour_000.csv')  
clicks_hour_000

Unnamed: 0,user_id,session_id,session_start,session_size,click_article_id,click_timestamp,click_environment,click_deviceGroup,click_os,click_country,click_region,click_referrer_type
0,0,1506825423271737,1506825423000,2,157541,1506826828020,4,3,20,1,20,2
1,0,1506825423271737,1506825423000,2,68866,1506826858020,4,3,20,1,20,2
2,1,1506825426267738,1506825426000,2,235840,1506827017951,4,1,17,1,16,2
3,1,1506825426267738,1506825426000,2,96663,1506827047951,4,1,17,1,16,2
4,2,1506825435299739,1506825435000,2,119592,1506827090575,4,1,17,1,24,2
...,...,...,...,...,...,...,...,...,...,...,...,...
1878,705,1506828968165442,1506828968000,2,119592,1506830912301,4,1,17,1,21,2
1879,705,1506828968165442,1506828968000,2,284847,1506830942301,4,1,17,1,21,2
1880,706,1506828979881443,1506828979000,3,108854,1506829027334,4,3,2,1,25,1
1881,706,1506828979881443,1506828979000,3,96663,1506829095732,4,3,2,1,25,1


La cellule ci-dessous permet de regrouper les clics d'utilisateur.

In [5]:
if not os.path.exists('E://datasets/Globo/clicks/clicks/clicks.csv'):
    clicks_path = []
    clicks_dir = "E://datasets/Globo/clicks/clicks/"

    clicks_path = clicks_path + sorted(
            [
                os.path.join(clicks_dir, fname)
                for fname in os.listdir(clicks_dir)
                if fname.endswith(".csv")
            ]
        )
    print("Number of clicks csv:", len(clicks_path))

    _li = []

    for filename in clicks_path:
        df = pd.read_csv(filename, index_col=None, header=0)
        _li.append(df)

    clicks = pd.concat(_li, axis=0, ignore_index=True)
    clicks.to_csv('E://datasets/Globo/clicks/clicks/clicks.csv')
else:
    clicks= pd.read_csv('E://datasets/Globo/clicks/clicks/clicks.csv')
    
clicks

Unnamed: 0.1,Unnamed: 0,user_id,session_id,session_start,session_size,click_article_id,click_timestamp,click_environment,click_deviceGroup,click_os,click_country,click_region,click_referrer_type
0,0,0,1506825423271737,1506825423000,2,157541,1506826828020,4,3,20,1,20,2
1,1,0,1506825423271737,1506825423000,2,68866,1506826858020,4,3,20,1,20,2
2,2,1,1506825426267738,1506825426000,2,235840,1506827017951,4,1,17,1,16,2
3,3,1,1506825426267738,1506825426000,2,96663,1506827047951,4,1,17,1,16,2
4,4,2,1506825435299739,1506825435000,2,119592,1506827090575,4,1,17,1,24,2
...,...,...,...,...,...,...,...,...,...,...,...,...,...
2988176,2988176,10051,1508211372158328,1508211372000,2,84911,1508211557302,4,3,2,1,25,1
2988177,2988177,322896,1508211376302329,1508211376000,2,30760,1508211672520,4,1,17,1,25,2
2988178,2988178,322896,1508211376302329,1508211376000,2,157507,1508211702520,4,1,17,1,25,2
2988179,2988179,123718,1508211379189330,1508211379000,2,234481,1508211513583,4,3,2,1,25,2


Une fois les clics regroupés, on cherche surtout à récupérer la liste des articles cliqués pour chaque utilisateur. 

In [6]:
df = clicks.groupby('user_id').agg(
    LIST_click_article_id = ('click_article_id', lambda x: list(x)),
)
df

Unnamed: 0_level_0,LIST_click_article_id
user_id,Unnamed: 1_level_1
0,"[157541, 68866, 96755, 313996, 160158, 233470,..."
1,"[235840, 96663, 59758, 160474, 285719, 156723,..."
2,"[119592, 30970, 30760, 209122]"
3,"[236065, 236294, 234686, 233769, 235665, 23513..."
4,"[48915, 44488, 195887, 195084, 63307, 336499, ..."
...,...
322892,"[42567, 39894]"
322893,"[50644, 36162]"
322894,"[36162, 168401]"
322895,"[289197, 63746]"


Enfin, on associe à chaque liste d'articles la liste des catégories qui leur correspond.

In [7]:
if not os.path.exists('E://datasets/Globo/df.csv'):
    df['categories'] = ''

    for index, row in df.iterrows():
        _list_row = []
        for article in row.LIST_click_article_id:
            _list_row.append(articles_metadata[articles_metadata.article_id == article].category_id.values[0])
        df.loc[index]['categories']=_list_row
    df.to_csv('E://datasets/Globo/df.csv')
else:
    df = pd.read_csv('E://datasets/Globo/df.csv')

In [8]:
df

Unnamed: 0,user_id,LIST_click_article_id,categories
0,0,"[157541, 68866, 96755, 313996, 160158, 233470,...","[281, 136, 209, 431, 281, 375, 186, 186]"
1,1,"[235840, 96663, 59758, 160474, 285719, 156723,...","[375, 209, 123, 281, 412, 281, 331, 412, 435, ..."
2,2,"[119592, 30970, 30760, 209122]","[247, 26, 26, 332]"
3,3,"[236065, 236294, 234686, 233769, 235665, 23513...","[375, 375, 375, 375, 375, 375, 281, 375, 375, ..."
4,4,"[48915, 44488, 195887, 195084, 63307, 336499, ...","[92, 81, 317, 317, 132, 437, 399]"
...,...,...,...
322892,322892,"[42567, 39894]","[67, 66]"
322893,322893,"[50644, 36162]","[99, 43]"
322894,322894,"[36162, 168401]","[43, 297]"
322895,322895,"[289197, 63746]","[418, 133]"


## Recommenders <a name="recomenders"></a>

### Content-based recommender <a name="content"></a>

Tous d'abord, nous créons la fonction inputUserRatings, qui à un ID d'utilisateur associe une matrice indiquant ses catégories préférées (c'est à dire celles sur lesquelles il a le plus cliqué). Le nombre de clics est ensuite normalisé.

In [10]:
def inputUserRatings(userId):
    _matrix = pd.DataFrame(columns=['click'])
    _row = df.loc[userId]['categories']
    _row = _row.replace('[', '').replace(']', '').replace(',', '').split()
    
    for index, val in pd.Series(_row).value_counts().items():
        _matrix.loc[index] = int(val)
         
    _matrix['click_norm'] = _matrix.apply(lambda x : x / _matrix['click'].sum())
    _matrix = _matrix.reset_index()
    _matrix = _matrix.rename(columns={"index": "category_id"})
    _matrix['category_id'] = _matrix['category_id'].astype(int)
    
    return _matrix

In [16]:
inputUserRatings(5)

Unnamed: 0,category_id,click,click_norm
0,281,19,0.218391
1,412,17,0.195402
2,442,12,0.137931
3,327,3,0.034483
4,99,3,0.034483
5,375,3,0.034483
6,125,2,0.022989
7,413,2,0.022989
8,126,2,0.022989
9,399,2,0.022989


Pour simplifier le tableau articles_metadata, nous allons juste récupérer les articles et leurs catégories.

In [17]:
articles_matrix = articles_metadata.loc[:, ['article_id', 'category_id']]

In [18]:
articles_matrix

Unnamed: 0,article_id,category_id
0,0,0
1,1,1
2,2,1
3,3,1
4,4,1
...,...,...
364042,364042,460
364043,364043,460
364044,364044,460
364045,364045,460


Passons maintenant à la fonction de recommandation à proprement parler.

In [19]:
def recommend1(userId, articles_matrix):
    pd.options.mode.chained_assignment = None  # default='warn'
    user_profile = inputUserRatings(userId)
    df = articles_matrix[articles_matrix['category_id']==281]
    df["weight"]=0.25
    recommendation_matrix=pd.DataFrame(columns=['article_id', 'category_id', 'weight'])
    for cat_id in user_profile['category_id']:
        new_recommendations=articles_matrix[articles_matrix['category_id']==cat_id]
        new_recommendations["weight"]=user_profile[user_profile['category_id']==cat_id]['click_norm'].iloc[0]
        recommendation_matrix=pd.concat([recommendation_matrix,new_recommendations])
    return recommendation_matrix.head(5)

In [20]:
%%time
res = recommend1(5 , articles_matrix)
res

Wall time: 111 ms


Unnamed: 0,article_id,category_id,weight
150044,150044,281,0.218391
150045,150045,281,0.218391
150046,150046,281,0.218391
150047,150047,281,0.218391
150048,150048,281,0.218391


### Collaborative filtering <a name="collaborative"></a>

Passons maintenant au collaborative filtering. Premièrement, nous allons créer une fonction qui convertit une chaine de caractère représentant une liste en une liste.

In [39]:
def Convert(string):
    li = list(string.replace('[', '').replace(']', '').replace(',', '').split())  
    return [int(e) for e in li]

Nous aurons aussi besoin d'une fonction permettant de voir si oui ou non deux listes ont une intersection non-vide.

In [40]:
def check_intersection(l1,l2):
    for e in l1:
        if e in l2:
            return True
    return False

Pour illuster le fonctionnement de cette méthode, nous étudierons un exemple avant de décrire la fonction qui résume le procédé.

In [41]:
user=2
user_articles=Convert(df['LIST_click_article_id'][user])
indices=[]
for i in range(len(df)):
    test_articles=Convert(df['LIST_click_article_id'][i])
    if(i!=user and check_intersection(user_articles,test_articles)):
        indices+=[i]

indices contient ici la liste des utilisateurs qui ont regardé des articles en commun avec l'utilisateur de référence.

In [42]:
indices[:10]

[7, 8, 11, 12, 15, 19, 22, 26, 34, 39]

On peut alors prendre un subset de df ne contenant que les utilisateurs concernés.

In [43]:
ratings_matrix_subset = df[df['user_id'].isin(indices)]

In [44]:
ratings_matrix_subset

Unnamed: 0,user_id,LIST_click_article_id,categories
7,7,"[235840, 284847, 156624, 284547, 123757, 16862...","[375, 412, 281, 412, 250, 297, 134, 437, 375, ..."
8,8,"[332114, 284847, 114161, 272660, 273464, 31350...","[436, 412, 237, 399, 399, 431, 375, 375, 323, ..."
11,11,"[207122, 119592, 202528, 220466, 216102, 26161...","[331, 247, 327, 353, 351, 396, 300, 331, 354, ..."
12,12,"[288431, 159359, 119592, 235132, 140720, 29311...","[418, 281, 247, 375, 265, 421, 209, 431, 412, ..."
15,15,"[119592, 108854, 284406, 283238, 313996, 10729...","[247, 230, 412, 412, 431, 228, 209]"
...,...,...,...
322851,322851,"[209122, 211442]","[332, 340]"
322855,322855,"[70986, 70758, 50644, 209122]","[136, 136, 99, 332]"
322866,322866,"[70986, 209122]","[136, 332]"
322867,322867,"[202355, 209122]","[327, 332]"


On peut alors recommender des articles que les autres utilisateurs ont déjà vu.

In [45]:
recommendations=[]
for i in indices:
    l = Convert(ratings_matrix_subset['LIST_click_article_id'][i])
    for e in l:
        if e not in recommendations:
            recommendations+=[e]
print(recommendations[:5],len(recommendations))

[235840, 284847, 156624, 284547, 123757] 9948


La fonction suivante résume toutes ces étapes.

In [46]:
def recommend2(userId):
    user_articles=Convert(df['LIST_click_article_id'][user])
    indices=[]
    for i in range(len(df)):
        test_articles=Convert(df['LIST_click_article_id'][i])
        if(i!=user and check_intersection(user_articles,test_articles)):
            indices+=[i]
    ratings_matrix_subset = df[df['user_id'].isin(indices)]
            
    recommendations=[]
    for i in indices:
        l = Convert(ratings_matrix_subset['LIST_click_article_id'][i])
        for e in l:
            if e not in recommendations:
                recommendations+=[e]
    #print(recommendations[:5],len(recommendations)) 
    return recommendations[:5]

In [48]:
%%time
res = recommend2(5)
res

Wall time: 4.67 s


[235840, 284847, 156624, 284547, 123757]

### Similarité cosine <a name="cosine"></a>

la matrice des articles embeddings peut être utilisée pour déterminer une notion de distance entre les articles, et ainsi recommander à un utilisateur des articles proches de ceux qu'il a déjà vu. Encore une fois, nous verrons d'abord un exemple avec un utilisateur de référence, puis nous décrirons la fonction dans son entièreté.

In [50]:
tfidf_matrix = pd.read_pickle('E://datasets/Globo/articles_embeddings.pickle')
tfidf_matrix.shape

(364047, 250)

In [51]:
df

Unnamed: 0,user_id,LIST_click_article_id,categories
0,0,"[157541, 68866, 96755, 313996, 160158, 233470,...","[281, 136, 209, 431, 281, 375, 186, 186]"
1,1,"[235840, 96663, 59758, 160474, 285719, 156723,...","[375, 209, 123, 281, 412, 281, 331, 412, 435, ..."
2,2,"[119592, 30970, 30760, 209122]","[247, 26, 26, 332]"
3,3,"[236065, 236294, 234686, 233769, 235665, 23513...","[375, 375, 375, 375, 375, 375, 281, 375, 375, ..."
4,4,"[48915, 44488, 195887, 195084, 63307, 336499, ...","[92, 81, 317, 317, 132, 437, 399]"
...,...,...,...
322892,322892,"[42567, 39894]","[67, 66]"
322893,322893,"[50644, 36162]","[99, 43]"
322894,322894,"[36162, 168401]","[43, 297]"
322895,322895,"[289197, 63746]","[418, 133]"


Afin de comparer ce qu'un utlisateur a vu avec le reste des documents, nous avons plusieurs possibilités: on peut choisir un article au hasard parmi ceux qu'il a vu, choisir le dernier article qu'il a vu, ou bien faire la moyenne des embeddings des articles qu'il a vu. Nous choisirons ici la troisième méthode pour l'exemple, mais il n'est pas difficile de réaliser les deux autres fonctions.

In [53]:
mean=tfidf_matrix[Convert(df['LIST_click_article_id'][0])].sum(axis=0)/len(Convert(df['LIST_click_article_id'][0]))

In [54]:
mean

array([-0.2103363 , -0.96357316, -0.19693483, -0.03603265, -0.35537797,
       -0.06501008, -0.26539654,  0.2746106 ,  0.05667167, -0.07157854,
        0.23942748, -0.28148898, -0.5039415 , -0.20950975,  0.11939192,
       -0.38639277,  0.12121646, -0.04283408, -0.19633986, -0.34574744,
        0.23315966, -0.16743226,  0.0159268 , -0.19047889,  0.06074592,
       -0.19402462, -0.12289533,  0.40420762, -0.35268888,  0.38777018,
        0.26388887,  0.28620663,  0.20145859, -0.34809983, -0.28379998,
        0.11757812,  0.13228935,  0.29330313, -0.09344876, -0.3391693 ,
       -0.25866377,  0.12169958,  0.5532014 , -0.94994354, -0.21334068,
        0.30972883, -0.00873232,  0.24818957,  0.30878684,  0.03226997,
        0.00868366, -0.22514868,  0.05656823, -0.10989898,  0.05942007,
        0.2483601 , -0.12747547,  0.3319702 ,  0.92854846,  0.3162514 ,
        0.8554979 ,  0.38393283, -0.3459077 , -0.3333925 , -0.26319897,
        0.0406845 ,  0.2873671 ,  0.06899647, -0.10329024, -0.12

In [55]:
cosine_similarities = cosine_similarity(tfidf_matrix, mean.reshape(1, -1)) 

In [56]:
cosine_similarities.shape

(364047, 1)

In [57]:
np.sort(cosine_similarities,axis=0)

array([[-0.36336488],
       [-0.36336488],
       [-0.36336488],
       ...,
       [ 0.8417565 ],
       [ 0.8489112 ],
       [ 0.8747992 ]], dtype=float32)

In [58]:
cosine_similarities.argsort(axis=0)[-6:]

array([[107637],
       [160079],
       [155943],
       [162230],
       [160966],
       [162235]], dtype=int64)

In [60]:
def recommend3(user,df,tfidf_matrix,nb_recommendations=5):
    seen = Convert(df['LIST_click_article_id'][user])
    mean=tfidf_matrix[seen].sum(axis=0)/len(seen)
    cosine_similarities = cosine_similarity(tfidf_matrix, mean.reshape(1, -1)) 
    best_reco = cosine_similarities.argsort(axis=0)
    reco=[]
    choice=-1
    while(len(reco)<nb_recommendations):
        if best_reco[choice] not in seen:
            reco+=[best_reco[choice][0]]
        choice-=1
        if(choice<-len(seen)-nb_recommendations):
            break
    return reco

In [61]:
%%time
res = recommend3(0,df,tfidf_matrix)
res

Wall time: 372 ms


[162235, 160966, 162230, 155943, 160079]

### Surprise library <a name="surprise"></a>

In [65]:
dataframe = clicks.merge(articles_metadata, left_on='click_article_id', right_on='article_id')

In [66]:
dataframe = dataframe[['user_id', 'article_id', 'category_id']]
dataframe

Unnamed: 0,user_id,article_id,category_id
0,0,157541,281
1,20,157541,281
2,44,157541,281
3,45,157541,281
4,76,157541,281
...,...,...,...
2988176,195186,2221,1
2988177,75658,271117,399
2988178,217129,20204,9
2988179,217129,70196,136


Pour utiliser la librairie surprise, il nous faut une matrice contenant des ratings pour les articles. Puisque nous n'avons pas de rating explicite, nous allons utiliser le nombre de clics comme rating.

In [67]:
series = dataframe.groupby(['user_id', 'category_id']).size()
user_rating_matrix = series.to_frame()
user_rating_matrix = user_rating_matrix.reset_index()
user_rating_matrix.rename(columns = {0:'rate'}, inplace = True)

In [68]:
user_rating_matrix

Unnamed: 0,user_id,category_id,rate
0,0,136,1
1,0,186,2
2,0,209,1
3,0,281,2
4,0,375,1
...,...,...,...
1882297,322894,297,1
1882298,322895,133,1
1882299,322895,418,1
1882300,322896,26,1


In [69]:
reader = Reader()
data = Dataset.load_from_df(user_rating_matrix, reader)

In [70]:
trainset, testset = train_test_split(data, test_size=0.25)

In [73]:
algo = SVD()
algo.fit(trainset)

predictions = algo.test(testset)

In [72]:
accuracy.rmse(predictions)

RMSE: 3.8691


3.869136237662854

## Déploiement <a name="deployment"></a>

Afin de dévolopper notre application sur mobile, il nous faut d'abord une API qui à un utilisateur donné retourne une série de recommandations. Nous allons créer une URL vers un endpoint qui réalise cette action grâce à azure.

In [74]:
import azureml.core

# Display the core SDK version number
print("Azure ML SDK Version: ", azureml.core.VERSION)

Azure ML SDK Version:  1.37.0


Il nous faut pour cela écrire un fichier score.py permettant de récupérer les datasets. Il faut aussi, à partir du nom d'un userId qu'on donne en entrée récupérer les prédictions, et les renvoyer dans un format de dictionnaire.

In [75]:
%%writefile scoreP9.py
import os, json, logging, joblib, requests, shutil
from json import JSONEncoder
import pathlib
import numpy as np
import pandas as pd
from keras.utils.data_utils import get_file
from sklearn.metrics.pairwise import cosine_similarity

def Convert(string):
    li = list(string.replace('[', '').replace(']', '').replace(',', '').split())  
    return [int(e) for e in li]

def recommend(user,df,tfidf_matrix,nb_recommendations=5):
    seen = Convert(df['LIST_click_article_id'][user])
    mean=tfidf_matrix[seen].sum(axis=0)/len(seen)
    cosine_similarities = cosine_similarity(tfidf_matrix, mean.reshape(1, -1)) 
    best_reco = cosine_similarities.argsort(axis=0)
    reco=[]
    choice=-1
    while(len(reco)<nb_recommendations):
        if best_reco[choice] not in seen:
            reco+=[best_reco[choice][0]]
        choice-=1
        if(choice<-len(seen)-nb_recommendations):
            break
    return reco


def init():
    global df, tfidf_matrix
    

    df_path = get_file(
    'df.csv',
    "https://p8workspace1601643265.blob.core.windows.net/azureml-blobstore-b663290d-f631-4713-8717-4c3d214088e3/UI/02-24-2022_105624_UTC/df.csv"
        )
    df = pd.read_csv(df_path)

    tfidf_matrix_path = get_file(
    'articles_embeddings.pickle',
    "https://p8workspace1601643265.blob.core.windows.net/azureml-blobstore-b663290d-f631-4713-8717-4c3d214088e3/UI/02-24-2022_105745_UTC/articles_embeddings.pickle"
        )
    tfidf_matrix = pd.read_pickle(tfidf_matrix_path)
    


def run(raw_data):
    if raw_data == "":
        return {}
    else:
        data = json.loads(raw_data)
        
    user = int(data['user_id']) 
    
    recommendations = recommend(user, df, tfidf_matrix)
    
    return_dic = {}
    for n,i in enumerate(recommendations):
        return_dic[n] = str(i)

    return return_dic

Overwriting scoreP9.py


Nous allons ici créer la configuration pour le déploiement.

In [76]:
from azureml.core.webservice import AciWebservice

aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, 
                                               memory_gb=1, 
                                               tags={"data": "globo",  "method" : "cosine"}, 
                                               description='recommend articles with cosine similarity')

In [77]:
import uuid
from azureml.core.webservice import Webservice
from azureml.core.model import InferenceConfig
from azureml.core.environment import Environment
from azureml.core import Workspace
from azureml.core.model import Model

Pour que cela fonctionne, il nous faut créer un Worspace sur Azure. On comentera ces lignes car le workspace est déjà créé.

In [5]:
#ws = Workspace.create(name='myworkspace',
#               subscription_id='<subscription_id>',
#               resource_group='myresourcegroup',
#               create_resource_group=True, #              location='eastus2'
#               )

In [6]:
#myenv = Environment.from_existing_conda_environment(name="myenv",
#                                                    conda_environment_name="TF27")

In [7]:
#myenv.save_to_directory(path="C:\\Users\\Raphael\\myenv", overwrite=True)

Il nous faut aussi préciser l'environnement avec lequel on travaille. A cause de quelques erreurs, nous avons du modifier le fichier d'environnement à la main, nous allons donc commenter la partie exportation d'environnement, et nous chargerons directement l'environnement désiré.

In [8]:
newenv = Environment.load_from_directory(path="C:\\Users\\Raphael\\myenv")

Nous pouvons alors passer au déploiement du modèle.

In [9]:
ws = Workspace.from_config()
model = Model(ws, 'my_model')

inference_config = InferenceConfig(entry_script="scoreP9.py", environment=newenv)

service_name = 'p9-' + str(uuid.uuid4())[:4]
service = Model.deploy(workspace=ws, 
                       name=service_name, 
                       models=[model], 
                       inference_config=inference_config, 
                       deployment_config=aciconfig)

In [10]:
service.wait_for_deployment(show_output=True)

Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes.
Running
2022-03-01 17:22:25+01:00 Creating Container Registry if not exists.
2022-03-01 17:22:25+01:00 Registering the environment.
2022-03-01 17:22:29+01:00 Use the existing image.
2022-03-01 17:22:29+01:00 Generating deployment configuration.
2022-03-01 17:22:31+01:00 Submitting deployment to compute..
2022-03-01 17:22:38+01:00 Checking the status of deployment p9-d13d..
2022-03-01 17:26:34+01:00 Checking the status of inference endpoint p9-d13d.
Succeeded
ACI service creation operation finished, operation "Succeeded"


In [11]:
a=ws.webservices[service_name].get_logs()
#print(a)

In [12]:
print(service.scoring_uri)

http://7cc2e629-221d-4f12-8682-6b926ecb310b.westeurope.azurecontainer.io/score


Pour tester le bon fonctionnement de la méthode, nous avons réalisé une requête pour l'utilisateur 0.

In [13]:
data = {"user_id": "0"}
 

import requests
r = requests.post(service.scoring_uri, json=data)

if not r.ok:
    print(f"Erreur de type {r.status_code}")

In [14]:
r.text

'{"0": "162235", "1": "160966", "2": "162230", "3": "155943", "4": "160079"}'