## <span style="color:green">**Version en cours**</span>

# Segmentez des clients d'un site e-commerce
## Notebook 4 : Simulation d'un contrat de maintenance
OpenClassrooms - Parcours Data Scientist - Projet 05  

## Présentation du projet

**Contexte**  


* Olist est une entreprise brésilienne qui propose une solution de vente sur les marketplaces en ligne.  
* Dans un premier temps il est demandé de réaliser quelques requêtes pour le dashboard à partir de la base de données SQLite d'Olist.  
* La mission principale est de fournir aux équipes d'e-commerce d'Olist une **segmentation des clients** qu’elles pourront utiliser au quotidien pour leurs campagnes de communication.

**Démarche globale**  
* Requêtes SQL pour le dashboard (cf Notebook 1)  
* Feature ingineering (cf Notebook 2)
* Tests de modèles de clustering : (cf Notebook 3)  
* **Simulation d'un contrat de maintenance** : c'est l'objet de ce notebook 4  

**Simulation d'un contrat de maintenance**  
* Objectif : déterminer au bout de combien de temps le modèle retenu doit être réentrainé  
* Démarche :
  * Comparer les résultats du dernier clustering réalisé (fin août 2018) avec les prédictions d'un modèle entrainé X jours avant  
  * La comparaison sera mesurée par le score ARI (Adjusted Rand Index) qui mesure la divergence entre clusters  
  * Si les score obtenu est supérieur à 0.80, considérer que les prédictions sont suffisament fiables et on retester avec les prédictions d'un modèle entraîné X jours avant
  * Le délai avant de devoir réentraîner le modèle sera déterminé dès que le score ARI sera descendu sous le seuil de 0.80

## Sommaire  
**Préparation de l'environnement**  
* Environnement virtuel
* Import des modules
* Fonctions

**Préparation des données**  


# 1 Préparation de l'environnement

## 1.1 Environnement virtuel

In [2]:
# Vérification environnement virtuel
envs = !conda env list
print(f"Environnement virtuel : {[e for e in envs if '*' in e][0].split('*')[1].strip()}")

Environnement virtuel : C:\Users\chrab\anaconda3\envs\opc5


## 1.2 Import des modules

* Installation conditionnelle des librairies

In [3]:
import sys
import subprocess
import pkg_resources

def install_package(package):
    """Installe une librairie en mode silencieux si elle n'est pas encore installée"""
    try:
        pkg_resources.get_distribution(package)
    except pkg_resources.DistributionNotFound:
        print(f"Installation {package}... ", end='')
        subprocess.check_call([sys.executable, "-m", "pip", "install", package, "--quiet"])
        print(f"Terminé.")
    else:
        print(f"{package} est déjà installé.")

In [4]:
# Installation des librairies
install_package('pandas')
install_package('scikit-learn')

pandas est déjà installé.
scikit-learn est déjà installé.


* Import des modules

In [5]:
import pandas as pd
from datetime import datetime, timedelta
from sklearn.preprocessing import MinMaxScaler
from sklearn.cluster import KMeans
from sklearn.metrics import adjusted_rand_score

## 1.3 Fonctions

In [6]:
def get_RFM_features(df, end_date):
    """Renvoit un dataframe contenant une ligne par client ayant commandé avant 'end_date' ainsi que les features :
    - 'Récence'   : nombre de jours depuis la dernière commande
    - 'Fréquence' : 0 si le client a passé une seule commande, 1 s'il en a passé plusieurs
    - 'Montant'   : total des montants des commandes du client
    """
    # Dernière date de commande
    last_purchase_date = df['order_purchase_timestamp'].max()
    
    # Filtre sur la date de fin
    mask = df['order_purchase_timestamp'] < end_date
    
    # Préparation du dataframe
    df_data = df.loc[mask].groupby('customer_unique_id').agg(
        Récence = ('order_purchase_timestamp', 'max'),
        Fréquence = ('customer_id', 'count'),
        Montant = ('total_price', 'sum')
    ).reset_index()
    
    # Récence
    df_data['Récence'] = (last_purchase_date - df_data['Récence']).dt.days

    # Fréquence
    mask = df_data['Fréquence'] > 1
    df_data['Fréquence'] = 0
    df_data.loc[mask, 'Fréquence'] = 1

    return df_data

In [72]:
def add_transformed_RFM_features(df, recence_t=MinMaxScaler(), montant_t=MinMaxScaler()):
    """Ajoute les features RFM transformées à df et renvoit les transformers utilisés"""
    # Récence
    recence_transformer = recence_t.fit(df[['Récence']])
    df['Récence_minmax'] = recence_transformer.transform(df[['Récence']])

    # Fréquence (valeurs déjà comprise entre 0 et 1)
    df['Fréquence_minmax'] = df['Fréquence']

    # Classement par intervalles de montants
    bins = [0, 25, 50, 100, 150, 200, 250, 500, 1000, float('inf')]
    labels = range(len(bins) - 1)
    df['Montant_class'] = pd.cut(df['Montant'], bins=bins, labels=labels, right=False)
    montant_transformer = montant_t.fit(df[['Montant_class']])
    df['Montant_class_minmax'] = montant_transformer.transform(df[['Montant_class']])

    return df, recence_transformer, montant_transformer

In [8]:
def get_kmeans_labels(X, k, random_state=0):
    """Instancie un modèle KMeans et renvoit les labels
    Arguments :
    X (DataFrame)      : données
    k (int)            : nombre de clusters
    random_state (int) : seed
    Retour :
    (numpy.ndarray) : labels des clusters
    """
    # Initialisation du modèle
    kmeans = KMeans(n_clusters=k, random_state=0)
    # Entraînement du modèle
    kmeans.fit(X)

    return kmeans.labels_

In [9]:
def get_date_x_days_before(date_str, x):
    """Renvoit la date qu'il était 'x' jours avant 'date_str'"""
    # Conversion de la chaîne de caractères en date
    date_format = "%Y-%m-%d"
    date_obj = datetime.strptime(date_str, date_format).date()
    
    # Calcul de la date x jours avant
    date_before_x_days = date_obj - timedelta(days=x)
    
    # Conversion en string
    result_date_str = date_before_x_days.strftime(date_format)
    return result_date_str

# 2 Stratégie

Simuler un contrat de maintenance, c’est **déterminer combien de temps le modèle entraîné peut rester performant.**  

* **Comment déterminer cette "performance" ?**  
   * Si on entraîne un modèle à la date D0 et qu’on utilise ce modèle entraîné pour prédire les segmentations à la date D1, il est possible de comparer ces prédictions avec la réalité, en réentraînant le modèle à la date D1.  
   * Pour disposer du clustering réel en D1, il suffit de se positionner en D1, et de simuler une prédiction antérieure en D0, sur les données de D1.  
   * On pourra donc comparer :  
      * `modele_D0.predict(datas_D1).labels_` vs `model_D1.fit(datas_D1).labels_`
      * avec `modele_D0 = KMeans(k=6).fit(datas_D0)`  
   * Attention au fait que les datas_D0 sont en fait des données mises à l’échelle via `MinMaxScaler()`. Il faudra réutiliser la même instance de ce transformer sur les datas_D1 avant le predict pour obtenir une mise à l’échelle cohérente.  

* **Comment mesurer l’efficacité des prédictions ?**  
   * SciKit-learn fournit la méthode `adjusted_rand_score()` qui mesure la divergence entre une liste de labels de clusters prédits et une liste de labels de clusters réels.  
   * Cet indice est proche de 0 pour un clustering aléatoire, négatif pour un clustering particulièrement divergent, et est égal à 1 pour une prédiction parfaite.  
   * Le score de **0.8** sera retenu comme seuil pour un clustering suffisamment efficace pour ne pas nécessiter de réentraîner le modèle.  

# 3 Chargement des données

* Lors de l'étape de feature ingineering (Notebook 2), un dasaset contenant une ligne par commande, avant transformation des features RFM a été préparé

In [15]:
# Chargement du dataset
df_features = pd.read_csv('df_data.csv')

In [16]:
# Affichage de quelques lignes
display(df_features)

Unnamed: 0,customer_unique_id,customer_id,order_id,order_purchase_timestamp,total_price,satisfaction,year_month
0,861eff4711a542e4b93843c6dd7febb0,06b8999e2fba1a1fbc88172c00ba8bc7,00e7ee1b050b8499577073aeb2a297a1,2017-05-16,146.87,4.0,2017-05
1,290c77bc529b7ac935b93aa66c333dc3,18955e83d337fd6b2def6b18a428ac77,29150127e6685892b6eab3eec79f59c7,2018-01-12,335.48,5.0,2018-01
2,060e732b5b29e8181a18229c7b0b2b5e,4e7b3e00288586ebd08712fdd0374a03,b2059ed67ce144a36e2aa97d2c9e9ad2,2018-05-19,157.73,5.0,2018-05
3,259dac757896d24d7702b9acbbff3f3c,b2b6027bc5c5109e529d4dc6358b12c3,951670f92359f4fe4a63112aa7306eba,2018-03-13,173.30,5.0,2018-03
4,345ecd01c38d18a9036ed96c73b8d066,4f2d8ab171c80ec8364f7c12e35b23ad,6b7d50bd145f6fc7f33cebabd7e49d0f,2018-07-29,252.25,5.0,2018-07
...,...,...,...,...,...,...,...
97900,1a29b476fee25c95fbafc67c5ac95cf8,17ddf5dd5d51696bb3d7c6291687be6f,6760e20addcf0121e9d58f2f1ff14298,2018-04-07,88.78,4.0,2018-04
97901,d52a67c98be1cf6a5c84435bd38d095d,e7b71a9017aa05c9a7fd292d714858e8,9ec0c8947d973db4f4e8dcf1fbfa8f1b,2018-04-04,129.06,5.0,2018-04
97902,e9f50caf99f032f0bf3c55141f019d99,5e28dfe12db7fb50a4b2f691faecea5e,fed4434add09a6f332ea398efd656a5c,2018-04-08,56.04,1.0,2018-04
97903,73c2643a0a458b49f58cea58833b192e,56b18e2166679b8a959d72dd06da27f9,e31ec91cea1ecf97797787471f98a8c2,2017-11-03,711.07,5.0,2017-11


In [17]:
# Nombre de clients : doit être égal à 94703 (cf Notebook 2)
display(df_features['customer_unique_id'].nunique())

94703

In [26]:
# Affichage des types de données
df_features.dtypes.reset_index()

Unnamed: 0,index,0
0,customer_unique_id,object
1,customer_id,object
2,order_id,object
3,order_purchase_timestamp,object
4,total_price,float64
5,satisfaction,float64
6,year_month,object


* Le type de données de la variable `order_purchase_timestamp` doit être transformé en type 'date'

In [27]:
# Conversion de la date d'achat au format datetime "YYYY-MM-DD 00:00:00"
df_features['order_purchase_timestamp'] = pd.to_datetime(df_features['order_purchase_timestamp']).dt.normalize()

# 4 TROUVER TITRE

## 4.1 Liste des labels de clusters au 31/08/2018

In [28]:
# last_date = '2018-08-31'

In [29]:
# df_last_features = get_RFM_features(df_features, last_date)
# df_last_features

Unnamed: 0,customer_unique_id,Récence,Fréquence,Montant
0,0000366f3b9a7992bf8c76cfdf3221e2,111,0,141.90
1,0000b849f77a49e4a4ce2b2a4ca5be3f,114,0,27.19
2,0000f46a3911fa3c0805444483337064,537,0,86.22
3,0000f6ccb0745a6a4b88665a16c9f078,321,0,43.62
4,0004aac84e0df4da2b147fca70cf8255,288,0,196.89
...,...,...,...,...
94698,fffcf5a5ff07b0908bd4e2dbc735a684,447,0,2067.42
94699,fffea47cd6d3cc0a88bd621562a9d061,262,0,84.58
94700,ffff371b4d645b6ecea244b27531430a,568,0,112.46
94701,ffff5962728ec6157033ef9805bacc48,119,0,133.69


In [30]:
# df_last_transformed_features, _, _ = add_transformed_RFM_features(df_last_features)
# df_last_transformed_features

Unnamed: 0,customer_unique_id,Récence,Fréquence,Montant,Récence_minmax,Fréquence_minmax,Montant_class,Montant_class_minmax
0,0000366f3b9a7992bf8c76cfdf3221e2,111,0,141.90,0.184692,0,3,0.375
1,0000b849f77a49e4a4ce2b2a4ca5be3f,114,0,27.19,0.189684,0,1,0.125
2,0000f46a3911fa3c0805444483337064,537,0,86.22,0.893511,0,2,0.250
3,0000f6ccb0745a6a4b88665a16c9f078,321,0,43.62,0.534110,0,1,0.125
4,0004aac84e0df4da2b147fca70cf8255,288,0,196.89,0.479201,0,4,0.500
...,...,...,...,...,...,...,...,...
94698,fffcf5a5ff07b0908bd4e2dbc735a684,447,0,2067.42,0.743760,0,8,1.000
94699,fffea47cd6d3cc0a88bd621562a9d061,262,0,84.58,0.435940,0,2,0.250
94700,ffff371b4d645b6ecea244b27531430a,568,0,112.46,0.945092,0,3,0.375
94701,ffff5962728ec6157033ef9805bacc48,119,0,133.69,0.198003,0,3,0.375


In [31]:
# X = df_last_transformed_features[['Récence_minmax', 'Fréquence_minmax', 'Montant_class_minmax']]

In [75]:
# real_labels = KMeans(n_clusters=6, random_state=0, init='k-means++').fit(X).labels_
# df_last_transformed_features['R_labels'] = real_labels

## 4.2 Calculs des scores ARI

In [166]:
simul_frequency = pd.to_timedelta('100 days')
simul_frequency

Timedelta('100 days 00:00:00')

In [167]:
d0_date = pd.to_datetime("2017-07-01")
d0_date

Timestamp('2017-07-01 00:00:00')

In [168]:
d1_date = d0_date + simul_frequency
d1_date

Timestamp('2017-10-09 00:00:00')

In [169]:
d0_RFM_features = get_RFM_features(df_features, d0_date)
d0_RFM_features

Unnamed: 0,customer_unique_id,Récence,Fréquence,Montant
0,0000f46a3911fa3c0805444483337064,537,0,86.22
1,0005e1862207bf6ccc02e4228effd9a0,543,0,150.12
2,00115fc7123b5310cf6d3a3aa932699e,585,0,76.11
3,0011805441c0d1b68b48002f1d005526,492,0,297.14
4,0011857aff0e5871ce5eb429f21cdaf5,427,0,192.83
...,...,...,...,...
13978,ffedff0547d809c90c05c2691c51f9b7,517,0,32.42
13979,ffef0ffa736c7b3d9af741611089729b,457,0,139.07
13980,fffcf5a5ff07b0908bd4e2dbc735a684,447,0,2067.42
13981,ffff371b4d645b6ecea244b27531430a,568,0,112.46


In [170]:
d0_RFM_features, d0_recence_trans, d0_montant_trans = add_transformed_RFM_features(d0_RFM_features)
d0_RFM_features

Unnamed: 0,customer_unique_id,Récence,Fréquence,Montant,Récence_minmax,Fréquence_minmax,Montant_class,Montant_class_minmax
0,0000f46a3911fa3c0805444483337064,537,0,86.22,0.636364,0,2,0.250
1,0005e1862207bf6ccc02e4228effd9a0,543,0,150.12,0.670455,0,4,0.500
2,00115fc7123b5310cf6d3a3aa932699e,585,0,76.11,0.909091,0,2,0.250
3,0011805441c0d1b68b48002f1d005526,492,0,297.14,0.380682,0,6,0.750
4,0011857aff0e5871ce5eb429f21cdaf5,427,0,192.83,0.011364,0,4,0.500
...,...,...,...,...,...,...,...,...
13978,ffedff0547d809c90c05c2691c51f9b7,517,0,32.42,0.522727,0,1,0.125
13979,ffef0ffa736c7b3d9af741611089729b,457,0,139.07,0.181818,0,3,0.375
13980,fffcf5a5ff07b0908bd4e2dbc735a684,447,0,2067.42,0.125000,0,8,1.000
13981,ffff371b4d645b6ecea244b27531430a,568,0,112.46,0.812500,0,3,0.375


In [171]:
d0_kmeans = KMeans(n_clusters=6, random_state=0, init='k-means++')
d0_kmeans.fit(d0_RFM_features[['Récence_minmax', 'Fréquence_minmax', 'Montant_class_minmax']])

In [172]:
d1_RFM_features = get_RFM_features(df_features, d1_date)
d1_RFM_features

Unnamed: 0,customer_unique_id,Récence,Fréquence,Montant
0,0000f46a3911fa3c0805444483337064,537,0,86.22
1,0005e1862207bf6ccc02e4228effd9a0,543,0,150.12
2,0006fdc98a402fceb4eb0ee528f6a8d4,407,0,29.00
3,000a5ad9c4601d2bbdd9ed765d5213b3,383,0,91.28
4,000bfa1d2f1a41876493be685390d6d3,334,0,46.85
...,...,...,...,...
27078,fff3a9369e4b7102fab406a334a678c3,383,0,102.74
27079,fff699c184bcc967d62fa2c6171765f7,362,0,55.00
27080,fffcf5a5ff07b0908bd4e2dbc735a684,447,0,2067.42
27081,ffff371b4d645b6ecea244b27531430a,568,0,112.46


In [173]:
d1_simul, _, _ = add_transformed_RFM_features(d1_RFM_features, d0_recence_trans, d0_montant_trans)
d1_simul

Unnamed: 0,customer_unique_id,Récence,Fréquence,Montant,Récence_minmax,Fréquence_minmax,Montant_class,Montant_class_minmax
0,0000f46a3911fa3c0805444483337064,537,0,86.22,0.768116,0,2,0.250
1,0005e1862207bf6ccc02e4228effd9a0,543,0,150.12,0.789855,0,4,0.500
2,0006fdc98a402fceb4eb0ee528f6a8d4,407,0,29.00,0.297101,0,1,0.125
3,000a5ad9c4601d2bbdd9ed765d5213b3,383,0,91.28,0.210145,0,2,0.250
4,000bfa1d2f1a41876493be685390d6d3,334,0,46.85,0.032609,0,1,0.125
...,...,...,...,...,...,...,...,...
27078,fff3a9369e4b7102fab406a334a678c3,383,0,102.74,0.210145,0,3,0.375
27079,fff699c184bcc967d62fa2c6171765f7,362,0,55.00,0.134058,0,2,0.250
27080,fffcf5a5ff07b0908bd4e2dbc735a684,447,0,2067.42,0.442029,0,8,1.000
27081,ffff371b4d645b6ecea244b27531430a,568,0,112.46,0.880435,0,3,0.375


In [174]:
d1_simul['Predict'] = d0_kmeans.fit(d1_simul[['Récence_minmax', 'Fréquence_minmax', 'Montant_class_minmax']]).labels_
d1_simul

Unnamed: 0,customer_unique_id,Récence,Fréquence,Montant,Récence_minmax,Fréquence_minmax,Montant_class,Montant_class_minmax,Predict
0,0000f46a3911fa3c0805444483337064,537,0,86.22,0.768116,0,2,0.250,1
1,0005e1862207bf6ccc02e4228effd9a0,543,0,150.12,0.789855,0,4,0.500,1
2,0006fdc98a402fceb4eb0ee528f6a8d4,407,0,29.00,0.297101,0,1,0.125,0
3,000a5ad9c4601d2bbdd9ed765d5213b3,383,0,91.28,0.210145,0,2,0.250,0
4,000bfa1d2f1a41876493be685390d6d3,334,0,46.85,0.032609,0,1,0.125,0
...,...,...,...,...,...,...,...,...,...
27078,fff3a9369e4b7102fab406a334a678c3,383,0,102.74,0.210145,0,3,0.375,2
27079,fff699c184bcc967d62fa2c6171765f7,362,0,55.00,0.134058,0,2,0.250,0
27080,fffcf5a5ff07b0908bd4e2dbc735a684,447,0,2067.42,0.442029,0,8,1.000,5
27081,ffff371b4d645b6ecea244b27531430a,568,0,112.46,0.880435,0,3,0.375,1


In [175]:
d1_real, _, _ = add_transformed_RFM_features(d1_RFM_features)
d1_real

Unnamed: 0,customer_unique_id,Récence,Fréquence,Montant,Récence_minmax,Fréquence_minmax,Montant_class,Montant_class_minmax,Predict
0,0000f46a3911fa3c0805444483337064,537,0,86.22,0.768116,0,2,0.250,1
1,0005e1862207bf6ccc02e4228effd9a0,543,0,150.12,0.789855,0,4,0.500,1
2,0006fdc98a402fceb4eb0ee528f6a8d4,407,0,29.00,0.297101,0,1,0.125,0
3,000a5ad9c4601d2bbdd9ed765d5213b3,383,0,91.28,0.210145,0,2,0.250,0
4,000bfa1d2f1a41876493be685390d6d3,334,0,46.85,0.032609,0,1,0.125,0
...,...,...,...,...,...,...,...,...,...
27078,fff3a9369e4b7102fab406a334a678c3,383,0,102.74,0.210145,0,3,0.375,2
27079,fff699c184bcc967d62fa2c6171765f7,362,0,55.00,0.134058,0,2,0.250,0
27080,fffcf5a5ff07b0908bd4e2dbc735a684,447,0,2067.42,0.442029,0,8,1.000,5
27081,ffff371b4d645b6ecea244b27531430a,568,0,112.46,0.880435,0,3,0.375,1


In [176]:
d1_kmeans = KMeans(n_clusters=6, random_state=0, init='k-means++')
d1_kmeans.fit(d1_RFM_features[['Récence_minmax', 'Fréquence_minmax', 'Montant_class_minmax']])

In [177]:
d1_real['Labels'] = d1_kmeans.labels_
d1_real

Unnamed: 0,customer_unique_id,Récence,Fréquence,Montant,Récence_minmax,Fréquence_minmax,Montant_class,Montant_class_minmax,Predict,Labels
0,0000f46a3911fa3c0805444483337064,537,0,86.22,0.768116,0,2,0.250,1,1
1,0005e1862207bf6ccc02e4228effd9a0,543,0,150.12,0.789855,0,4,0.500,1,1
2,0006fdc98a402fceb4eb0ee528f6a8d4,407,0,29.00,0.297101,0,1,0.125,0,0
3,000a5ad9c4601d2bbdd9ed765d5213b3,383,0,91.28,0.210145,0,2,0.250,0,0
4,000bfa1d2f1a41876493be685390d6d3,334,0,46.85,0.032609,0,1,0.125,0,0
...,...,...,...,...,...,...,...,...,...,...
27078,fff3a9369e4b7102fab406a334a678c3,383,0,102.74,0.210145,0,3,0.375,2,2
27079,fff699c184bcc967d62fa2c6171765f7,362,0,55.00,0.134058,0,2,0.250,0,0
27080,fffcf5a5ff07b0908bd4e2dbc735a684,447,0,2067.42,0.442029,0,8,1.000,5,5
27081,ffff371b4d645b6ecea244b27531430a,568,0,112.46,0.880435,0,3,0.375,1,1


In [178]:
ari_score = adjusted_rand_score(d1_real['Labels'], d1_simul['Predict'])
ari_score

1.0

In [70]:
ari_threshold = 0.6
delay_in_days = simul_frequency = 7
ari_results = []
ari_score = 1

In [71]:
while ari_score > ari_threshold:
    # Calcul de la date de simulation
    simul_date = get_date_x_days_before(last_date, delay_in_days)

    # Récupération des features RFM jusqu'à la date de simulation
    df_simul_RFM_features = get_RFM_features(df_features, simul_date)

    # Ajout des features transformées et récupération des transformers
    df_simul_RFM_features, recence_scaler, montant_scaler = add_transformed_RFM_features(df_simul_RFM_features)

    # Fit KMeans sur données de simulation
    X = df_simul_RFM_features[['Récence_minmax', 'Fréquence_minmax', 'Montant_class_minmax']]
    simul_kmeans = KMeans(n_clusters=6, random_state=0).fit(X)

    # Préparation dataset pour prédictions (utilisation des transformers récupérés)
    df_predict_RFM_features, _, _ = add_transformed_RFM_features(df_last_features, recence_scaler, montant_scaler)

    # Récupération des labels prédits
    X = df_predict_RFM_features[['Récence_minmax', 'Fréquence_minmax', 'Montant_class_minmax']]
    predict_labels = simul_kmeans.predict(X)
    df_predict_RFM_features['P_labels'] = predict_labels

    display(df_last_transformed_features)
    display(df_predict_RFM_features)
    # Calcul du score ARI
    ari_score = adjusted_rand_score(real_labels, predict_labels)

    # Mise à jour de la liste des résultats
    ari_results.append((delay_in_days, ari_score))
    print(f"Simulation à J-{delay_in_days} ({simul_date}): score ARI = {ari_score}")

    # Mise à jour du délai de simulation
    delay_in_days += simul_frequency   

Unnamed: 0,customer_unique_id,Récence,Fréquence,Montant,Récence_minmax,Fréquence_minmax,Montant_class,Montant_class_minmax,R_labels,P_labels
0,0000366f3b9a7992bf8c76cfdf3221e2,111,0,141.90,0.184692,0,3,0.375,4,1
1,0000b849f77a49e4a4ce2b2a4ca5be3f,114,0,27.19,0.189684,0,1,0.125,4,1
2,0000f46a3911fa3c0805444483337064,537,0,86.22,0.893511,0,2,0.250,2,4
3,0000f6ccb0745a6a4b88665a16c9f078,321,0,43.62,0.534110,0,1,0.125,0,0
4,0004aac84e0df4da2b147fca70cf8255,288,0,196.89,0.479201,0,4,0.500,0,2
...,...,...,...,...,...,...,...,...,...,...
94698,fffcf5a5ff07b0908bd4e2dbc735a684,447,0,2067.42,0.743760,0,8,1.000,5,2
94699,fffea47cd6d3cc0a88bd621562a9d061,262,0,84.58,0.435940,0,2,0.250,0,0
94700,ffff371b4d645b6ecea244b27531430a,568,0,112.46,0.945092,0,3,0.375,2,4
94701,ffff5962728ec6157033ef9805bacc48,119,0,133.69,0.198003,0,3,0.375,4,1


Unnamed: 0,customer_unique_id,Récence,Fréquence,Montant,Récence_minmax,Fréquence_minmax,Montant_class,Montant_class_minmax,R_labels,P_labels
0,0000366f3b9a7992bf8c76cfdf3221e2,111,0,141.90,0.184692,0,3,0.375,4,1
1,0000b849f77a49e4a4ce2b2a4ca5be3f,114,0,27.19,0.189684,0,1,0.125,4,1
2,0000f46a3911fa3c0805444483337064,537,0,86.22,0.893511,0,2,0.250,2,4
3,0000f6ccb0745a6a4b88665a16c9f078,321,0,43.62,0.534110,0,1,0.125,0,0
4,0004aac84e0df4da2b147fca70cf8255,288,0,196.89,0.479201,0,4,0.500,0,2
...,...,...,...,...,...,...,...,...,...,...
94698,fffcf5a5ff07b0908bd4e2dbc735a684,447,0,2067.42,0.743760,0,8,1.000,5,2
94699,fffea47cd6d3cc0a88bd621562a9d061,262,0,84.58,0.435940,0,2,0.250,0,0
94700,ffff371b4d645b6ecea244b27531430a,568,0,112.46,0.945092,0,3,0.375,2,4
94701,ffff5962728ec6157033ef9805bacc48,119,0,133.69,0.198003,0,3,0.375,4,1


Simulation à J-7 (2018-08-24): score ARI = 0.6754726174692263


Unnamed: 0,customer_unique_id,Récence,Fréquence,Montant,Récence_minmax,Fréquence_minmax,Montant_class,Montant_class_minmax,R_labels,P_labels
0,0000366f3b9a7992bf8c76cfdf3221e2,111,0,141.90,0.184692,0,3,0.375,4,2
1,0000b849f77a49e4a4ce2b2a4ca5be3f,114,0,27.19,0.189684,0,1,0.125,4,2
2,0000f46a3911fa3c0805444483337064,537,0,86.22,0.893511,0,2,0.250,2,5
3,0000f6ccb0745a6a4b88665a16c9f078,321,0,43.62,0.534110,0,1,0.125,0,0
4,0004aac84e0df4da2b147fca70cf8255,288,0,196.89,0.479201,0,4,0.500,0,0
...,...,...,...,...,...,...,...,...,...,...
94698,fffcf5a5ff07b0908bd4e2dbc735a684,447,0,2067.42,0.743760,0,8,1.000,5,4
94699,fffea47cd6d3cc0a88bd621562a9d061,262,0,84.58,0.435940,0,2,0.250,0,0
94700,ffff371b4d645b6ecea244b27531430a,568,0,112.46,0.945092,0,3,0.375,2,5
94701,ffff5962728ec6157033ef9805bacc48,119,0,133.69,0.198003,0,3,0.375,4,2


Unnamed: 0,customer_unique_id,Récence,Fréquence,Montant,Récence_minmax,Fréquence_minmax,Montant_class,Montant_class_minmax,R_labels,P_labels
0,0000366f3b9a7992bf8c76cfdf3221e2,111,0,141.90,0.184692,0,3,0.375,4,2
1,0000b849f77a49e4a4ce2b2a4ca5be3f,114,0,27.19,0.189684,0,1,0.125,4,2
2,0000f46a3911fa3c0805444483337064,537,0,86.22,0.893511,0,2,0.250,2,5
3,0000f6ccb0745a6a4b88665a16c9f078,321,0,43.62,0.534110,0,1,0.125,0,0
4,0004aac84e0df4da2b147fca70cf8255,288,0,196.89,0.479201,0,4,0.500,0,0
...,...,...,...,...,...,...,...,...,...,...
94698,fffcf5a5ff07b0908bd4e2dbc735a684,447,0,2067.42,0.743760,0,8,1.000,5,4
94699,fffea47cd6d3cc0a88bd621562a9d061,262,0,84.58,0.435940,0,2,0.250,0,0
94700,ffff371b4d645b6ecea244b27531430a,568,0,112.46,0.945092,0,3,0.375,2,5
94701,ffff5962728ec6157033ef9805bacc48,119,0,133.69,0.198003,0,3,0.375,4,2


Simulation à J-14 (2018-08-17): score ARI = 0.9731500676322788


Unnamed: 0,customer_unique_id,Récence,Fréquence,Montant,Récence_minmax,Fréquence_minmax,Montant_class,Montant_class_minmax,R_labels,P_labels
0,0000366f3b9a7992bf8c76cfdf3221e2,111,0,141.90,0.184692,0,3,0.375,4,1
1,0000b849f77a49e4a4ce2b2a4ca5be3f,114,0,27.19,0.189684,0,1,0.125,4,1
2,0000f46a3911fa3c0805444483337064,537,0,86.22,0.893511,0,2,0.250,2,0
3,0000f6ccb0745a6a4b88665a16c9f078,321,0,43.62,0.534110,0,1,0.125,0,5
4,0004aac84e0df4da2b147fca70cf8255,288,0,196.89,0.479201,0,4,0.500,0,5
...,...,...,...,...,...,...,...,...,...,...
94698,fffcf5a5ff07b0908bd4e2dbc735a684,447,0,2067.42,0.743760,0,8,1.000,5,4
94699,fffea47cd6d3cc0a88bd621562a9d061,262,0,84.58,0.435940,0,2,0.250,0,5
94700,ffff371b4d645b6ecea244b27531430a,568,0,112.46,0.945092,0,3,0.375,2,0
94701,ffff5962728ec6157033ef9805bacc48,119,0,133.69,0.198003,0,3,0.375,4,1


Unnamed: 0,customer_unique_id,Récence,Fréquence,Montant,Récence_minmax,Fréquence_minmax,Montant_class,Montant_class_minmax,R_labels,P_labels
0,0000366f3b9a7992bf8c76cfdf3221e2,111,0,141.90,0.184692,0,3,0.375,4,1
1,0000b849f77a49e4a4ce2b2a4ca5be3f,114,0,27.19,0.189684,0,1,0.125,4,1
2,0000f46a3911fa3c0805444483337064,537,0,86.22,0.893511,0,2,0.250,2,0
3,0000f6ccb0745a6a4b88665a16c9f078,321,0,43.62,0.534110,0,1,0.125,0,5
4,0004aac84e0df4da2b147fca70cf8255,288,0,196.89,0.479201,0,4,0.500,0,5
...,...,...,...,...,...,...,...,...,...,...
94698,fffcf5a5ff07b0908bd4e2dbc735a684,447,0,2067.42,0.743760,0,8,1.000,5,4
94699,fffea47cd6d3cc0a88bd621562a9d061,262,0,84.58,0.435940,0,2,0.250,0,5
94700,ffff371b4d645b6ecea244b27531430a,568,0,112.46,0.945092,0,3,0.375,2,0
94701,ffff5962728ec6157033ef9805bacc48,119,0,133.69,0.198003,0,3,0.375,4,1


Simulation à J-21 (2018-08-10): score ARI = 0.954620943343971


Unnamed: 0,customer_unique_id,Récence,Fréquence,Montant,Récence_minmax,Fréquence_minmax,Montant_class,Montant_class_minmax,R_labels,P_labels
0,0000366f3b9a7992bf8c76cfdf3221e2,111,0,141.90,0.184692,0,3,0.375,4,1
1,0000b849f77a49e4a4ce2b2a4ca5be3f,114,0,27.19,0.189684,0,1,0.125,4,1
2,0000f46a3911fa3c0805444483337064,537,0,86.22,0.893511,0,2,0.250,2,0
3,0000f6ccb0745a6a4b88665a16c9f078,321,0,43.62,0.534110,0,1,0.125,0,4
4,0004aac84e0df4da2b147fca70cf8255,288,0,196.89,0.479201,0,4,0.500,0,4
...,...,...,...,...,...,...,...,...,...,...
94698,fffcf5a5ff07b0908bd4e2dbc735a684,447,0,2067.42,0.743760,0,8,1.000,5,3
94699,fffea47cd6d3cc0a88bd621562a9d061,262,0,84.58,0.435940,0,2,0.250,0,4
94700,ffff371b4d645b6ecea244b27531430a,568,0,112.46,0.945092,0,3,0.375,2,0
94701,ffff5962728ec6157033ef9805bacc48,119,0,133.69,0.198003,0,3,0.375,4,1


Unnamed: 0,customer_unique_id,Récence,Fréquence,Montant,Récence_minmax,Fréquence_minmax,Montant_class,Montant_class_minmax,R_labels,P_labels
0,0000366f3b9a7992bf8c76cfdf3221e2,111,0,141.90,0.184692,0,3,0.375,4,1
1,0000b849f77a49e4a4ce2b2a4ca5be3f,114,0,27.19,0.189684,0,1,0.125,4,1
2,0000f46a3911fa3c0805444483337064,537,0,86.22,0.893511,0,2,0.250,2,0
3,0000f6ccb0745a6a4b88665a16c9f078,321,0,43.62,0.534110,0,1,0.125,0,4
4,0004aac84e0df4da2b147fca70cf8255,288,0,196.89,0.479201,0,4,0.500,0,4
...,...,...,...,...,...,...,...,...,...,...
94698,fffcf5a5ff07b0908bd4e2dbc735a684,447,0,2067.42,0.743760,0,8,1.000,5,3
94699,fffea47cd6d3cc0a88bd621562a9d061,262,0,84.58,0.435940,0,2,0.250,0,4
94700,ffff371b4d645b6ecea244b27531430a,568,0,112.46,0.945092,0,3,0.375,2,0
94701,ffff5962728ec6157033ef9805bacc48,119,0,133.69,0.198003,0,3,0.375,4,1


Simulation à J-28 (2018-08-03): score ARI = 0.7415962967067257


Unnamed: 0,customer_unique_id,Récence,Fréquence,Montant,Récence_minmax,Fréquence_minmax,Montant_class,Montant_class_minmax,R_labels,P_labels
0,0000366f3b9a7992bf8c76cfdf3221e2,111,0,141.90,0.184692,0,3,0.375,4,5
1,0000b849f77a49e4a4ce2b2a4ca5be3f,114,0,27.19,0.189684,0,1,0.125,4,0
2,0000f46a3911fa3c0805444483337064,537,0,86.22,0.893511,0,2,0.250,2,3
3,0000f6ccb0745a6a4b88665a16c9f078,321,0,43.62,0.534110,0,1,0.125,0,3
4,0004aac84e0df4da2b147fca70cf8255,288,0,196.89,0.479201,0,4,0.500,0,5
...,...,...,...,...,...,...,...,...,...,...
94698,fffcf5a5ff07b0908bd4e2dbc735a684,447,0,2067.42,0.743760,0,8,1.000,5,4
94699,fffea47cd6d3cc0a88bd621562a9d061,262,0,84.58,0.435940,0,2,0.250,0,0
94700,ffff371b4d645b6ecea244b27531430a,568,0,112.46,0.945092,0,3,0.375,2,3
94701,ffff5962728ec6157033ef9805bacc48,119,0,133.69,0.198003,0,3,0.375,4,5


Unnamed: 0,customer_unique_id,Récence,Fréquence,Montant,Récence_minmax,Fréquence_minmax,Montant_class,Montant_class_minmax,R_labels,P_labels
0,0000366f3b9a7992bf8c76cfdf3221e2,111,0,141.90,0.184692,0,3,0.375,4,5
1,0000b849f77a49e4a4ce2b2a4ca5be3f,114,0,27.19,0.189684,0,1,0.125,4,0
2,0000f46a3911fa3c0805444483337064,537,0,86.22,0.893511,0,2,0.250,2,3
3,0000f6ccb0745a6a4b88665a16c9f078,321,0,43.62,0.534110,0,1,0.125,0,3
4,0004aac84e0df4da2b147fca70cf8255,288,0,196.89,0.479201,0,4,0.500,0,5
...,...,...,...,...,...,...,...,...,...,...
94698,fffcf5a5ff07b0908bd4e2dbc735a684,447,0,2067.42,0.743760,0,8,1.000,5,4
94699,fffea47cd6d3cc0a88bd621562a9d061,262,0,84.58,0.435940,0,2,0.250,0,0
94700,ffff371b4d645b6ecea244b27531430a,568,0,112.46,0.945092,0,3,0.375,2,3
94701,ffff5962728ec6157033ef9805bacc48,119,0,133.69,0.198003,0,3,0.375,4,5


Simulation à J-35 (2018-07-27): score ARI = 0.41255537638896667
