# Labelisation semi-automatique

Après de nombreuses tentatives d'apprentissage non supervisé, que ce soit par le biais de techniques de clustering ou d'autres méthodes similaires, 🤔 les résultats n'étaient pas satisfaisants. Face à cette situation, j'ai pris la décision de passer à une approche de labélisation manuelle pour la première vidéo. 💡

L'idée était de créer un ensemble de données initial avec des annotations faites à la main ✍️, puis d'utiliser ces annotations comme référence pour effectuer du clustering sur les vidéos suivantes.

Ce processus de labélisation manuelle a permis d'établir une base solide pour la segmentation des données. 👍 J'ai pu contrôler et corriger les annotations, assurant ainsi leur précision. En utilisant ces données labélisées et corrigées comme ensemble d'entraînement, 🚀 j'ai ensuite ré-entraîné mon modèle de clustering. Cette approche itérative a permis d'incorporer progressivement de nouvelles données corrigées dans le modèle, le rendant de plus en plus précis au fil du temps.

L'avantage de cette méthodologie réside dans le fait qu'elle combine l'expertise humaine pour l'annotation initiale avec la capacité du modèle de clustering à généraliser à partir de ces annotations. 💪 Cela a permis de créer un modèle de labélisation de plus en plus précis au fur et à mesure de l'ajout de données corrigées, améliorant ainsi la qualité et la fiabilité de l'ensemble de données pour les tâches ultérieures d'apprentissage automatique ou d'analyse de données. 📈📊

### Importation des bibliotéque

In [1]:
import math
import cv2
import numpy as np
import pandas as pd
from time import time
import mediapipe as mp
import matplotlib.pyplot as plt
from IPython.display import HTML

### Initialisation

In [2]:
# Initializing mediapipe pose class.
mp_pose = mp.solutions.pose

# Setting up the Pose function.
pose = mp_pose.Pose(static_image_mode=True, min_detection_confidence=0.3, model_complexity=2)

# Initializing mediapipe drawing class, useful for annotation.
mp_drawing = mp.solutions.drawing_utils

### Ouverture du fichier labéliser

In [5]:
# Lecture du fichier Excel avec pandas
df = pd.read_csv("ALONG_3_label.csv",delimiter=';')

# Affichage des données
df.tail()

Unnamed: 0,Angles_Elbow_Right,Angles_Elbow_Left,Angles_Shoulder_Right,Angles_Shoulder_Left,Angles_Hip_Right,Angles_Hip_Left,Angles_Wrist_Right,Angles_Wrist_Left,Angles_Shoulder_lateral_Left,Angles_Shoulder_lateral_right,...,Angles_Wrist_Left_yz,Angles_Shoulder_lateral_Left_yz,Angles_Shoulder_lateral_right_yz,Angles_Shoulder_lateral2_Left_yz,Angles_Shoulder_lateral2_right_yz,Angles_Hand_Left_yz,Angles_Hand_right_yz,Angles_Hand_lateral_Left_yz,Angles_Hand_lateral_right_yz,y
1365,63.786452,303.293563,23.747152,337.871317,344.413201,14.189892,296.565051,330.255119,317.919947,44.077461,...,330.255119,333.823017,316.505498,39.899371,211.715503,41.471239,28.315211,221.723975,203.260546,Start_position
1366,56.667632,295.48227,23.309335,338.264166,344.399763,14.307447,304.380345,315.0,318.226123,43.666024,...,315.0,341.491824,319.655422,50.186363,217.358843,46.558127,28.024087,228.033782,202.64159,Start_position
1367,46.735705,284.172338,22.621607,337.966237,344.613052,14.642103,315.0,296.565051,318.128189,42.881052,...,296.565051,344.238862,325.648787,53.902537,223.989976,49.42051,28.171413,231.085093,202.391575,Start_position
1368,46.735705,283.626995,22.621607,338.276468,344.613052,14.406991,315.0,333.434949,318.486704,42.881052,...,333.434949,344.488864,324.574753,51.607863,220.683015,47.20464,27.774596,228.013947,202.261942,Start_position
1369,56.667632,238.897176,23.309335,337.972721,344.399763,14.229592,304.380345,18.434949,318.117507,43.56878,...,18.434949,344.415231,323.664808,53.32629,222.227076,46.611595,27.941498,227.604058,202.592614,Start_position


In [4]:
# Lecture du fichier Excel avec pandas
ajout = pd.read_excel("William_Label_brut.xlsx")

# Affichage des données
ajout.head()

Unnamed: 0,Angles_Elbow_Right,Angles_Elbow_Left,Angles_Shoulder_Right,Angles_Shoulder_Left,Angles_Hip_Right,Angles_Hip_Left,Angles_Wrist_Right,Angles_Wrist_Left,Angles_Shoulder_lateral_Left,Angles_Shoulder_lateral_right,Angles_Shoulder_lateral2_Left,Angles_Shoulder_lateral2_right,Angles_Hand_Left,Angles_Hand_right,Angles_Hand_lateral_Left,Angles_Hand_lateral_right,Unnamed: 16,Unnamed: 17
0,172.133232,183.938451,17.111696,340.915051,339.267594,21.056449,185.351818,169.056434,318.507147,41.469608,260.817211,101.835099,327.658509,40.331352,158.602075,214.979533,p,1.0
1,170.578138,183.767124,16.25623,342.258208,341.747236,18.686244,189.974164,169.121395,320.394008,39.805049,261.469234,101.712472,327.658509,42.730319,158.537114,212.756155,p,2.0
2,169.633677,183.700346,16.948267,343.499624,342.547656,17.270919,187.765166,168.570215,322.884514,39.03028,261.469234,100.366323,323.514936,42.273689,154.944721,214.508523,p,3.0
3,163.526006,178.88399,18.828161,341.045054,344.377575,15.641052,191.376594,180.161169,320.079608,41.058379,258.690068,102.528808,315.0,42.969086,134.838831,211.592492,p,4.0
4,130.255556,171.221843,19.367484,326.100667,345.710484,16.32642,213.006651,197.02632,304.610395,41.225669,243.830546,101.771821,294.633188,75.296448,97.606868,222.289797,p,5.0


In [5]:
ajout.drop(columns='Unnamed: 17',inplace=True)

In [10]:
# Lecture du fichier Excel avec pandas
ajout = pd.read_csv("William_3_label.csv",delimiter=';')
# Affichage des données
ajout.head()

Unnamed: 0,Angles_Elbow_Right,Angles_Elbow_Left,Angles_Shoulder_Right,Angles_Shoulder_Left,Angles_Hip_Right,Angles_Hip_Left,Angles_Wrist_Right,Angles_Wrist_Left,Angles_Shoulder_lateral_Left,Angles_Shoulder_lateral_right,...,Angles_Wrist_Left_yz,Angles_Shoulder_lateral_Left_yz,Angles_Shoulder_lateral_right_yz,Angles_Shoulder_lateral2_Left_yz,Angles_Shoulder_lateral2_right_yz,Angles_Hand_Left_yz,Angles_Hand_right_yz,Angles_Hand_lateral_Left_yz,Angles_Hand_lateral_right_yz,y
0,170.23165,180.393685,18.073954,347.019086,341.517145,14.084278,182.072298,168.774664,325.876542,41.43362,...,168.774664,15.427591,333.558171,102.267409,250.929051,353.272515,348.163188,178.180314,166.199406,Gum_Sau
1,170.774613,185.755107,16.333621,343.284605,343.757714,16.660607,185.996219,170.923632,322.185043,38.47622,...,170.923632,16.964148,336.5209,114.176952,266.203347,347.901199,351.229979,170.298454,170.71379,Gum_Sau_L+Gum_Sau_R
2,170.106079,183.743734,13.884834,344.665857,346.56105,15.861859,194.036243,174.47836,324.6525,35.634629,...,174.47836,28.164095,334.079126,121.61245,265.005383,353.345994,349.204588,175.916925,167.987304,Gum_Sau_L+Gum_Sau_R
3,167.05349,185.201396,15.990376,344.861068,344.857979,15.146799,193.404866,168.527527,324.35287,37.466707,...,168.527527,22.282089,327.001253,116.973918,260.426671,351.288055,346.747194,170.682022,160.887226,Gwat_Sau_L
4,164.357754,180.428639,15.347643,340.756696,348.172003,15.385819,196.220972,176.56637,319.960673,37.050641,...,176.56637,17.901458,296.643743,119.314751,232.520244,352.00296,348.420998,144.480971,164.758839,Start_position


In [11]:
print(df.shape)

# Concaténer les deux DataFrames en utilisant la méthode concat de pandas
df = pd.concat([df, ajout], ignore_index=True)

# La valeur ignore_index=True permet de réinitialiser les index dans le DataFrame fusionné

# Afficher le DataFrame fusionné
df.shape

(3610, 49)


(4510, 49)

In [5]:
df.drop(columns='Unnamed: 16',inplace=True)

In [15]:
df['y'].unique()

array(['Start_position', 'Double_Tan_Sau', 'Double_Gan_Sau',
       'Double_Kwan_Sau', 'Make_Fist_L', 'Punch_L', 'Tan_Sau_L',
       'Huen_Sau_L', 'Make_Fist_R', 'Punch_R', 'Tan_Sau_R', 'Huen_Sau_R',
       'Wu_Sau_L', 'Fook_Sau_L', 'Pak_Sau_L', 'Palm_Strike_L', 'Wu_Sau_R',
       'Fook_Sau_R', 'Pak_Sau_R', 'Palm_Strike_R', 'Gum_Sau',
       'Double_Rear_Palm', 'Double_Forward_Palm_prep',
       'Double_Forward_Palm', 'Double_Lan_Sau_L', 'Double_Fak_Sau',
       'Double_Lan_Sau_R', 'Double_Jam_Sau', 'Double_Tok_Sau_0',
       'Double_Tok_Sau_1', 'Double_Jut_Sau_0', 'Double_Jut_Sau_1',
       'Double_Biu_Sau', 'Double_Gum_Sau', 'Double_Tai_Sau',
       'Double_Huen_Sau', 'Side_Palm_L', 'Side_Palm_R', 'Jam_Sau_L',
       'Gwat_Sau_L_m', 'Gwat_Sau_L', 'Lau_Sao_L', 'Low_Side_Palm_L',
       'Jam_Sau_R', 'Gwat_Sau_R_m', 'Gwat_Sau_R', 'Lau_Sao_R',
       'Low_Side_Palm_R', 'Bong_Sau_L', 'Down_Low_Palm_L',
       'Hight_Low_Palm_L', 'Bong_Sau_R', 'Down_Low_Palm_R',
       'Hight_Low_Palm_R', 

In [14]:
# Suppression des lignes avec les valeurs spécifiées dans la colonne 'y'
values_to_remove = ['Beginning', 'Bong_Sau_L/Tan_Sau_L', 'Bong_Sau_R/Tan_Sau_R', 
                              'Double_Biu_Sau/Double_Gum_Sau', 'Double_Fak_Sau/Double_Lan_Sau_R',
                              'Double_Forward_Palm/Double_Lan_Sau_L', 'Double_Gan_Sau/Double_Kwan_Sau',
                              'Double_Gum_Sau/Double_Tai_Sau', 'Double_Huen_Sau/Start_position', 
                              'Double_Jam_Sau/Double_Tok_Sau_0', 'Double_Jam_Sau/Double_Tok_Sau_1',
                              'Double_Jut_Sau_1/Double_Biu_Sau', 'Double_Kwan_Sau/Start_position',
                              'Double_Lan_Sau_L/Double_Fak_Sau', 'Double_Lan_Sau_R/Double_Jam_Sau',
                              'Double_Rear_Palm/Double_Forward_Palm_prep', 'Double_Tan_Sau/Double_Gan_Sau',
                              'Down_Low_Palm_L/Hight_Low_Palm_L', 'Down_Low_Palm_R/Hight_Low_Palm_R',
                              'Fook_Sau_L/Huen_Sau_L', 'Fook_Sau_R/Huen_Sau_R', 'Gan_Sau_L+Tan_Sau_R',
                              'Gum_Sau/Double_Rear_Palm', 'Gwat_Sau_L/Lau_Sao_L', 'Gwat_Sau_R/Lau_Sao_R',
                              'Huen_Sau_L/Start_position', 'Huen_Sau_L/Wu_Sau_L', 'Huen_Sau_R/Start_position',
                              'Huen_Sau_R/Wu_Sau_R', 'Lau_Sao_L/Low_Side_Palm_L', 'Lau_Sao_R/Low_Side_Palm_R',
                              'Make_Fist_L/Punch_L', 'Make_Fist_R/Punch_R', 'Pak_Sau_R/Wu_Sau_R',
                              'Palm_Strike_R/Tan_Sau_R', 'Punch_L/Punch_R', 'Punch_L/Tan_Sau_L',
                              'Punch_R/Punch_L', 'Punch_R/Tan_Sau_R', 'Start_position/Bong_Sau_L',
                              'Start_position/Bong_Sau_R', 'Start_position/Double_Tan_Sau', 
                              'Start_position/Gan_Sau_L', 'Start_position/Gum_Sau', 'Start_position/Make_Fist_L',
                              'Start_position/Make_Fist_R', 'Start_position/Pak_Sau_L', 'Start_position/Pak_Sau_R',
                              'Start_position/Tan_Sau_L', 'Start_position/Tan_Sau_R', 'Tan_Sau_L/Huen_Sau_L',
                              'Tan_Sau_L/Jam_Sau_L', 'Tan_Sau_R/Huen_Sau_R','Tan_Sau_L/Wu_Sau_L', 'Tan_Sau_R/Jam_Sau_R', 
                              'Tut_Sau/Punch_L', 'Wu_Sau_L/Fook_Sau_L', 'Wu_Sau_L/Palm_Strike_L',
                              'Wu_Sau_L/Side_Palm_L', 'Wu_Sau_R/Fook_Sau_R', 'Wu_Sau_R/Palm_Strike_R', 
                              'Wu_Sau_R/Side_Palm_R','Fook_Sau_L/Wu_Sau_L',
                               'Pak_Sau_L/Palm_Strike_L', 'Punch_L/Start_position',
                               'Fook_Sau_R/Wu_Sau_R', 'Wu_Sau_R/Pak_Sau_R',
                               'Start_position/Gum_Sau_L','Punch_R/Start_position', 'Fook_Sau_L/Tan_Sau_L','p']
df = df[~df['y'].isin(values_to_remove)]

# Affichage du nouveau dataset
df

Unnamed: 0,Angles_Elbow_Right,Angles_Elbow_Left,Angles_Shoulder_Right,Angles_Shoulder_Left,Angles_Hip_Right,Angles_Hip_Left,Angles_Wrist_Right,Angles_Wrist_Left,Angles_Shoulder_lateral_Left,Angles_Shoulder_lateral_right,...,Angles_Wrist_Left_yz,Angles_Shoulder_lateral_Left_yz,Angles_Shoulder_lateral_right_yz,Angles_Shoulder_lateral2_Left_yz,Angles_Shoulder_lateral2_right_yz,Angles_Hand_Left_yz,Angles_Hand_right_yz,Angles_Hand_lateral_Left_yz,Angles_Hand_lateral_right_yz,y
62,280.544759,44.812672,24.479205,338.624318,336.783444,22.445813,180.000000,180.842524,320.208409,44.215271,...,180.842524,280.896513,290.555143,231.857988,51.244959,53.871957,46.403103,235.304852,227.112401,Start_position
63,347.945740,25.024895,25.231930,338.074500,337.245238,19.044790,20.854458,225.000000,317.965858,45.966482,...,225.000000,299.721610,314.364418,229.407339,53.603192,50.307283,41.095997,231.427437,217.628955,Start_position
64,355.601295,24.817131,25.831025,337.239648,337.476868,18.538332,49.398705,243.434949,316.707429,46.606003,...,243.434949,301.184484,316.701289,226.772764,52.184541,53.120161,44.826087,232.443740,222.639864,Start_position
65,351.358559,31.649768,25.261576,338.973627,338.366838,16.831344,170.537678,241.313852,318.732757,45.818758,...,241.313852,308.721253,318.891243,227.631905,52.663133,53.936127,50.342723,233.395831,231.168302,Start_position
66,13.213496,19.849996,26.679543,339.476666,335.005518,17.972869,147.264774,222.594029,318.838041,47.759176,...,222.594029,312.030182,319.935025,230.533776,53.311055,59.151704,57.510781,240.275743,238.774381,Start_position
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4505,28.703255,352.950822,24.157019,334.475121,338.893115,20.556045,188.130102,166.759480,312.850747,45.774515,...,166.759480,333.115129,328.074249,242.466008,42.905843,79.087171,71.799675,262.621637,250.852017,Start_position
4506,26.565051,349.000611,23.780394,334.590615,338.517571,21.234372,213.690068,173.198685,313.297426,45.000000,...,173.198685,335.189793,329.830907,244.697491,44.247348,76.994146,72.991416,265.043223,250.128187,Start_position
4507,19.313630,337.825738,34.892543,330.039076,329.390937,23.955995,206.903575,166.917009,309.063718,55.601111,...,166.917009,331.655793,326.273335,252.592256,45.555088,64.038592,73.584872,249.659328,253.520788,Palm Strike
4508,38.962510,325.058943,33.667807,334.724645,333.714508,21.405323,214.883557,172.532923,314.244287,53.947876,...,172.532923,333.706190,335.386363,253.567496,47.951445,89.600002,81.700513,273.208835,257.223675,Start_position


In [16]:
#Sauvegarde du DataFrame en CSV
df.to_csv('df_3d_label.csv', index=False)

In [8]:
from sklearn.model_selection import train_test_split

X = df.drop(columns='y')
y = df[['y']]

# Diviser les données en ensembles d'entraînement et de test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [9]:
y

Unnamed: 0,y
0,Start_position
1,Start_position
2,Start_position
3,Start_position
4,Start_position
...,...
4365,Double_Huan_Sau
4366,Start_position
4367,Start_position
4368,Start_position


In [14]:
import pandas as pd
import numpy as np
from sklearn import set_config
set_config(display='diagram')

from sklearn.preprocessing import StandardScaler, RobustScaler, MinMaxScaler, OneHotEncoder
from sklearn.impute import KNNImputer, SimpleImputer
from sklearn.compose import make_column_transformer, make_column_selector
from sklearn.pipeline import make_pipeline
from sklearn.model_selection import cross_validate, cross_val_score, GridSearchCV, RandomizedSearchCV
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier, GradientBoostingClassifier, StackingClassifier, VotingClassifier
from sklearn.neighbors import KNeighborsClassifier

def local_cv(X_train, y_train, model):
    cv = cross_validate(
        model,
        X_train,
        y_train,
        scoring=[
            'accuracy',
            'precision_macro',
            'recall_macro',
            'f1_macro'
        ],
        cv=10,
        n_jobs=-1
    )
    
    accuracy = cv.get('test_accuracy').mean().round(3)
    precision = cv.get('test_precision_macro').mean().round(3)
    recall = cv.get('test_recall_macro').mean().round(3)
    f1 = cv.get('test_f1_macro').mean().round(3)
    
    return accuracy, precision, recall, f1

models = [
    LogisticRegression(max_iter=1000),
    DecisionTreeClassifier(),
    RandomForestClassifier(),
    KNeighborsClassifier(),
    AdaBoostClassifier(),
    GradientBoostingClassifier(),
]

for model in models:
    accuracy, precision, recall, f1 = local_cv(X_train, y_train, model)
    results.append({
        'Model': model.__class__.__name__,
        'Accuracy': accuracy,
        'Precision': precision,
        'Recall': recall,
        'F1': f1
    })

results_df = pd.DataFrame(results)
print(results_df)


TypeError: '<' not supported between instances of 'float' and 'str'

In [None]:
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestClassifier

# Définir les hyperparamètres à optimiser
param_grid = {
    'n_estimators': [100, 200, 300],
    'max_depth': [None, 5, 10],
    'min_samples_split': [2, 5, 10],
    'min_samples_leaf': [1, 2, 4],
    'max_features': ['auto', 'sqrt']
}

# Créer une instance du modèle RandomForestClassifier
model = RandomForestClassifier()

# Créer une instance de GridSearchCV
grid_search = GridSearchCV(estimator=model, param_grid=param_grid, cv=5, scoring='accuracy', n_jobs=-1)

# Effectuer la recherche des meilleurs paramètres
grid_search.fit(X_train, y_train)

# Afficher les meilleurs paramètres et le meilleur score
print("Meilleurs paramètres : ", grid_search.best_params_)
print("Meilleur score : ", grid_search.best_score_)


In [None]:
from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier(n_estimators=100, max_depth=None, random_state=42)

model.fit(X_train, y_train.to_numpy().ravel())


# Prédiction sur les données de test
y_pred = model.predict(X_test)

In [None]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

# Calculer l'exactitude (accuracy) du modèle
accuracy = accuracy_score(y_test, y_pred)

# Calculer la précision (precision) du modèle
precision = precision_score(y_test, y_pred, average='macro')

# Calculer le rappel (recall) du modèle
recall = recall_score(y_test, y_pred, average='macro')

# Calculer le score F1 du modèle
f1 = f1_score(y_test, y_pred, average='macro')

print("Accuracy:", accuracy)
print("Precision:", precision)
print("Recall:", recall)
print("F1 Score:", f1)



In [None]:
# Importer le fichier CSV dans un DataFrame
new_data = pd.read_csv("William.csv")

# Afficher les premières lignes du DataFrame
new_data.head()


In [15]:
# Faire des prédictions sur le nouveau dataset
predictions = model.predict(new_data)

# Utiliser les prédictions comme requis
print(predictions)


NameError: name 'new_data' is not defined

In [16]:
new_data['y']=predictions

NameError: name 'predictions' is not defined

In [17]:
new_data

NameError: name 'new_data' is not defined

In [22]:
#Sauvegarde du DataFrame en CSV
new_data.to_csv('William_Label_brut.csv', index=False)