# train classiffier

- Loads hand gesture data from a pickle file.
- Splits the data into training and testing sets.
- Trains a Support Vector Classifier (SVC) model using the training data.
- Evaluates the model's accuracy on the testing data.
- Saves the trained model to a file.

* Loading Data: The script first loads the hand gesture data from a pickle file named 'data.pickle'. This file contains the preprocessed data, including the hand landmarks extracted from the images and their corresponding labels.

* Splitting Data: After loading the data, it splits it into two sets: training data and testing data. The train_test_split function from scikit-learn is used for this purpose. By default, it splits the data into 80% training and 20% testing sets.

* Model Initialization: Next, it initializes a Support Vector Classifier (SVC) model. SVC is a popular supervised learning algorithm used for classification tasks. In this case, it's chosen for its effectiveness in handling high-dimensional data like the hand landmarks extracted from images.

* Model Training: The initialized SVC model is trained using the training data. The fit method is called on the model object (model.fit(x_train, y_train)), where x_train represents the features (hand landmarks) and y_train represents the corresponding labels.

* Model Evaluation: Once the model is trained, it predicts the labels for the testing data using the predict method (y_predict = model.predict(x_test)). Then, it calculates the accuracy of the model by comparing the predicted labels (y_predict) with the actual labels (y_test). The accuracy_score function from scikit-learn is used for this purpose.

* Saving the Model: Finally, the trained model is saved to a file named 'model.p' using pickle. This allows you to reuse the trained model later without needing to retrain it every time.

* Print Results: The script prints the accuracy of the model on the testing data, indicating how well the model performs in classifying hand gestures. It also prints the percentage of samples that were classified correctly.

---

In [2]:
import pickle
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC  # For SVC
from sklearn.ensemble import RandomForestClassifier # For ensemble
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, classification_report
import numpy as np

In [3]:
data_dict = pickle.load(open('./data.pickle', 'rb')) # read the data from pickle file
data = np.asarray(data_dict['data'])
labels = np.asarray(data_dict['labels'])

In [4]:
# x_train, x_test, y_train, y_test = train_test_split(data, labels, test_size=0.2, shuffle=True, stratify=labels)
x_train,x_test, y_train, y_test = train_test_split(data, labels, test_size=0.2, random_state=42, shuffle=True, stratify=labels)

# model = RandomForestClassifier()
model = SVC(kernel='linear')  # You can choose different kernels like 'rbf' or 'poly' as well


model.fit(x_train, y_train)

y_predict = model.predict(x_test)
# Calculate accuracy
accuracy = accuracy_score(y_test, y_predict)
print("Accuracy:", accuracy)

# Calculate precision
precision = precision_score(y_test, y_predict, average='weighted')
print("Precision:", precision)

# Calculate recall
recall = recall_score(y_test, y_predict, average='weighted')
print("Recall:", recall)

# Calculate F1-score
f1 = f1_score(y_test, y_predict, average='weighted')
print("F1-score:", f1)

# Generate a detailed classification report
print("\nClassification Report:")
print(classification_report(y_test, y_predict))

f = open('model.p', 'wb') # write the model to a file
pickle.dump({'model': model}, f)
f.close()

Accuracy: 0.9961538461538462
Precision: 0.9965034965034967
Recall: 0.9961538461538462
F1-score: 0.9961442066705224

Classification Report:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        20
           1       1.00      1.00      1.00        20
          10       1.00      1.00      1.00        20
          11       1.00      1.00      1.00        20
          12       1.00      1.00      1.00        20
          13       1.00      1.00      1.00        20
          14       1.00      1.00      1.00        20
          15       1.00      1.00      1.00        20
          16       1.00      1.00      1.00        20
          17       1.00      1.00      1.00        20
          18       0.91      1.00      0.95        20
          19       1.00      1.00      1.00        20
           2       1.00      1.00      1.00        20
          20       1.00      1.00      1.00        20
          21       1.00      1.00      1.00       

# Theory

### sklearn
- User-Friendly Interface: scikit-learn provides a consistent and easy-to-use interface for various machine learning tasks, including classification, regression, clustering, dimensionality reduction, and more.

- Wide Range of Algorithms: It implements a wide range of machine learning algorithms, including but not limited to:

  - Supervised learning algorithms like Support Vector Machines (SVM), Decision Trees, Random Forests, Gradient Boosting, k-Nearest Neighbors (k-NN), and Neural Networks.
  - Unsupervised learning algorithms like K-Means Clustering, Hierarchical Clustering, Principal Component Analysis (PCA), and Independent Component Analysis (ICA).

- Model Evaluation and Validation: scikit-learn provides tools for model evaluation, including metrics such as accuracy, precision, recall, F1-score, ROC curves, and more. It also offers functions for cross-validation and hyperparameter tuning to improve model performance.

- Data Preprocessing: The library includes various utilities for data preprocessing, such as feature scaling, feature selection, data imputation, encoding categorical variables, and handling missing values.

- Integration with Other Libraries: scikit-learn seamlessly integrates with other popular Python libraries like NumPy, SciPy, Pandas, and Matplotlib, making it easy to incorporate machine learning into data analysis workflows.