<a href="https://colab.research.google.com/github/Ebasurtos/Machine-Learning/blob/main/Proyecto2_Clasificaci%C3%B3n_Grupo8.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Participantes: (Colocar el % de participación)

1 Jorge Palacios 35%

2.Eder Basurto 35%

3 Rodolfo Morocho 30%

Proyecto #2 (Clasificación):

El objetivo de este proyecto es clasificar a los pacientes como con COVID-19, utilizando únicamente el sonido de su tos. Para ello, su grupo puede usar bibliotecas para obtener el mejor vector de características que represente el sonido de la tos. El conjunto de datos contiene las señales sonoras de pacientes con y sin COVID-19. El conjunto de datos se crea a partir de muestras recopiladas de COSWARA y Virufy, que son altamente fiables. Hay 1207 toses de personas con resultado negativo y 150 de personas con resultado positivo de COVID-19

Actividades:

Utilice el conjunto de datos "Toses" y aplique los siguientes algoritmos de clasificación: Regresión logística, SVM, Árboles de decisión y KNN.
Implemente (puntuación sobre 20) o utilice bibliotecas (puntuación sobre 15) para clasificar el conjunto de datos utilizando SVM, KNN y Árboles de decisión.
Realice el proceso de entrenamiento utilizando validación cruzada de K-fold y Bootstrap para estimar el error.
En una tabla, presente los valores de Precisión, Recall y Puntuación F1 para cada prueba de hiperparámetro en cada modelo.
Finalmente, concluya qué modelos ofrecen los mejores resultados

In [3]:

!pip install librosa

import pandas as pd
import numpy as np
import librosa as lb
import os
from sklearn.model_selection import KFold, train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import precision_score, recall_score, f1_score
from sklearn.utils import resample

# Assume the dataset is structured with subdirectories 'positive' and 'negative'
# containing the audio files. Adjust the base_dir if your data is organized differently.
base_dir = '/content/drive/MyDrive/ML_Data/tos/cleaned_data' # Replace with the actual path to your dataset

# --- Feature Extraction ---
def extract_features(file_path):
  try:
    y, sr = lb.load(file_path)
    # Extract various audio features
    mfccs = lb.feature.mfcc(y=y, sr=sr, n_mfcc=40)
    chroma = lb.feature.chroma_stft(y=y, sr=sr)
    mel = lb.feature.melspectrogram(y=y, sr=sr)
    contrast = lb.feature.spectral_contrast(y=y, sr=sr)
    tonnetz = lb.feature.tonnetz(y=lb.effects.harmonic(y), sr=sr)

    # Aggregate features (e.g., mean)
    features = np.hstack([
        np.mean(mfccs, axis=1),
        np.mean(chroma, axis=1),
        np.mean(mel, axis=1),
        np.mean(contrast, axis=1),
        np.mean(tonnetz, axis=1)
    ])
    return features
  except Exception as e:
    print(f"Error processing file {file_path}: {e}")
    return None

features = []
labels = []

# Process positive samples
positive_dir = os.path.join(base_dir, 'Positive')
if os.path.exists(positive_dir):
  for filename in os.listdir(positive_dir):
    if filename.endswith('.wav'): # Assuming audio files are in WAV format
      file_path = os.path.join(positive_dir, filename)
      extracted_features = extract_features(file_path)
      if extracted_features is not None:
        features.append(extracted_features)
        labels.append(1) # 1 for positive

# Process negative samples
negative_dir = os.path.join(base_dir, 'Negative')
if os.path.exists(negative_dir):
  for filename in os.listdir(negative_dir):
    if filename.endswith('.wav'): # Assuming audio files are in WAV format
      file_path = os.path.join(negative_dir, filename)
      extracted_features = extract_features(file_path)
      if extracted_features is not None:
        features.append(extracted_features)
        labels.append(0) # 0 for negative

X = np.array(features)
y = np.array(labels)

# Handle potential empty dataset if file paths were incorrect or no files found
if X.shape[0] == 0:
    print("No audio files found or processed. Please check the 'base_dir' and file extensions.")
else:
    # --- Data Splitting (Initial split for Bootstrap) ---
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

    # --- Classification Algorithms and Evaluation ---
    models = {
        'Logistic Regression': LogisticRegression(max_iter=1000),
        'SVM': SVC(),
        'Decision Tree': DecisionTreeClassifier(),
        'KNN': KNeighborsClassifier()
    }

    results = {}

    # --- K-Fold Cross-Validation ---
    n_splits_kfold = 5
    kf = KFold(n_splits=n_splits_kfold, shuffle=True, random_state=42)

    print("Performing K-Fold Cross-Validation...")
    for name, model in models.items():
        precision_scores_kf = []
        recall_scores_kf = []
        f1_scores_kf = []

        for fold, (train_index, val_index) in enumerate(kf.split(X)):
            X_train_kf, X_val_kf = X[train_index], X[val_index]
            y_train_kf, y_val_kf = y[train_index], y[val_index]

            model.fit(X_train_kf, y_train_kf)
            y_pred_kf = model.predict(X_val_kf)

            precision_scores_kf.append(precision_score(y_val_kf, y_pred_kf))
            recall_scores_kf.append(recall_score(y_val_kf, y_pred_kf))
            f1_scores_kf.append(f1_score(y_val_kf, y_pred_kf))

        results[f'{name}_KFold'] = {
            'Precision': np.mean(precision_scores_kf),
            'Recall': np.mean(recall_scores_kf),
            'F1-Score': np.mean(f1_scores_kf)
        }
        print(f"{name} K-Fold results calculated.")


    # --- Bootstrap ---
    n_iterations_bootstrap = 100
    bootstrap_scores = {name: {'Precision': [], 'Recall': [], 'F1-Score': []} for name in models.keys()}

    print("\nPerforming Bootstrap...")
    for i in range(n_iterations_bootstrap):
        # Create a bootstrap sample of the training data
        X_train_bs, y_train_bs = resample(X_train, y_train, replace=True, random_state=i)

        for name, model in models.items():
            model.fit(X_train_bs, y_train_bs)
            y_pred_bs = model.predict(X_test) # Evaluate on the original test set

            bootstrap_scores[name]['Precision'].append(precision_score(y_test, y_pred_bs))
            bootstrap_scores[name]['Recall'].append(recall_score(y_test, y_pred_bs))
            bootstrap_scores[name]['F1-Score'].append(f1_score(y_test, y_pred_bs))

        if (i + 1) % 10 == 0:
            print(f"Bootstrap iteration {i + 1}/{n_iterations_bootstrap} complete.")

    for name in models.keys():
        results[f'{name}_Bootstrap'] = {
            'Precision': np.mean(bootstrap_scores[name]['Precision']),
            'Recall': np.mean(bootstrap_scores[name]['Recall']),
            'F1-Score': np.mean(bootstrap_scores[name]['F1-Score'])
        }


    # --- Presentation of Results ---
    results_df = pd.DataFrame.from_dict(results, orient='index')
    print("\nClassification Results:")
    print(results_df)

    # --- Conclusion ---
    print("\nConclusion:")
    # You can analyze the results_df to identify the best performing models based on the metrics.
    # For example, focusing on F1-Score as it balances Precision and Recall, which is important
    # for imbalanced datasets.
    best_model_kfold = results_df.loc[results_df.index.str.contains('KFold'), 'F1-Score'].idxmax()
    best_f1_kfold = results_df.loc[best_model_kfold, 'F1-Score']

    best_model_bootstrap = results_df.loc[results_df.index.str.contains('Bootstrap'), 'F1-Score'].idxmax()
    best_f1_bootstrap = results_df.loc[best_model_bootstrap, 'F1-Score']

    print(f"\nBased on K-Fold Cross-Validation (highest average F1-Score): {best_model_kfold} with F1-Score: {best_f1_kfold:.4f}")
    print(f"Based on Bootstrap (highest average F1-Score on test set): {best_model_bootstrap} with F1-Score: {best_f1_bootstrap:.4f}")

    print("\nFurther analysis is needed to consider the trade-offs between Precision and Recall based on the application's needs.")
    print("For example, a higher Recall might be preferred to avoid missing positive cases, even if it means more false positives.")




  return pitch_tuning(
  return pitch_tuning(
  return pitch_tuning(
  return pitch_tuning(
  return pitch_tuning(
  return pitch_tuning(
  return pitch_tuning(
  return pitch_tuning(
  return pitch_tuning(
  return pitch_tuning(
  return pitch_tuning(
  return pitch_tuning(
  return pitch_tuning(
  return pitch_tuning(
  return pitch_tuning(
  return pitch_tuning(


Performing K-Fold Cross-Validation...


STOP: TOTAL NO. OF ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


Logistic Regression K-Fold results calculated.
SVM K-Fold results calculated.


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


Decision Tree K-Fold results calculated.
KNN K-Fold results calculated.

Performing Bootstrap...


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize

Bootstrap iteration 10/100 complete.


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize

Bootstrap iteration 20/100 complete.


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize

Bootstrap iteration 30/100 complete.


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize

Bootstrap iteration 40/100 complete.


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


Bootstrap iteration 50/100 complete.


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize

Bootstrap iteration 60/100 complete.


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize

Bootstrap iteration 70/100 complete.


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize

Bootstrap iteration 80/100 complete.


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize

Bootstrap iteration 90/100 complete.
Bootstrap iteration 100/100 complete.

Classification Results:
                               Precision  Recall  F1-Score
Logistic Regression_KFold        0.05000   0.100  0.066667
SVM_KFold                        0.00000   0.000  0.000000
Decision Tree_KFold              0.05000   0.200  0.080000
KNN_KFold                        0.00000   0.000  0.000000
Logistic Regression_Bootstrap    0.00000   0.000  0.000000
SVM_Bootstrap                    0.00000   0.000  0.000000
Decision Tree_Bootstrap          0.12794   0.165  0.132040
KNN_Bootstrap                    0.00000   0.000  0.000000

Conclusion:

Based on K-Fold Cross-Validation (highest average F1-Score): Decision Tree_KFold with F1-Score: 0.0800
Based on Bootstrap (highest average F1-Score on test set): Decision Tree_Bootstrap with F1-Score: 0.1320

Further analysis is needed to consider the trade-offs between Precision and Recall based on the application's needs.
For example, a higher Recall 