# How to run

1. Create a folder in your Google Drive (for example: "my_data")
2. Add your input data file (for example: data.csv) as a csv to this folder in your google drive;

      The input file must be a csv file containing 16 columns, each column must contain in each row the value for the following variables for a patient:

        1. nsyll_pataka
        2. n_pausas_pataka
        3. dur_pataka
        4. phonationtme_pataka
        5. speech_rate_pataka
        6. articulationrate_pataka
        7. asd_pataka
        8. nsyll_monologo
        9. n_pausas_monologo
        10. dur_monologo
        11. phonationtime_monologo
        12. speechrate_monologo
        13. articulationrate_monologo
        14. asd_monologo
        15. pausas_monologo
        16. pause_ratio_monologo

      Each row of the csv file must be a patient.  

2. Edit the PATH_TO_OUTPUT_DIR and PATH_TO_INPUT_FILE in the cell below;
3. Run all cells in this notebook;
4. Predictions will be saved in the output folder indicated in the path.

In [None]:
# ADD THE DESIRED PATHS IN YOUR DRIVE HERE

PATH_TO_INPUT_FILE = '/content/drive/MyDrive/my_data/data.csv'
PATH_TO_OUTPUT_FOLDER = '/content/drive/MyDrive/my_data/'

In [None]:
# Import libraries
import pickle
import random

import numpy as np
import pandas as pd

from google.colab import drive

# Mount Google Drive
drive.mount('/content/drive')


In [None]:
# Download trained models
!wget https://raw.githubusercontent.com/jmp-3/test-notebook/master/trained_models/kmeans_model.pkl
!wget https://raw.githubusercontent.com/jmp-3/test-notebook/master/trained_models/scaler_model.pkl

In [None]:
# Load the data from Google Drive
loaded_data = pd.read_csv(PATH_TO_INPUT_FILE, header=None)

# Load the trained KMeans model from the pickle file
with open('kmeans_model.pkl', 'rb') as model_file:
    loaded_model = pickle.load(model_file)

# Load the scaler from the separate pickle file
with open('scaler_model.pkl', 'rb') as scaler_file:
    loaded_scaler = pickle.load(scaler_file)

# Transform the new data using the loaded scaler
new_data_scaled = loaded_scaler.transform(loaded_data)

# Assign the new data to clusters using the loaded model
predictions = loaded_model.predict(new_data_scaled)

pd.DataFrame(predictions).to_csv(PATH_TO_OUTPUT_FOLDER + "output.csv", index=False, header=False)
