<a href="https://colab.research.google.com/github/cris-py-code/burger-bot/blob/main/knn_recommendation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Task
Create a K-Nearest Neighbors (KNN) example for recommending treatments based on clinical profiles. The example should simulate patient data with features like age, tumor size, abnormal cell count, and CA15-3 protein level, and recommend the most frequent treatment among the k nearest neighbors for a new patient.

## Generar datos sintéticos

### Subtask:
Crear un conjunto de datos sintético que simule perfiles clínicos de pacientes y los tratamientos asociados.


**Reasoning**:
The instructions require creating a synthetic dataset using pandas and numpy. I will import the necessary libraries and create a dictionary with synthetic data, then convert it into a pandas DataFrame.



In [None]:
import pandas as pd
import numpy as np

data = {
    'Age': np.random.randint(30, 80, 25),
    'Tumor Size': np.random.uniform(1.0, 10.0, 25),
    'Abnormal Cell Count': np.random.randint(100, 1000, 25),
    'CA15-3 Level': np.random.uniform(15.0, 100.0, 25),
    'Treatment': np.random.choice(['Chemotherapy', 'Radiation', 'Surgery', 'Immunotherapy'], 25)
}

df_patients = pd.DataFrame(data)
display(df_patients.head())

Unnamed: 0,Age,Tumor Size,Abnormal Cell Count,CA15-3 Level,Treatment
0,59,4.020702,230,16.376949,Chemotherapy
1,46,2.131193,773,30.784707,Immunotherapy
2,49,3.224266,634,82.129825,Chemotherapy
3,43,2.462785,301,64.768126,Radiation
4,72,4.803267,760,36.817903,Radiation


## Implementar knn

### Subtask:
Desarrollar la lógica para calcular distancias, encontrar vecinos cercanos y determinar la recomendación.


**Reasoning**:
Import the necessary modules, separate features and target, split the data, create a KNN classifier, and train the model as per the instructions.



In [None]:
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split

X = df_patients[['Age', 'Tumor Size', 'Abnormal Cell Count', 'CA15-3 Level']]
y = df_patients['Treatment']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train, y_train)

## Realizar recomendación

### Subtask:
Aplicar el modelo KNN a un nuevo paciente para obtener una recomendación de tratamiento.


**Reasoning**:
Create a new DataFrame for a new patient with the same features as the training data and then use the trained KNN model to predict the treatment for this new patient. Finally, print the recommended treatment.



In [None]:
new_patient_data = {
    'Age': [60],
    'Tumor Size': [7.5],
    'Abnormal Cell Count': [500],
    'CA15-3 Level': [65.0]
}

new_patient_df = pd.DataFrame(new_patient_data)

recommended_treatment = knn.predict(new_patient_df)

print(f"The recommended treatment for the new patient is: {recommended_treatment[0]}")

The recommended treatment for the new patient is: Radiation


## Evaluar y explicar

### Subtask:
Explicar el funcionamiento del modelo y cómo se llegó a la recomendación.


**Reasoning**:
Explain the KNN model and the recommendation process in markdown.



**Reasoning**:
Correct the markdown syntax error by escaping the backticks and regenerate the markdown explanation.



**Reasoning**:
Correct the markdown syntax error by properly enclosing the markdown content within triple quotes and regenerate the markdown explanation.



In [None]:
print("""
## Explanation of the K-Nearest Neighbors (KNN) Model for Treatment Recommendation

The K-Nearest Neighbors (KNN) algorithm is a simple yet effective supervised machine learning model used for both classification and regression tasks. In this scenario, we are using KNN for classification to recommend a treatment based on a patient's clinical profile.

**How KNN Works:**

1.  **Distance Calculation:** When predicting the treatment for a new patient, the KNN model calculates the "distance" between the new patient's clinical features (Age, Tumor Size, Abnormal Cell Count, CA15-3 Level) and the features of all the patients in the training dataset (`X_train`). This distance is typically calculated using a distance metric, such as Euclidean distance. The smaller the distance, the more similar the two patients' clinical profiles are considered to be.

2.  **Finding the Nearest Neighbors:** After calculating the distances, the model identifies the `k` patients in the training dataset that are closest (have the smallest distances) to the new patient. The value of `k` is a parameter that is set beforehand (in our case, `n_neighbors=5`). These `k` closest patients are the "nearest neighbors".

3.  **Determining the Recommendation:** Once the `k` nearest neighbors are identified, the model looks at the treatments (`y_train`) that these neighbors received. The recommended treatment for the new patient is the treatment that is most frequent among these `k` nearest neighbors. This is a majority voting process.

**Recommendation for the New Patient:**

For the `new_patient_df`, which has the following clinical profile:

| Age | Tumor Size | Abnormal Cell Count | CA15-3 Level |
|-----|------------|---------------------|--------------|
| 60  | 7.5        | 500                 | 65.0         |

The KNN model calculated the distances to all patients in the training set (`X_train`). It then identified the 5 closest patients based on their clinical features. The treatments of these 5 neighbors were considered, and the treatment that appeared most frequently among them was selected as the recommended treatment for this new patient.

Based on the model's prediction, the `recommended_treatment` variable holds the recommended treatment for this specific new patient, which was previously determined to be **Radiation**. This means that among the 5 most clinically similar patients in our training data, "Radiation" was the most common treatment.
""")


## Explanation of the K-Nearest Neighbors (KNN) Model for Treatment Recommendation

The K-Nearest Neighbors (KNN) algorithm is a simple yet effective supervised machine learning model used for both classification and regression tasks. In this scenario, we are using KNN for classification to recommend a treatment based on a patient's clinical profile.

**How KNN Works:**

1.  **Distance Calculation:** When predicting the treatment for a new patient, the KNN model calculates the "distance" between the new patient's clinical features (Age, Tumor Size, Abnormal Cell Count, CA15-3 Level) and the features of all the patients in the training dataset (`X_train`). This distance is typically calculated using a distance metric, such as Euclidean distance. The smaller the distance, the more similar the two patients' clinical profiles are considered to be.

2.  **Finding the Nearest Neighbors:** After calculating the distances, the model identifies the `k` patients in the training dataset that are

## Realizar recomendación desde línea de comandos

### Subtask:
Modificar el código para que reciba los datos del nuevo paciente desde la línea de comandos y use el modelo KNN para generar una recomendación.

In [None]:
import sys
import argparse

def recommend_treatment(age, tumor_size, abnormal_cell_count, ca15_3_level):
    """
    Recommends a treatment for a new patient using the trained KNN model.

    Args:
        age (int): Age of the patient.
        tumor_size (float): Size of the tumor.
        abnormal_cell_count (int): Count of abnormal cells.
        ca15_3_level (float): CA15-3 protein level.

    Returns:
        str: Recommended treatment.
    """
    new_patient_data = {
        'Age': [age],
        'Tumor Size': [tumor_size],
        'Abnormal Cell Count': [abnormal_cell_count],
        'CA15-3 Level': [ca15_3_level]
    }
    new_patient_df = pd.DataFrame(new_patient_data)
    recommended_treatment = knn.predict(new_patient_df)
    return recommended_treatment[0]

if __name__ == "__main__":
    parser = argparse.ArgumentParser(description='Recommend treatment for a new patient using KNN.')
    parser.add_argument('--age', type=int, required=True, help='Age of the patient')
    parser.add_argument('--tumor_size', type=float, required=True, help='Size of the tumor')
    parser.add_argument('--abnormal_cell_count', type=int, required=True, help='Count of abnormal cells')
    parser.add_argument('--ca15_3_level', type=float, required=True, help='CA15-3 protein level')

    args = parser.parse_args()

    recommended_treatment = recommend_treatment(
        args.age,
        args.tumor_size,
        args.abnormal_cell_count,
        args.ca15_3_level
    )

    print(f"The recommended treatment for the new patient is: {recommended_treatment}")

usage: colab_kernel_launcher.py [-h] --age AGE --tumor_size TUMOR_SIZE
                                --abnormal_cell_count ABNORMAL_CELL_COUNT
                                --ca15_3_level CA15_3_LEVEL
colab_kernel_launcher.py: error: the following arguments are required: --age, --tumor_size, --abnormal_cell_count, --ca15_3_level
ERROR:root:Internal Python error in the inspect module.
Below is the traceback from this internal error.



Traceback (most recent call last):
  File "/usr/lib/python3.12/argparse.py", line 1943, in _parse_known_args2
    namespace, args = self._parse_known_args(args, namespace, intermixed)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/argparse.py", line 2230, in _parse_known_args
    raise ArgumentError(None, _('the following arguments are required: %s') %
argparse.ArgumentError: the following arguments are required: --age, --tumor_size, --abnormal_cell_count, --ca15_3_level

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/IPython/core/interactiveshell.py", line 3553, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "/tmp/ipython-input-1887264259.py", line 34, in <cell line: 0>
    args = parser.parse_args()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/argparse.py", line 1904, in parse_args
    a

TypeError: object of type 'NoneType' has no len()

**To run this code from the command line:**

1. Save the notebook as a Python file (e.g., `knn_recommendation.py`).
2. Open a terminal or command prompt.
3. Navigate to the directory where you saved the file.
4. Run the command with the patient data as arguments:

## Summary:

### Data Analysis Key Findings

*   A synthetic dataset was successfully generated, simulating clinical profiles for 25 patients including Age, Tumor Size, Abnormal Cell Count, CA15-3 Level, and their associated Treatments.
*   A K-Nearest Neighbors (KNN) classifier was implemented and trained on the synthetic data to learn the relationship between clinical features and treatments.
*   Using the trained KNN model with `n_neighbors=5`, a treatment recommendation was successfully generated for a new, unseen patient with specific clinical data.
*   The recommended treatment for the new patient was identified as "Radiation" based on the majority treatment among its 5 nearest neighbors in the training data.
*   The process of the KNN model was explained, highlighting distance calculation, finding the k-nearest neighbors, and determining the recommendation through a majority vote of the neighbors' treatments.

### Insights or Next Steps

*   The choice of the number of neighbors (k) can significantly impact the recommendation. Further analysis could involve experimenting with different values of k to see how the recommendations change and potentially evaluating the model's performance with different k values.
*   While this is a simulated example, applying this approach to real-world, anonymized patient data could potentially assist healthcare professionals in making treatment decisions by identifying similar cases and their outcomes.
