### Proyecto: Predicción de enfermedades: Soporte para la toma de decisiones

##### ![1) Definición del objetivo:](https://github.com/chetincho/ds_Prediccion_de_enfermedades/blob/main/img/Definici%C3%B3n%20del%20objetivo.jpg?raw=true)

Se busca obtener un modelo computacional que permita la predicción de enfermedades, dicho modelo se convertirá en un soporte para la toma de decisiones médicas cuya aplicación estará enfocada principalmente en estudiantes de medicina recién graduados que estén atravesando por su primer año de residencia.

##### ![2) Contexto comercial:](https://github.com/chetincho/ds_Prediccion_de_enfermedades/blob/main/img/Contexto%20comercial.jpg?raw=true)

El resultado final del proyecto tiene por objetivo:<br>
1- Minimizar inconvenientes legales causados por negligencias médicas proveniente de la inexperiencia de los recién graduados.<br>
2- Proveer a los internos de un soporte a sus inferencias médicas ante una emergencia o la ausencia de un titular.<br>

##### ![3) Problema comercial:](https://github.com/chetincho/ds_Prediccion_de_enfermedades/blob/main/img/Problema%20comercial.jpg?raw=true)

¿Es posible desarrollar un modelo que, en función de ciertos inputs, nos permita predecir una enfermedad?

##### ![Librerías utilizadas:](https://github.com/chetincho/ds_Prediccion_de_enfermedades/blob/main/img/Librerias%20utilizadas.jpg?raw=true)

In [1]:
import pandas as pd

import matplotlib.pyplot as plt
import seaborn as sns

Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
        
  import pandas as pd


##### ![4) Data Acquisition:](https://github.com/chetincho/ds_Prediccion_de_enfermedades/blob/main/img/Data%20Acquisition.jpg?raw=true)

El dataset utilizado es el resultado de la recopilación de datos provenientes de las historias clínicas de pacientes que ya fueron diagnosticados por esta institución y cuyos casos fueron analizados por la junta médica de este hospital.

In [2]:
# Defino el origen de los datos.
url_dataset="https://raw.githubusercontent.com/chetincho/ds_Prediccion_de_enfermedades/refs/heads/main/dataset/ds_entrenamiento.csv"

# Carga del dataframe
df = pd.read_csv(url_dataset)

# Seteamos el índice del dataframe para que comience en 1
df.set_index(pd.Index(range(1, len(df) + 1)), inplace=True)
# Seteamos pandas para mostrar todas las columnas
pd.set_option('display.max_columns', None)

##### ![5) Data Acquisition:](https://github.com/chetincho/ds_Prediccion_de_enfermedades/blob/main/img/Exploratory%20Data%20Analysis.jpg?raw=true)

🗒️ ¿Cuál es la cantidad de filas y columnas que componen el dataframe?

In [3]:
filas, columnas = df.shape
print(f"Total de Filas = {filas}")
print(f"Total de Columnas = {columnas}")

Total de Filas = 4920
Total de Columnas = 134


🗒️ ¿Cuáles son las columnas o atributos que lo componen?

In [4]:
# Utilizo el método .tolist() para convertir las columnas en una lista
print(f"Este dataframe esta compuesto por las siguientes columnas:")
columnas = df.columns.tolist()
for columna in columnas:
    print(f"- {columna}")

Este dataframe esta compuesto por las siguientes columnas:
- itching
- skin_rash
- nodal_skin_eruptions
- continuous_sneezing
- shivering
- chills
- joint_pain
- stomach_pain
- acidity
- ulcers_on_tongue
- muscle_wasting
- vomiting
- burning_micturition
- spotting_ urination
- fatigue
- weight_gain
- anxiety
- cold_hands_and_feets
- mood_swings
- weight_loss
- restlessness
- lethargy
- patches_in_throat
- irregular_sugar_level
- cough
- high_fever
- sunken_eyes
- breathlessness
- sweating
- dehydration
- indigestion
- headache
- yellowish_skin
- dark_urine
- nausea
- loss_of_appetite
- pain_behind_the_eyes
- back_pain
- constipation
- abdominal_pain
- diarrhoea
- mild_fever
- yellow_urine
- yellowing_of_eyes
- acute_liver_failure
- fluid_overload
- swelling_of_stomach
- swelled_lymph_nodes
- malaise
- blurred_and_distorted_vision
- phlegm
- throat_irritation
- redness_of_eyes
- sinus_pressure
- runny_nose
- congestion
- chest_pain
- weakness_in_limbs
- fast_heart_rate
- pain_during_bow

⚠️ Atención: Se detecta la columna llamada "Unnamed", se procede a su eliminación.

In [5]:
df = df.drop('Unnamed: 133', axis=1)

🗒️ ¿Hay valores nulos?

In [6]:
print(f"Total de valores nulos detectados: {sum(df.isnull().sum())} valores")

Total de valores nulos detectados: 0 valores


🗒️ ¿Cuáles son los tipos de dato de cada columna?

In [7]:
print(f"Tipo de Dato por Columna:")
for columna, tipo in df.dtypes.items():
    print(f"- {columna}: {tipo}")

Tipo de Dato por Columna:
- itching: int64
- skin_rash: int64
- nodal_skin_eruptions: int64
- continuous_sneezing: int64
- shivering: int64
- chills: int64
- joint_pain: int64
- stomach_pain: int64
- acidity: int64
- ulcers_on_tongue: int64
- muscle_wasting: int64
- vomiting: int64
- burning_micturition: int64
- spotting_ urination: int64
- fatigue: int64
- weight_gain: int64
- anxiety: int64
- cold_hands_and_feets: int64
- mood_swings: int64
- weight_loss: int64
- restlessness: int64
- lethargy: int64
- patches_in_throat: int64
- irregular_sugar_level: int64
- cough: int64
- high_fever: int64
- sunken_eyes: int64
- breathlessness: int64
- sweating: int64
- dehydration: int64
- indigestion: int64
- headache: int64
- yellowish_skin: int64
- dark_urine: int64
- nausea: int64
- loss_of_appetite: int64
- pain_behind_the_eyes: int64
- back_pain: int64
- constipation: int64
- abdominal_pain: int64
- diarrhoea: int64
- mild_fever: int64
- yellow_urine: int64
- yellowing_of_eyes: int64
- acute

In [8]:
print(f"Resumen de los tipos de datos que componen el dataframe:")
resumen_tipos_datos = df.dtypes.value_counts()
print(resumen_tipos_datos)

Resumen de los tipos de datos que componen el dataframe:
int64     132
object      1
Name: count, dtype: int64


🗒️ Veamos una pequeña muestra del dataframe, exploramos las primeras y últimas filas.

In [9]:
print("Esta es una muestra de los datos contenidos en el dataframe:")
print("PRIMEROS 10 REGISTROS")
print("====================")
df.head(10)

Esta es una muestra de los datos contenidos en el dataframe:
PRIMEROS 10 REGISTROS


Unnamed: 0,itching,skin_rash,nodal_skin_eruptions,continuous_sneezing,shivering,chills,joint_pain,stomach_pain,acidity,ulcers_on_tongue,muscle_wasting,vomiting,burning_micturition,spotting_ urination,fatigue,weight_gain,anxiety,cold_hands_and_feets,mood_swings,weight_loss,restlessness,lethargy,patches_in_throat,irregular_sugar_level,cough,high_fever,sunken_eyes,breathlessness,sweating,dehydration,indigestion,headache,yellowish_skin,dark_urine,nausea,loss_of_appetite,pain_behind_the_eyes,back_pain,constipation,abdominal_pain,diarrhoea,mild_fever,yellow_urine,yellowing_of_eyes,acute_liver_failure,fluid_overload,swelling_of_stomach,swelled_lymph_nodes,malaise,blurred_and_distorted_vision,phlegm,throat_irritation,redness_of_eyes,sinus_pressure,runny_nose,congestion,chest_pain,weakness_in_limbs,fast_heart_rate,pain_during_bowel_movements,pain_in_anal_region,bloody_stool,irritation_in_anus,neck_pain,dizziness,cramps,bruising,obesity,swollen_legs,swollen_blood_vessels,puffy_face_and_eyes,enlarged_thyroid,brittle_nails,swollen_extremeties,excessive_hunger,extra_marital_contacts,drying_and_tingling_lips,slurred_speech,knee_pain,hip_joint_pain,muscle_weakness,stiff_neck,swelling_joints,movement_stiffness,spinning_movements,loss_of_balance,unsteadiness,weakness_of_one_body_side,loss_of_smell,bladder_discomfort,foul_smell_of urine,continuous_feel_of_urine,passage_of_gases,internal_itching,toxic_look_(typhos),depression,irritability,muscle_pain,altered_sensorium,red_spots_over_body,belly_pain,abnormal_menstruation,dischromic _patches,watering_from_eyes,increased_appetite,polyuria,family_history,mucoid_sputum,rusty_sputum,lack_of_concentration,visual_disturbances,receiving_blood_transfusion,receiving_unsterile_injections,coma,stomach_bleeding,distention_of_abdomen,history_of_alcohol_consumption,fluid_overload.1,blood_in_sputum,prominent_veins_on_calf,palpitations,painful_walking,pus_filled_pimples,blackheads,scurring,skin_peeling,silver_like_dusting,small_dents_in_nails,inflammatory_nails,blister,red_sore_around_nose,yellow_crust_ooze,prognosis
1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Fungal infection
2,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Fungal infection
3,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Fungal infection
4,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Fungal infection
5,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Fungal infection
6,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Fungal infection
7,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Fungal infection
8,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Fungal infection
9,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Fungal infection
10,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Fungal infection


In [10]:
print("ÚLTIMOS 10 REGISTROS")
print("===================")
df.tail(10)

ÚLTIMOS 10 REGISTROS


Unnamed: 0,itching,skin_rash,nodal_skin_eruptions,continuous_sneezing,shivering,chills,joint_pain,stomach_pain,acidity,ulcers_on_tongue,muscle_wasting,vomiting,burning_micturition,spotting_ urination,fatigue,weight_gain,anxiety,cold_hands_and_feets,mood_swings,weight_loss,restlessness,lethargy,patches_in_throat,irregular_sugar_level,cough,high_fever,sunken_eyes,breathlessness,sweating,dehydration,indigestion,headache,yellowish_skin,dark_urine,nausea,loss_of_appetite,pain_behind_the_eyes,back_pain,constipation,abdominal_pain,diarrhoea,mild_fever,yellow_urine,yellowing_of_eyes,acute_liver_failure,fluid_overload,swelling_of_stomach,swelled_lymph_nodes,malaise,blurred_and_distorted_vision,phlegm,throat_irritation,redness_of_eyes,sinus_pressure,runny_nose,congestion,chest_pain,weakness_in_limbs,fast_heart_rate,pain_during_bowel_movements,pain_in_anal_region,bloody_stool,irritation_in_anus,neck_pain,dizziness,cramps,bruising,obesity,swollen_legs,swollen_blood_vessels,puffy_face_and_eyes,enlarged_thyroid,brittle_nails,swollen_extremeties,excessive_hunger,extra_marital_contacts,drying_and_tingling_lips,slurred_speech,knee_pain,hip_joint_pain,muscle_weakness,stiff_neck,swelling_joints,movement_stiffness,spinning_movements,loss_of_balance,unsteadiness,weakness_of_one_body_side,loss_of_smell,bladder_discomfort,foul_smell_of urine,continuous_feel_of_urine,passage_of_gases,internal_itching,toxic_look_(typhos),depression,irritability,muscle_pain,altered_sensorium,red_spots_over_body,belly_pain,abnormal_menstruation,dischromic _patches,watering_from_eyes,increased_appetite,polyuria,family_history,mucoid_sputum,rusty_sputum,lack_of_concentration,visual_disturbances,receiving_blood_transfusion,receiving_unsterile_injections,coma,stomach_bleeding,distention_of_abdomen,history_of_alcohol_consumption,fluid_overload.1,blood_in_sputum,prominent_veins_on_calf,palpitations,painful_walking,pus_filled_pimples,blackheads,scurring,skin_peeling,silver_like_dusting,small_dents_in_nails,inflammatory_nails,blister,red_sore_around_nose,yellow_crust_ooze,prognosis
4911,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,1,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Hypothyroidism
4912,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,1,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Hyperthyroidism
4913,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,Hypoglycemia
4914,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,Osteoarthristis
4915,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,Arthritis
4916,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,(vertigo) Paroymsal Positional Vertigo
4917,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,0,0,0,0,0,0,0,Acne
4918,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Urinary tract infection
4919,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,0,0,0,Psoriasis
4920,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,Impetigo


💡 Si el contenido de la celda es igual a 1 -> el síntoma formo parte del diagnóstico.<br>
💡 Si el contenido de la celda es igual a 0 -> el síntoma NO formo parte del diagnóstico.

🗒️ ¿Cuántas historias clínicas fueron relevadas por cada tipo de enfermedad diagnosticada?

In [11]:
frecuencia_valores = df['prognosis'].value_counts()
print(f"\nFrecuencia de valores en la columna: {'prognosis'}")
print("-------------------------------------------------------")

n=1
for enfermedad,frecuencia in frecuencia_valores.items():
    print(f"{n} {enfermedad} -> {frecuencia}")
    n=n+1


Frecuencia de valores en la columna: prognosis
-------------------------------------------------------
1 Fungal infection -> 120
2 Hepatitis C -> 120
3 Hepatitis E -> 120
4 Alcoholic hepatitis -> 120
5 Tuberculosis -> 120
6 Common Cold -> 120
7 Pneumonia -> 120
8 Dimorphic hemmorhoids(piles) -> 120
9 Heart attack -> 120
10 Varicose veins -> 120
11 Hypothyroidism -> 120
12 Hyperthyroidism -> 120
13 Hypoglycemia -> 120
14 Osteoarthristis -> 120
15 Arthritis -> 120
16 (vertigo) Paroymsal  Positional Vertigo -> 120
17 Acne -> 120
18 Urinary tract infection -> 120
19 Psoriasis -> 120
20 Hepatitis D -> 120
21 Hepatitis B -> 120
22 Allergy -> 120
23 hepatitis A -> 120
24 GERD -> 120
25 Chronic cholestasis -> 120
26 Drug Reaction -> 120
27 Peptic ulcer diseae -> 120
28 AIDS -> 120
29 Diabetes  -> 120
30 Gastroenteritis -> 120
31 Bronchial Asthma -> 120
32 Hypertension  -> 120
33 Migraine -> 120
34 Cervical spondylosis -> 120
35 Paralysis (brain hemorrhage) -> 120
36 Jaundice -> 120
37 Malaria

💡 Se observa que en total el dataset contiene 41 enfermedades posibles de diagnostico de las cuales se relevaron equitativamente 120 historias clínicas.<br>

🗒️ ¿Cuáles son los síntomas que mas se repiten?

In [12]:
# Identificamos las columnas pertenecientes a los síntomas
columnas_binarias = df.columns[:-1]  # Filtro la última columna, la cual corresponde al síntoma

# Calculamos la frecuencia del valor 1 (SI) para cada síntoma
frecuencia_unos = df[columnas_binarias].sum()

# Calculamos el número total de filas (para obtener la proporción)
total_filas = len(df)

# Calculamos el porcentaje de 1s (SI)
porcentaje_unos = (frecuencia_unos / total_filas) * 100

print("Resumen de columnas binarias:")
print("-----------------------------")
print("Total de veces que el sintoma forma parte del dignóstico")
print("")

for columna, frecuencia in frecuencia_unos.items():
    porcentaje = porcentaje_unos[columna]
    print(f"* {porcentaje:.2f}% de las veces esta presente el síntoma -> {columna}")


Resumen de columnas binarias:
-----------------------------
Total de veces que el sintoma forma parte del dignóstico

* 13.78% de las veces esta presente el síntoma -> itching
* 15.98% de las veces esta presente el síntoma -> skin_rash
* 2.20% de las veces esta presente el síntoma -> nodal_skin_eruptions
* 4.51% de las veces esta presente el síntoma -> continuous_sneezing
* 2.20% de las veces esta presente el síntoma -> shivering
* 16.22% de las veces esta presente el síntoma -> chills
* 13.90% de las veces esta presente el síntoma -> joint_pain
* 4.51% de las veces esta presente el síntoma -> stomach_pain
* 4.51% de las veces esta presente el síntoma -> acidity
* 2.20% de las veces esta presente el síntoma -> ulcers_on_tongue
* 2.20% de las veces esta presente el síntoma -> muscle_wasting
* 38.90% de las veces esta presente el síntoma -> vomiting
* 4.39% de las veces esta presente el síntoma -> burning_micturition
* 2.20% de las veces esta presente el síntoma -> spotting_ urination
* 

💡 Hay 7 síntomas con alto grado de participación en los diagnósticos:<br>
- fatigue esta presente en el 39,27% de los diagnósticos.<br>
- vomiting esta presente en el 38,90% de los diagnósticos.<br>
- high_fever esta presente en el 27,68% de los diagnósticos.<br>
- loss_of_appetite esta presente en el 23,41% de los diagnósticos.<br>
- nausea esta presente en el 23,29% de los diagnósticos.<br>
- headache esta presente en el 23,05% de los diagnósticos.<br>
- abdominal_pain esta presente en el 20,98% de los diagnósticos.<br>
<br>

Porcentajes altos de participación de un síntoma provoca que éste no sea determinante en el diagnostico, ej: fatigue y vomiting está presente en casi el 40% de las historias clínicas. Si el profesional de la salud no tiene en cuenta el resto de los síntomas puede cometer errores de diagnóstico.
