# Análisis de Tuberculosis

**Planteamiento**  

Usted trabaja para la Organización Mundial de la Salud como analista de información. El consejo directivo se reunirá para revisar los datos más recientes de casos de tuberculosis.  

Los datos han llegado con la información más reciente y necesita prepararlos para mostrar la información a los líderes de la organización.  

El objetivo de la junta es entender la situación actual de tuberculosis, y las tendencias por región e identifi car países que han sido casos de éxito y aquellos que necesiten mayor apoyo con la gestión de la enfermedad.  

Tenga en mente las metas de la ONU para terminar con la Tuberculosis para 2025:  

*   Reducción en la tasa de incidencia del 50% 2015 vs 2025.
*   Reducción en 75% el número de muertes 2015 vs 2025

**¿Qué es la tuberculosis?**  
La tuberculosis es una enfermedad infecciosa causada por la bacteria *Mycobacterium tuberculosis*. Afecta principalmente a los pulmones, aunque también puede dañar otro organos del cuerpo, mejor conocido como tuberculosos extrapulmonar.

Se trasmite por vía áerea, a través de gotículas que una persona enferma expulsa al toser, estornudar o hablar.

**Métodos de dignóstico**

1.   ***Smear Positive Pulmonary Tuberculosis (sp)::*** Diagnóstico por baciloscopia pulmonar positiva, es decir, se detectan bacilos de tuberculosis. Método más tradicional y específico para detectar tuberculosis pulmonar activa. Indicador clave de transmisión porque estos pacientes suelen ser más infecciosos.
1.   ***Smear Negative Pulmonary Tuberculosis (sn)::*** Diagnóstico por baciloscopia pulmonar negativa, es decir, no se detectan bacilos. Son casos con síntoma clínicos compatibles pero una baciloscopía negativa, por lo que debe apoyarse en otros métodos. Estos pacientes son difíciles de confirmar.

1.   ***Extrapulmonary Tuberculosis (ep)::*** Diagnóstico de tuberculosis extrapulmonar, es decir, que afecta otras partes del cuerpo sin afectar directamente los pulmones. Estos pacientes son menos infecciosos, y más difíciles de tratar.

1.   ***Relapse Cases (rel)::*** Diagnóstico de recaída, es decir, paciente que ya habían sido tratados y dados de alta, pero vuelven a desarrollar tuberculosis activa. Estos pacientes son importnates para evaluar la efectividad de los tratamientos y el riesgo de decaídas.

In [None]:
from google.colab import drive
import pandas as pd
import numpy as np
drive.mount('/content/drive')

pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)


In [None]:
who = pd.read_csv("/content/drive/MyDrive/Recursos Colab/who.csv")
population = pd.read_csv("/content/drive/MyDrive/Recursos Colab/population.csv")

In [None]:
who.head()

Unnamed: 0,country,iso2,iso3,year,new_sp_m014,new_sp_m1524,new_sp_m2534,new_sp_m3544,new_sp_m4554,new_sp_m5564,...,newrel_m4554,newrel_m5564,newrel_m65,newrel_f014,newrel_f1524,newrel_f2534,newrel_f3544,newrel_f4554,newrel_f5564,newrel_f65
0,Afghanistan,AF,AFG,1980,,,,,,,...,,,,,,,,,,
1,Afghanistan,AF,AFG,1981,,,,,,,...,,,,,,,,,,
2,Afghanistan,AF,AFG,1982,,,,,,,...,,,,,,,,,,
3,Afghanistan,AF,AFG,1983,,,,,,,...,,,,,,,,,,
4,Afghanistan,AF,AFG,1984,,,,,,,...,,,,,,,,,,


In [None]:
population.head()

Unnamed: 0,country,year,population
0,Afghanistan,1995,17586073
1,Afghanistan,1996,18415307
2,Afghanistan,1997,19021226
3,Afghanistan,1998,19496836
4,Afghanistan,1999,19987071


# Manipulacion de datos

**Acerca de los datos**  

Un conjunto de datos del Informe Global sobre Tuberculosis de la Organización Mundial de la Salud (OMS), junto con datos de poblaciones globales.
El conjunto utiliza los códigos originales de la OMS.

Los nombres de las columnas de la columna 5 a la 60 se forman combinando *new_* con:  

*   El método de diagnóstico (*rel* = recaída, *sn* = esputo pulmonar negativo, *sp* = esputo pulmonar positivo, *ep* = extrapulmonar).

*   El género (*f* = femenino, *m* = masculino).


*   El grupo de edad (*014* = 0 a 14 años, *1524* = 15 a 24, *2534* = 25 a 34, *3544* = 35 a 44, *4554* = 45 a 54, *5564* = 55 a 64, *65* = 65 años o más).


In [None]:
who.columns

Index(['country', 'iso2', 'iso3', 'year', 'new_sp_m014', 'new_sp_m1524',
       'new_sp_m2534', 'new_sp_m3544', 'new_sp_m4554', 'new_sp_m5564',
       'new_sp_m65', 'new_sp_f014', 'new_sp_f1524', 'new_sp_f2534',
       'new_sp_f3544', 'new_sp_f4554', 'new_sp_f5564', 'new_sp_f65',
       'new_sn_m014', 'new_sn_m1524', 'new_sn_m2534', 'new_sn_m3544',
       'new_sn_m4554', 'new_sn_m5564', 'new_sn_m65', 'new_sn_f014',
       'new_sn_f1524', 'new_sn_f2534', 'new_sn_f3544', 'new_sn_f4554',
       'new_sn_f5564', 'new_sn_f65', 'new_ep_m014', 'new_ep_m1524',
       'new_ep_m2534', 'new_ep_m3544', 'new_ep_m4554', 'new_ep_m5564',
       'new_ep_m65', 'new_ep_f014', 'new_ep_f1524', 'new_ep_f2534',
       'new_ep_f3544', 'new_ep_f4554', 'new_ep_f5564', 'new_ep_f65',
       'newrel_m014', 'newrel_m1524', 'newrel_m2534', 'newrel_m3544',
       'newrel_m4554', 'newrel_m5564', 'newrel_m65', 'newrel_f014',
       'newrel_f1524', 'newrel_f2534', 'newrel_f3544', 'newrel_f4554',
       'newrel_f5564', 'n

In [None]:
population.columns

Index(['country', 'year', 'population'], dtype='object')

## Valores nulos

In [None]:
who.isna().sum()

Unnamed: 0,0
country,0
iso2,34
iso3,0
year,0
new_sp_m014,4067
new_sp_m1524,4031
new_sp_m2534,4034
new_sp_m3544,4021
new_sp_m4554,4017
new_sp_m5564,4022


In [None]:
# Filtrar valores faltantes
who[who.iso2.isna()]

Unnamed: 0,country,iso2,iso3,year,new_sp_m014,new_sp_m1524,new_sp_m2534,new_sp_m3544,new_sp_m4554,new_sp_m5564,...,newrel_m4554,newrel_m5564,newrel_m65,newrel_f014,newrel_f1524,newrel_f2534,newrel_f3544,newrel_f4554,newrel_f5564,newrel_f65
4369,Namibia,,NAM,1980,,,,,,,...,,,,,,,,,,
4370,Namibia,,NAM,1981,,,,,,,...,,,,,,,,,,
4371,Namibia,,NAM,1982,,,,,,,...,,,,,,,,,,
4372,Namibia,,NAM,1983,,,,,,,...,,,,,,,,,,
4373,Namibia,,NAM,1984,,,,,,,...,,,,,,,,,,
4374,Namibia,,NAM,1985,,,,,,,...,,,,,,,,,,
4375,Namibia,,NAM,1986,,,,,,,...,,,,,,,,,,
4376,Namibia,,NAM,1987,,,,,,,...,,,,,,,,,,
4377,Namibia,,NAM,1988,,,,,,,...,,,,,,,,,,
4378,Namibia,,NAM,1989,,,,,,,...,,,,,,,,,,


In [None]:
# Los valores faltantes de iso2 para Namibia se llenan con un string "NA"
who.loc[who.country == "Namibia", "iso2"] = "NA"

In [None]:
# Todos los valores faltantes de todas las columnas van a ser rellenadas con un valor 0
who = who.fillna(0)

In [None]:
# population tiene valores faltantes?
population.isna().sum()

Unnamed: 0,0
country,0
year,0
population,0


## Modificar la estructura de los datos

In [None]:
who.head()

Unnamed: 0,country,iso2,iso3,year,new_sp_m014,new_sp_m1524,new_sp_m2534,new_sp_m3544,new_sp_m4554,new_sp_m5564,...,newrel_m4554,newrel_m5564,newrel_m65,newrel_f014,newrel_f1524,newrel_f2534,newrel_f3544,newrel_f4554,newrel_f5564,newrel_f65
0,Afghanistan,AF,AFG,1980,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Afghanistan,AF,AFG,1981,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Afghanistan,AF,AFG,1982,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Afghanistan,AF,AFG,1983,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Afghanistan,AF,AFG,1984,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [None]:
population.head()

Unnamed: 0,country,year,population
0,Afghanistan,1995,17586073
1,Afghanistan,1996,18415307
2,Afghanistan,1997,19021226
3,Afghanistan,1998,19496836
4,Afghanistan,1999,19987071


In [None]:
who2 = who.melt(id_vars=["country", "year", "iso2", "iso3"])
who2.head(10)

Unnamed: 0,country,year,iso2,iso3,variable,value
0,Afghanistan,1980,AF,AFG,new_sp_m014,0.0
1,Afghanistan,1981,AF,AFG,new_sp_m014,0.0
2,Afghanistan,1982,AF,AFG,new_sp_m014,0.0
3,Afghanistan,1983,AF,AFG,new_sp_m014,0.0
4,Afghanistan,1984,AF,AFG,new_sp_m014,0.0
5,Afghanistan,1985,AF,AFG,new_sp_m014,0.0
6,Afghanistan,1986,AF,AFG,new_sp_m014,0.0
7,Afghanistan,1987,AF,AFG,new_sp_m014,0.0
8,Afghanistan,1988,AF,AFG,new_sp_m014,0.0
9,Afghanistan,1989,AF,AFG,new_sp_m014,0.0


In [None]:
# Añadir la columna que indique el genero
who2["gender"] = np.where(who2["variable"].str.contains("m"), "masculino", "femenino")
who2.head(10)

Unnamed: 0,country,year,iso2,iso3,variable,value,gender
0,Afghanistan,1980,AF,AFG,new_sp_m014,0.0,masculino
1,Afghanistan,1981,AF,AFG,new_sp_m014,0.0,masculino
2,Afghanistan,1982,AF,AFG,new_sp_m014,0.0,masculino
3,Afghanistan,1983,AF,AFG,new_sp_m014,0.0,masculino
4,Afghanistan,1984,AF,AFG,new_sp_m014,0.0,masculino
5,Afghanistan,1985,AF,AFG,new_sp_m014,0.0,masculino
6,Afghanistan,1986,AF,AFG,new_sp_m014,0.0,masculino
7,Afghanistan,1987,AF,AFG,new_sp_m014,0.0,masculino
8,Afghanistan,1988,AF,AFG,new_sp_m014,0.0,masculino
9,Afghanistan,1989,AF,AFG,new_sp_m014,0.0,masculino


In [None]:
# funcion para asignar el grupo de edad
def asignar_groupedad(valor):
  if "014" in valor:
     return "0-14"
  elif "1524" in valor:
    return "15-24"
  elif "25-34" in valor:
    return "25-34"
  elif "3544" in valor:
    return "35-44"
  elif "4554" in valor:
    return "45-54"
  elif "5564" in valor:
    return "55-64"
  elif "65" in valor:
    return "65+"

In [None]:
who2["agegroup"] = who2["variable"].apply(asignar_groupedad)
who2.head()

Unnamed: 0,country,year,iso2,iso3,variable,value,gender,agegroup
0,Afghanistan,1980,AF,AFG,new_sp_m014,0.0,masculino,0-14
1,Afghanistan,1981,AF,AFG,new_sp_m014,0.0,masculino,0-14
2,Afghanistan,1982,AF,AFG,new_sp_m014,0.0,masculino,0-14
3,Afghanistan,1983,AF,AFG,new_sp_m014,0.0,masculino,0-14
4,Afghanistan,1984,AF,AFG,new_sp_m014,0.0,masculino,0-14


In [None]:
# funcion para asignar los metodos de diagnóstico
def asignar_metodo(valor):
  if "rel" in valor:
    return "recaida"
  if "sn" in valor:
    return "esputo pulmonar negativo"
  if "sp" in valor:
    return "esputo pulmonar positivo"
  if "ep" in valor:
    return "extrapulmonar"

In [None]:
who2["method"] = who2["variable"].apply(asignar_metodo)
who2.head()

Unnamed: 0,country,year,iso2,iso3,variable,value,gender,agegroup,method
0,Afghanistan,1980,AF,AFG,new_sp_m014,0.0,masculino,0-14,esputo pulmonar positivo
1,Afghanistan,1981,AF,AFG,new_sp_m014,0.0,masculino,0-14,esputo pulmonar positivo
2,Afghanistan,1982,AF,AFG,new_sp_m014,0.0,masculino,0-14,esputo pulmonar positivo
3,Afghanistan,1983,AF,AFG,new_sp_m014,0.0,masculino,0-14,esputo pulmonar positivo
4,Afghanistan,1984,AF,AFG,new_sp_m014,0.0,masculino,0-14,esputo pulmonar positivo


In [None]:
who2 = who2.drop(columns=["variable"])

In [None]:
who2.head()

Unnamed: 0,country,year,iso2,iso3,value,gender,agegroup,method
0,Afghanistan,1980,AF,AFG,0.0,masculino,0-14,esputo pulmonar positivo
1,Afghanistan,1981,AF,AFG,0.0,masculino,0-14,esputo pulmonar positivo
2,Afghanistan,1982,AF,AFG,0.0,masculino,0-14,esputo pulmonar positivo
3,Afghanistan,1983,AF,AFG,0.0,masculino,0-14,esputo pulmonar positivo
4,Afghanistan,1984,AF,AFG,0.0,masculino,0-14,esputo pulmonar positivo


In [None]:
who2 = who2.rename(columns={"value":"cases"})

In [None]:
who2.head()

Unnamed: 0,country,year,iso2,iso3,cases,gender,agegroup,method
0,Afghanistan,1980,AF,AFG,0.0,masculino,0-14,esputo pulmonar positivo
1,Afghanistan,1981,AF,AFG,0.0,masculino,0-14,esputo pulmonar positivo
2,Afghanistan,1982,AF,AFG,0.0,masculino,0-14,esputo pulmonar positivo
3,Afghanistan,1983,AF,AFG,0.0,masculino,0-14,esputo pulmonar positivo
4,Afghanistan,1984,AF,AFG,0.0,masculino,0-14,esputo pulmonar positivo


In [None]:
# vamos a realizar un join con population
df = pd.merge(left=who2, right=population, how="inner", on=["country", "year"])
df.head()

Unnamed: 0,country,year,iso2,iso3,cases,gender,agegroup,method,population
0,Afghanistan,1995,AF,AFG,0.0,masculino,0-14,esputo pulmonar positivo,17586073
1,Afghanistan,1996,AF,AFG,0.0,masculino,0-14,esputo pulmonar positivo,18415307
2,Afghanistan,1997,AF,AFG,0.0,masculino,0-14,esputo pulmonar positivo,19021226
3,Afghanistan,1998,AF,AFG,30.0,masculino,0-14,esputo pulmonar positivo,19496836
4,Afghanistan,1999,AF,AFG,8.0,masculino,0-14,esputo pulmonar positivo,19987071


In [None]:
df.to_csv("tuberculosis.csv", index=False)