# Are varicella vaccines effective?

In this notebook I analyze data that the CDC acquired with the National Inmunization Surveys (https://www.cdc.gov/vaccines/imz-managers/nis/datasets.html#2017), specifically the most recent available data, the 2019 dataset.

After I ran in **R** the input statements provided in the same site, I was able to obtain the `.csv` file I'm going to use.

### 1. The libraries needed are imported as well as the dataset

In [23]:
import numpy as np
import pandas as pd
import scipy.stats as stats
import matplotlib.pyplot as plt

NISdata = pd.read_csv('NISPUF19.csv')

In [10]:
NISdata.head()

Unnamed: 0.1,Unnamed: 0,SEQNUMC,SEQNUMHH,PDAT,PROVWT_C,PROVWT_C_TERR,RDDWT_C,RDDWT_C_TERR,STRATUM,YEAR,...,XVRCTY2,XVRCTY3,XVRCTY4,XVRCTY5,XVRCTY6,XVRCTY7,XVRCTY8,XVRCTY9,INS_STAT2_I,INS_BREAK_I
0,1,152651,15265,1,503.793619,503.793619,106.653676,106.653676,2052,2019,...,,,,,,,,,2.0,2.0
1,2,12631,1263,2,,,291.636945,291.636945,2038,2019,...,,,,,,,,,,
2,3,261061,26106,1,66.147188,66.147188,66.147188,66.147188,2044,2019,...,,,,,,,,,2.0,2.0
3,4,102261,10226,2,,,1104.531434,1104.531434,2068,2019,...,,,,,,,,,,
4,5,269631,26963,2,,,1301.585774,1301.585774,2022,2019,...,,,,,,,,,,


### 2. The data documentation is examined to understand the data

We will be interested in the information about vaccines doses and if a child had the disease or not. In the Data user's guide we find information about the variables and their categories.


- HAD_CPOX - did child ever have chicken pox
    - yes, no, don't know, refused, missing
- P_NUMVRC - total number of varicella doses

### 3. The values taken by the columns of interest are examined

In [18]:
NISdata['HAD_CPOX'].unique() # 1 = Yes, 2 = No

array([ 2,  1, 77, 99], dtype=int64)

In [19]:
NISdata['P_NUMVRC'].unique()

array([ 1., nan,  0.,  2.,  3.])

### 4. Data is cleaned by droping nan values and input errors

In [20]:
CPOXandVacc = NISdata[['HAD_CPOX','P_NUMVRC']].dropna()
CPOXandVacc = CPOXandVacc[ (CPOXandVacc['HAD_CPOX'] != 77) & (CPOXandVacc['HAD_CPOX'] != 99) ]

### 5. Correlation and p-value calculated with scipy libray

In [21]:
corr, pval=stats.pearsonr( CPOXandVacc['HAD_CPOX'], CPOXandVacc['P_NUMVRC'] )

In [22]:
(corr, pval)

(0.07892422900142637, 2.9395181283119037e-24)

The positive correlation value signifies that an increase in 'HAD_POX' leads to an increase on 'P_NUMVRC', which is to say, not having varicella is correlated to being vaccinated. And a small p-value (smaller than e-18) denotes that it is highly unlikely to occur by chance.