# My final project: HPV vaccination rate in Missouri
I'm using data from the [CDC's website](https://www.cdc.gov/vaccines/imz-managers/coverage/teenvaxview/data-reports/hpv/reports/2018.html) to examine and analyze the rate of boys and girls ages 13 to 17 that have gotten the HPV vaccination rate over the last 10 years or so.

For my project, I downloaded datasets from the CDC for each year from 2008 to 2018, which broke down the vaccination rate based on a variety of factors, including gender, age, race, etc. Each year's dataset was extremely widespread and organized the rates by state, and even some areas within a state, but I didn't need such a big set for what I was looking into in this project, especially when comparing year to year. I decided to manually copy and paste the information that I needed (overall rate based on age and gender for all U.S. states between 2008 ans 2010) into my own three csv files.

rate_by_year.csv = the years that I'm looking at (2008-2018), along with the HPV vaccination rate for girls age 13-17 and the rate for boys age 13-17, both in Missouri and the overall U.S. rate.

female_state.csv = the rate of vaccination for girls age 13-17 in each state for each year (plus the U.S. average) from 2008 to 2018. This dataset also included the 2018 combined rate for both boys and girls age 13-17 (which I may need to pull into it's own sheet so it doesn't mess up the analysis)

male_state.csv = the rate of vaccination for boys age 13-17 in each state for each year (plus the U.S. average) from 2008 to 2018. It seems like the CDC didn't collect information for boys' vaccinations until 2011, and didn't do so consistently until 2013. There isn't any info about the Missouri boys' vaccination rate until 2013.


## My plan
* Read in all three csv files
* Determine how boys and girls rate has changed between 2008 and 2018
* Use Altair to visualize change between 2008 and 2018


In [60]:
import pandas as pd
import altair as alt

In [61]:
year = pd.read_csv("rate_by_year.csv").fillna("")

In [62]:
year.head()

Unnamed: 0,year,missouri_rate_girls,missouri_rate_boys,US_rate_girls,US_rate_boys
0,2008,31.6,,37.2,
1,2009,32.7,,44.3,
2,2010,41.4,,48.7,1.4
3,2011,49.5,,53.0,8.3
4,2012,51.6,,53.8,45.1


In [63]:
females = pd.read_csv("female_state.csv").fillna("")

In [64]:
females.head()

Unnamed: 0,StateName,female_2008,female_2009,female_2010,female_2011,female_2012,female_2013,female_2014,female_2015,female_2016,female_2017,female_2018
0,Alabama,32.8,49.4,45.8,49.5,46.6,54.7,54.7,57.7,54.2,61.1,69.1
1,Alaska,38.8,40.8,40.8,59.5,56.1,52.2,48.7,57.0,61.9,61.1,65.7
2,Arizona,50.5,52.8,52.8,55.3,54.3,64.1,58.2,68.3,65.4,68.9,65.3
3,Arkansas,22.4,34.6,37.9,36.1,41.2,44.3,54.6,63.5,53.3,66.7,65.9
4,California,46.6,49.2,56.1,65.0,65.0,67.6,69.2,66.7,78.0,75.6,68.4


In [65]:
males = pd.read_csv("male_state.csv").fillna("")

In [66]:
males.head()

Unnamed: 0,Names,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018
0,Alabama,,,,,,18.4,27.6,39.4,49.2,55.1,60.4
1,Alaska,,,,,,27.6,37.9,41.6,60.3,67.6,67.1
2,Arizona,,,,8.4,,44.4,40.6,51.3,60.9,61.4,69.1
3,Arkansas,,,,,,17.7,35.1,44.2,55.4,55.7,56.0
4,California,,,,13.0,49.5,50.9,52.1,58.5,67.3,68.2,78.4


In [67]:
both2018 = pd.read_csv("both2018.csv").fillna("")

In [68]:
both2018.head()

Unnamed: 0,StateName,2018both
0,Alabama,64.7
1,Alaska,66.4
2,Arizona,67.2
3,Arkansas,60.8
4,California,73.5


In [69]:
both2018.sort_values("2018both", ascending=False)

Unnamed: 0,StateName,2018both
39,Rhode Island,89.3
8,District of Columbia,86.0
21,Massachusetts,85.2
45,Vermont,78.3
29,New Hampshire,77.4
5,Colorado,77.2
34,North Dakota,76.7
23,Minnesota,76.7
11,Hawaii,76.7
27,Nebraska,75.6


In [98]:
alt.Chart(year).mark_bar().encode(
    x = 'year',
    y = alt.Y('missouri_rate_girls:Q', sort="-x"),
).configure_mark(
    opacity=0.7,
    color='red'
).properties(title ='Rate of HPV Vaccination among girls ages 13-17 in MO')

In [97]:
alt.Chart(year).mark_bar().encode(
    x = 'year',
    y = alt.Y('US_rate_girls:Q', sort="-x"),
).configure_mark(
    opacity=0.4,
    color='red'
).properties(title ='Rate of HPV Vaccination among girls ages 13-17 in U.S.')

In [102]:
alt.Chart(year).mark_bar().encode(
    x = 'year',
    y = alt.Y('missouri_rate_boys:Q', sort="-x"),
).configure_mark(
    opacity=0.8
).properties(title ='Rate of HPV Vaccination among boys ages 13-17 in MO')

In [104]:
alt.Chart(year).mark_bar().encode(
    x = 'year',
    y = alt.Y('US_rate_boys:Q', sort="-x"),
).configure_mark(
    opacity=0.4
).properties(title ='Rate of HPV Vaccination among boys ages 13-17 in U.S.')

# Findings
Here are the things that I discovered during my data analysis:
* The state with the highest rate of HPV vaccination in 2018 for both boys and girls ages 13 to 17 was Rhode Island, with a vaccination rate of 89.3%. Missouri was in the bottom 10 — specifically, eighth was from the bottom — for HPV vaccination rate in 2018 for both boys and girls. Missouri's rate was 61.6%.
* The rate of girls ages 13 to 17 getting vaccinated in Missouri has more than doubled from 2008 to 2018. The U.S. rate for girls ages 13 to 17 has seen a similar increase, though a bit more consistent than the Missouri increase. The average vaccination rate for the U.S. has remained slightly higher than the Missouri rate for every year from 2008 to 2018.
* The rate of boys ages 13 to 17 getting vaccinated in Missouri almost tripled between 2013 (when Missouri data started being collected) and 2018. The U.S. rate for boys ages 13 to 17 has seen a similar increase, if not more dramatic, though this could be, in part, due to the inconistencies in nationwide collection before 2013. The average vaccination rate for the U.S. has remained higher than the Missouri rate for every year from 2010 (when the inconsistent nationwide data started being collected) to 2018.

## Issues I ran into:
At first, I had trouble figuring out how to analyze the data that I had, but I was able to explore the csv files a little more and figured out what I needed to find. Because the CDC's spreadsheets were so large and would have been hard to analyze in Jupyter Notebooks, I made my own spreadsheets. I think this made it so there were fewer things I had to analyze within Jupyter, but I tried to do as much as I could. One thing I would like to do, but I wasn't totally sure how to do (even after some searching on Altair's website), was making a national map of the 2018 HPV vaccination rates for both boys and girls. I have the descending order above, so I don't need the map in my analysis since I'm already making the map for my graphics final, but I thought it could be interesting do also make it through Altair.

## Where I go from here:
* Contact MU Health experts to see why these trends may be happening:
    * Mark Hunter - director of gynecologic oncology for MU Health (done)
    * Eric Kimchi - directing Moon Shota program for cancer prevention at MU 
* Contact Dr. Alan Barnette, a Cape Girardeau doctor who reseached HPV vaccination rates in rural Missouri
* Try to find someone that has been affected by HPV or has gotten cancer as a result of HPV
    * This could be hard so, as a backup, I'll try to find someone that has kids of vaccination-age
* Write the story!