Planned Parenthood is a service available in the United States that provides sexual and women's health services, including birth control, STI testing, HPV vaccinations, and abortion services. Though there is contraversy over some of the services offered and whether Planned Parenthood should continue to be available. In this project, I explore how the availability of Planned Parenthood services impacts rates of sexual and women's health issues.

Population and square milage data for this data are from the 2016 US census and 2000 geography US census, respectively. The number of Planned Parenthood locations is from the Planned Parenthood website. Teenage birth rates, STI rates, and cervical cancer rates are pulled from the CDC.

These rates were selected for comparison because they are issues that can be assuaged with the services provided by Planned Parenthood. Teenage birth rates can be lowered by access to birth control, cervical cancer can be prevented by HPV vaccinations, and chlamydia and gonorrhea are both curable STIs, so treating them early can prevent their spread.

Two states do not have any Parenthood locations: Wyoming and North Dakota. For the sake of data analysis, these states were removed from the data.

In [2]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import pandas as pd

The data described was cleaned in Excel and compiled into one spreadsheet. Teen birth rate is given as number of births per 1000 females age 15-19, cervical cancer rate is cases per 100,000 women, and chlamydia and gonorrhea rates are cases per 100,000 people.

In [30]:
data = pd.read_csv('honorsopdata.csv')
data.head()

Unnamed: 0,State,Num Locations,Population,Area (sq mi),pop per sq mile,Pop per Location,loc per sq mi,teen birth rate,cervical cancer rate,chlamydia rate,gonorrhea rate
0,Alaska,5,737438,570641,1.292298,147487.6,9e-06,22.0,8.2,799.8,295.1
1,Alabama,2,4887871,50645,96.51241,2443936.0,3.9e-05,27.0,9.4,615.5,245.7
2,Arkansas,2,3013825,52035,57.919189,1506912.0,3.8e-05,32.8,10.4,579.6,224.5
3,Arizona,7,7171646,113594,63.134021,1024521.0,6.2e-05,22.0,7.3,571.3,180.4
4,California,108,39559045,155779,253.943375,366287.5,0.000693,15.1,7.1,557.4,192.0


In [31]:
data.describe()

Unnamed: 0,Num Locations,Population,Area (sq mi),pop per sq mile,Pop per Location,loc per sq mi,teen birth rate,cervical cancer rate,chlamydia rate,gonorrhea rate
count,48.0,48.0,48.0,48.0,48.0,48.0,48.0,48.0,48.0,48.0
mean,12.791667,6768798.0,70118.958333,209.885636,914526.1,0.000426,19.170833,7.435417,507.575,158.983333
std,18.158825,7426359.0,87535.844125,268.080975,759449.9,0.000682,6.417926,1.413535,112.106108,64.589905
min,1.0,626299.0,1033.0,1.292298,52191.58,9e-06,8.1,3.0,226.1,32.5
25%,2.75,2053888.0,34575.5,56.116699,347631.0,4.5e-05,14.8,6.725,442.025,116.975
50%,5.0,4773924.0,52829.5,108.866941,628074.6,0.000146,18.55,7.35,506.65,157.45
75%,18.0,7781114.0,80159.0,223.268075,1192693.0,0.000512,22.525,8.225,567.55,195.6
max,108.0,39559040.0,570641.0,1228.293854,2986530.0,0.003304,32.8,10.4,799.8,309.8


In the following cells, I will use scipy.optimize.curve_fit to try a few different fits on the data, to see which sets are correlated and how they are correlated. I will look at linear, quadratic, and exponential fits for the data, and compare the Planned Parenthood data with the sexual/women's health data.

In [22]:
from scipy.optimize import curve_fit
pp_dict = {0 : data['Num Locations'], 1 : data['Pop per Location'], 2 : data['loc per sq mi']}
rates_dict = {0 : data['teen birth rate'], 1 : data['cervical cancer rate'], 2 : data['chlamydia rate'], 3 : data['gonorrhea rate']}

In [23]:
def lin(x, a, b):
    return a*x + b
def quad(x, a, b, c):
    return a*(x**2) + b*x + c
def exp(x, a, b, c):
    return a*np.exp(b*x) + c

In [24]:
results_dict = {0 : 0, 1 : 0, 2 : 0}

In [27]:
for i in range(len(pp_dict)):
    new_array = []
    for j in range(len(rates_dict)):
        x = pp_dict[i]
        y = rates_dict[j]
        lin_arr = []
        popt_lin, pcov_lin = curve_fit(lin, x, y)
        lin_arr.append(popt_lin)
        lin_arr.append(pcov_lin)
        
        quad_arr = []
        popt_quad, pcov_quad = curve_fit(quad, x, y)
        quad_arr.append(popt_quad)
        quad_arr.append(pcov_quad)
        
        exp_arr = []
        popt_exp, pcov_exp = curve_fit(exp, x, y)
        exp_arr.append(popt_exp)
        exp_arr.append(pcov_exp)
        
        arr = [lin_arr, quad_arr, exp_arr]
        new_array.append(arr)
        
    results_dict[i] = np.array(new_array)

We can visualize our results with matplotlib.

In [None]:
genx0 = np.linspace(0,108,100)
genx1 = np.linspace(5.25e4,3e6,1000)
genx2 = np.linspace(.00001,.0035,100)