# Detroit Michigan - Exploring Interesting Facts

---
### Background
---
As an enthusiastic consumer of podcast journalism, I am pleased to share, my impressions of a **Slate Working Podcast** featuring *Diana Nucera, Director of the Detroit Community Technology Project*.
Her opening statement that 40% of Detroit Michigan residents have no access to internet grabbed my attention and infomred my decision to explore data describing the Detroit, Michigan community. 

Detroit has a rich history in the USA, serving an important and productive nexus of the automobile industry in middle of the 20th century. Once alive with prosperity, innovation and industry, this city on the US - Canadian border, filed for bankruptcy in 2013.

Detroit has new leadership and enthusiastic community organizers such as Diana Nucera tirelessly driving change and breathing life back into this city. Return to the fact that 40% of the city has no internet access . Pause for a moment and imagine it. This fact peaked my interest in the current situation in this city. 

Diana’s story had so many layers of information with intersections including digital literacy, technology and open data. She credits one of the city elders with the guiding principle of “transform yourself and you will transform the world”, a principle she lives and breathes in her daily walk.

Based on her recommendation, I explored Detroit's open data with a focus on how it described the current state of prosperity in the city. This was interesting and informative. I then expanded my search of open data in order to place Detroit's data next to national data, allowing for context. 

Below is a link to the podcast and to other resources provided by Diana. 

https://www.alliedmedia.org/dctp

http://www.slate.com/articles/podcasts/working/2017/07/how_does_detroit_community_technology_project_director_diana_nucera_work.html

Open Technology Institute
https://www.newamerica.org/oti/

---
### Question to explore with the data
---
Can the data show that Detroit is less prosperous than other cities in the United State?

---
### Discussion
---
With Detroit undergoing revitalizaiton after declaring bankruptcy in 2013, we may expect to see markers that show the city is less prosperous than other cities in the nation. There are many possible ways to pull and analyze data to answer this questions. For this story, we look at the percentage of adults aged 18 to 64 with health insurance. Generally speaking, those with health insurance have personal income to buy insurance on their own or have insurance throught their employer. A low percentage of insured would suggest a healthy community economically. 

---
### Findings and Conclusion
---
The mean percentage of those aged 18 to 64 was:
* Mean when grouped by states: 18%
* Mean when grouped by all cities: 20.5%
* Mean for Detroit, Michigan: 26.7*
---
Using percentage of those aged 19 to 64 without health insurance as our measure of prosperity, Detroit is less prosperous than other cities and state as a whole. 





---

### Notebook Setup
---
- Python Libraries: the pandas, numpy and matplotlib libraries are imported.
- Data frames: Two data frames are created. The first is **Health**. This brings in the entire data set including all cities and states in the *500 Cities Project*. Each city has a field called *Data_Value* which shows the percentage of adults aged 18 to 64 without health insurance. The second data frame is **df1**. This groups the data by state and includes the mean value for percentage of adults aged 18 to 64 without health insurance. 
- Visualizations: charts are created using commands from matplotlib
- Other commands include *groupby*, *describe*, *mean* and other commands to describe and analyze the data.

---

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

%matplotlib notebook

In [2]:
plt.style.use('seaborn-colorblind')

---
The analysis is based on a data set from *Data Across Sectors for Health - 500 Cities Project*. This initiative uses open data to describe health and unhealthy communities. They encourage use of the data to improve outcomes for those with Chronic Diseases. Below is a quote from their website and a link to the website.
___

> The data available in the 500 Cities Project can be used to:

- Identify health issues in a city or neighborhood
- Establish key health objectives
- Guide the development and implementation of effective and targeted prevention activities
- Start community conversations about health priorities
- Understand how where we live matters to our health
- Help communities assess what they are doing well and where they can improve

> As part of the Chronic Disease and Open Data initiative, 500 Cities helps the CDC make their data more available to the public. The project offers map books for available cities containing census tract-level data for all 28 BRFSS measures for one city. Starting March 2, 2017 an interactive web application provides instant use of this first-of-its kind data to access small-area estimates for the risk behaviors that cause much illness, suffering, and early death, as well as reveal the conditions and diseases that are the most common, costly, and preventable. The data are available to download as well as visualize using tools provided on the website.

---
http://dashconnect.org/2017/03/02/the-500-cities-project-whats-on-the-horizon/

---

In [3]:
health = pd.read_csv("/Users/Becky/Downloads/500_Cities__Current_lack_of_health_insurance_among_adults_aged_18-64_years.csv")

#### Based on the data frame "health" created above, the groupby command is used to show the mean for each state of adults aged 18 to 64 years old who do not have health insurance. 

In [4]:
health.groupby('StateDesc').Data_Value.mean()

StateDesc
Alabama                 19.251261
Alaska                  15.519298
Arizona                 19.214047
Arkansas                19.525000
California              20.056533
Colorado                16.628371
Connecticut             20.576271
Delaware                19.473077
District of Columbia    11.527778
Florida                 24.265393
Georgia                 24.227468
Hawaii                  12.101271
Idaho                   19.591781
Illinois                19.063285
Indiana                 22.036381
Iowa                    12.891089
Kansas                  20.422903
Kentucky                11.757895
Louisiana               25.578740
Maine                   15.336364
Maryland                16.392040
Massachusetts           12.362941
Michigan                19.709524
Minnesota               12.396605
Mississippi             25.474684
Missouri                18.850339
Montana                 15.757447
Nebraska                16.302294
Nevada                  21.675610
New 

In [9]:
df1 = health.groupby('StateDesc').Data_Value.mean()

In [19]:
df1.plot.bar();

<IPython.core.display.Javascript object>

<div #### group the "health" data frame and calculate the mean percentage of adults aged 18 to 64 years old who do not have health insurance >

In [20]:
health.groupby('CityName').Data_Value.mean()

CityName
Abilene              27.269231
Akron                19.155882
Alameda              11.000000
Albany               23.596364
Albuquerque          17.433813
Alexandria           12.167500
Alhambra             18.790909
Allen                13.684615
Allentown            28.492857
Amarillo             25.973684
Anaheim              22.953247
Anchorage            15.519298
Ann Arbor            10.542857
Antioch              17.904545
Apple Valley         16.842857
Appleton             10.403846
Arlington            25.480000
Arlington Heights     8.545833
Arvada               11.737143
Asheville            17.841935
Athens               25.575000
Atlanta              23.489394
Auburn               14.947619
Augusta              26.777551
Aurora               19.309924
Austin               22.466000
Avondale             23.365217
Bakersfield          22.604938
Baldwin Park         31.925000
Baltimore            16.392040
                       ...    
Victorville          23.519048

In [21]:
health[health.CityName=='Detroit'].Data_Value.mean()

26.720068027210882

In [22]:
health.Data_Value.mean()

20.51446552029777

In [24]:
df1.describe()

count    52.000000
mean     18.126813
std       4.408064
min      11.392308
25%      14.662815
50%      18.708436
75%      20.448495
max      29.923437
Name: Data_Value, dtype: float64