# Banking Deserts
---
The below script uncovers the well-known phenomenon of [Banking Deserts](https://en.wikipedia.org/wiki/Banking_desert). The concept is simple: many neighborhoods with predominantly low-income and elderly populations tend to have inadequate coverage of banking services. This leads such communities to be  vulnerable to predatory loan and pricey check casher providers.

In this script, we retrieved and plotted data from the 2013 US Census and Google Places API to show the relationship between various socioeconomic parameters and bank count across 700 randomly selected zip codes. We used Pandas, Numpy, Matplotlib, Requests, Census API, and Google API to accomplish our task.

In [22]:
# Dependencies
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import requests
import time
import json

# Google Places API Key
gkey = "YOUR-CODE-HERE"

## Data Retrieval

In [5]:
# Import the census data into a pandas Data Frame
csv_path = "Census_Data.csv"
data_df = pd.read_csv(csv_path)
# we may want to reconsider changing the index to Zipcode; so commented out the code for now
# data_df.set_index('Zipcode', inplace=True)

# Preview the data
data_df.head()

Unnamed: 0,Zipcode,Address,Population,Median Age,Household Income,Per Capita Income,Poverty Rate
0,15081,"South Heights, PA 15081, USA",342,50.2,31500.0,22177,20.760234
1,20615,"Broomes Island, MD 20615, USA",424,43.4,114375.0,43920,5.188679
2,50201,"Nevada, IA 50201, USA",8139,40.4,56619.0,28908,7.777368
3,84020,"Draper, UT 84020, USA",42751,30.4,89922.0,33164,4.39288
4,39097,"Louise, MS 39097, USA",495,58.0,26838.0,17399,34.949495


In [6]:
# Check to see which columns are numeric; it appears all columns contain strings
data_df.describe

<bound method NDFrame.describe of      Zipcode                        Address  Population  Median Age  \
0      15081   South Heights, PA 15081, USA         342        50.2   
1      20615  Broomes Island, MD 20615, USA         424        43.4   
2      50201          Nevada, IA 50201, USA        8139        40.4   
3      84020          Draper, UT 84020, USA       42751        30.4   
4      39097          Louise, MS 39097, USA         495        58.0   
5      72315     Blytheville, AR 72315, USA       24120        34.4   
6       5454         Fairfax, VT 05454, USA        4754        38.7   
7      53816         Mt Hope, WI 53816, USA         942        33.4   
8       5483         Sheldon, VT 05483, USA        1481        35.9   
9      37345        Huntland, TN 37345, USA        2155        46.0   
10      1922         Byfield, MA 01922, USA        3025        44.4   
11     95237       Lockeford, CA 95237, USA        3269        38.7   
12     12429          Esopus, NY 12429, USA

In [16]:
# Abdul owns this part:
#   The deliverable is reduced data_df data frame with 700 randomly selected cities with population >100
#   Two columns are added for each city: lat and lng which respectively hold lattitude and longitude info for each city

# Randomly select 700 zip codes locations that have at least 100 residents
# Hint: `pd.sample()`
# Hint: `pd[pd[astype(int) > 100`]]`
# Create blank columns in DataFrame for lat/lng

# Loop through and grab the lat/lng for each of the selected zips using Google maps
# Inside the loop add the lat/lng to our DataFrame
# Note: Be sure to use try/except to handle cities with missing data

#  I added two empty columns called lat and lng to illustratet the end result
#  Abdul will need to populate these columns with real data
lat = []
lng = []
for i in range(len(data_df['Zipcode'])):
    lat.append(0)
    lng.append(0)
data_df ['Lat'] = ''
data_df ['Lng'] = ''
# Visualize the DataFrame
data_df.head()

Unnamed: 0,Zipcode,Address,Population,Median Age,Household Income,Per Capita Income,Poverty Rate,Lat,Lng,Num_Banks
0,15081,"South Heights, PA 15081, USA",342,50.2,31500.0,22177,20.760234,,,
1,20615,"Broomes Island, MD 20615, USA",424,43.4,114375.0,43920,5.188679,,,
2,50201,"Nevada, IA 50201, USA",8139,40.4,56619.0,28908,7.777368,,,
3,84020,"Draper, UT 84020, USA",42751,30.4,89922.0,33164,4.39288,,,
4,39097,"Louise, MS 39097, USA",495,58.0,26838.0,17399,34.949495,,,


In [21]:
# Alex owns this part
# Create an empty column for bank count
data_df ['Num_Banks'] = ''

# I created a small sample data frame with the first two rows of data_df and manually filled in the lat/lng columns
# with coordinates for Boise nad New York
# This approach generates warnings that I think can be ignored since I am trying to demonstrate 
sample_df = data_df[:2]
sample_df['Lat'][0] = 43.6187102
sample_df['Lng'][0] = -116.2146068
sample_df['Lat'][1] = 40.7128
sample_df['Lng'][1] = -74.0059

# Visualize the DataFrame
sample_df.head()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  exec(code_obj, self.user_global_ns, self.user_ns)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if __name__ == '__main__':
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  # Remove the CWD from sys.path while we load stuff.
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: h

Unnamed: 0,Zipcode,Address,Population,Median Age,Household Income,Per Capita Income,Poverty Rate,Lat,Lng,Num_Banks
0,15081,"South Heights, PA 15081, USA",342,50.2,31500.0,22177,20.760234,43.6187,-116.215,
1,20615,"Broomes Island, MD 20615, USA",424,43.4,114375.0,43920,5.188679,40.7128,-74.0059,


In [28]:
gkey = "AIzaSyDIf2LJb8kF7_7ZGHDHfsS7yoZzdwp-EqA"

# Target city
# Boise, Idaho {"lat": 43.6187102, "lng": -116.2146068}
# New York, NY {"lat": 40.7128, "lng": -74.0059}
#target_city = {"lat": 43.6187102, "lng": -116.2146068}

# Build the endpoint URL (Checks all ice cream shops)
for i in range(len(sample_df)):
    target_url = "https://maps.googleapis.com/maps/api/place/radarsearch/json" \
                 "?location=%s,%s&radius=8000&type=bank&key=%s" % (sample_df['Lat'][i], sample_df['Lng'][i], gkey)
    bank_data = requests.get(target_url).json()
    sample_df ['Num_Banks'][i] = len(bank_data['results'])

sample_df.head()
#print(len(bank_data["results"]))

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if sys.path[0] == '':
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  exec(code_obj, self.user_global_ns, self.user_ns)


Unnamed: 0,Zipcode,Address,Population,Median Age,Household Income,Per Capita Income,Poverty Rate,Lat,Lng,Num_Banks
0,15081,"South Heights, PA 15081, USA",342,50.2,31500.0,22177,20.760234,43.6187,-116.215,109
1,20615,"Broomes Island, MD 20615, USA",424,43.4,114375.0,43920,5.188679,40.7128,-74.0059,199


## Save Graphs

In [None]:
# Beatriz owns this part and will plot various graphs using data in data_df
# Save the DataFrame as a csv


## Plot Graphs

In [None]:
# Build a scatter plot for each data type 


In [None]:
# Build a scatter plot for each data type
plt.scatter(selected_zips["Bank Count"], 
            selected_zips["Median Age"],
            edgecolor="black", linewidths=1, marker="o", 
            alpha=0.8, label="Zip Codes")

# Incorporate the other graph properties
plt.title("Median Age vs. Bank Count by Zip Code")
plt.ylabel("Median Age")
plt.xlabel("Bank Count")
plt.grid(True)
plt.xlim([-2.5, 202])

# Save the figure
plt.savefig("output_analysis/Age_BankCount.png")

# Show plot
plt.show()

In [None]:
# Build a scatter plot for each data type
plt.scatter(selected_zips["Bank Count"], 
            selected_zips["Household Income"],
            edgecolor="black", linewidths=1, marker="o", 
            alpha=0.8, label="Zip Codes")

# Incorporate the other graph properties
plt.title("Household Income vs. Bank Count by Zip Code")
plt.ylabel("Household Income ($)")
plt.xlabel("Bank Count")
plt.grid(True)
plt.xlim([-2.5, 202])
plt.ylim([-2.5, 230000])

# Save the figure
plt.savefig("output_analysis/HouseholdIncome_BankCount.png")

# Show plot
plt.show()

In [None]:
# Build a scatter plot for each data type
plt.scatter(selected_zips["Bank Count"], 
            selected_zips["Per Capita Income"],
            edgecolor="black", linewidths=1, marker="o", 
            alpha=0.8, label="Zip Codes")

# Incorporate the other graph properties
plt.title("Per Capita Income vs. Bank Count by Zip Code")
plt.ylabel("Per Capita Income (%)")
plt.xlabel("Bank Count")
plt.grid(True)
plt.xlim([-2.5, 202])
plt.ylim([0, 165000])

# Save the figure
plt.savefig("output_analysis/PerCapitaIncome_BankCount.png")

# Show plot
plt.show()

In [None]:
# Build a scatter plot for each data type
plt.scatter(selected_zips["Bank Count"], 
            selected_zips["Poverty Rate"],
            edgecolor="black", linewidths=1, marker="o", 
            alpha=0.8, label="Zip Codes")

# Incorporate the other graph properties
plt.title("Poverty Rate vs. Bank Count by Zip Code")
plt.ylabel("Poverty Rate (%)")
plt.xlabel("Bank Count")
plt.grid(True)
plt.xlim([-2.5, 202])
plt.ylim([-2.5, 102])

# Save the figure
plt.savefig("output_analysis/PovertyRate_BankCount.png")

# Show plot
plt.show()