In [3]:
import pandas as pd

In this project, I want to explore the effectiveness of Wicked Free Wi-Fi, Boston's outdoor wireless network. One goal of this network is to make travel more convenient, allowing tourists and Bostonians alike to search for places to eat and shop. Another goal of Wicked Free Wi-Fi, the goal that interests me, is to give Bostonians of all income levels the same ability to access the digital world. 

I will use data published by the City of Boston to map the connections to Wicked Free Wi-Fi by neighborhood, and then I will compare the map to one of the many existing City of Boston income maps. (I ended up using this one: https://www.bostonglobe.com/business/2015/07/12/mapping-boston-disparity/vzzB9jBaNtnrpQ9XL9NMhI/story.html)

Below are the sources I used to learn about Wicked Free Wi-Fi:

https://www.boston.com/news/local-news/2014/04/10/wicked-wifi-goes-live-in-boston

https://www.boston.gov/departments/innovation-and-technology/wicked-free-wi-fi

This is the link to the dataset, which catalogs daily connections in several neighborhoods between March 2014 and May 2015:

https://data.boston.gov/dataset/wicked-free-wi-fi-daily-connections

In [4]:
df = pd.read_csv('wifi.csv')
df

Unnamed: 0,Neighborhood,Date,Connections
0,Charlestown,05/10/2014 12:00:00 AM,18
1,Charlestown,05/11/2014 12:00:00 AM,16
2,Charlestown,05/12/2014 12:00:00 AM,30
3,Charlestown,05/13/2014 12:00:00 AM,21
4,Charlestown,05/14/2014 12:00:00 AM,20
5,Charlestown,05/15/2014 12:00:00 AM,26
6,Charlestown,05/16/2014 12:00:00 AM,31
7,Charlestown,05/17/2014 12:00:00 AM,20
8,Charlestown,05/18/2014 12:00:00 AM,23
9,Charlestown,05/19/2014 12:00:00 AM,27


Some little measures for convenience:

In [None]:
df.index.name = 'Key'
df['Date'] = df['Date'].apply(lambda x: str(x).replace(' 12:00:00 AM', ''))

I want to total up all the connections in each neighborhood, so it is helpful to have a list of all the neighborhoods so I can pick all the rows corresponding to each neighborhood out of the Dataframe.

In [5]:
neighborhoods = list(df.Neighborhood.unique())
neighborhoods

['Charlestown',
 'City Hall Truck',
 'Roxbury',
 'Roslindale',
 'Downtown Boston',
 'Parks',
 'Dorchester',
 'East Boston',
 'South Boston',
 'Hyde Park',
 'Allston Brighton',
 'Columbus Park']

These don't match with the neighborhoods in my JSON file (I printed these to the console and copy-pasted them here): 


Roslindale, Jamaica Plain, Mission Hill, Longwood Medical Area, Bay Village, Leather District, Chinatown, North End, Roxbury, South End, Back Bay, East Boston, Charlestown, West End, Beacon Hill, Downtown, Fenway, Brighton, West Roxbury, Hyde Park, Mattapan, Dorchester, South Boston Waterfront, South Boston, Allston, Harbor Islands

Contained Directly in Both: Roslindale, Roxbury, East Boston, Charlestown, Hyde Park, Dorchester, South Boston

I used Google Maps and Wikipedia to figure out how differently named areas in the CSV and JSON files might correspond with each other.

Conversions (CSV to JSON):
 - Downtown Boston - Downtown, Bay Village, Leather District, Beacon Hill, Chinatown, North End, West End
 - Allston Brighton - Allston, Brighton
 - South Boston - South Boston, South End
 - Columbus Park - South Boston Waterfront

That left me with these areas in the JSON that were not accounted for in the CSV: Jamaica Plain, Mission Hill, Longwood Medical Area, Fenway, West Roxbury, Mattapan, Back Bay

There were also a couple of "areas" in the CSV that were not appropriate for map display, especially of residential areas: connections to the cCity Hall Truck wifi and all connections in all parks grouped together.

So from the `neighborhoods` array, we can remove City Hall Truck and Parks

In [6]:
neighborhoods.remove('City Hall Truck')
neighborhoods.remove('Parks')

Then, we can iterate through the list of neighborhoods and the dataframe generated from the CSV to total up all the connections in each of the neighborhoods.

In [7]:
connect_per = []
for item in neighborhoods:
    neighborhood_df = df.loc[df['Neighborhood'] == item]
    connections = 0
    for index, row in neighborhood_df.iterrows():
        connections += row['Connections']
    connect_per.append({'area': item, 'total': connections})
connect_per

[{'area': 'Charlestown', 'total': 16583},
 {'area': 'Roxbury', 'total': 1167198},
 {'area': 'Roslindale', 'total': 271516},
 {'area': 'Downtown Boston', 'total': 512062},
 {'area': 'Dorchester', 'total': 245449},
 {'area': 'East Boston', 'total': 79043},
 {'area': 'South Boston', 'total': 146579},
 {'area': 'Hyde Park', 'total': 187495},
 {'area': 'Allston Brighton', 'total': 80447},
 {'area': 'Columbus Park', 'total': 4459}]

Now, I am thinking about how I want the viewer of my visualization to split the map up mentally when they see it.

In [8]:
connect_per = sorted(connect_per, key=lambda k: k['total']) 
connect_per

[{'area': 'Columbus Park', 'total': 4459},
 {'area': 'Charlestown', 'total': 16583},
 {'area': 'East Boston', 'total': 79043},
 {'area': 'Allston Brighton', 'total': 80447},
 {'area': 'South Boston', 'total': 146579},
 {'area': 'Hyde Park', 'total': 187495},
 {'area': 'Dorchester', 'total': 245449},
 {'area': 'Roslindale', 'total': 271516},
 {'area': 'Downtown Boston', 'total': 512062},
 {'area': 'Roxbury', 'total': 1167198}]

Just looking at the numbers, I want to break the data into the following groups.

Columbus Park: 4459

Charlestown: 16583

East Boston: 79043
Allston Brighton: 80447

South Boston: 146579
Hyde Park: 187495

Dorchester: 245449
Roslindale: 271516

Downtown Boston: 512062

Roxbury: 1167198

I will keep this in mind when I am working with colors and values in my visualization.

In [None]:
# reading out the data for use in the visualization
with open('totals.csv', 'w') as f:
    f.write('area,total \n')
    for item in connect_per:
        f.write(item['area'] + "," + str(item['total']) + "\n")
    f.close()