# Analysis of Child Care Services in the Neighborhoods of Edmonton, AB

## Introduction

In this exercise, I will be trying to analyze different neighborhoods in Edmonton city to identify which areas are underserved when it comes to child care. This can be an insight for anyone who is interested in starting a Child care business in that neighborhood. It also helps city counsels to identify areas where more investment in this category is needed.

## Table of Contents

1. <a href="#item1">Dataset description and retrieval</a>
2. <a href="#item2">Explore Edmonton Neighborhoods Childcare Venues</a>  
3. <a href="#item3">Rate Neighborhoods based on the children population</a>  
4. <a href="#item4">Conclusion</a>  

### Import Libraries

In [83]:
import numpy as np # library to handle data in a vectorized manner
import pandas as pd # library for data analsysis
import json # library to handle JSON files
#!conda install -c conda-forge geopy --yes
#!conda install -c conda-forge folium=0.5.0 --yes
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

print('Libraries imported.')

Libraries imported.


### Data Retrieval and Processing

In [2]:
#Read Edmonton Neighborhoods Spatial data --- Neighborhood, Longitude, Latitude
edm_lon_lat_neigh = pd.read_csv("Edmonton_Neighbourhoods.csv")
edm_lon_lat_neigh = edm_lon_lat_neigh.groupby('NEIGHBORHOOD_NAME').mean().reset_index()
edm_lon_lat_neigh = edm_lon_lat_neigh [['NEIGHBORHOOD_NAME', 'LONGITUDE','LATITUDE']]
edm_lon_lat_neigh.reset_index(drop=True, inplace=True)
edm_lon_lat_neigh.head(5)

Unnamed: 0,NEIGHBORHOOD_NAME,LONGITUDE,LATITUDE
0,ABBOTTSFIELD,-113.390342,53.574269
1,ALBANY,-113.552974,53.632235
2,ALBERTA AVENUE,-113.48594,53.568262
3,ALBERTA PARK INDUSTRIAL,-113.598152,53.566517
4,ALDERGROVE,-113.641484,53.518399


In [3]:
#Read Edmonton Neighborhoods Census/Population data --- Neighborhood, Population AgeGroups(0 to 4), (5 to 9), (10 to 14) 
edm_popul = pd.read_csv("Edmonton_Population_by_Age_Neighbourhood_Ward.csv")
edm_popul = edm_popul[['Neighbourhood Name', '0 - 4','5 - 9', '10 - 14']]

#Rename Columns
edm_popul.columns=['NEIGHBORHOOD_NAME','AGE_0_4', 'AGE_5_9', 'AGE_10_14']
edm_popul.sort_values(by='NEIGHBORHOOD_NAME', ascending=True, inplace=True)

#Drop neighborhoods with 0 population
edm_popul.drop(edm_popul.loc[(edm_popul['AGE_0_4']==0) & (edm_popul['AGE_5_9']==0) & (edm_popul['AGE_10_14']==0)].index, inplace=True)

edm_popul.reset_index(drop=True, inplace=True)
edm_popul.head(5)

Unnamed: 0,NEIGHBORHOOD_NAME,AGE_0_4,AGE_5_9,AGE_10_14
0,ABBOTTSFIELD,184,178,136
1,ALBANY,101,54,44
2,ALBERTA AVENUE,256,251,183
3,ALDERGROVE,269,253,192
4,ALLARD,213,148,97


#### Prepare Edmonton Full Dataset - Neighborhood, Long, Lat, AgeGroups

In [4]:
#join the two above datasets
edm_dataset = pd.concat([edm_lon_lat_neigh, edm_popul], axis=1, join='inner')
edm_dataset = edm_dataset.loc[:,~edm_dataset.columns.duplicated()]
edm_dataset.head(5)

Unnamed: 0,NEIGHBORHOOD_NAME,LONGITUDE,LATITUDE,AGE_0_4,AGE_5_9,AGE_10_14
0,ABBOTTSFIELD,-113.390342,53.574269,184,178,136
1,ALBANY,-113.552974,53.632235,101,54,44
2,ALBERTA AVENUE,-113.48594,53.568262,256,251,183
3,ALBERTA PARK INDUSTRIAL,-113.598152,53.566517,269,253,192
4,ALDERGROVE,-113.641484,53.518399,213,148,97


## Create a map of Edmonton Neighborhoods with Population Age Category (0 to 4)

In [5]:
address = 'Edmonton, AB'

geolocator = Nominatim(user_agent="edm_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Edmonton, AB City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Edmonton, AB City are 53.535411, -113.507996.


In [81]:
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
import folium # map rendering library

# create map of Edmonton using latitude and longitude values
map_edm = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, popul_0_4, neighborhood in zip(edm_dataset['LATITUDE'], edm_dataset['LONGITUDE'], edm_dataset['AGE_0_4'], edm_dataset['NEIGHBORHOOD_NAME']):
    label = '{}, {}'.format(neighborhood, popul_0_4)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=popul_0_4/20
        ,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_edm)
map_edm

## After Analyzing Neighborhood/Population data, let's analyze Neighborhood/Venues data

#### Using the function that gets venue data for different neighbourhoods

In [7]:
import requests # library to handle requests
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    #nearby_venues.head(5)
    return(nearby_venues)

In [9]:
#Initialize Patameter for Foursquare
CLIENT_ID = 'AUPCN0SPKPWBDUKE5XOZ2Y2A0VW3OHQQBJKLG35G0UFBMXR2' # your Foursquare ID
CLIENT_SECRET = 'WVAOJGQQOVLECE05OXJTFYMC3K34NNIETWBTZTOLGLUBXSMX' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100
radius = 500

#Get the Venues data for Neighbourhoods in Toronto
edm_venues = getNearbyVenues(names=edm_dataset['NEIGHBORHOOD_NAME'],
                                   latitudes=edm_dataset['LATITUDE'],
                                   longitudes=edm_dataset['LONGITUDE']
                                  )

In [10]:
edm_neighbourhood_data = pd.get_dummies(edm_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
edm_neighbourhood_data['NEIGHBORHOOD_NAME'] = edm_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [edm_neighbourhood_data.columns[-1]] + list(edm_neighbourhood_data.columns[:-1])
edm_neighbourhood_data = edm_neighbourhood_data[fixed_columns]


### Group by Neighborhood & Focus on Schools/Childcare

### Please note that Foursquare maps does not have any daycare data for Edmonton. It only has two points for the School category

In [13]:
edm_neighbourhood_data = edm_neighbourhood_data.groupby('NEIGHBORHOOD_NAME').sum().reset_index()
edm_neighbourhood_data = edm_neighbourhood_data [['NEIGHBORHOOD_NAME', 'School']]
edm_neighbourhood_data.head()

Unnamed: 0,NEIGHBORHOOD_NAME,School
0,ABBOTTSFIELD,0
1,ALBANY,0
2,ALBERTA AVENUE,0
3,ALBERTA PARK INDUSTRIAL,0
4,ALDERGROVE,0


### Prepare the Neighborhoods data with a "SCHOOL CATEGORY SCORE"
The School category score reflects how many school venues are in the neighborhood. Please note that the initial intent was to find "Childcare" Venue Category but Foursquare API did not return any values for Edmonton, AB

In [16]:
#Get the Neighborhoods data with Long/Lat
edm_neighbourhood_data = pd.concat([edm_lon_lat_neigh, edm_neighbourhood_data], axis=1, join='inner')
edm_neighbourhood_data = edm_neighbourhood_data.loc[:,~edm_neighbourhood_data.columns.duplicated()]
edm_neighbourhood_data.columns=['NEIGHBORHOOD_NAME', 'LONGITUDE', 'LATITUDE', 'SCHOOL_CAT_SCORE']
edm_neighbourhood_data.head(5)

Unnamed: 0,NEIGHBORHOOD_NAME,LONGITUDE,LATITUDE,SCHOOL_CAT_SCORE
0,ABBOTTSFIELD,-113.390342,53.574269,0
1,ALBANY,-113.552974,53.632235,0
2,ALBERTA AVENUE,-113.48594,53.568262,0
3,ALBERTA PARK INDUSTRIAL,-113.598152,53.566517,0
4,ALDERGROVE,-113.641484,53.518399,0


## Now we will plot the Neighborhood data with the school score

In [15]:
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

# create map of Edmonton using latitude and longitude values
map_edm_school_score = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, school_score, neighborhood in zip(edm_neighbourhood_data['LATITUDE'], edm_neighbourhood_data['LONGITUDE'], edm_neighbourhood_data['SCHOOL_CAT_SCORE'], edm_neighbourhood_data['NEIGHBORHOOD_NAME']):
    label = '{}, {}'.format(neighborhood, school_score)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=school_score*10
        ,
        popup=label,
        color='red',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_edm_school_score)
map_edm_school_score

## Looking for alternative dataset

### The government of Alberta offers a dataset but it is incomplete. It's being used here just to demonstrate the concept of the analysis

In [65]:
#Read Edmonton Child Care data --- Focus on Neighborhood, Count of Daycares/Dayhome
edm_daycares = pd.read_csv("Daycare_Edmonton_Dataset_Incomplete.csv")
edm_daycares = edm_daycares[['Neighbourhood', 'Program City', 'Capacity']]
grouper = edm_daycares.groupby('Neighbourhood')
res = grouper.count()
res['Capacity'] = grouper.Capacity.sum()
edm_daycares = res.reset_index()

#edm_daycares = edm_daycares.groupby('Neighbourhood').count().reset_index()
edm_daycares.rename(columns={'Neighbourhood': 'NEIGHBORHOOD_NAME', 'Program City': 'DAYCARE_COUNT', 'Capacity': 'DAYCARE_CAPACITY'}, inplace=True)
edm_daycares.head(5)

Unnamed: 0,NEIGHBORHOOD_NAME,DAYCARE_COUNT,DAYCARE_CAPACITY
0,ALBERTA AVENUE,8,432
1,AVENUE OF NATIONS,1,45
2,AVONMORE,1,56
3,BELMEAD,2,111
4,BELVEDERE,3,200


### Prepare the Neighbourhoods Daycares Data: Daycares Count, Daycares Capacity per Neighbourhood

In [67]:
#Get the Neighborhoods data with Long/Lat
edm_daycares_spatial_data = pd.concat([edm_dataset, edm_daycares], axis=1, join='inner')
edm_daycares_spatial_data = edm_daycares_spatial_data.loc[:,~edm_daycares_spatial_data.columns.duplicated()]
edm_daycares_spatial_data.columns=['NEIGHBORHOOD_NAME', 'LONGITUDE', 'LATITUDE', 'AGE_0_4', 'AGE_5_9', 'AGE_10_14', 'DAYCARE_COUNT', 'DAYCARE_CAPACITY']
edm_daycares_spatial_data.head(5)

Unnamed: 0,NEIGHBORHOOD_NAME,LONGITUDE,LATITUDE,AGE_0_4,AGE_5_9,AGE_10_14,DAYCARE_COUNT,DAYCARE_CAPACITY
0,ABBOTTSFIELD,-113.390342,53.574269,184,178,136,8,432
1,ALBANY,-113.552974,53.632235,101,54,44,1,45
2,ALBERTA AVENUE,-113.48594,53.568262,256,251,183,1,56
3,ALBERTA PARK INDUSTRIAL,-113.598152,53.566517,269,253,192,2,111
4,ALDERGROVE,-113.641484,53.518399,213,148,97,3,200


## Visualize the Neighbourhoods with the Daycares

In [80]:
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

# create map of Edmonton using latitude and longitude values
map_edm_daycares = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, daycare_count, daycare_capacity, child_popul, neighborhood in zip(edm_daycares_spatial_data['LATITUDE'], edm_daycares_spatial_data['LONGITUDE'], edm_daycares_spatial_data['DAYCARE_COUNT'], edm_daycares_spatial_data['DAYCARE_CAPACITY'], edm_daycares_spatial_data['AGE_0_4'], edm_daycares_spatial_data['NEIGHBORHOOD_NAME']):
    label = '{}, {}, {}'.format(neighborhood, daycare_count, daycare_capacity)
    label = folium.Popup(label, parse_html=True)
    # Marker for Daycares capacity in each neighbourhood
    folium.CircleMarker(
        [lat, lng],
        radius=daycare_capacity/20,
        popup=label,
        color='red',
        fill=True,
        fill_color='#FB3A10',
        fill_opacity=0.2,
        parse_html=False).add_to(map_edm_daycares)
map_edm_daycares