# Early Childhood Care Selection in the Grand Toronto Area

## Contents
* [Introduction](#introduction)
* [Business problem](#businessproblem)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction
This project was motivated by my personal struggles as a young working mother looking for child care options for her one year old in a pre-pandemic world.

The purpose of the project is to help young families moving into York Region of Ontario, Ontario in exploring and much better informed options for early learning and chicld care. My project aims to create an analysys of comprehensive features around child care for families coming to the region from within or outside Canada, in order to help them make smart and efficient decision on what's best for them and their young family members.

## Business problem
As I mentioned above, the reasoning behind me choosing this subject to explore and analyze was my personal experience. My husband and I deceided to move to Markham, Ontario from another city in the province while I was still pregnant so that we have time- pre baby- to find the necessary facilities, services, amneties and know our soroundings before the the baby comes. As we are working parents with no grandparents to pitch in, we knew that once my maternity leave ended we had to have a plan in terms of child care. We found it insanely difficult to find and choose, more so that thh waiting lists are very long.

While York Region, of which the city of Markham is part, has a very comprehensive website and resource list, everything is very static and time consuming and quite some extensive analysis had to be done to find a list of child care options for our child in a prepandemic world. It was very hard to find information on daycare and other child care options in the city in such a way that it makes the decision process easy, efficent and result driven.

The main aim of my project analysis is to help the decision makin process less stressful and easier by providing a comprehensive, easy to follow list of child care options in Markham Ontario, that can be used by new families either already living in the city or just moving in the city. The end result will need to have details on learning and care options, ratings, locations, reviews, programs included and fees, other amneties and dependencies.

# Data

Based on the business problem this project will use data from various sources in order to achieve it's desired result. The main data sources will be:

* Wikipedia list of Postal Codes in Canada, in order to identify cites and neighbourhoods in the Grand Toronto Area https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M
* Geographical coordinates from http://cocl.us/Geospatial_data in order to find specific location of each city and neighbourhood
* Foursquare API for data on child care locations and user reviews
* City data on child care options : https://insights-york.opendata.arcgis.com/datasets/childrens-service/data 
* Provincial data on child care options in the province of Ontario : https://data.ontario.ca/dataset/licensed-child-care-facilities-in-ontario

Secondary data sets will be used for analysis purposes or partial scrapping where other data is not available. For example: 
* National data on child care options: https://www150.statcan.gc.ca/n1/pub/11f0019m/11f0019m2006284-eng.pdf


## Getting Data

In [26]:
# First we import all necessary dependencies for the project
import numpy as np # library to handle data in a vectorized manner
import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
import json # library to handle JSON files
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib.colors as colors

!pip install folium
import folium # map rendering library
import random
!pip install geopy
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 

from bs4 import BeautifulSoup
import csv

print('Libraries imported.')

Libraries imported.


In [30]:
url='https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
response = requests.get(url)

#Process and convert html data
data = response.text
soup = BeautifulSoup(data,'html.parser')
table_wikipedia=soup.find('table')

#Create the pandas dataframe
gtaraw_df = pd.read_html(str(table_wikipedia))[0]
gtaraw_df.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


In [31]:
# No need for the first column hence we will drop it
# Drop the first column
gtaraw_df.drop(0,inplace=True)
#Rename the columns names
gtaraw_df.columns = ['PostalCode','Borough','Neighborhood']
gtaraw_df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"


In [32]:
# Let's remove all 'Not assigned' values in Borough as they are not useful
gtaraw_df2=gtaraw_df[gtaraw_df['Borough'].str.contains("Not assigned") == False].reset_index()
gtaraw_df2.head()

Unnamed: 0,index,PostalCode,Borough,Neighborhood
0,2,M3A,North York,Parkwoods
1,3,M4A,North York,Victoria Village
2,4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,5,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


In [33]:
# We'll manipulate the data a bit more to avoid unnecessary redundancy and weherever Borough and Neighnorhood have the same PostalCode, I'll merge in one row
# Drop the first column
gtaraw_df2.drop(0,inplace=True)
gtaraw_df3= gtaraw_df2.groupby(['PostalCode', 'Borough'])['Neighborhood'].apply(', '.join).reset_index()
gtaraw_df3.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1B,Scarborough,"Malvern, Rouge"
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


In [37]:
# Let's remove all 'Not assigned' values in Borough as they are not useful
gtaraw_df3=gtaraw_df3[gtaraw_df3['Borough'].str.contains("Not assigned") == False].reset_index()
gtaraw_df3.head()

Unnamed: 0,index,PostalCode,Borough,Neighborhood
0,0,M1B,Scarborough,"Malvern, Rouge"
1,1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek"
2,2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,3,M1G,Scarborough,Woburn
4,4,M1H,Scarborough,Cedarbrae


In [38]:
gtaraw_df3.drop(gtaraw_df3.columns[0], axis=1, inplace=True)
gtaraw_df3.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1B,Scarborough,"Malvern, Rouge"
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


In [39]:
gtaraw_df3.shape

(102, 3)

In [40]:
gtaraw_df3.to_csv('GTA_RAW.csv')
print('First data Set from WIkipedia saved')

First data Set from WIkipedia saved


### Getting location data

In [41]:
# I'll get the data to create our data frame by leveraging the link provided in the instructions
geolocation_df=pd.read_csv('http://cocl.us/Geospatial_data')
geolocation_df.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [42]:
# I'll adjust the Postal Code name to match the name from our data frame from step 1
geolocation_df = geolocation_df.rename(columns = {'Postal Code':'PostalCode'}) 
geolocation_df.head()

Unnamed: 0,PostalCode,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [43]:
# In order to merge the 2 datasets I'll load the first one
gtaraw_df3 = pd.read_csv('GTA_RAW.csv')
gtaraw_df3.head()

Unnamed: 0.1,Unnamed: 0,PostalCode,Borough,Neighborhood
0,0,M1B,Scarborough,"Malvern, Rouge"
1,1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek"
2,2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,3,M1G,Scarborough,Woburn
4,4,M1H,Scarborough,Cedarbrae


In [44]:
gtaraw_df3.drop(gtaraw_df3.columns[0], axis=1, inplace=True)
gtaraw_df3.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1B,Scarborough,"Malvern, Rouge"
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


In [45]:
# Now we merge teh 2 datasets by PostalCode
gtaraw_df3 = pd.merge(gtaraw_df3, geolocation_df, on = "PostalCode")
gtaraw_df3.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


In [46]:
#Save the new dataset
gtaraw_df3.to_csv('GTA_RAW_GEO.csv', index=False)
print('Data set with geolocation data is now saved!')

Data set with geolocation data is now saved!


#### Get data from York

In [55]:
# I'll get the provincial child care data related to licensed child care facilities
prov_df=pd.read_excel('https://data.ontario.ca/dataset/7efd8b4b-cc63-4337-a551-c940a346605b/resource/d2144297-fc60-4472-b954-e577d1f1a3fb/download/child_care_facilities_open_data_feb2020.xlsx')
prov_df.head()

Unnamed: 0,Report Snapshot Date,Licensee Name,Program Type Desc,Region Display Name,CMSM DSSAB Name,Child Care Center Name,Licence Number,Program Option,Original Issue Date,Licence Status,Closure or Termination Date,Language of Service Desc,Street Number,Street Name,Street Type,City,Province,Postal Code
0,2020-02-29,002350227 Ontario Inc. (Creative Minds Childre...,Child Care Centre,Southwest Region,City of Brantford,Creative Minds Children Services,6463,Full Day(6 hours or more in a day),2013-04-15,Inactive,NaT,English,61,Sherwood,Drive,Brantford,ON,N3T 1N7
1,2020-02-29,002350227 Ontario Inc. (Creative Minds Childre...,Child Care Centre,Southwest Region,City of Brantford,Creative Minds children services inc,56690,Full Day(6 hours or more in a day),2016-11-24,Active,NaT,English,5,wade,Avenue,"Brantford, On",ON,N3T 1W8
2,2020-02-29,002599266 Ontario INC.,Child Care Centre,Central East Region,Regional Municipality of Durham,Durham Montessori School and Daycare,57087,Full Day(6 hours or more in a day),2018-01-08,Active,NaT,English,200,Byron,Street,Whitby,ON,L1N 4P6
3,2020-02-29,002633409 Ontario Corporation,Child Care Centre,West Region,Regional Municipality of Halton,Western Heights Montessori Academy,57245,Full Day(6 hours or more in a day),2018-08-10,Active,NaT,"English, French",186,Morrison Rd,,Oakville,ON,L4J 4J4
4,2020-02-29,1.2.3. Look At Me Co-operative Nursery School ...,Child Care Centre,Southwest Region,City of Stratford,1.2.3. LOOK AT ME CO-OPERATIVE NURSERY SCHOOL,14085,Half day(Less than 6 hours in a day),1991-12-05,Active,NaT,English,465,Maitland,Avenue,Listowel,ON,N4W 2M7


I'll drop a few columns as they are of no use to what the project is meant to achieve.

In [56]:
# Dropping unnecessary columns
prov_df.drop(['Report Snapshot Date', 'Licensee Name', 'Licence Number', "Original Issue Date", "Closure or Termination Date"], axis=1)
prov_df.head()

Unnamed: 0,Report Snapshot Date,Licensee Name,Program Type Desc,Region Display Name,CMSM DSSAB Name,Child Care Center Name,Licence Number,Program Option,Original Issue Date,Licence Status,Closure or Termination Date,Language of Service Desc,Street Number,Street Name,Street Type,City,Province,Postal Code
0,2020-02-29,002350227 Ontario Inc. (Creative Minds Childre...,Child Care Centre,Southwest Region,City of Brantford,Creative Minds Children Services,6463,Full Day(6 hours or more in a day),2013-04-15,Inactive,NaT,English,61,Sherwood,Drive,Brantford,ON,N3T 1N7
1,2020-02-29,002350227 Ontario Inc. (Creative Minds Childre...,Child Care Centre,Southwest Region,City of Brantford,Creative Minds children services inc,56690,Full Day(6 hours or more in a day),2016-11-24,Active,NaT,English,5,wade,Avenue,"Brantford, On",ON,N3T 1W8
2,2020-02-29,002599266 Ontario INC.,Child Care Centre,Central East Region,Regional Municipality of Durham,Durham Montessori School and Daycare,57087,Full Day(6 hours or more in a day),2018-01-08,Active,NaT,English,200,Byron,Street,Whitby,ON,L1N 4P6
3,2020-02-29,002633409 Ontario Corporation,Child Care Centre,West Region,Regional Municipality of Halton,Western Heights Montessori Academy,57245,Full Day(6 hours or more in a day),2018-08-10,Active,NaT,"English, French",186,Morrison Rd,,Oakville,ON,L4J 4J4
4,2020-02-29,1.2.3. Look At Me Co-operative Nursery School ...,Child Care Centre,Southwest Region,City of Stratford,1.2.3. LOOK AT ME CO-OPERATIVE NURSERY SCHOOL,14085,Half day(Less than 6 hours in a day),1991-12-05,Active,NaT,English,465,Maitland,Avenue,Listowel,ON,N4W 2M7
