<h1>Capstone Project - The Battle of the Neighborhoods</h1>
<h3>Applied Data Science Capstone by IBM/Coursera</h3>

## Table of Contents
* [Introduction/Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

<a name="introduction"></a>
<h2>Introduction/Business Problem</h2>
<p>Most parents want to live in the best environment possible for their kids. As a parent myself, I want my kids to grow up in the most kid-friendly environment possible.  When choicing a city to live in it is important to know how kid-friendly a city is.  In this project, we will try to determine the most kid-friendly cities in the state of Oregon. We will do this by evaluating kid-friendly venues around each major city to determine which cities are the most kid-friendly.</p>
<p>We will be reviewing a list of categories a venue can have and determine if it is a kid-friendly category.  Once we have determined the kid-friendly categories, we will use those categories to determine what venues are around a set of cities and use the number of venues to score how kid-friendly a city is.</p>

<a name="data"></a>
<h2>Data</h2>
<p>Based on our problem, we will need the following collections of data for our analysis:</p>
<ul>
    <li>A dataset of cities in Oregon with their lat/long as a reference to get a list of kid-friendly venues.</li>
    <li>A dataset of categories that we can use to determine if a category is kid-friendly.</li>
    <li>A dataset of venues around a city with a kid-friendly category.</li>
</ul>

<p>We will be making a few assumptions with our data:</p>
<ul>
    <li>We will look for kid-friendly venues in a circular area around the center point of the city. This mean we may not cover the entire city.  It also mean if two small cities are near each other, there may be some overlap in the venues.</li>
    <li>The dataset we use may not assign certain categories to the venues, so some kid-friendly venues might be excludes.  For example, lets say 'Chuck-E-Cheese' has a category of restaurant only, if the restaurant category is not labelled as kid-friendly it will not be included.  However, if it has the label of 'arcade' as well, which we will mark as kid-friendly category, then it will included.</li>
</ul>

<p>The following data sources will be needed to extract/generate the required information:</p>
<ul>
    <li>Venues will be obtained using <b>Foursquare API</b>.</li>
    <li>Venue Categories will be obtained using <b>Foursquare API</b>.</li>
    <li>Lat/Long of Cities in Oregon will be obtained from <a href='https://public.opendatasoft.com/explore/dataset/us-zip-code-latitude-and-longitude/export/?refine.state=OR' target='_blank'>https://public.opendatasoft.com/explore/dataset/us-zip-code-latitude-and-longitude/export/?refine.state=OR</a>.</li>
</ul>


<h3>Samples of Data</h3>

<h4>Latitude and Longitude of Cities in Oregon</h4>

In [23]:
import numpy as np
import types
import pandas as pd
from pandas.io.json import json_normalize
from botocore.client import Config
import ibm_boto3

def __iter__(self): return 0

In [24]:
# The code was removed by Watson Studio for sharing.

In [25]:
body = client.get_object(Bucket='courseraibmapplieddatasciencecaps-donotdelete-pr-1plbhmv1f3ta9w',Key='us-zip-code-latitude-and-longitude.csv')['Body']
# add missing __iter__ method, so pandas accepts body as file-like object
if not hasattr(body, "__iter__"): body.__iter__ = types.MethodType( __iter__, body )

zipcode_df = pd.read_csv(body, delimiter=';')
zipcode_df.head()

Unnamed: 0,Zip,City,State,Latitude,Longitude,Timezone,Daylight savings time flag,geopoint
0,97001,Antelope,OR,44.904051,-120.67244,-8,1,"44.904051, -120.67244"
1,97015,Clackamas,OR,45.416785,-122.52859,-8,1,"45.416785, -122.52859"
2,97070,Wilsonville,OR,45.308105,-122.77266,-8,1,"45.308105, -122.77266"
3,97110,Cannon Beach,OR,45.894287,-123.961,-8,1,"45.894287, -123.961"
4,97112,Cloverdale,OR,45.257176,-123.89141,-8,1,"45.257176, -123.89141"


<h4>Venue Categories</h4>

In [26]:
import json, requests
url = 'https://api.foursquare.com/v2/venues/categories'

params = dict(
  client_id='V0DGIPOL3TARSV1GJA3IFZZMB4XK4BLS1WLMHEKVY0ZXOG5L',
  client_secret='OPHJQIJSA33Y3HU1TIM4LNMT5HJOAXAO2AU2VIITTPQGFU3S',
  v='20180323',
)
resp = requests.get(url=url, params=params)
data = json.loads(resp.text)

In [27]:
category_df = json_normalize(data['response']['categories'][0]['categories'])
category_df.head()

Unnamed: 0,categories,icon.prefix,icon.suffix,id,name,pluralName,shortName
0,[],https://ss3.4sqi.net/img/categories_v2/arts_en...,.png,56aa371be4b08b9a8d5734db,Amphitheater,Amphitheaters,Amphitheater
1,[],https://ss3.4sqi.net/img/categories_v2/arts_en...,.png,4fceea171983d5d06c3e9823,Aquarium,Aquariums,Aquarium
2,[],https://ss3.4sqi.net/img/categories_v2/arts_en...,.png,4bf58dd8d48988d1e1931735,Arcade,Arcades,Arcade
3,[],https://ss3.4sqi.net/img/categories_v2/arts_en...,.png,4bf58dd8d48988d1e2931735,Art Gallery,Art Galleries,Art Gallery
4,[],https://ss3.4sqi.net/img/categories_v2/arts_en...,.png,4bf58dd8d48988d1e4931735,Bowling Alley,Bowling Alleys,Bowling Alley


<h3>Data Cleanup</h3>
<p>TBD</p>

<a name="methodology"></a>
<h2>Methodology</h2>
<p>Methodology section which represents the main component of the report where you discuss and describe any exploratory data analysis that you did, any inferential statistical testing that you performed, and what machine learnings were used and why.</p>

<a name="analysis"></a>
<h2> Analysis </h2>
<p> Analysis of the data based on the methodology </p>

Unnamed: 0,Zip,City,State,Latitude,Longitude,Timezone,Daylight savings time flag,geopoint
0,97001,Antelope,OR,44.904051,-120.67244,-8,1,"44.904051, -120.67244"
1,97015,Clackamas,OR,45.416785,-122.52859,-8,1,"45.416785, -122.52859"
2,97070,Wilsonville,OR,45.308105,-122.77266,-8,1,"45.308105, -122.77266"
3,97110,Cannon Beach,OR,45.894287,-123.961,-8,1,"45.894287, -123.961"
4,97112,Cloverdale,OR,45.257176,-123.89141,-8,1,"45.257176, -123.89141"


<a name="results"></a>
<h2>Results and Discussion</h2>
<p>Results section where you discuss the results.</p>
<p>Discussion section where you discuss any observations you noted and any recommendations you can make based on the results.</p>

<a name="conclusion"></a>
<h2>Conclusion</h2>
<p>Conclusion section where you conclude the report.</p>