# Coursera Capstone Project
## The Battle of Neighborhoods (Part 1)
#### Coursera Capstone - REPORT CONTENT
1. Introduction Section : Discussion of the business problem and the interested audience in this project.
2. Data Section: Description of the data that will be used to solve the problem and the sources.
3. Methodology section: Discussion and description of exploratory data analysis carried out, any inferential statistical testing performed, and if any machine learnings were used establishing the strategy and purposes.
4. Results section: Discussion of the results.
5. Discussion section: Elaboration and discussion on any observations noted and any recommendations suggested based on the results.
6. Conclusion section: Report Conclusion.


## 1.) Intro

In [1]:
import numpy as np # library to handle data in a vectorized manner
import time
import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    branca-0.3.1               |             py_0          25 KB  conda-forge
    altair-4.0.1               |             py_0         575 KB  conda-forge
    openssl-1.1.1d             |       h516909a_0         2.1 MB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    certifi-2019.11.28         |           py36_0         149 KB  conda-forge
    ca-certificates-2019.11.28 |       hecc5488_0         145 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         3.0 MB

The following NEW packages will be 

### Scenario:
I am a beginner data scientist. I currently live with my parents in the suburbs of Chicago. I have been offered an opportunity to interview and work for a company in Chicago, Illinois. I am very excited and I want to use this opportunity to practice my learnings in Coursera in order to be prepared for the area and for possible interview questions. The key question is : How can I find a convenient and enjoyable place similar to mine now in the suburbs? Certainly, I can use available real estate apps and Google but the idea is to use and apply the tools from the program. In order to make a comparison and evaluation of the rental options in Chicago, IL. The apartment in Chicago must meet the following demands:

1. Apartment must be 2 or 3 bedrooms.
2. Desired location is near a Metra Station within 1.0 mile (1.6 km) radius.
3. Price of rent not exceed $3,000 per month.
4. Top ammenities in the selected neighborhood would be cafes, entertainment centers, international restaurants, and grocery stores. 
5. Desirable to have venues such as cafes, restaurants, liquor stores, gyms, and grocery stores.

### Problem:
The challenge is to find a suitable apartment for rent in Chicago, IL that complies with the demands on location, price and venues. The data required to resolve this challenge is described in the following section 2, below.
### Audience:
This gives information to, and helps, anyone planning on moving to a big city like Chicago from less populous places in the US or the rest of the world. This also serves as practice in the development of a solid Data Science tool box of skills. 

## 2. Data
**Description of the data and its sources that will be used to solve the problem**

### Description:
The following data is required to answer the issues of the problem:

1. List of neighborhoods in Chicago with their geodata (latitude and longitude).
2. List of Metra/CTA stations in Chicago with their address.
3. List of apartments for rent in Chicago area with their address and price.
4. Preferably, a list of apartment for rent with additional information, such as price, address, area, # of beds, etc.
5. Venues for each Chicago neighborhood ( than can be clustered).
6. Venues for subway Metra/CTA stations, as needed.

### How the data will be used to solve the problem
The data will be used as follows:

1. Use Foursquare and geopy data to map top 10 venues for all Chicago neighborhoods and clustered in groups.
2. Use foursquare and geopy data to map the location of Metra/CTA stations , separately and on top of the above clustered map in order to be able to identify the venues and ammenities near each station, or explore each subway location separately.
3. Use Foursquare and geopy data to map the location of rental places, in some form, linked to the subway location.
4. Create a map that depicts, for instance, the average rental price per square ft, around a radious of 1.0 mile (1.6 km) around each subway station. I will be able to quickly point to the popups to know the relative price per subway area.
5. Addresses from rental locations will be converted to geodata( lat, long) using Geopy-distance and Nominatim.
6. Data will be searched in open data sources if available, from real estate sites if open to reading, libraries or other government agencies.

**The procesing of this DATA will allow to answer the key questions to make a decision:**

    1. What is the cost of rent (per square ft) around a mile radius from each Metra/CTA station?
    2. What is the area of Manhattan with best rental pricing that meets criteria established?
    3. What is the distance from work to the tentative future home?
    4. What are the venues of the two best places to live? How the prices compare?
    5. How venues distribute among Chicago neighborhoods and around Metra/CTA stations?
    6. Are there tradeoffs between size and price and location?
    7. Any other interesting statistical data findings of the real estate and overall data.

#### Reference of venues around current residence in the suburbs for comparison to Chicago

In [17]:
address = 'Old Schaumburg Rd, Schaumburg, IL'

#geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Illinois home is {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Illinois home is 42.0250138, -88.0424755.


In [26]:
lat = 42.0250138
long = -88.0424755
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 2000 # define radius
CLIENT_ID = 'SDLBIES51DQ32MTL22XEOPAIOPUJOW1E1NGRQ3JEVOJHOD0I'
CLIENT_SECRET = 'T3STUAGHODJ2VFMA2ZX0QRONFM52DALPHQRWKXPGDZENRXZC'
VERSION = '20180605'

# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    lat, 
    long, 
    radius, 
    LIMIT)
url # display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=SDLBIES51DQ32MTL22XEOPAIOPUJOW1E1NGRQ3JEVOJHOD0I&client_secret=T3STUAGHODJ2VFMA2ZX0QRONFM52DALPHQRWKXPGDZENRXZC&v=20180605&ll=42.0250138,-88.0424755&radius=2000&limit=100'

In [27]:
results = requests.get(url).json()

In [28]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [29]:
venues = results['response']['groups'][0]['items']
    
SGnearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
SGnearby_venues =SGnearby_venues.loc[:, filtered_columns]

# filter the category for each row
SGnearby_venues['venue.categories'] = SGnearby_venues.apply(get_category_type, axis=1)

# clean columns
SGnearby_venues.columns = [col.split(".")[-1] for col in SGnearby_venues.columns]

SGnearby_venues.head(15)

Unnamed: 0,name,categories,lat,lng
0,Olympic Park Soccer Fields,Soccer Field,42.024142,-88.035389
1,Spring Valley,Nature Preserve,42.027111,-88.05186
2,Shaw's Crab House,Seafood Restaurant,42.038123,-88.033179
3,Benihana,Japanese Restaurant,42.039795,-88.049686
4,Starbucks,Coffee Shop,42.039272,-88.04791
5,Seasons 52,New American Restaurant,42.038636,-88.036202
6,Wildfire,Steakhouse,42.040173,-88.048348
7,Whole Foods Market,Grocery Store,42.042192,-88.036541
8,IHOP,Breakfast Spot,42.038749,-88.037754
9,Starbucks,Coffee Shop,42.040215,-88.03361


## Map of Schaumburg with venues near home - for reference

In [36]:
map_sg = folium.Map(location = [latitude, longitude], zoom_start = 15)

# add markers to map
for lat, lng, label in zip(SGnearby_venues['lat'], SGnearby_venues['lng'], SGnearby_venues['name']):
    label = folium.Popup(label, parse_html=True)
    folium.RegularPolygonMarker(
        [lat, lng],
        number_of_sides=4,
        radius=10,
        popup=label,
        color='blue',
        fill_color='blue',
        fill_opacity=0.7,
    ).add_to(map_sg)  
    
map_sg