# Capstone Project - The Battle of the Neighborhoods (Week4)

### Applied Data Science Capstone by IBM/Coursera

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data Requirement and Sources](#datasource)
* [Final DataFrame](#datafinal)

## Introduction: Business Problem <a name="introduction"></a>

Pune is one of the fastest growing cities of India. The city has seen a major growth in population as it is moving ahead in its path of being a hub for IT and Education services. It is also in the process of becoming a SMART city attracting even more people.

The property costs in Pune are not as high as other major cities of India. But with good average income, this city is a good candidate for business. 

In this project we will try to find an **optimal location for a restaurant in the city of Pune India**. 
Some of the factors that affect success of a restaurant are **location, area population and other attractions in the area.**

Using data analysis techniques we will find out the density of restaurants in various neighborhoods of the city. 
We will also find the top venues of the area. 

Then we will find areas with low density of restaurants and some new attractions comming in like a big IT firm opening their office or a new multiplex being constructed in the area etc.

## Data <a name="datasource"></a>

For the problem discussed above, I have taken below data for the city of Pune, India:
    - Neighborhoods
    - Map coordinates of the neighborhoods
    - Various types of venues in 500m of the neighborhoods

**Sources**:
    - Neighborhoods data is taken from Wikipedia
    - Coordinates are taken using Google Maps API geocoders
    - Venues data is taken from Foursquare API

Further I will be using google to check for news of any new/ongoing projects, IT offices, multiplexes being constructed in the areas of choice we get. So that we are able to make an even better suggestion.

**For Example -**
Lets say after our analisys, we get two neighborhoods n1 and n2 that seems like optimal optins for opening a restaurant.
Then using the additional data, we can if there is a new/ongoing IT SEZ office with a capacity of 10,000 people is being/just constructed in (say) n1. On the other hand there are no such activities going on in n2. According to this, we will be able to recommend n1 over n2.

**== !Importing libraries! ==**

In [7]:
import pandas as pd
#!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim
#!conda install -c conda-forge folium=0.5.0 --yes
import folium

**====!Loading Neighborhoods in dataframe!====**

In [159]:
df=pd.read_csv(r'C:\Users\user\Desktop\Coursera_Assignment\Project\Project\Pune_Neighborhoods.csv',encoding= 'unicode_escape')
df.head()

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Aundh,,
1,Baner,,
2,Bavdhan Khurd,,
3,Bavdhan Budruk,,
4,Balewadi,,


In [160]:
df=df.fillna(0)

**Creating google geocoder locater instance**

In [161]:
locator = Nominatim(user_agent='myGeocoder')

**Fetching coordinates for all neighborhoods in the dataframe**

In [162]:
for x in range(len(df)):
    location = locator.geocode(f'{df.iloc[x,0]}, Pune, India')
    df.iloc[x,1]=location.latitude
    df.iloc[x,2]=location.longitude

In [163]:
df.head()

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Aundh,18.561883,73.810196
1,Baner,18.564243,73.776857
2,Bavdhan Khurd,18.313881,74.023109
3,Bavdhan Budruk,18.529135,73.7787
4,Balewadi,18.582027,73.768983


### Let's use folium to create map of Pune city

In [164]:
# create map using latitude and longitude values
location = locator.geocode('Pune, India')
latitude=location.latitude
longitude=location.longitude
neighborhoods=df

map_city = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Neighborhood']):
    label = '{} , Pune, India'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_city)  
    
map_city

### Lets try to fetch the venues nearby all the neighborhoods in the city

From the Foursquare lab in the previous module, we know that all the information is in the items key. Before we proceed, let's borrow the get_category_type function from the Foursquare lab.

In [52]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

### Let's create a function to clean the json returned from Foursquare api and structure it into a pandas dataframe for all the neighborhoods in Pune India

Provide Foursquare credentials for making a connection for API calls

In [167]:
CLIENT_ID = 'D1IXAYF5EGVSEDQ2PFFO01KVVXCYAICFE0F5YXMDPX1Z4STE' # your Foursquare ID
CLIENT_SECRET = 'WMFCF5KXW4N5PH0Q0J4NEZHE134H441NKFZFEGD4I1YJ2Z2H' # your Foursquare Secret
VERSION = '20200306'
LIMIT = 30

In [168]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        #print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

### Now write the code to run the above function on each neighborhood and create a new dataframe called york_venues.

In [106]:
york_venues = getNearbyVenues(names=df['Neighborhood'],
                                   latitudes=df['Latitude'],
                                   longitudes=df['Longitude']
                                  )

Let's check the newly created dataframe

In [107]:
york_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Ambegaon,19.031694,73.95003,Ravikiran,19.031083,73.953355,Vegetarian / Vegan Restaurant
1,Aundh,18.561883,73.810196,Westend mall,18.561814,73.80722,Shopping Mall
2,Aundh,18.561883,73.810196,Cinepolis IMAX,18.561756,73.807192,Multiplex
3,Aundh,18.561883,73.810196,Picantos Mexican Grill,18.560654,73.812447,Mexican Restaurant
4,Aundh,18.561883,73.810196,Venkys Xprs,18.56055,73.808964,Fast Food Restaurant


In [108]:
york_venues.shape

(160, 7)

In [109]:
york_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Ambegaon,1,1,1,1,1,1
Aundh,31,31,31,31,31,31
Balewadi,1,1,1,1,1,1
Baner,9,9,9,9,9,9
Bavdhan Budruk,3,3,3,3,3,3
Bhugaon,5,5,5,5,5,5
Bhukum,1,1,1,1,1,1
Bibvewadi,4,4,4,4,4,4
Ghorpadi,4,4,4,4,4,4
Hadapsar,1,1,1,1,1,1


### Let's check the unique categories retruned for the city

In [169]:
print('There are {} uniques categories.'.format(len(york_venues['Venue Category'].unique())))

There are 64 uniques categories.


## Finally the data looks like below now <a name="datafinal"></a>

In [172]:
york_venues

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
1,Aundh,18.561883,73.810196,Westend mall,18.561814,73.807220,Shopping Mall
2,Aundh,18.561883,73.810196,Cinepolis IMAX,18.561756,73.807192,Multiplex
3,Aundh,18.561883,73.810196,Picantos Mexican Grill,18.560654,73.812447,Mexican Restaurant
4,Aundh,18.561883,73.810196,Venkys Xprs,18.560550,73.808964,Fast Food Restaurant
5,Aundh,18.561883,73.810196,Café Maroo,18.564801,73.809141,Korean Restaurant
...,...,...,...,...,...,...,...
155,Wagholi,18.580630,73.983310,Sangam Restaurant,18.580484,73.986763,Restaurant
156,Warje,18.482044,73.800170,Hotel Kaveri,18.483746,73.800725,Seafood Restaurant
157,Warje,18.482044,73.800170,Mystic Flavours,18.485079,73.798939,Indian Restaurant
158,Warje,18.482044,73.800170,Hotel Swarna Pure Veg,18.480503,73.803400,Vegetarian / Vegan Restaurant
