# Battle of Neighborhoods: The Shop Explorer
## Business Problem

Vancouver is a major city in western Canada and is the most populous city in the province of British Columbia. Being the third largest metropoliton city, it has the largest population density in Canada. The basic necessities of such a large number of people is satisfied by the numerous shopping venues present accross the huge metropolis.

This project aims at segmenting the shopping venues into clusters based on their distance from the neighborhood centre they are present and explore various aspects like most common distances for shops to be found, etc. Intuitively, based on geography of the city, the shops that are located within a close range to any neighborhood gets grouped into one cluster. This helps in identifying shops that are outliers and customers may need to go out of their way or through a secluded area to have access to them. This can result in possible threats like robbing, lack of timely emergency services, or in the worst case scenarios life threatening events.

## Data Description

### Wikipedia Data

Data link: [https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_V](https://)

The data is fetched as html text from wikipedia and gives the Postal Code, City Name, and Neighborhood Name for the Metropolitan city of Vancouver. web scraping and cleaning is used to scrap the data and fetch the table having the required data fields and ultimately convert it into a pandas DataFrame.

### Geocoder Data

Further data for Longitudes and Latitudes is fetched using Geocoder library.

### Foursquare API Data

We will need data about different shops in different neighborhoods of that specific city. In order to gain that information we will use "Foursquare" locational information. Foursquare is a location data provider with information about all manner of venues and events within an area of interest. Such information includes venue names, locations, menus and even photos. As such, the foursquare location platform will be used as the sole data source since all the stated required information can be obtained through the API.

After finding the list of neighborhoods, we then connect to the Foursquare API to gather information about shops inside each and every neighborhood. For each neighborhood, we have chosen the radius to be 10 km.

The data retrieved from Foursquare contained information of shops within a specified distance of the longitude and latitude of the postcodes. The information obtained per shop is as follows:

1. Neighborhood
2. Neighborhood Latitude
3. Neighborhood Longitude
4. Shop
5. Name of the Shop/Mall
6. Shop Latitude
7. Shop Longitude
8. Shop Distance from center of Neighborhood
9. Shop Category

## Methodology

### Map of Vancouver

![picture](https://raw.githubusercontent.com/code-demoe/Coursera_Capstone/main/CourseraCapstoneImages/Vancouver.png)

### FourSquare API


This project would use Four-square API as its prime data gathering source as it has a database of millions of places, especially their places API which provides the ability to perform location search, location sharing and details about a business.

### Work Flow

Using credentials of Foursquare API features of near-by places of the neighborhoods would be mined. Due to http request limitations the number of places per neighborhood parameter would reasonably be set to 50 and the radius parameter would be set to 15 km.

### Clustering Approach:

To achieve the project goal, we decided to explore neighborhoods, segment them, and group them into clusters to find similar neighborhoods in a big city like Vancouver. To be able to do that, we need to cluster data using a form of unsupervised machine learning: k-means clustering algorithm

### Results

Example of DataFrame after retrieving information from FourSquare:

![picture](https://raw.githubusercontent.com/code-demoe/Coursera_Capstone/main/CourseraCapstoneImages/Columns.png)

Description of shops:


![picture](https://raw.githubusercontent.com/code-demoe/Coursera_Capstone/main/CourseraCapstoneImages/DescriptionOfShops.png)

Number of Shops within various distance ranges:

![picture](https://raw.githubusercontent.com/code-demoe/Coursera_Capstone/main/CourseraCapstoneImages/NumShopsBasedOnDistance.png)

Distance range where most number of shops are present for each neighborhood:

![picture](https://raw.githubusercontent.com/code-demoe/Coursera_Capstone/main/CourseraCapstoneImages/SortBasedOnDistance.png)

#### Final Clustering Result:

![picture](https://raw.githubusercontent.com/code-demoe/Coursera_Capstone/main/CourseraCapstoneImages/VancouverClustered.png)

### Libraries Which are Used to Develope the Project

Pandas: For creating and manipulating dataframes.

Folium: Python visualization library would be used to visualize the neighborhoods cluster distribution of using interactive leaflet map.

Scikit Learn: For importing k-means clustering.

Geocoder: To retrieve Location Data.

Beautiful Soup and Requests: To scrap and library to handle http requests.

Matplotlib: Python Plotting Module.