# Capstone Business Problem

## Introduction/Business Problem

I am a huge connoisseur of coffee, and have always harbored a pipe dream of opening a coffee shop focused on simple, but elevated coffee. The suggested prompt for the assignment got me thinking about where could I open a coffee shop in the city of Atlanta, where I live. Atlanta has some decent coffee, especially within the city limits. However, there are some areas which are devoid of good coffee and therefore could be a good location to open a shop.  

For those who have never visited, Atlanta is a vast city with three major counties: Fulton, Cobb, and Dekalb. Because I cannot find any database regarding the actual suburbs of Atlanta, I'm going to focus on the different Zip codes within the city.

Therefore, **my business problem is**: Where is the best place to open a coffee shop in metro Atlanta?

## The first challenge is finding out the zip codes and association latitude and longitude of the centroid

In [1]:
from bs4 import BeautifulSoup
import requests
import pandas as pd
import numpy as np
import geocoder

### Getting a list of the three counties

In [2]:
counties =  {'fulton':"https://www.zip-codes.com/county/ga-fulton.asp",
            'cobb':"https://www.zip-codes.com/county/ga-cobb.asp",
            'dekalb':"https://www.zip-codes.com/county/ga-dekalb.asp"}

### This is my function to web scrape and clean the zip code data.

In [3]:
def get_county_zip(x):
    
    #print(x)
    
    page = requests.get(x)
    soup = BeautifulSoup(page.text)
    
    # To Get Zip Codes
    zipcode = soup.findAll(class_='label')
    population = soup.findAll(class_= 'info')
    
    # First, Processing Zip Codes
    for zipc in range(1,len(zipcode)):
        zip_name = zipcode[zipc].text
        #print(zip_name)
        zipcode[zipc] = zip_name
    
    # Removing Non-Zip Code areas
    zip2 = []
    for ii in zipcode:
        if isinstance(ii, str) and ii[0:3] == 'ZIP':
            zip_isolated = str(ii).replace('ZIP Code ','')
            #print(zip_isolated)
            zip2.append(zip_isolated)
            
    # Next, Populations. These start on every 5
    v = np.arange(7,len(population),5)
    
    zip_pop = [] 
    for ii in v:
        pop = population[ii].text
        pop = pop.replace(',','')
        zip_pop.append(int(pop))
    
    # Combinging into a dataframe
    d = {'zipcode': zip2, 'population': zip_pop}
    df = pd.DataFrame(data=d)
    
    return(df)

### This function takes a zip code and, using geocoder, finds the associated latitutde and longitude of the centroid

In [4]:
def get_lat_long(zipcode):
    
    # Returning search for zipcode
    g = geocoder.arcgis(zipcode)
    
    # Latitude
    lati = g.lat
    
    # Longitude
    longi = g.lng
    
    return(lati,longi)

### Looping over the three counties

In [5]:
counties2 = {}
for county in counties:
    
    url = counties[county]
    
    # First DF
    county_df = get_county_zip(url)
    
    # Run through location function
    latitude = []
    longitude = []
    
    for z in county_df['zipcode']:
        
        la, lo = get_lat_long(z)
        
        latitude.append(la)
        longitude.append(lo)
        
    # final DF
    county_df['latitude'] = latitude
    county_df['longitude'] = longitude
    
    # Removing counties with a population = 0
    county_df = county_df[county_df['population'] > 0]
    
    # Adding to new dictionary
    counties2[county] = county_df

In [6]:
# Combine into a dataframe
all_df = pd.DataFrame()
for key, sub_df in counties2.items():
    all_df = all_df.append(sub_df, ignore_index=False) # Add your sub_df one by one
    
all_df.head()

Unnamed: 0,zipcode,population,latitude,longitude
0,30004,53033,34.129194,-84.3099
1,30005,34442,34.076509,-84.225019
2,30009,13722,34.075075,-84.298541
3,30022,64359,34.02118,-84.23461
5,30075,52573,34.051165,-84.371848


In [21]:
## Looking at where the locations are
import folium

# create map of New York using latitude and longitude values
map_atl = folium.Map(location=[33.7676338, -84.5606894], zoom_start=10)

# add markers to map
for lat, lng, borough in zip(all_df['latitude'], all_df['longitude'], all_df['zipcode']):
    #label = str(borough)
    #label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=borough,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_atl)  
    
map_atl

# Week 1 Part 2: Data Creaiton

In doing some research for how others go about finding the best coffee shop location, I came across [this page](https://squareup.com/us/en/townsquare/how-to-find-a-coffee-shop-location). With regards to the the Foursquare API, this point seems most salient:

**2. Neighboring businesses**
"When you’re determining the best location for a coffee shop, neighboring businesses can also affect your profitability — both negatively and positively.

It might be obvious to research other local coffee shops to find out where they are established, but your competitors aren’t limited to other coffee shops. If your coffee shop targets customers looking for a quick breakfast, you should keep your eye on quick breakfast alternatives including smoothie joints, juice bars, bagel places, and even fast food chains. While these businesses are all in different categories, they could all be competing with the products you serve.

Neighboring businesses can also help your coffee shop by complementing your offerings. If your coffee shop has a large study space, it might be smart to open near other businesses or a university. This provides an ideal area for employees or students to come in and get work done before or after hours. If you’re targeting customers who seek a midday caffeine buzz, you may want to look at shopping centers where customers need a pick-me-up while walking around and shopping."

Therefore, I am going to look at data analysis which focuses on the locations, in a way that is similar to the lab praticals already completed. Therefore, for this project I will utilize data from the **200 most relevant venues within a search distance of 5000m** (we only have zipcodes, which cover a larger area of space. In testing, this seemed to return better results.



## Part 1: Web Scraping

In [15]:
# Using this function from earlier in the course
def getNearbyVenues(names, latitudes, longitudes, radius=500):

    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        #print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])
    
    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [None]:
### Getting all the top venues from Atlanta

In [22]:
CLIENT_ID = 'XHZAVLMJZPZ1YERWSOQP0RZSG#####' # your Foursquare ID
CLIENT_SECRET = 'FKWRVOLJYEBTMABNFPMRYR#########3' # your Foursquare Secret
VERSION = '20180604'

In [18]:
LIMIT = 200
atl_venues = getNearbyVenues(names=all_df['zipcode'],
                                   latitudes=all_df['latitude'],
                                   longitudes=all_df['longitude'],
                                  radius = 5000)

In [19]:
atl_venues.shape

(7656, 7)

In [20]:
atl_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,30004,34.129194,-84.3099,Union Restaurant,34.119985,-84.331329,American Restaurant
1,30004,34.129194,-84.3099,'Cue Barbecue,34.114239,-84.269442,BBQ Joint
2,30004,34.129194,-84.3099,Starbucks,34.118145,-84.270546,Coffee Shop
3,30004,34.129194,-84.3099,White Columns Country Club,34.151228,-84.329553,Golf Course
4,30004,34.129194,-84.3099,Scottsdale Farms Garden Center,34.162383,-84.33263,Garden Center
