# WorldWideCoffee (WWC) Global Expansion

## Summary:

1. Description of business problem
2. Description of the Data
3. Demo


### 1. Description of business problem
WorldWideCoffee (WWC) is a fast growing startup coffeeshop company established in Eeastern Europe (Ankara, Turkey). After successful launch in the head country, it is planning to expand globally. Main target market for WWC are current unsatisfied Starbukcs customers (major competitor - global brand). 

The strategy company is going to use is summarized below:
1. WWC will consider the capital cities of countries throughout the world, where Starbucks has at least one branch.
2. Based on the data available online, WWC will analyze the average ratings of Starbucks shops in each city, and cluster them into 3 groups: Low, Medium, High.
3. It will choose one of the cities in the low cluster group, and expand to that country.
4. Location of the shop will be decided based on the in city ratings of Starbucks shops, ie, closer to the shops with lowest ratings


### 2. Description of the data

1. List of Starbucks Shops -  https://en.wikipedia.org/wiki/Starbucks#Locations
2. List of National Capitals - https://en.wikipedia.org/wiki/List_of_national_capitals
3. FourSquare - https://foursquare.com/developers/apps

### 3. Demo

In [1]:
import requests
import pandas as pd
from geopy.geocoders import Nominatim
from bs4 import BeautifulSoup
import numpy as np
from sklearn.cluster import KMeans


#### Get coordinates for a capital City

In [2]:
capital = 'Baku'
country = 'Azerbaijan'
address = capital + ", " + country

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of {} are {}, {}.'.format(capital, latitude, longitude))

The geograpical coordinate of Baku are 40.3754434, 49.8326748.


#### List of Stores and ratings
Use the above coordinates as an input into foursquare api to get the list of Starbucks stores, and their corresponding ratings

More detailed demo will be provided in W2 sumbission. For now you can use the links below to get the required response from Foursquare:
1. https://foursquare.com/developers/explore#req=venues%2F
2. https://foursquare.com/developers/explore#req=venues%2Fsearch%3F

#### Create DataFrame with Starbucks locations

In [4]:
site_url = "https://en.wikipedia.org/wiki/Starbucks#Locations"
site = requests.get(site_url).text
site = BeautifulSoup(site,"lxml")

In [5]:
starbucks_locs = pd.DataFrame(columns=['Country'])

for lst in site.find_all("ul")[13:19]:
    for item in lst.find_all('li'):
        c = item.find('a').get('title')
        ## to make sure it matches with list of national capitals page
        if c == 'The Bahamas':
            c = 'Bahamas'
        elif c == 'Republic of Ireland':
            c = 'Ireland'
        elif c == 'Hong Kong':
            continue
        starbucks_locs = starbucks_locs.append({'Country': c}, ignore_index=True)

print(starbucks_locs.shape)
starbucks_locs.head()

(77, 1)


Unnamed: 0,Country
0,Egypt
1,South Africa
2,Morocco
3,China
4,Japan


#### Get List of National Capitals

In [6]:
cap_site_url = "https://en.wikipedia.org/wiki/List_of_national_capitals"
cap_site = requests.get(cap_site_url).text
cap_site = BeautifulSoup(cap_site,"lxml")

My_table = cap_site.find("table",{"class":"wikitable sortable"})

In [7]:
capital_df = pd.DataFrame(columns=['Capital', 'Country'])

for tr in My_table.find_all('tr'):
    tds = tr.find_all('td')
    if not tds:
        continue
    capital, country, _ = [td.text.strip() for td in tds]
    capital_df = capital_df.append({'Capital': capital,
                                    'Country': country}, ignore_index=True)

In [8]:
print(capital_df.shape)
capital_df.head()

(244, 2)


Unnamed: 0,Capital,Country
0,Abu Dhabi,United Arab Emirates
1,Abuja,Nigeria
2,Accra,Ghana
3,Adamstown,Pitcairn Islands
4,Addis Ababa,Ethiopia


In [9]:
starbucks_shops = pd.merge(capital_df, starbucks_locs, on='Country')
starbucks_shops.head()

Unnamed: 0,Capital,Country
0,Abu Dhabi,United Arab Emirates
1,Amman,Jordan
2,Amsterdam (official)The Hague (de facto),Netherlands
3,Andorra la Vella,Andorra
4,Ankara,Turkey


In [11]:
# modify the data to make sure names match between two wikipedia pages
starbucks_shops.at[2,'Capital'] = 'Amsterdam'
starbucks_shops.at[31,'Capital'] = 'Kuala Lumpur'
starbucks_shops.at[57,'Capital'] = 'Cape Town'
starbucks_shops.at[64,'Capital'] = 'Santiago'
starbucks_shops.at[69,'Capital'] = 'La Paz'

In [15]:
print(starbucks_shops.shape)
starbucks_shops.head()

(77, 2)


Unnamed: 0,Capital,Country
0,Abu Dhabi,United Arab Emirates
1,Amman,Jordan
2,Amsterdam,Netherlands
3,Andorra la Vella,Andorra
4,Ankara,Turkey


#### Add Columns for latitude and longitutde

In [16]:
sLength = len(starbucks_shops)
starbucks_shops = starbucks_shops.assign(Latitude=pd.Series(np.zeros(sLength)).values)
starbucks_shops = starbucks_shops.assign(Longitude=pd.Series(np.zeros(sLength)).values)
starbucks_shops.head()

Unnamed: 0,Capital,Country,Latitude,Longitude
0,Abu Dhabi,United Arab Emirates,0.0,0.0
1,Amman,Jordan,0.0,0.0
2,Amsterdam,Netherlands,0.0,0.0
3,Andorra la Vella,Andorra,0.0,0.0
4,Ankara,Turkey,0.0,0.0


#### Fill in the lat, long data

In [17]:
address = capital + ", " + country

for index, row in starbucks_shops.iterrows():
    address = row["Capital"] + ", " + row["Country"]
    geolocator = Nominatim(user_agent="ny_explorer")
    location = geolocator.geocode(address)
    latitude = location.latitude
    longitude = location.longitude
    starbucks_shops.at[index,'Latitude'] = latitude
    starbucks_shops.at[index,'Longitude'] = longitude
    

In [None]:
# Save the file
# starbucks_shops.to_csv("final_df.csv")

In [20]:
# load the file
# starbucks_shops = pd.read_csv("final_df.csv", index_col=0)

#### Starbucks Stores
Table below will be used in the next part as an input into FourSquare API to analyze the ratings of stores

In [21]:
starbucks_shops

Unnamed: 0,Capital,Country,Latitude,Longitude
0,Abu Dhabi,United Arab Emirates,23.997644,53.643910
1,Amman,Jordan,31.951569,35.923963
2,Amsterdam,Netherlands,52.374540,4.897976
3,Andorra la Vella,Andorra,42.506939,1.521247
4,Ankara,Turkey,39.921522,32.853793
5,Athens,Greece,37.984149,23.727984
6,Baku,Azerbaijan,40.375443,49.832675
7,Bandar Seri Begawan,Brunei,4.889545,114.941757
8,Bangkok,Thailand,13.753893,100.816080
9,Beijing,China,39.906217,116.391276


## Get shop ratings using FourSquare 

Let's first add two columns below:
1. AvgRatings - Average rating of all starbucks shops in the city
2. Ratings - a dictionary containing, the id of a specific shop from foursquare api, and its corresponding rating

In [22]:
sLength = len(starbucks_shops)
starbucks_shops = starbucks_shops.assign(AvgRatings = pd.Series(np.zeros(sLength)).values)
starbucks_shops = starbucks_shops.assign(Ratings = pd.Series(np.zeros(sLength)).values)
starbucks_shops.head()

Unnamed: 0,Capital,Country,Latitude,Longitude,AvgRatings,Ratings
0,Abu Dhabi,United Arab Emirates,23.997644,53.64391,0.0,0.0
1,Amman,Jordan,31.951569,35.923963,0.0,0.0
2,Amsterdam,Netherlands,52.37454,4.897976,0.0,0.0
3,Andorra la Vella,Andorra,42.506939,1.521247,0.0,0.0
4,Ankara,Turkey,39.921522,32.853793,0.0,0.0


#### Complete the table

Let's fill in the AvgRatings and Ratings columns with correct values

In [None]:
CLIENT_ID = '**' #  Foursquare ID
CLIENT_SECRET = '**' #  Foursquare Secret
VERSION = '20180604'
query = 'Starbucks' # name to be searched
cat_id = '4bf58dd8d48988d1e0931735' # Id for Coffe Shops to be used for search query in Foursquare API 
lim = 5 # due to the restrictions to the number of queries limit of 5 will be used

# Iterate over each capital city
for index, row in starbucks_shops[15:].iterrows():
    
    lat = row['Latitude']
    lon = row['Longitude']
   
    # to get a list of all starbucks shops in a given city
    url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&v={}&ll={},{}&query={}&categoryId={}&limit={}'.format(
        CLIENT_ID, 
        CLIENT_SECRET, 
        VERSION, 
        lat, 
        lon, 
        query, 
        cat_id,
        lim)

    
    store_results = requests.get(url).json()
    
    # creates a dictionary of store ids
    d = {}
    for strb in store_results['response']['venues']:
        d[strb['id']] = None

        
    # to get the corresponding ratings of starbucks shops found above
    url = 'https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&v={}'
    for k in d.keys():
        
        url = url.format(k, CLIENT_ID, CLIENT_SECRET,VERSION)

        res = requests.get(url).json()
        try:
            d[k] = float(res['response']['venue']['rating'])
        except:
            d[k] = 0.
        
    
    # fill in the Ratings and AvgRatings columns
    starbucks_shops.at[index,'Ratings'] = d
    try:
        starbucks_shops.at[index,'AvgRatings'] = sum(list(d.values())) / len(list(d.values())) 
    except:
        starbucks_shops.at[index,'AvgRatings'] = 0

In [None]:
# Save the file
# starbucks_shops.to_csv("Ratings_final.csv")

In [23]:
# load the file
starbucks_shops_ratings = pd.read_csv('Ratings_final.csv', index_col=0)

#### Completed table

In [24]:
starbucks_shops_ratings

Unnamed: 0,Capital,Country,Latitude,Longitude,AvgRatings,Ratings
0,Abu Dhabi,United Arab Emirates,23.997644,53.643910,8.40,{'53089160498e151fa7f9a9f5': 8.4}
1,Amman,Jordan,31.951569,35.923963,9.10,"{'4bb624ee46d4a593f8c0c5c0': 9.1, '59fcc6bb286..."
2,Amsterdam,Netherlands,52.374540,4.897976,8.00,"{'529634bc11d2ab5263ab848c': 8.0, '50056329e4b..."
3,Andorra la Vella,Andorra,42.506939,1.521247,7.90,"{'5745a9cc498e4f0f9a79db85': 7.9, '595a40052bf..."
4,Ankara,Turkey,39.921522,32.853793,8.10,"{'4f25511ce4b04f6e6a110d1b': 8.1, '51386bcbe4b..."
5,Athens,Greece,37.984149,23.727984,7.90,"{'4b55dc00f964a52046f327e3': 7.9, '4b9f7ab4f96..."
6,Baku,Azerbaijan,40.375443,49.832675,8.00,"{'5698c3bc498eac8f3804d4df': 8.0, '576d0a8c498..."
7,Bandar Seri Begawan,Brunei,4.889545,114.941757,7.60,"{'52fc9a45498e79749d33f0d0': 7.6, '4c4d13aff5a..."
8,Bangkok,Thailand,13.753893,100.816080,8.30,"{'56c6a4c1498e6dd6d1bddb91': 8.3, '5a3c5fe36a5..."
9,Beijing,China,39.906217,116.391276,7.70,"{'4be7eebbee96c9289b6efdbf': 7.7, '5b373a893d4..."


As you can see from the table above, some of the ratings are missing. We will filter out that data and work only with cities left

In [25]:
idxs = starbucks_shops_ratings[starbucks_shops_ratings['AvgRatings'] == 0].index
idxs

Int64Index([11, 15, 22, 30, 54, 57, 60, 69], dtype='int64')

In [26]:
starbucks_shops_ratings_filtered = starbucks_shops_ratings.drop(index = idxs).reset_index(drop=True)
print(starbucks_shops_ratings_filtered.shape)
starbucks_shops_ratings_filtered

(69, 6)


Unnamed: 0,Capital,Country,Latitude,Longitude,AvgRatings,Ratings
0,Abu Dhabi,United Arab Emirates,23.997644,53.643910,8.40,{'53089160498e151fa7f9a9f5': 8.4}
1,Amman,Jordan,31.951569,35.923963,9.10,"{'4bb624ee46d4a593f8c0c5c0': 9.1, '59fcc6bb286..."
2,Amsterdam,Netherlands,52.374540,4.897976,8.00,"{'529634bc11d2ab5263ab848c': 8.0, '50056329e4b..."
3,Andorra la Vella,Andorra,42.506939,1.521247,7.90,"{'5745a9cc498e4f0f9a79db85': 7.9, '595a40052bf..."
4,Ankara,Turkey,39.921522,32.853793,8.10,"{'4f25511ce4b04f6e6a110d1b': 8.1, '51386bcbe4b..."
5,Athens,Greece,37.984149,23.727984,7.90,"{'4b55dc00f964a52046f327e3': 7.9, '4b9f7ab4f96..."
6,Baku,Azerbaijan,40.375443,49.832675,8.00,"{'5698c3bc498eac8f3804d4df': 8.0, '576d0a8c498..."
7,Bandar Seri Begawan,Brunei,4.889545,114.941757,7.60,"{'52fc9a45498e79749d33f0d0': 7.6, '4c4d13aff5a..."
8,Bangkok,Thailand,13.753893,100.816080,8.30,"{'56c6a4c1498e6dd6d1bddb91': 8.3, '5a3c5fe36a5..."
9,Beijing,China,39.906217,116.391276,7.70,"{'4be7eebbee96c9289b6efdbf': 7.7, '5b373a893d4..."


### K-means

Now that the table is ready, we will use K-means algorithm in order to cluster cities into 3 segments:

1. Cities with **High Ratings**
2. Cities with **Medium Ratings**
3. Cities with **Low Ratings**


In [27]:
# set number of clusters
kclusters = 3

grouped_clustering = starbucks_shops_ratings_filtered[['AvgRatings']]

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_

array([1, 1, 2, 2, 2, 2, 2, 2, 1, 2, 2, 0, 0, 0, 2, 2, 1, 2, 2, 1, 2, 1,
       0, 1, 1, 2, 0, 0, 1, 2, 1, 0, 0, 2, 1, 0, 2, 2, 2, 1, 2, 0, 2, 1,
       0, 2, 0, 0, 1, 0, 2, 0, 0, 1, 1, 2, 2, 2, 0, 0, 1, 0, 2, 0, 2, 2,
       0, 0, 1], dtype=int32)

In [None]:
starbucks_shops_ratings_filtered.insert(0, 'Cluster Labels', kmeans.labels_)

In [31]:
starbucks_shops_ratings_filtered

Unnamed: 0,Cluster Labels,Capital,Country,Latitude,Longitude,AvgRatings,Ratings
0,1,Abu Dhabi,United Arab Emirates,23.997644,53.643910,8.40,{'53089160498e151fa7f9a9f5': 8.4}
1,1,Amman,Jordan,31.951569,35.923963,9.10,"{'4bb624ee46d4a593f8c0c5c0': 9.1, '59fcc6bb286..."
2,2,Amsterdam,Netherlands,52.374540,4.897976,8.00,"{'529634bc11d2ab5263ab848c': 8.0, '50056329e4b..."
3,2,Andorra la Vella,Andorra,42.506939,1.521247,7.90,"{'5745a9cc498e4f0f9a79db85': 7.9, '595a40052bf..."
4,2,Ankara,Turkey,39.921522,32.853793,8.10,"{'4f25511ce4b04f6e6a110d1b': 8.1, '51386bcbe4b..."
5,2,Athens,Greece,37.984149,23.727984,7.90,"{'4b55dc00f964a52046f327e3': 7.9, '4b9f7ab4f96..."
6,2,Baku,Azerbaijan,40.375443,49.832675,8.00,"{'5698c3bc498eac8f3804d4df': 8.0, '576d0a8c498..."
7,2,Bandar Seri Begawan,Brunei,4.889545,114.941757,7.60,"{'52fc9a45498e79749d33f0d0': 7.6, '4c4d13aff5a..."
8,1,Bangkok,Thailand,13.753893,100.816080,8.30,"{'56c6a4c1498e6dd6d1bddb91': 8.3, '5a3c5fe36a5..."
9,2,Beijing,China,39.906217,116.391276,7.70,"{'4be7eebbee96c9289b6efdbf': 7.7, '5b373a893d4..."


##### Let's create 3 dataframes for low, medium, and highly rated cities

In [32]:
low_cities = starbucks_shops_ratings_filtered[starbucks_shops_ratings_filtered['Cluster Labels'] == 0]
medium_cities = starbucks_shops_ratings_filtered[starbucks_shops_ratings_filtered['Cluster Labels'] == 2]
high_cities = starbucks_shops_ratings_filtered[starbucks_shops_ratings_filtered['Cluster Labels'] == 1]

#### Now, let's take a look at the cities with low Starbucks store ratings

In [37]:
low_cities

Unnamed: 0,Cluster Labels,Capital,Country,Latitude,Longitude,AvgRatings,Ratings
11,0,Berlin,Germany,52.517037,13.38886,6.9,"{'4adf77c3f964a520df7a21e3': 6.9, '4adf768af96..."
12,0,Bern,Switzerland,46.948271,7.451451,7.2,"{'4b7687ddf964a520b4502ee3': 7.2, '4b61bad2f96..."
13,0,Bogotá,Colombia,4.598077,-74.076103,6.7,"{'58d0742a951e7d532bac7fb2': 6.7, '54f83f18498..."
22,0,Dublin,Ireland,53.349764,-6.260273,7.3,"{'5290d704498e2d5c6000a181': 7.3, '4af1ea70f96..."
26,0,Jakarta,Indonesia,-6.175394,106.827183,7.3,"{'51047707e4b0059ce06c2cf0': 7.3, '5783a0ff498..."
27,0,Kuala Lumpur,Malaysia,3.151664,101.694303,6.8,"{'5902d2f4646e387f2b2d705c': 6.8, '538daac6498..."
31,0,London,United Kingdom,51.507322,-0.127647,6.8,"{'502904745dd7750e9d63bc17': 6.8, '4b77eb6ff96..."
32,0,Luxembourg,Luxembourg,49.815868,6.129675,7.2,"{'567eae1e498eab68b90d203b': 7.2, '568f8115498..."
35,0,Manila,Philippines,14.590622,120.97997,7.1,"{'4cd7475ab6962c0fc4b72f96': 7.1, '4bda44e6a8d..."
41,0,Nassau,Bahamas,25.078346,-77.338333,6.5,"{'4bcf7dd041b9ef3b078cf8e5': 6.5, '4ccafe8eba0..."


### Final Notes:

Now, the table above can be used by WorldWideCoffee  in order to make decisions about their expanding.

They can choose the cities with lowest ratings, and open their shops there.

Moreover, Ratings column of the table contains rating information about each store in the chosen city. They can use this data to further analyze the starbucks shops, and choose a location in which starbucks has the lowest rating