# Starbucks City Mugs Predictor
#### Applied Data Science Capstone Project by Sergey Timoshpolskiy, September 2019
Contact: timoshpolsky (at) gmail.com

This notebook is a Report on Coursera Applied Data Science course capstone project.

## 1. Introduction
According to Wikipedia, Starbucks Corporation is an American coffee company and coffeehouse chain. Starbucks was founded in Seattle, Washington in 1971. As of early 2019, the company operates over 30,000 locations worldwide. 

For most of major locations of Starbucks' presence a collectible city mug is released that is only sold in this specific city. There are also mugs for countries, states, and territories. Here is an example of what these mugs might look like:

![New York City Been There Starbucks mug](new_york_city_bts.jpg "New York City") ![Toronto Been There Starbucks mug](toronto_bts.jpg "Toronto")

The fact that the mugs are only sold in the locations they are dedicated to makes them valuable collectible items for people traveling around the World. There is a whole community of such mug collectors, who hunt and trade Starbucks mugs. The mugs come in different designs and sizes (called series) totaling to several thousands collectibles. The community of collectors counts at least in tens of thousands people.

In this project I will try to answer the following questions using data:
1. What are the next possible  cities for the upcoming Starbucks mugs releases?
2. What kind of mugs (what series) are most likely to be released for each of these cities?

Besides the audience of collectors community mentioned above, there is Starbucks Marketing department. Maybe one day they find this research helpful.

## 2. Data

The general idea is to leverage Data from multiple sources to answer the questions above and maybe get some data insights that could not be found in any given single source of information. Here is the description of data sources we will be using in this research:

1. The list of the largest cities in the World by population
 
 There is an [article on Wikipedia](https://en.wikipedia.org/wiki/List_of_largest_cities) listing world's largest cities. It seems to have every city in the World that has more than 1 million in population. That could be a good start, but might not be enough for my research. One of the most recently newly released Starbucks city mugs was for Nice, France, which population is about 345,000 people. That means I need work with a list of cities with population at least of 300,000. I will try using United Nations open data sources available [here](https://unstats.un.org/unsd/demographic-social/products/dyb/dyb_2017/). Conveniently, these datasets have geocoordinates listed for each city in the list, along with country, region and population data. The dataset will be exported from the UN site as a \*.csv file.
 
2. The way of telling if Starbucks is present in a given country

 To filter out the countries where there is no Starbucks shops at all from the list above, I will use this official document from Starbucks website: [Store Counts: by Market](https://investor.starbucks.com/financial-data/supplemental-financial-data/default.aspx). That's a simple MS Excel spreadsheet where Starbucks shops counts distribution by countries is given. I will transform it to the list of countries and use that list as a filter.
 
2. The way of telling if Starbucks is present in a given city

 That is where Foursquare venues search API comes in handy. For every city in the list I will query Foursquare if a venue with a name Starbucks exists in a 10 km radius of city's geolocation. If the result of such a query is not empty, I will consider the city to be a city of Starbucks presence.

3. The list of existing Starbucks collectible city mugs

 There is no official Starbucks city mugs catalog. Instead, there are plenty of community supported mug lists and databases, such as [Fredorange.com](http://fredorange.com), [Michael Blass's website](http://mugs.m-blass.de) and others. I could have scraped one of these websites, but luckily I have access to a not complete but well-structured Starbucks city mugs database. That database serves as a back-end to my own iOS application, [Star.Mugs](https://itunes.apple.com/us/app/star-mugs-collectors-tool/id1299038579). For each existing mug there are several attributes: city, state (if available), country, series, and release date (year and month). It took some time and effort of several community members enthusiasts to clean and structure this data. Although the list is not complete (not all existing mug series added yet), it should be a good starting point for the research. The data will be provided as a JSON file.

## 3. Methodology

#### Listing the candidate cities for potential mug release

From the way Starbucks mugs are released over the years, we can tell the general idea is "major cities get their mugs". How do we tell if the city is "major"? For the purpose of this research we will consider city's population as a measure of its "majority". Obviously, there are some other characteristics of majority of cities and places, e.g. being historically important or being a major place of tourists attraction. But we leave them aside for now and just focus on population.

First thing we want to do is join the datasets of world largest cities by population and existing Starbucks city mugs. By combining them and playing around with the data we can put together a list of most populous cities in the World that don't have a dedicated Starbucks mug so far. To convert this list to a list of actual possible-next-mug-candidate cities, we want to filter out: 
    
 - the cities from the countries that don't have Starbucks stores yet (using Starbucks official countries list)
 - the cities that don't have Starbucks stores yet (using Foursquare API)
 
#### Predicting the series of mugs to be released in that cities

After the list of candidates is put together, we want to try to predict the types of mugs to be released for that cities, using the existing mugs database as a training dataset. Since there is a limited list of possible mug types (series), this is a classification problem. Since the list of series is more than 2, this is a multiclass classification problem.

These are the features that mug series seem to be dependent on:
- country / region (right now most of Starbucks shops in the countries of Americas sell Been There series mugs and most shops in Europe sell You Are Here mugs)
- location (the numeric way to summarize country and region)
- release date - year (in the US Global Icon series was intrduced in 2008, You Are Here series replaced it in 2013 and was replaced by Been There series in turn in 2018)

The dependence of mug series from year and country is demonstrated in the Charts below 

![Mug series by Years](series_by_years.png)
![Mug series by Countries](series_by_countries.png)

Two classification Machine Learning models were used to predict series of possible to be released by Starbucks next:

#### K Nearest Neighbor classification model

These three features were used for KNN classificator training:
- location latitude
- location longitude
- release year
For all three of these numerical values it makes sense to calculate difference as a distance

The prediction dataset was loaded with actual cities geolocation data and the value for year 2019, so that nearest neighbors were actually the closest cities where the mug exists and the series of the possible mug would be the nearest in terms of release year. Intuition for this is as follows. If the Icon mug was released in that neighbor city in 2009 and then Been There one in 2018, the Been There should be more likely to be released in the city being predicted in the year 2019.

#### Decision Tree classification model

For the decision tree classificator the features used were:

- country
- release year

The prediction dataset was loaded with actual cities country value and the value for year of 2019. 

## 4. Results

#### Classification models accuracy

The prediction accuracy measured for KNN-classifier by train/test dataset split methodology was of value 0.72 for k = 8. The prediction accuracy measured for Decision Tree classifier by train/test dataset split methodology was of value 0.68. But honestly, as a Starbucks mugs collector myself, I find the Decision Tree predicted results much more accurate. Probably, what happens with KNN-classifier is two location features overweight the third feature, which is year. I will have to investigate into it a little bit further. For the Results section I will stick to Decision Tree Results for now.

#### Top 10 cities that do not have Starbucks mugs

Here are the 10 most populous cities in the world, limited to one from each country, to probably get the Starbucks mug soon:

| Country | City | Population | Predicted Series |
|---------|------| ----------:|:----------------:|
| India | Bengaluru | 5,104,047 | You Are Here |
| Egypt | Alexandria | 3,811,516 | You Are Here |
| Japan | Yokohama | 3,574,443 | You Are Here |
| China | Shiyan | 3,460,000 | You Are Here |
| South Africa | Cape Town | 3,433,441 | You Are Here |
| South Korea | Incheon | 2,628,000 | You Are Here |
| Brazil | Fortaleza | 2,400,000 | Been There |
| Indonesia | Surabaya | 2,374,658 | You Are Here |
| Italy | Rome | 2,318,895 | You Are Here |
| United States | Brooklyn | 2,300,664 | Been There |

Let's quickly go over this table line by line and double check the results.

##### 1. Bengaluru, India

Indeed, Bengaluru (Bangalore) is a third city by population in India. According to Starbucks official [Store Locator](https://www.starbucks.com/store-locator), it does have multiple Starbucks coffeeshops. Both more populous Indian Mumbai and New Delhi have You Are Here Starbucks mug. If Starbucks decided to expand its Indian mugs collection right now, most probably it would be the You Are Here mug for Bangalore.

##### 2. Alexandria, Egypt

Alexandria, Egypt's second largest city never had a Starbucks mug. Cairo and Sharm El Sheikh used to have their Icons in 2009. There are also country mugs for Egypt - Icon if 2009 and Christmas version of You Are Here in 2018. It would be reasonable to expect a You Are Here mug for Alexandria at some point. Cairo (as the biggest city and the capital) and Sharm El Sheikh (as a popular tourists destination) might have their You Are Heres soon as well.

##### 3. Yokohama, Japan

There is an insight to this result, of which our datasets were not aware. The thing is, there are special collections of Japan city mugs, that include Yokohama. We just don't have these series on our mugs database yet. The only You Are Here Japan has is the country mug. There are several seasonal versions of it, but there is no sign of You Are Here-s to be released for Japanese cities. That's one of the cases when we get not really actual predictions because of the quality of data fed to the model.

##### 4. Shiyan, China

Chinese You Are Here mugs family is one of the biggest and still growing. It would be expected for any rather big  Chinese city to get a YAH mug anytime soon. I will actually go even deeper into Chinese cities data, to pull up Top 5 Cities in China that do not have mugs yet. Any of those might get a mug soon.

##### 5. Cape Town, South Africa

The fun fact about this prediction, is Cape Town does not have a Starbucks store yet. Well, for some reason Foursquare reports there is a location, that's how Cape Town got on our list. The opening of Starbucks in Cape Town is actually a long awaited event. South African Johannesburg, Pretoria and Durban all have their You Are Here mugs. So having the one for Cape Town is actually a question of time.

##### 6. Incheon, South Korea

Just as Yokohama, Incheon does have multiple Starbucks stores in the city, but there a mug for Incheon of Asian Artsy series that is not on our list. There are no You Are Here mugs in South Korea so far, so that prediction is quite questionable.

##### 7. Fortaleza, Brazil

According to Starbucks [Store Locator](https://www.starbucks.com/store-locator), there are no Starbuck stores in Fortaleza. Again for some reason the opposite is provided by Foursquare, that's why we see Fortaleza on our list. If we assumed that Starbucks opened a store in Fortaleza, Been There Series would be the most reasonable mug to expect there, since Rio de Janeiro and Sao Paolo just got there Been There-s last November.

##### 8. Surabaya, Indenesia

Surabaya is a good find by our analysis, but again it would be wrong to expect a You Are Here mug for it, since there is already an Indonesia-special series mug that we did not have on our mug database. No You Are Here mugs have appeared in Indonesia so far.

##### 9. Rome, Italy

Very similarly to Cape Town, Rome does not have a Starbucks store yet, but is expected to open anytime soon. There are You Are Here mugs for Italy and Milan released in November 2018, so most probably as soon as the first store is open in Rome, we are going to see a You Are Here mug there.

##### 10. Brooklyn, New York

That is funniest catch from the entire research. Obviously, Brooklyn is not a city itself. Rather, it is the one of five boroughs of New York City. Since it is treated as a city by United Nation's database we used, it is reported to be the most populous "city" in the US without it's own Starbucks mug. If Brooklyn happened to have a mug right now, it would most probably be the Been There one, as this is the actual series of US mugs introduced be Starbucks in 2018. I will dig into the US cities a little deeper, to see if there any actual major cities that still don't have any mugs.

#### Digging deeper into the US and China results

##### Top 5 cities in China

Any of these five cities might get a You Are Here mug anytime soon:

| City | Population | Predicted Series |
|------| ----------:|:----------------:|
| Shiyan | 3,460,000 | yah |
| Tangshan | 3,372,102 | yah |
| Zibo | 3,129,228 | yah |
| Huai'an | 2,700,000 | yah |
| Dadonghai | 2,000,000 | yah |

##### Top 5 cities in the United States

| City | Population | Predicted Series |
|------| ----------:|:----------------:|
| Brooklyn | 2,300,664 | bts |
| Queens | 2,272,771 | bts |
| Manhattan | 1,487,536 | bts |
| The Bronx | 1,385,108 | bts |
| San Jose | 1,026,908 | bts |

4 of the 5 most populous cities in the US without a Starbucks mug are the boroughs of New York City! All of the rest US cities with no mugs seem to be less than a million in population. That makes San Jose, California a decent candidate for it's own mug some time in the future. 

## 5. Discussion

As a side effect of this research, some fun facts were found about existing Starbucks city mugs.


#### Smallest cities

Here are the 5 smallest cities in terms of population that have dedicated Starbucks mugs:

| city | country | population |
|------|---------| ----------:|
| Corfu | Greece | 27,003 |
| Bodrum | Turkey | 39,317 |
| Antigua | Guatemala | 39,368 |
| Roermond | Netherlands | 44,975 |
| Lucerne | Switzerland | 57,066 | 

Although, not very useful information, but still a fun data insight, that could not have been gotten from either of existing data sources. Only combining them together could get us to this small table.

#### Cities not found in the UN dataset

Several cities from mugs database were not found in the UN database. As It appeared, these were not even the cities but rather some special locations, that had Starbucks mugs dedicated to them. With that information we can improve the existing Star.Mugs application database with some side information details about the mugs. Here are as an example, three of such finds with descriptions from Wikipedia.

##### Los Cabos 
Los Cabos is not a city in Mexico, but a municipality located at the southern tip of Mexico's Baja California Peninsula. It encompasses the two towns of Cabo San Lucas and San José del Cabo, linked by a twenty-mile Resort Corridor of beach-front properties.

##### Penang 
Penang is a state in northwest Malaysia comprising mainland Seberang Perai and Penang Island. The state capital of Penang state is George Town.

##### Ruhrgebiet
Ruhrgebiet is the German name for The Ruhr, also referred to as Ruhr district, Ruhr region, Ruhr area or Ruhr valley,  a polycentric urban area in North Rhine-Westphalia, Germany. With a population density of 2,800/km² and a population of over 5 million, it is the largest urban area in Germany and the third-largest in the European Union. Not a city though!

## 6. Conclusion

Obviously, this research was not supposed to solve any serious problems or answer any tough questions humanity faces these days. Mainly, it was a way for me to get some hands-on practice with some real Data.

There are several thoughts though, I would like to share to conclude the research.

- There is lots of Data. Lots of Data openly available to everyone right now. 
- This data may not be clean or complete. 
- Relevant data may come from different and completely independent sources.
- Although the process of combining data from different sources is not always easy and straightforward, by doing so we can sometimes receive the answers to our questions. Sometimes even to those we did not even ask in the first place.

I would also love Starbucks to release any of those mugs predicted above. Yes, we know Brooklyn is not a city, but it's the most populous entity in the US without a Starbucks mug. Why not give them one? Hey, Starbucks, what do you think? There is a solid (Data) Scientific ground for such a move now!