***
<h1>European Capitals Clustering</h1>
<h2>Coursera Capstone Final Project (week 5, part 1)</h2>

**Author: Ivan Letal**

Date: 2018/11/06
***

<p><img src="https://ec.europa.eu/programmes/creative-europe/sites/creative-europe/files/actions-capitals-culture.png" width="450">

<h1>Introduction</h1>

The culture of Europe is rooted in the art, architecture, film, different types of music, literature, and philosophy that originated from the continent of Europe. 

European culture is largely rooted in what is often referred to as its "common cultural heritage". Because of the great number of perspectives which can be taken on the subject, it is impossible to form a single, all-embracing conception of European culture. Nonetheless, there are core elements which are generally agreed upon as forming the cultural foundation of modern Europe. 

If I could describe European culture in few words, I would say Art, Architecture, Music, Science, Cuisine. 

<h1>Objective</h1>

In this project, I will focus only on capital cities in Europe, as we can assume that they well represent their countries and nations' culture.

Using Machine Learning (clustering) I will put capitals in clusters to determine:
<br>
* Similarity or dissimilarity of the cities from cultural point of view
* Classification of capitals into clusters and their visualization

Now, for the sake of the project, I will not focus on cuisine (food and restaurants) to describe the culture, because the data I have mostly contain commercial venues that would have significant impact on comparison. The data I use contain various type of restaurants, who knows Europe will know that nowadays you can find all different types of kitchen in every European country.

As a tourist, you might find this information useful, for example, if you liked Prague (Czech Republic) you might as well like Berlin (Germany).

You can also use the model as a travel agent to recommend your customers different places to visit.

<h1>Data</h1>

The data come from Wikipedia, listing European countries and their capitals. This is pretty straightforward.

https://simple.wikipedia.org/wiki/List_of_European_countries

<p><img src="1.PNG">

To complement the data with geolocations like latitude and longitude, I use Geopy libraries that pulls data from Nominatim. 
Geopy makes it easy for Python developers to locate the coordinates of addresses, cities, countries, and landmarks across the globe using third-party geocoders and other data sources.

https://geopy.readthedocs.io
https://nominatim.openstreetmap.org/

<p><img src="2.PNG" width="700">

The last but not least are data from Foursquare that I access through API. 

https://foursquare.com/

The problem with these data I touched in the Objective section, accuracy of data captured can't determine 100% correct classification in real world. 

But the data are sufficient for this project and study.

<p><img src="3.PNG">

For comparison, I find the text cloud the best.

But in the project (week 5) I will show in detail how Machine Learning evaluates the clusters.

<h1>Methodology</h1>

The data from Wikipedia are scrapped using BeautifulSoup (Python). After that, the data are converted to Pandas (Python) DataFrame and geolocations (from Nominatim's OpenStreetMaps) are added using Geopy (Python). Using FourSquare data, I explore neighborhoods of listed cities for their most frequent venues, converting them to OneHot dataframe and merge with the capital cities data. 

**Using Machine Learning classification method K-Means, I will classify capital cities into clusters.** Finally, each cluster is inspected for typical venue categories that I will present using Word Cluster visualization method.

<h1>Results</h1>

Using the K-Means methodology, the capital cities were divided into ten clusters and displayed on Folium Map (Python).

With geolocations added, various colors indicate belonging to a cluster.

<p><img src="4.PNG" width="700">

Not that geolocations of cluster centroids are any useful here, but for the sake of study purpose we can display centroids on a Folium Map (Python).

<p><img src="5.PNG" width="700">

FourSquare is very useful to explore each city for venues and categorizing them. For example first ten rows of capital Prague (Czech Republic) will give overview of what venues you can expect in the city.

<p><img src="6.PNG" width="700">

Let's use World Cloud to describe similar cities that were discovered using the described method and classified into the same cluster by K-Means (Machine Learning algorithm).

As you can see, Berlin and Prague are similar cities. At least from view of culture. What connects them are: Museum, Theatre, Art, Opera, Concert, Architecture.

<p><img src="8.PNG" width="700">

<h1>Discussion</h1>

K-Means method works well for classification of capital cities. What is debatable are the FourSquare data, as for the project I excluded venues that fall into category of cuisine, food and restaurants. The data that fall into category 'Fun', 'Art' and 'Culture' are considered. From this dataset, the most common venues categories are used for clustering hence comparing two or more cities.

To claim an objective classification you need to understand the data and it takes little experimenting find out the best K for K-Means algorithm, but that's well known problem.

What is especially challenging is tuning the FourSquare data, it very much depends on how you limit the query (API method) and what radius in meters you use. For instance, tiny Vatican City does not contain many venues in the same city radius that was used e.g. for large city of Paris.

Also as you could observe, countries that are not typical in European are considered, that's because part their region is technically in Europe. For example Russia, Turkey, etc.. However, this is valid assumption.

<h1>Conclusion</h1>

FourSquare API is very powerful if used correctly, its advantage is that people use it all around the world. I prefer discussion thread for every venue to learn references when I am looking for trying something new.

Using the described method, I learned that Prague, Berlin, Paris but also Kiev are similar cities in terms of culture, what people like and historic architecture. I have never been in the last 3 out of 4, so I might give it a chance.

One interesting thing that I found using K-Means (K=5) and all venue categories unfiltered, I found that Dublin (capital of Ireland) is an outliner because of vast existence of beer pubs.