# A Simple Example of Clustering 

You are given much more country data. Using the same methodology as the one in the lecture, group all the countries in 2 clusters. 

Try with other numbers of clusters and see if they match your expectations. Maybe 7 is going to be a cool one!

Plot the data using the <i> c </i> parameter to separate the data by the clusters we defined.  

<i> Note: c stands for color <i>

## Import the relevant libraries

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
from sklearn.cluster import KMeans

## Load the data

Load data from the csv file: <i> 'Countries_exercise.csv'</i>.

In [None]:
# Load the data
raw_data = pd.read_csv('Countries_exercise.csv')
# Check the data
raw_data

Remove the duplicate index column from the dataset. 

In [None]:
data = raw_data.copy()

## Plot the data

Plot the <i>'Longtitude'</i> and <i>'Latitude'</i> columns. 

In [None]:
plt.scatter(data['Longitude'], data['Latitude'])
plt.xlim(-180,180)
plt.ylim(-90, 90)
plt.show()

## Select the features

Create a copy of that data and remove all parameters apart from <i>Longitude</i> and <i>Latitude</i>.

In [None]:
x = data.iloc[:,1:3]
x

## Clustering

Here's the actual solution: 

Simply change <i> kmeans = KMeans(2) </i> to <i> kmeans = KMeans(3) </i>. 

Then run the remaining kernels until the end.

In [None]:
kmeans = KMeans(7)

In [None]:
kmeans.fit(x)

### Clustering Resutls

In [None]:
identified_clusters = kmeans.fit_predict(x)
identified_clusters

In [None]:
data_with_clusters = data.copy()
data_with_clusters['Cluster'] = identified_clusters
data_with_clusters

In [None]:
plt.scatter(data['Longitude'], data['Latitude'],c=data_with_clusters['Cluster'], cmap = 'rainbow')
plt.xlim(-180,180)
plt.ylim(-90, 90)
plt.show()