# DataCamp Certification Case Study

### Project Brief

You are on the data science team for a coffee company that is looking to expand their business into Ukraine. They want to get an understanding of the existing coffee shop market there.

You have a dataset from Google businesses. It contains information about coffee shops in Ukraine. The marketing manager wants to identify the key coffee shop segments. They will use this to construct their marketing plan. In their current location, they split the market into 5 segments. The marketing manager wants to know how many segments are in this new market, and their key features.

You will be presenting your findings to the Marketing Manager, who has no data science background.

The data you will use for this analysis can be accessed here: `"data/coffee_shops.csv"`

### Table of Content
* [Getting to know the dataset](#getting-to-know)
* [Regrouping](#regroup)




In [None]:
# import packages
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

sns.set()
%matplotlib inline


### Getting to know the dataset <a class='anchor' id='getting-to-know'/>
From the case study description document we see that the missing values in 'delivery, dine-in, takeout' columns are actually False. Therefore, let's take care of it! 
<img src='data/descr.png'>

In [2]:
shops = pd.read_csv('data/coffee_shops.csv')
shops[['Delivery option', 'Dine in option', 'Takeout option']] = shops[['Delivery option', 
                                                                        'Dine in option', 
                                                                        'Takeout option']].fillna(False, axis=1)
shops.head(3)

Unnamed: 0,Region,Place name,Place type,Rating,Reviews,Price,Delivery option,Dine in option,Takeout option
0,Mariupol,Dim Kavu,Coffee store,4.6,206.0,,False,False,False
1,Mariupol,Коферум,Cafe,5.0,24.0,$$,False,False,True
2,Mariupol,Кофейня Світ Чаю,Coffee shop,5.0,11.0,,False,False,True


In [3]:
print('Number of rows and columns:', shops.shape)
shops.info()

Number of rows and columns: (200, 9)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 200 entries, 0 to 199
Data columns (total 9 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   Region           200 non-null    object 
 1   Place name       200 non-null    object 
 2   Place type       200 non-null    object 
 3   Rating           198 non-null    float64
 4   Reviews          198 non-null    float64
 5   Price            122 non-null    object 
 6   Delivery option  200 non-null    bool   
 7   Dine in option   200 non-null    bool   
 8   Takeout option   200 non-null    bool   
dtypes: bool(3), float64(2), object(4)
memory usage: 10.1+ KB


In [4]:
# taking care of duplicates and/or misspelled words
def correct_names(places: list):
    for idx, place in enumerate(places.lower()):
        if place.startswith('dim ka'):
            places[idx] = 'dim kavy'
        elif 'gangster' in place:
            places[idx] = 'gangster coffee shop'
        elif 'aroma' in place:
            places[idx] = 'aroma kava (coffee)'
        elif 'art coffee' in place:
            places[idx] = 'art coffee'
        elif place.startswith('смажимо каву'):
            places[idx] = 'смажимо каву'
        elif 'my coffee' in place:
            pla
        

In [5]:
shops['Place name'].unique().tolist()

['Dim Kavu',
 'Коферум',
 'Кофейня Світ Чаю',
 'Кофейня Starcoff',
 'Кофейня "Friend Zone"',
 'Racers Coffee Shop',
 'Займемся Кофе',
 'Кофейня Rit Rit',
 "Кав'ярня My coffee",
 'LENЬ. Coffee & desserts.',
 'SOVA COFFEE',
 'Кава Тайм',
 'Skver кафе',
 'Кафе на Георгіївській',
 'Khosper',
 'Lekontina Шоколадна Майстерня',
 'Lecker',
 'Veterano Coffee',
 'VEIN',
 'Coffee Drive',
 'G COFFEE',
 'Kavun',
 'Buns Brew Bar',
 'Coffee House',
 '"Точка кофе"',
 'Your Coffee',
 'KOFEiN',
 "Perfect Coffee, КАВ'ЯРНЯ",
 'Misceva kavyarnya',
 'Dzhi',
 'Gangster coffee shop',
 'Crema Caffe Poltava',
 'COFFBOY',
 'Wake Up Coffee',
 'Lviv Handmade Chocolate',
 'ПЕРША ДЕГУСТАЦІЙНА ЗАЛА КАВИ "GANGSTER_COFFEE SHOP 3"',
 'Aroma kava',
 'Koffishka',
 'ЗАКУТОК - coffee hookah point',
 'CoffeePot',
 'Coffee 66',
 'Godshot Coffee',
 'Verona',
 'Prostir.coffee Таврик',
 'I love coffee',
 'coffee House',
 'Дом Кофе',
 'CoffeeOk',
 'Кофе с Совой',
 'Кофе В Херсоне',
 'Prostir.coffee',
 'Don Marco coffee shop',
 'H