# Getting city names listed in Global Properties SQM data

The global properties sqm data only contains country names, whereas I want the city names. However, the website that contains the data stipulates that the data is for the "capital city" for each entry, therefore we can simply lookup the capital city for each country using a separate data set.

## Capital city data
##### Create a dict out of the capital cities csv

In [63]:
import csv

capital_city_csv_path = r"../data/country_capital_city.csv"

with open(capital_city_csv_path, 'r') as csv_file:
    # Create a CSV reader
    csv_reader = csv.DictReader(csv_file)
    country_capital_dict = {row['Country']:row['Capital City'] for row in csv_reader}

### Global SQM data
##### Read Global SQM data into a dataframe

In [64]:
import pandas as pd

global_sqm_csv_path = r"../data/global_sqm_prices.csv"
df = pd.read_csv(global_sqm_csv_path)
df.head()

Unnamed: 0,Country/City,Buying Price\nUS $ per Sq. M.,Price/Rent\nRatio (x),Rent per\nMonth ($ or €),Gross\nRental Yield
0,Hong Kong,"$ 23,695",36x,"$ 2,149",2.78%
1,Singapore,"$ 16,120",21x,"$ 5,075",4.74%
2,United Kingdom,"$ 15,125",16x,"€1,999",6.21%
3,France,"$ 14,808",25x,"€1,441",4.06%
4,Israel,"$ 13,820",30x,"$ 1,615",3.30%


##### Map country:capital city values into a new 'City' column

In [65]:
df['Capital City'] = df['Country/City'].map(country_capital_dict)
df.head()

Unnamed: 0,Country/City,Buying Price\nUS $ per Sq. M.,Price/Rent\nRatio (x),Rent per\nMonth ($ or €),Gross\nRental Yield,Capital City
0,Hong Kong,"$ 23,695",36x,"$ 2,149",2.78%,
1,Singapore,"$ 16,120",21x,"$ 5,075",4.74%,Singapore
2,United Kingdom,"$ 15,125",16x,"€1,999",6.21%,London
3,France,"$ 14,808",25x,"€1,441",4.06%,Paris
4,Israel,"$ 13,820",30x,"$ 1,615",3.30%,Jerusalem (very limited international recognit...


##### Check if values have been mapped correctly

In [66]:
df[df['Capital City'].isna()]

Unnamed: 0,Country/City,Buying Price\nUS $ per Sq. M.,Price/Rent\nRatio (x),Rent per\nMonth ($ or €),Gross\nRental Yield,Capital City
0,Hong Kong,"$ 23,695",36x,"$ 2,149",2.78%,
5,Taiwan,"$ 10,955",48x,$ 984,2.09%,
22,Czech Republic,"$ 5,227",26x,€928,3.88%,
37,Puerto Rico,"$ 2,700",13x,n.a.,8.05%,
46,Turkey,"$ 1,955",18x,€418,5.55%,


We can see here that five countries have failed to map. After analysing the country_capital data this due to various reasons (e.g. Country not recognised and hence no capital, difference in spelling).

Due to the small amount of issues I will manually add these capital names.

In [67]:
df.loc[df['Country/City'] == 'Hong Kong', 'Capital City'] = 'Hong Kong'
df.loc[df['Country/City'] == 'Taiwan', 'Capital City'] = 'Taipei'
df.loc[df['Country/City'] == 'Puerto Rico', 'Capital City'] = 'San Juan'
df.loc[df['Country/City'] == 'Czech Republic', 'Capital City'] = 'Prague'
df.loc[df['Country/City'] == 'Turkey', 'Capital City'] = 'Ankara'

In [69]:
df[df['Capital City'].isna()]

Unnamed: 0,Country/City,Buying Price\nUS $ per Sq. M.,Price/Rent\nRatio (x),Rent per\nMonth ($ or €),Gross\nRental Yield,Capital City
