# City Touristic Indexes Join

Jupyter Notebook for the creation of the the join of all the city touristic indexes.

____________________________________________________________

For this process, the following CSV files are needed: 
* ```all_indexes_by_city.csv```
* ```city_tourist_arrivals_euromonitor_2018.csv```
* ```out_city_swp.csv```

The first CSV file has been generated by Marco Lupis, bechelor's degree student in Computer Science at the University of Modena and Reggio Emilia.

The second one has been manually extracted by the [2018 Euromonitor Report of the top 100 city destinations](https://go.euromonitor.com/white-paper-travel-2018-100-cities.html)

The last one must be generated running the ```1 - swp_index_web_scraping.ipynb``` Notebook that is contained in the ```9 - Touristic Indexes Web Scraping``` folder of this Repository.

____________________________________________________________

In particular, the following operations are going to be executed:
* Opening of the CSV dataset
* Merge of the different datasets

Lastly, the entire processed dump is going to be saved on disk in CSV format

In [1]:
# Libraries Import
import pandas as pd
import numpy as np 

pd.set_option('display.max_columns', None)

## File Paths
Please set your working directory paths.

In [2]:
# ******************* PATHS ********************+

# Dumps Directory Path
path_file_import = r'/Users/marcoterzulli/File/Scuola Local/Magistrale/Materiale Corsi Attuali/Tirocinio/Cartella di Lavoro/Archivi Dump di Lavoro/Import/'

# CSV Exports Directory Path
path_file_export = r'/Users/marcoterzulli/File/Scuola Local/Magistrale/Materiale Corsi Attuali/Tirocinio/Cartella di Lavoro/Archivi Dump di Lavoro/Export/'

## Dataset Import

#### City Search Engines Results

In [3]:
df_city_search_engines_results = pd.read_csv(path_file_import + 'all_indexes_by_city.csv', low_memory=False, index_col=[0])
df_city_search_engines_results = df_city_search_engines_results.reset_index(level=0)
df_city_search_engines_results = df_city_search_engines_results.rename(columns={"city": "City", "google": "N. of Google Res.", "booking": "N. of Booking Res.", "tripadvisor": "N. of Tripadvisor Res."})
df_city_search_engines_results

Unnamed: 0,City,N. of Google Res.,N. of Booking Res.,N. of Tripadvisor Res.
0,Austin,848.000.000,639.000,1.165
1,Wrocław,105.000.000,970.000,439.000
2,Innsbruck,117.000.000,288.000,306.000
3,Villefranche-sur-Saône,11.400.000,16.000,28.000
4,Zakopane,25.900.000,2.227,203.000
...,...,...,...,...
2398,Veneto,213.000.000,14.714,12.031
2399,Bastia,30.500.000,177.000,82.000
2400,Laramie,33.700.000,33.000,63.000
2401,Longyearbyen,15.300.000,12.000,51.000


#### Size of the Wikipedia Page Index (SWP)

In [4]:
df_city_swp = pd.read_csv(path_file_import + 'out_city_swp.csv', low_memory=False, index_col=[0])
df_city_swp

Unnamed: 0,ConferenceLocation,SWP
0,"Innsbruck, Tyrol, Austria",73456
1,"Austin, Texas, United States",178836
2,"Wrocław, Lower Silesian Voivodeship, Poland",136404
3,"Villefranche-sur-Saône, Auvergne-Rhône-Alpes, ...",27815
4,"Lisbon, Portugal",121730
...,...,...
2465,"Essex, Maryland, United States",71775
2466,"Bastia, Corsica, France",64516
2467,"Laramie, Wyoming, United States",10391
2468,"Shijiazhuang City, Hebei, China",87977


#### City Tourist Arrivals

In [5]:
df_city_tourist = pd.read_csv(path_file_import + 'city_tourist_arrivals_euromonitor_2018.csv', low_memory=False, index_col=[0])
df_city_tourist = df_city_tourist.reset_index(level=0)
df_city_tourist = df_city_tourist.rename(columns={"Arrivals(Millions)": "Tourist Arrivals"})
df_city_tourist = df_city_tourist.drop(columns=["Rank", "Country"])
df_city_tourist

Unnamed: 0,City,Tourist Arrivals
0,Hong Kong,29.26
1,Bangkok,24.17
2,London,19.23
3,Macao,18.93
4,Singapore,18.55
...,...,...
86,Porto,2.34
87,Rhodes,2.34
88,Rio de Janeiro,2.28
89,Krabi,2.26


## Join of the Datasets

In [6]:
df_city_swp["City"] = np.nan
df_city_swp["Country"] = np.nan
df_city_swp["State"] = np.nan

for index, row in df_city_swp.iterrows():

    splitted_location = row['ConferenceLocation'].split(', ')

    if splitted_location.__len__() == 3:
        df_city_swp.at[index, 'City'] = splitted_location[0]
        df_city_swp.at[index, 'State'] = splitted_location[1]
        df_city_swp.at[index, 'Country'] = splitted_location[2]
        
    elif splitted_location.__len__() == 2:
        df_city_swp.at[index, 'City'] = splitted_location[0]
        df_city_swp.at[index, 'Country'] = splitted_location[1]

df_city_swp

Unnamed: 0,ConferenceLocation,SWP,City,Country,State
0,"Innsbruck, Tyrol, Austria",73456,Innsbruck,Austria,Tyrol
1,"Austin, Texas, United States",178836,Austin,United States,Texas
2,"Wrocław, Lower Silesian Voivodeship, Poland",136404,Wrocław,Poland,Lower Silesian Voivodeship
3,"Villefranche-sur-Saône, Auvergne-Rhône-Alpes, ...",27815,Villefranche-sur-Saône,France,Auvergne-Rhône-Alpes
4,"Lisbon, Portugal",121730,Lisbon,Portugal,
...,...,...,...,...,...
2465,"Essex, Maryland, United States",71775,Essex,United States,Maryland
2466,"Bastia, Corsica, France",64516,Bastia,France,Corsica
2467,"Laramie, Wyoming, United States",10391,Laramie,United States,Wyoming
2468,"Shijiazhuang City, Hebei, China",87977,Shijiazhuang City,China,Hebei


In [7]:
df_city_turistic_indexes = pd.merge(left=df_city_search_engines_results, right=df_city_tourist, on="City", how="left")
df_city_turistic_indexes = pd.merge(left=df_city_turistic_indexes, right=df_city_swp, on="City", how="right")

# column sort
df_city_turistic_indexes = df_city_turistic_indexes.reindex(sorted(df_city_turistic_indexes.columns), axis=1)

df_city_turistic_indexes

Unnamed: 0,City,ConferenceLocation,Country,N. of Booking Res.,N. of Google Res.,N. of Tripadvisor Res.,SWP,State,Tourist Arrivals
0,Innsbruck,"Innsbruck, Tyrol, Austria",Austria,288.000,117.000.000,306.000,73456,Tyrol,
1,Austin,"Austin, Texas, United States",United States,639.000,848.000.000,1.165,178836,Texas,
2,Wrocław,"Wrocław, Lower Silesian Voivodeship, Poland",Poland,970.000,105.000.000,439.000,136404,Lower Silesian Voivodeship,
3,Villefranche-sur-Saône,"Villefranche-sur-Saône, Auvergne-Rhône-Alpes, ...",France,16.000,11.400.000,28.000,27815,Auvergne-Rhône-Alpes,
4,Lisbon,"Lisbon, Portugal",Portugal,4.407,178.000.000,3.052,121730,,3.54
...,...,...,...,...,...,...,...,...,...
2465,Essex,"Essex, Maryland, United States",United States,762.000,1.540.000.000,1.375,71775,Maryland,
2466,Bastia,"Bastia, Corsica, France",France,177.000,30.500.000,82.000,64516,Corsica,
2467,Laramie,"Laramie, Wyoming, United States",United States,33.000,33.700.000,63.000,10391,Wyoming,
2468,Shijiazhuang City,"Shijiazhuang City, Hebei, China",China,16.000,9.810.000,77.000,87977,Hebei,


In [8]:
# Write of the resulting CSV on Disk
df_city_turistic_indexes.to_csv(path_file_export + 'out_city_turistc_indexes.csv')
print(f'Successfully Exported the Processed CSV to {path_file_export}out_city_turistc_indexes.csv')

Successfully Exported the Processed CSV to /Users/marcoterzulli/File/Scuola Local/Magistrale/Materiale Corsi Attuali/Tirocinio/Cartella di Lavoro/Archivi Dump di Lavoro/Export/out_city_turistc_indexes.csv


## Write of the Final CSV on Disk

Saving the resulting dataframe on disk in CSV format.

In [9]:
# Write of the resulting CSV on Disk
df_city_turistic_indexes.to_csv(path_file_export + 'out_city_turistc_indexes.csv')
print(f'Successfully Exported the Processed CSV to {path_file_export}out_city_turistc_indexes.csv')

Successfully Exported the Processed CSV to /Users/marcoterzulli/File/Scuola Local/Magistrale/Materiale Corsi Attuali/Tirocinio/Cartella di Lavoro/Archivi Dump di Lavoro/Export/out_city_turistc_indexes.csv


Check of the Exported CSV to be sure that everything went fine.

In [10]:
# Check of the Exported CSV
df_city_turistic_indexes_exported_csv = pd.read_csv(path_file_export + 'out_city_turistc_indexes.csv', low_memory=False, index_col=[0])
df_city_turistic_indexes_exported_csv

Unnamed: 0,City,ConferenceLocation,Country,N. of Booking Res.,N. of Google Res.,N. of Tripadvisor Res.,SWP,State,Tourist Arrivals
0,Innsbruck,"Innsbruck, Tyrol, Austria",Austria,288.000,117.000.000,306.000,73456,Tyrol,
1,Austin,"Austin, Texas, United States",United States,639.000,848.000.000,1.165,178836,Texas,
2,Wrocław,"Wrocław, Lower Silesian Voivodeship, Poland",Poland,970.000,105.000.000,439.000,136404,Lower Silesian Voivodeship,
3,Villefranche-sur-Saône,"Villefranche-sur-Saône, Auvergne-Rhône-Alpes, ...",France,16.000,11.400.000,28.000,27815,Auvergne-Rhône-Alpes,
4,Lisbon,"Lisbon, Portugal",Portugal,4.407,178.000.000,3.052,121730,,3.54
...,...,...,...,...,...,...,...,...,...
2465,Essex,"Essex, Maryland, United States",United States,762.000,1.540.000.000,1.375,71775,Maryland,
2466,Bastia,"Bastia, Corsica, France",France,177.000,30.500.000,82.000,64516,Corsica,
2467,Laramie,"Laramie, Wyoming, United States",United States,33.000,33.700.000,63.000,10391,Wyoming,
2468,Shijiazhuang City,"Shijiazhuang City, Hebei, China",China,16.000,9.810.000,77.000,87977,Hebei,
