# Citation and Conference Data Cleanup and Normalization

Jupyter Notebook for the cleanup and normalization of the conferences locations of the joined datasets.

Microsoft Academics Graph and DBLP use two different scheme of rapresentation for the locations.

For example, some locations are represented in the following format *City, State*, while while others in the *City, State, USA* format.<br>
Also, there are Locations that wrongly contains their Conference Name that needs to be filtered, or dates, or touristic locations, and so on.<br>

These different formats create ambiguity that we need to solve.
____________________________________________________________

For this process, the following CSV files are needed: ```out_citations_and_conferences.csv``` and ```out_citations_by_year_and_conferences.csv```. <br>
The first one must be generated running the Notebook ```2 - DBLP+MAG and COCI Data Join.ipynb``` that is contained in the ```3 - Citation and Conference Data Join``` folder of this project.<br>
The second one must be generated running the Notebook ```3 - DBLP + MAG Join with COCI RAW for By Year Citations.ipynb``` that is contained in the ```3 - Citation and Conference Data Join``` folder of this project.

In particular, the following operations are going to be executed:
* Opening of the CSV joined datasets
* Drop of the useless columns
* Manual Filter and Disambiguation of the main cases
* Removal of the conference name
* Location Sanitization and Normalization Using GeoPy
* Drop of the Location that only have the state (but not the city)

Lastly, the processed datasets are going to be saved on disk in CSV format

In [1]:
# Libraries Import
import pandas as pd
import numpy as np
from geopy.geocoders import Nominatim
import time

pd.set_option('display.max_columns', None)

## File Paths
Please set your working directory paths.

In [2]:
# ******************* PATHS ********************+

# Dumps Directory Path
path_file_import = r'/Users/marcoterzulli/File/Scuola Local/Magistrale/Materiale Corsi Attuali/Tirocinio/Cartella di Lavoro/Archivi Dump di Lavoro/Import/COCI_RAW/'

# CSV Exports Directory Path
path_file_export = r'/Users/marcoterzulli/File/Scuola Local/Magistrale/Materiale Corsi Attuali/Tirocinio/Cartella di Lavoro/Archivi Dump di Lavoro/Export/'

## Read of the Joined Datasets

In [3]:
df_citations_and_locations = pd.read_csv(path_file_export + 'out_citations_and_conferences.csv', low_memory=False, index_col=[0])
print(f'Successfully Imported the Conference Citations and Locations CSV')

df_citations_by_year_and_locations = pd.read_csv(path_file_export + 'out_citations_by_year_and_conferences.csv', low_memory=False, index_col=[0])
print(f'Successfully Imported the Conference Citations by Year and Locations CSV')

Successfully Imported the Conference Citations and Locations CSV
Successfully Imported the Conference Citations by Year and Locations CSV


### Conference Citations and Location

In [4]:
df_citations_and_locations.head(3)

Unnamed: 0,CitationCount_COCI,CitationCount_Mag,CitationCount_MagEstimated,ConferenceLocation,ConferenceNormalizedName,ConferenceTitle,Doi,OriginalTitle,Year
0,10,12,12,"Austin, TX",disc 2014,Distributed Computing - 28th International Sym...,10.1007/978-3-662-45174-8_28,The Adaptive Priority Queue with Elimination a...,2014
1,5,10,10,"Wrocław, Poland",esa 2014,Algorithms - ESA 2014 - 22th Annual European S...,10.1007/978-3-662-44777-2_60,Document Retrieval on Repetitive Collections,2014
2,11,20,20,"Innsbruck, Austria",enter 2013,Information and Communication Technologies in ...,10.1007/978-3-319-03973-2_13,SoCoMo Marketing for Travel and Tourism,2013


### Conference Citations by Year and Location

In [5]:
df_citations_by_year_and_locations.head(3)

Unnamed: 0,ConferenceLocation,ConferenceNormalizedName,ConferenceTitle,Doi,OriginalTitle,Year,1950,1951,1952,1953,1954,1955,1956,1957,1958,1959,1960,1961,1962,1963,1964,1965,1966,1967,1968,1969,1970,1971,1972,1973,1974,1975,1976,1977,1978,1979,1980,1981,1982,1983,1984,1985,1986,1987,1988,1989,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021,2022
0,"Austin, TX",disc 2014,Distributed Computing - 28th International Sym...,10.1007/978-3-662-45174-8_28,The Adaptive Priority Queue with Elimination a...,2014,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,1,1,0,2,1,2,0
1,"Wrocław, Poland",esa 2014,Algorithms - ESA 2014 - 22th Annual European S...,10.1007/978-3-662-44777-2_60,Document Retrieval on Repetitive Collections,2014,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,2,0,0,0,0,0
2,"Innsbruck, Austria",enter 2013,Information and Communication Technologies in ...,10.1007/978-3-319-03973-2_13,SoCoMo Marketing for Travel and Tourism,2013,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,3,2,0,1,1,0,0


## Drop of the Useless Columns
First of all, we're going to drop the columns that are not needed anymore.<br>
The following columns are going to be removed:
* ConferenceTitle: the full title of the conference. It's not defined for a lot a conferences.
* OriginalTitle: the paper's title. It's not defined for the most of the papers.

In [6]:
df_citations_and_locations.drop(columns=['ConferenceTitle', 'OriginalTitle'], inplace=True)
df_citations_by_year_and_locations.drop(columns=['ConferenceTitle', 'OriginalTitle'], inplace=True)

In [7]:
df_citations_and_locations.head(3)

Unnamed: 0,CitationCount_COCI,CitationCount_Mag,CitationCount_MagEstimated,ConferenceLocation,ConferenceNormalizedName,Doi,Year
0,10,12,12,"Austin, TX",disc 2014,10.1007/978-3-662-45174-8_28,2014
1,5,10,10,"Wrocław, Poland",esa 2014,10.1007/978-3-662-44777-2_60,2014
2,11,20,20,"Innsbruck, Austria",enter 2013,10.1007/978-3-319-03973-2_13,2013


In [8]:
df_citations_by_year_and_locations.head(3)

Unnamed: 0,ConferenceLocation,ConferenceNormalizedName,Doi,Year,1950,1951,1952,1953,1954,1955,1956,1957,1958,1959,1960,1961,1962,1963,1964,1965,1966,1967,1968,1969,1970,1971,1972,1973,1974,1975,1976,1977,1978,1979,1980,1981,1982,1983,1984,1985,1986,1987,1988,1989,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021,2022
0,"Austin, TX",disc 2014,10.1007/978-3-662-45174-8_28,2014,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,1,1,0,2,1,2,0
1,"Wrocław, Poland",esa 2014,10.1007/978-3-662-44777-2_60,2014,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,2,0,0,0,0,0
2,"Innsbruck, Austria",enter 2013,10.1007/978-3-319-03973-2_13,2013,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,3,2,0,1,1,0,0


## Conference Location Manual Cleanup
Before submitting the location data to an automatic location recognizer, I decided to manually cleanup and filter the most of the issues I found.

First of all we need to filter the papers that do not have a location:

In [9]:
original_rows = df_citations_and_locations.index.__len__()

df_citations_and_locations = df_citations_and_locations[df_citations_and_locations['ConferenceLocation'].notna()]
df_citations_by_year_and_locations = df_citations_by_year_and_locations[df_citations_by_year_and_locations['ConferenceLocation'].notna()]

actual_rows = df_citations_and_locations.index.__len__()

print(f"The operation filtered about {round(((original_rows - actual_rows) / 1000000), 1)}M of rows")

The operation filtered about 1.5M of rows


### Extraction of the Distinct Conferences Locations

Now, we're going to extract the distinct conferences locations:<br>
**Note**: since the two dataframes contain exactly the same papers and locations, the following operations are going to be executed only on a dataframe, and then replicated on the other.

In [10]:
locations_list = df_citations_and_locations.drop_duplicates(subset="ConferenceLocation")['ConferenceLocation'].tolist()

Filtering the locations that only have the state (but don't have the city): the don't need to be fixed.

In [11]:
new_locations_list = list()

for loc in locations_list:
    if loc.split(',').__len__() >= 2:
        new_locations_list.append(loc)

locations_list = new_locations_list
new_locations_list = None

### Creation of a Support Dictionary
We're going to create a support dictionary that's going to contain the locations and their fixed name.

In [12]:
locations_fix_dict = dict()

for loc in locations_list:
    locations_fix_dict[loc] = loc

### Fix of the Locations in the Format "City,state_acronym"
Some locations are in the format "City,state_acronym". We need to convert them to "City, STATE_ACRONYM".

For example: "Hamilton,nz" to "Hamilton, NZ"

In [13]:
for loc in locations_fix_dict.keys():
    if locations_fix_dict[loc].split(',').__len__() == 2 and locations_fix_dict[loc].split(',')[1].__len__() == 2:
        locations_fix_dict[loc] = str(locations_fix_dict[loc].split(',')[0] + ', ' + locations_fix_dict[loc].split(',')[1].upper())

### Fix of Some Extra Spacings

In [14]:
for loc in locations_fix_dict.keys():
    locations_fix_dict[loc] = locations_fix_dict[loc].replace(' ,', ',')

### Filter of the "- United State of America" and Other Special Cases

In [15]:
for loc in locations_fix_dict.keys():
    locations_fix_dict[loc] = locations_fix_dict[loc].replace(" - United States of America", "")
    locations_fix_dict[loc] = locations_fix_dict[loc].replace(" - United States", "")
    locations_fix_dict[loc] = locations_fix_dict[loc].replace(" - United Kingdom of Great Britain and Northern Ireland", "")
    locations_fix_dict[loc] = locations_fix_dict[loc].replace("Netherlands - Kingdom of the Netherlands", "The Netherlands")
    locations_fix_dict[loc] = locations_fix_dict[loc].replace("The Netherlands - Including", "The Netherlands")

### US, USA, U.S.A., U.S. and Other Special Cases

In [16]:
for loc in locations_fix_dict.keys():
    locations_fix_dict[loc] = locations_fix_dict[loc].replace("United States", "US")
    locations_fix_dict[loc] = locations_fix_dict[loc].replace("USA", "US")
    locations_fix_dict[loc] = locations_fix_dict[loc].replace("U.S.A.", "US")
    locations_fix_dict[loc] = locations_fix_dict[loc].replace("U.S.A", "US")
    locations_fix_dict[loc] = locations_fix_dict[loc].replace("USA.", "US")
    locations_fix_dict[loc] = locations_fix_dict[loc].replace("U.S.", "US")
    locations_fix_dict[loc] = locations_fix_dict[loc].replace("U.S", "US")
    locations_fix_dict[loc] = locations_fix_dict[loc].replace("US", "USA")

### United Kingdom, Great Bretain, and Other Special Cases

In [17]:
for loc in locations_fix_dict.keys():
    locations_fix_dict[loc] = locations_fix_dict[loc].replace("GB", "UK")
    locations_fix_dict[loc] = locations_fix_dict[loc].replace("United Kingdom", "UK")
    locations_fix_dict[loc] = locations_fix_dict[loc].replace("England", "UK")
    locations_fix_dict[loc] = locations_fix_dict[loc].replace("U.K.", "UK")
    locations_fix_dict[loc] = locations_fix_dict[loc].replace("U.K", "UK")
    locations_fix_dict[loc] = locations_fix_dict[loc].replace("G.B.", "UK")
    locations_fix_dict[loc] = locations_fix_dict[loc].replace("G.B", "UK")

### South Korea and Other Special Cases

In [18]:
for loc in locations_fix_dict.keys():
    locations_fix_dict[loc] = locations_fix_dict[loc].replace("S.Korea", "Korea (South)")
    locations_fix_dict[loc] = locations_fix_dict[loc].replace("S. Korea", "Korea (South)")

### "(near Place)" Case

Some places are in the following format: "place_name (near big_town_name), [...]"

We need to convert them in the following format: "big_town_name, [...]"

In [19]:
for loc in locations_fix_dict.keys():
    if " (near " in locations_fix_dict[loc]:
        locations_fix_dict[loc] = locations_fix_dict[loc].split(" (near ")[1].split(")")[0] + locations_fix_dict[loc].split(" (near ")[1].split(")")[1]

### "near Place" Case

Some places are in the following format: "place_name near big_town_name, [...]"

We need to convert them in the following format: "big_town_name, [...]"

In [20]:
for loc in locations_fix_dict.keys():
    if " near " in locations_fix_dict[loc]:
        locations_fix_dict[loc] = locations_fix_dict[loc].split(" near ")[1]

### Filtering the Conference Name
There are a small number of cases where the location wrongly contains the conference name. We need to filter it.

First, we try to filter some cases automatically.

In fact, in the most of the cases we have two formats:
* "CONF_NAME YEAR, Location"
* "CONF_NAME'YEAR, Location"
* "CONF_NAME-YEAR, Location"
* "CONF_NAME, Location": we'll address this case manually, since they are difficult to be distinguished from the normal locations

In [21]:
for loc in locations_fix_dict.keys():

    count = 0
    loc_splitted_list = locations_fix_dict[loc].split(',')
    needs_to_be_fixed = False

    if loc_splitted_list.__len__() >= 2:

        # Here we check the "CONF_NAME YEAR, Location" format
        if loc_splitted_list[0].split(' ').__len__() == 2 and loc_splitted_list[0].split(' ')[1].isnumeric():
            needs_to_be_fixed = True

        # Here we check the "CONF_NAME'YEAR, Location" format
        if loc_splitted_list[0].split("'").__len__() == 2 and loc_splitted_list[0].split("'")[1].isnumeric():
            needs_to_be_fixed = True

        # Here we check the "CONF_NAME-YEAR, Location" format
        if loc_splitted_list[0].split('-').__len__() == 2 and loc_splitted_list[0].split('-')[1].isnumeric():
            needs_to_be_fixed = True

        if needs_to_be_fixed:
            locations_fix_dict[loc] = ""

            for el in loc_splitted_list:
                if count == 0:
                    pass # the first element is the conference name
                else:
                    if str(el)[0] == ' ':
                        el = str(el)[1:] # Filtering the blank space
                        
                    if count == 1: # the first doesn't need the comma
                        locations_fix_dict[loc] += el
                    else:
                        locations_fix_dict[loc] += ', ' + el
                
                count += 1

Addressing other special conferences:

In [22]:
for loc in locations_fix_dict.keys():

    count = 0
    loc_splitted_list = locations_fix_dict[loc].split(',')
    needs_to_be_fixed = False

    if loc_splitted_list.__len__() >= 2:

        if "ASIC" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "COIN@AAMAS" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "EvoFIN, EvoSTOC" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "IEEE" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "EvoSTOC" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "IIT" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "VLSI" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "EvoFIN" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "IST" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "IMC" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "DEXA" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "CAMS" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "ACM" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "SC11" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "SCIDOCA" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "CBD" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "TBD" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "WAIM" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "KAIST" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "CompSysTech" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "ESupercomputing" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "HEC2016" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "BIRTE" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "IWANN2003" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "Web3D" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "WoTUG" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "XSEDE13" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "DBISP2P" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "Erlang" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "CNAM" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "PX/16" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "SESoS@ECOOP" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "WISE9" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "CDT&SECOMANE" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "Reengineering" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "Multimedia" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "Mobile" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "WBICV" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "DCSA, DC" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "FHPCN" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "WISA" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "P^3MA, WOPSSS" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "WOPSSS" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "HardBD" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "MoDeVVa" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "QLD" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "FedCSIS" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "ReConFig14" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "WGLBWS" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "ETAPS" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "SoMeT_17" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "PARLE" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "Banff" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "Informatics" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "NLP&DBpedia" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "SIGGRAPH" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "DNA8" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "SGAI" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "DCNET" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "Meta4eS" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "ARRAY@PLDI" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "ALSIP, SocNet, BigPMA" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "ITiCSE" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "Supercomputersystemen" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "ITEE2013" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "CSP" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "Algorithmics" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "China" in loc_splitted_list[0]: # not a conference, but threated in the same way
            needs_to_be_fixed = True
        elif "UK" in loc_splitted_list[0]: # not a conference, but threated in the same way
            needs_to_be_fixed = True
        elif "BigNovelTI, SW4CH" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "SW4CH" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "M2P" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "DC" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "Society" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "IoTPTS@AsiaCCS" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "PPREW@ACSAC" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "Eurasia" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "DUI" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "Parallel" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "Education" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "Humanity" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "NLP" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "MSA" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "Modeling" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "MoDIC" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "WM2SP" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "QoIS" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "ETheCoM" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "XSDM" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "Virtual Event" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "Workshops" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "1992" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "SMAP" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "MaLTeSQuE@SANER" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "Mobile" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "TRNC" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "the UK & Ireland Computing Education Research Conference" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "WBDB.cn" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "QUOVADIS" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "ULSSIS@ICSE" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "Training" in loc_splitted_list[0]:
            needs_to_be_fixed = True
        elif "ICoC" in loc_splitted_list[0]:
            needs_to_be_fixed = True

        if needs_to_be_fixed:
            locations_fix_dict[loc] = ""

            for el in loc_splitted_list:
                if count == 0:
                    pass # the first element is the conference name
                else:
                    if str(el)[0] == ' ':
                        el = str(el)[1:] # Filtering the blank space
                        
                    if count == 1: # the first doesn't need the comma
                        locations_fix_dict[loc] += el
                    else:
                        locations_fix_dict[loc] += ', ' + el
                
                count += 1

Virtual events:

In [23]:
for loc in locations_fix_dict.keys():
    locations_fix_dict[loc] = locations_fix_dict[loc].replace("Virtual Event / ", "")

Manual filter:

In [24]:
locations_fix_dict["York, UK / 2nd AAMAS 2002"] = "York, UK"
locations_fix_dict["Eugene, OR, USA / 2nd IWOMP 2006"] = "Eugene, OR, USA"
locations_fix_dict["Ausbildung, INFOS'95, Chemnitz"] = "Ausbildung, Chemnitz"

### Filter of Universities

In [25]:
for loc in locations_fix_dict.keys():

    count = 0
    loc_splitted_list = locations_fix_dict[loc].split(',')
    needs_to_be_fixed = False

    if loc_splitted_list.__len__() >= 2:

        if "University of " in loc_splitted_list[0]:
            needs_to_be_fixed = True

        if " University" in loc_splitted_list[0]:
            needs_to_be_fixed = True

        if needs_to_be_fixed:
            locations_fix_dict[loc] = loc_splitted_list[0].replace("University of ", "")
            locations_fix_dict[loc] = loc_splitted_list[0].replace(" University", "")

            for el in loc_splitted_list:
                if count >= 1:
                    if str(el)[0] == ' ':
                        el = str(el)[1:] # Filtering the blank space
                        
                        locations_fix_dict[loc] += ', ' + el
                
                count += 1

### Manual Fix of the Special Cases

The cases here are of various kind. We can have some mismatched caracter cases, or wrong spacings, or the indication of the place of the conference (such as hotels, etc).

These cases need to be addressed one by one.

In [26]:
locations_fix_dict["Lyon,\xa0France"] = "Lyon, France"
locations_fix_dict[", USA"] = "USA"
locations_fix_dict["CANCUN, Mexico"] = "Cancun, Mexico"
locations_fix_dict["Auckland, New Zealand, 8-12 August 2016"] = "Auckland, New Zealand"
locations_fix_dict["IOWA STATE UNIVERSITY, USA"] = "Iowa, USA"
locations_fix_dict["No.1, Dai Co Viet Rd, Hanoi, Vietnam"] = "Hanoi, Vietnam"
locations_fix_dict["Guilin,Guangxi, China"] = "Guilin, Guangxi, China"
locations_fix_dict["Gyeongju, Republic of Korea - March"] = "Gyeongju, Republic of Korea"
locations_fix_dict["Harbin,China"] = "Harbin, China"
locations_fix_dict["Washington, D. C., USA"] = "Washington D.C., USA"
locations_fix_dict["Funchal, Madeira - Portugal"] = "Funchal, Madeira, Portugal"
locations_fix_dict["Kuantan, Pahang, MALAYSIA"] = "Kuantan, Pahang, Malaysia"
locations_fix_dict["Phoenix Park, PyeongChang,, Korea (South)"] = "Phoenix Park, PyeongChang, Korea (South)"
locations_fix_dict["EvoFIN, EvoSTOC, Germany"] = "Germany"
locations_fix_dict["Prague,"] = "Prague"
locations_fix_dict[", York, UK"] = "York, UK"
locations_fix_dict["Royal Continental Hotel,Naples, Italy"] = "Naples, Italy"
locations_fix_dict["Puebla, MEXICO"] = "Puebla, Mexico"
locations_fix_dict["Jun 16-20, 2008"] = ""
locations_fix_dict["Taipei, Taiwan, August 29-31, 2012."] = "Taipei, Taiwan"
locations_fix_dict["YORK, UK"] = "York, UK"
locations_fix_dict["Kuala Lumpur, Malaysia."] = "Kuala Lumpur, Malaysia"
locations_fix_dict["Brisbane Convention & Exhibition Centre, Brisbane, Australia"] = "Brisbane, Australia"
locations_fix_dict["Vienna University of Technology, Vienna"] = "Vienna, Austria"
locations_fix_dict["Hammamet,Tunisia"] = "Hammamet, Tunisia"
locations_fix_dict["MIT, Cambridge, USA"] = "Cambridge, USA"
locations_fix_dict["Cumbria, United, Kngdm"] = "Cumbria, UK"
locations_fix_dict["Hilton Hotel Cyprus, Nicosia"] = "Cyprus, Nicosia"
locations_fix_dict["changsha, China"] = "Changsha, China"
locations_fix_dict["Durham, NC USA"] = "Durham, NC, USA"
locations_fix_dict["International, Mykonos Island, Greece"] = "Mykonos Island, Greece"
locations_fix_dict["GUNTUR, Vijayawada, PIN 622510,in"] = "Vijayawada, IN"
locations_fix_dict["Bolzano-Bozen, Italy"] = "Bolzen, Italy"
locations_fix_dict["Providence, RI,"] = "Providence, RI"
locations_fix_dict["Adisaptagram, Hooghly - 712121, India"] = "Adisaptagram, Hooghly, India"
locations_fix_dict["Alexandria, Virginia, U.S."] = "Alexandria, Virginia, USA"
locations_fix_dict["guilin, china"] = "Guilin, China"
locations_fix_dict["Washington, D.C. (USA)"] = "Washington D.C., USA"
locations_fix_dict["San, Diego, CA, USA"] = "San Diego, CA, USA"
locations_fix_dict["Kinsdale,"] = "Kinsdale"
locations_fix_dict["Bhubaneswar,India."] = "Bhubaneswar, India"
locations_fix_dict["Beijing, People's Republic of China"] = "Beijing, China"
locations_fix_dict["DARMSTADT, Germany."] = "Darmstadt, Germany"
locations_fix_dict["singapore, Singapore"] = "Singapore, Singapore"
locations_fix_dict["St.-Petersburg, Russia"] = "St. Petersburg, Russia"
locations_fix_dict["Suwon, Korea,"] = "Suwon, Korea"
locations_fix_dict["Curium Palace Hotel, Limassol, Cyprus"] = "Limassol, Cyprus"
locations_fix_dict["Vilanova i la Geltru, Barcelona, Spain"] = "Barcelona, Spain"
locations_fix_dict["Vancouver Convention Center, Vancouver CANADA "] = "Vancouver, Canada"
locations_fix_dict["Chiang Mai,, Thailand"] = "Chiang Mai, Thailand"
locations_fix_dict["DIVANI PALACE ACROPOLIS Athens, Greece"] = "Athens, Greece"
locations_fix_dict["Greenwich, London (UK)"] = "London, UK"
locations_fix_dict["Madrid,Spain"] = "Madrid, Spain"
locations_fix_dict["Chongqing,China"] = "Chongqing, China"
locations_fix_dict["Training, Atlanta, GA, USA"] = "Atlanta, GA, USA"
locations_fix_dict["denver, CA, USA"] = "Denver, CA, USA"
locations_fix_dict["HANGZHOU, PEOPLE'S REPUBLIC OF CHINA"] = "Hangzhou, China"
locations_fix_dict["Portland, Oregon, June 18-19, 2015"] = "Portland, Oregon"
locations_fix_dict["UK, Guildford, United Kingdom"] = "Guildford, UK"
locations_fix_dict["London (Guildford), United Kingdom"] = "London, UK"
locations_fix_dict["MIT, Cambridge, U.S.A"] = "Cambridge, USA"
locations_fix_dict["54 on Bath, Rosebank, Johannesburg, South Africa"] = "Rosebank, Johannesburg, South Africa"
locations_fix_dict["hONOLULU, hAWAII"] = "Honolulu, Hawaii"
locations_fix_dict["Hefei, P.R.China"] = "Hefei, China"
locations_fix_dict["National Ilan Unviersity, I-Lan, Taiwan"] = "I-Lan, Taiwan"
locations_fix_dict["Galt House Hotel, Louisville, Kentucky, USA - United States"] = "Kentucky, USA - United States"
locations_fix_dict["HIROSHIMA, JAPAN"] = "Hiroshima, Japan"
locations_fix_dict["UK, Bradford, UK"] = "Bradford, UK"
locations_fix_dict["ETH Zürich, Zurich, Switzerland"] = "Zurich, Switzerland"
locations_fix_dict["THE FAIRMONT, SAN JOSE, CA"] = "San Jose, CA"
locations_fix_dict["Shenzhen, China (collocated with HPCA)"] = "Shenzhen, China"
locations_fix_dict["Birmingham City Univ, UK"] = "Birmingham, UK"
locations_fix_dict["Dublin City, Univ., Ireland"] = "Dublin, Ireland"
locations_fix_dict["Saint John's, Newfoundland and Labrador,"] = ""
locations_fix_dict["ANNECY, FRANCE - IMPERIAL PALACE"] = "Annecy, France"
locations_fix_dict["Nanyang Technological University, Singapore"] = "Nanyang, Singapore"
locations_fix_dict["San Francisco Bay Area, USA"] = "San Francisco, USA"
locations_fix_dict["TU Berlin, Berlin, Germany"] = "Berlin, Germany"
locations_fix_dict["Grecian Bay Hotel, Ayia Napa, Cyprus"] = "Ayia Napa, Cyprus"
locations_fix_dict["Aristi Village, Zagorochoria, Greece"] = "Zagorochoria, Greece"
locations_fix_dict["KENITRA, MA"] = "Kinitra, MA"
locations_fix_dict["Exeter College, Oxford, UK - UK"] = "Exeter College, Oxford, UK"
locations_fix_dict["2008"] = ""
locations_fix_dict["UK, Edinburgh, UK"] = "Edinburgh, UK"
locations_fix_dict["Bhubaneswar,Odisha, India"] = "Bhubaneswar, Odisha, India"
locations_fix_dict["Hyatt Harborside, Boston, Massachusetts, USA"] = "Boston, Massachusetts, USA"
locations_fix_dict["HERAKLION, CRETE, GREECE"] = "Crete, Greece"
locations_fix_dict["Podebrady (near Prague), Czech Republic"] = "Prague, Czech Republic"
locations_fix_dict["Holiday Inn Express & Suites Ottawa Airport, Canada"] = "Ottawa, Canada"
locations_fix_dict["University of Koblenz-Landau, Koblenz, G"] = "Koblenz-Landau, Koblenz, Germany"
locations_fix_dict["Houston, Texas,us"] = "Houston, Texas, USA"
locations_fix_dict["BHUBANESWAR, INDIA"] = "Bhubaneswar, Odisha, India"
locations_fix_dict["Millennium Hall, Addis Ababa ETHIOPIA"] = "Addis Ababa, Ethiopia"
locations_fix_dict["Neubiberg, Germany, Germany"] = "Neubiberg, Germany"
locations_fix_dict["9.6/11.6, Brno, Czech Republic"] = "Brno, Czech Republic"
locations_fix_dict["K.lo Alto,, California, USA"] = "K.lo Alto, California, USA"
locations_fix_dict["Kassel, 2.-6, Universität, Kassel"] = "Kassel, Germany"
locations_fix_dict["IBM Germany, Wildbad"] = "Wildbad, Germany"
locations_fix_dict["IBM Germany, Heidelberg"] = "Heidelberg, Germany"
locations_fix_dict["USA, Sendai, Japan"] = "Sendai, Japan"
locations_fix_dict["Los Angeles, USA, Studio"] = "Los Angeles, USA"
locations_fix_dict["Anaheim, USA, VR Village"] = "Anaheim, USA"
locations_fix_dict["Anaheim, USA, Studio"] = "Anaheim, USA"
locations_fix_dict["Orlando Area, Florida, United States"] = "Orlando, Florida, United States"
locations_fix_dict["San Diego, CA, United States"] = "San Diego, California, United States"
locations_fix_dict["Universidad de Zaragoza, Zaragoza, Spain"] = "Zaragoza, Spain"
locations_fix_dict["Eurasia, St. Petersburg, Russian Federation"] = "St. Petersburg, Russia"
locations_fix_dict["Danang, Viet Nam -"] = "Danang, Vietnam"
locations_fix_dict["Büro, Dresden"] = "Dresden, Germany"
locations_fix_dict["Büro, Oldenburg"] = "Oldenburg, Germany"
locations_fix_dict["Büro, Darmstadt"] = "Darmstadt, Germany"
locations_fix_dict["Büro, Braunschweig"] = "Braunschweig, Germany"
locations_fix_dict["Büro, Zürich"] = "Zürich, Switzerland"
locations_fix_dict["Büro, Freiburg"] = "Freiburg, Germany"
locations_fix_dict["Büro, Ulm"] = "Ulm, Germany"
locations_fix_dict["Modeling, Houston, TX, USA"] = "Houston, Texas, USA"
locations_fix_dict["Universal Village, Boston, MA, USA"] = "Boston, MA, USA"
locations_fix_dict["Ghent, Belgium (Virtual Event)"] = "Ghent, Belgium"
locations_fix_dict["Buenos Aires - Argentina"] = "Buenos Aires, Argentina"

### Inserting the Fixed Locations to the Locations in the Original Dataframe

In [27]:
df_citations_and_locations = df_citations_and_locations.replace({"ConferenceLocation": locations_fix_dict})
df_citations_by_year_and_locations = df_citations_by_year_and_locations.replace({"ConferenceLocation": locations_fix_dict})

### Filter of the Papers that Only Have the Conference State (But Not the Cities)

Reset the indexes:

In [28]:
df_citations_and_locations = df_citations_and_locations.reset_index(drop=True)
df_citations_by_year_and_locations = df_citations_by_year_and_locations.reset_index(drop=True)

Row drop for the citation and locations dataset:

In [29]:
row_to_be_dropped_list = list()

for index, row in df_citations_and_locations.iterrows():
    if row["ConferenceLocation"].split(',').__len__() < 2:
        row_to_be_dropped_list.append(index)

df_citations_and_locations = df_citations_and_locations.drop(df_citations_and_locations.index[row_to_be_dropped_list])

Row drop for the citation by year and locations dataset:

In [30]:
row_to_be_dropped_list = list()

for index, row in df_citations_by_year_and_locations.iterrows():
    if row["ConferenceLocation"].split(',').__len__() < 2:
        row_to_be_dropped_list.append(index)

df_citations_by_year_and_locations = df_citations_by_year_and_locations.drop(df_citations_by_year_and_locations.index[row_to_be_dropped_list])

Reset the iindexes after the drop:

In [31]:
df_citations_and_locations = df_citations_and_locations.reset_index(drop=True)
df_citations_by_year_and_locations = df_citations_by_year_and_locations.reset_index(drop=True)

## Conference Location Automatic Cleanup and Normalization

### Extraction of the Distinct Conferences Locations

Now, we're going to extract the distinct conferences locations:<br>
**Note**: since the two dataframes contain exactly the same papers and locations, the following operations are going to be executed only on a dataframe, and then replicated on the other.

In [80]:
locations_list = df_citations_and_locations.drop_duplicates(subset="ConferenceLocation")['ConferenceLocation'].tolist()

locations_fix_dict = dict()

for loc in locations_list:
    locations_fix_dict[loc] = loc

### Location Extra Fixes

Extra fixes for some location that shown issues during the geopy processing.

In [81]:
locations_fix_dict["London, UK - UK"] = "London, UK"
locations_fix_dict["Capetown, South Africa (RSA)"] = "Capetown, South Africa"
locations_fix_dict["San Franci, USA"] = "San Francisco, USA"
locations_fix_dict["Novosibirs, Russia"] = "Novosibirsk, Russia"
locations_fix_dict["La Martinique, West Indies"] = "La Martinique, France"
locations_fix_dict["Chicago, Marriott, Chicago, Illinois, USA"] = "Chicago, Illinois, USA"
locations_fix_dict["Kos Island, Greece"] = "Kos, Greece"
locations_fix_dict["Ulsan, Korea (South)"] = "Ulsan, Korea"
locations_fix_dict["Anguilla, British West Indies"] = "Anguilla, AI"
locations_fix_dict["Foster City (Silicon Valley) CA, USA"] = "Foster City, CA, USA"
locations_fix_dict["Sheraton Inn, Ann Arbor, Michigan, USA"] = "Ann Arbor, Michigan, USA"
locations_fix_dict["Fuzhou, Fujuan Province, China"] = "Fuzhou, China"
locations_fix_dict["Playa Blanca, Lanzarote, Canary Islands, Spain"] = "Lanzarote, Canary Islands, Spain"
locations_fix_dict["Smart City, Shanghai, China"] = "Shanghai, China"
locations_fix_dict["J.W. Marriott, Los Angeles, California"] = "Los Angeles, California"
locations_fix_dict["Maarten, Netherlands Antilles"] = "Sint Maarten, Netherlands"
locations_fix_dict["Phoenix Convention Centre, Phoenix, Ariz"] = "Phoenix, Arizona"
locations_fix_dict["Laughborough, UK"] = "Loughborough, UK"
locations_fix_dict["Honololu, HI, USA"] = "Honolulu, HI, USA"
locations_fix_dict["Kohala Coast, Hawaii, USA"] = "Hawaii, USA"
locations_fix_dict["Hefei, Anhuis, P.R. China"] = "Hefei, China"
locations_fix_dict["Paris, France, Gold Coast, QLD, Australia"] = "Paris, France"
locations_fix_dict["Supercomputing, Baltimore, Maryland, USA"] = "Baltimore, Maryland, USA"
locations_fix_dict["Helsiniki, Finland"] = "Helsinki, Finland"
locations_fix_dict["St. Raphael Resort, Limassol, Cyprus"] = "Limassol, Cyprus"
locations_fix_dict["Tourism, Innsbruck, Austria"] = "Innsbruck, Austria"
locations_fix_dict["Konzepte, Technologien, Wien"] = "Wiens, Austria"
locations_fix_dict["Biarritz, France, San Sebastian, Spain"] = "Biarritz, France"
locations_fix_dict["Melbourne, Vicoria, Australia"] = "Melbourne, Australia"
locations_fix_dict["Monte Carlo Resort, Las Vegas, USA"] = "Las Vegas, USA"
locations_fix_dict["Hotel Sofitel,San Francisco Bay, USA"] = "San Francisco, USA"
locations_fix_dict["Kaohsiung, Taiwan, ROC"] = "Kaohsiung, Taiwan"
locations_fix_dict["Stirling, Scotland, Great Britain"] = "Stirling, Scotland, UK"
locations_fix_dict["Calabria, Southern Italy"] = np.nan # no city here
locations_fix_dict["Russia, Listvyanka, Russia"] = "Listvyanka, Russia"
locations_fix_dict["Toronto, Canada, Lyon, France"] = "Toronto, Canada"
locations_fix_dict["York, Toronto, Ontario, Canada"] = "York, Ontario, Canada"
locations_fix_dict["Polytechnique Montreal, in Montreal, QC,"] = "Montreal, Canada"
locations_fix_dict["Chania, Crete Island, Greece"] = "Chania, Greece"
locations_fix_dict["Thiruvanan, India"] = "Thiruvananthapuram, India"
locations_fix_dict["Saint Gilles, Reunion Island, France"] = "Saint Gilles, France"
locations_fix_dict["University of California, Santa Barbara"] = " Santa Barbara, USA"
locations_fix_dict["Fort Worth, Alabama, USA"] = "Alabama, USA"
locations_fix_dict["Supercomputing, Denver"] = "Denver, USA"
locations_fix_dict["Waikoloa, Big Island, USA"] = "Hawaii, USA"
locations_fix_dict["Smalltalk, Prague, Czech Republic"] = "Prague, Czech Republic"
locations_fix_dict["Hatfield, Herthfordshire, UK"] = "Hatfield, UK"
locations_fix_dict["Krystiansand, Norway"] = "Kristiansand, Norway"
locations_fix_dict["Smolenice castle, Slovakia"] = "Smolenice, Slovakia"
locations_fix_dict["Yanuca Island, Cuvu, Fiji"] = "Cuvu, Fijia"
locations_fix_dict["Menuires, The Three Valleys, French Alps"] = "Les Menuires, France"
locations_fix_dict["York, Toronto, Ontario, Canad"] = "York, Ontario, Canada"
locations_fix_dict["Kyongju, Korea"] = "Gyeongju, Korea"
locations_fix_dict["Heifei, China"] = "Hefei, China"
locations_fix_dict["Irsee Monastery, Bavaria, Germany"] = "Irsee, Bavaria, Germany"
locations_fix_dict["Vi a del Mar, Chile"] = "Viña del Mar, Chile"
locations_fix_dict["Cheju Island, Korea"] = "Jeju-do, Korea"
locations_fix_dict["Hinan Island, China"] = "Hainan Island, China"
locations_fix_dict["Phoenix Park, PyeongChang, Korea (South)"] = "PyeongChang, Korea"
locations_fix_dict["Puerto Rico, Dominican Rebuplic"] = "Puerto Rico, Dominican Republic"
locations_fix_dict["Metropolitan Area Nuremberg, Germany"] = "Nuremberg, Germany"
locations_fix_dict["Jeju Island, Korea (South)"] = "Jeju, Korea"
locations_fix_dict["Gazimagusa, TRNC, North Cyprus"] = "Famagosta, Cyprus"
locations_fix_dict["AI4EPT, Delhi, India"] = "Delhi, India"
locations_fix_dict["Kaosiung, Taiwan"] = "Kaohsiung, Taiwan"
locations_fix_dict["Hilton Waikoloa, Big Island"] = "Hawaii, USA"
locations_fix_dict["Aquila Atlantis Hotel, Greece"] = "Heraklion, Greece"
locations_fix_dict["Kuala Lump, Malaysia"] = "Kuala Lumpur, Malaysia"
locations_fix_dict["Regional of Blumenau (FURB), Blumenau, Brazil"] = "Blumenau, Brazil"
locations_fix_dict["Austin (Lakeway Resort), TX, USA"] = "Austin, TX, USA"
locations_fix_dict["Ischia Island, Naples, Italy"] = "Ischia, Italy"
locations_fix_dict["Olomouc, Czechoslovakia"] = "Olomouc, Czech"
locations_fix_dict["Santa Marg, Italy"] = "Santa Margherita Ligure, Italy"
locations_fix_dict["Noordweijkerhout, The Netherland"] = "Noordwijkerhout, The Netherlands"
locations_fix_dict["Nice, French Riviera, France"] = "Nice, France"
locations_fix_dict["Philadeliphia, USA"] = "Philadelphia, USA"
locations_fix_dict["Hangzhou, Zh;ejiang, China"] = "Hangzhou, China"
locations_fix_dict["Exeter College, Oxford, UK - UK"] = "Oxford, UK"
locations_fix_dict["Uml, Germany"] = np.nan #it's a conference
locations_fix_dict["Singpapore, Singapore"] = "Singapore, Singapore"
locations_fix_dict["Rio de Janeriro, Brazil"] = "Rio de Janeiro, Brazil"
locations_fix_dict["EvoSTOC, Málaga, Spain"] = "Málaga, Spain"
locations_fix_dict["Bautzen, GDR"] = "Bautzen, Germany"
locations_fix_dict["Rio de Janerio, Brazil"] = "Rio de Janeiro, Brazil"
locations_fix_dict["Montrél, Québec, Canada"] = "Montréal, Québec, Canada"
locations_fix_dict["Calimanesti-Caciucata, Romania"] = "Calimanesti, Romania"
locations_fix_dict["Roverto, Italy"] = "Rovereto, Italy"
locations_fix_dict["Reseach, Yokohama, Japan"] = "Yokohama, Japan"
locations_fix_dict["Laussane, Switzerland"] = " Lausanne, Switzerland"
locations_fix_dict["Corvalis, Oregon, USA"] = "Corvallis, Oregon, USA"
locations_fix_dict["Sorrento Peninsula, Italy"] = "Sorrento, Italy"
locations_fix_dict["Honolulu, USA, Durham, UK"] = "Honolulu, USA"
locations_fix_dict["Naha, Okinawaw, Japan"] = "Naha, Okinawa, Japan"
locations_fix_dict["St. Paphael Resort, Limassol, Cyprus"] = "Limassol, Cyprus"
locations_fix_dict["Capri Island (Naples),"] = "Capri, Italy"
locations_fix_dict["Warshaw, Poland"] = "Warsaw, Poland"
locations_fix_dict["Castle of Hagenberg, Austria"] = "Hagenberg, Austria"
locations_fix_dict["Terromolinos, Spain"] = "Torremolinos, Spain"
locations_fix_dict["Sint Maarten, Dutch Antilles"] = "Sint Maarten, The Netherlands"
locations_fix_dict["San Francsico, USA"] = "San Francisco, USA"
locations_fix_dict["The USA Grant, San Diego, CA, USA"] = "San Diego, CA, USA"
locations_fix_dict["Holiday Inn London, UK - UK"] = "London, UK"
locations_fix_dict["Germany, Bergen, Norway"] = "Bergen, Norway"
locations_fix_dict["Linz/Hagenberg, Austria"] = "Hagenberg, Austria"
locations_fix_dict["Bolzen, Italy"] = "Bozen, Italy"
locations_fix_dict["KDIR, Vienna, Austria"] = "Vienna, Austria"
locations_fix_dict["Utrecht, Netherlands, Hakodate, Japan"] = "Utrecht, Netherlands"
locations_fix_dict["Napa Valley, California, USA"] = "Napa County, California, USA"
locations_fix_dict["Christchurch, New Zealnad"] = "Christchurch, New Zealand"
locations_fix_dict["The Big Island, Hawaii, USA"] = "Hawaii, USA"
locations_fix_dict["Adisaptagram, Hooghly, India"] = "Hooghly, India"
locations_fix_dict["Russia, Kazan, Russia"] = "Kazan, Russia"
locations_fix_dict["Toulouse, Aarhus"] = "Toulouse, France"
locations_fix_dict["KyongJu City, Korea"] = "Gyeongjum, Korea"
locations_fix_dict["Conference Kanpur, India"] = "Kanpur, India"
locations_fix_dict["Trondheim, Norway, Seoul, Korea"] = "Trondheim, Norway"
locations_fix_dict["Kaoshiung, Taiwan, R.O.C."] = "Kaohsiung, Taiwan"
locations_fix_dict["Beijing, China (Virtual Event)"] = "Beijing, China"
locations_fix_dict["Ifrance, Morocco"] = "Ifrane, Morocco"
locations_fix_dict["Nuremburg, Germany"] = "Nuremberg, Germany"
locations_fix_dict["Kings Mannor, York,  UK"] = "York, UK"
locations_fix_dict["Florianápolis, Santa Catarina, Brazil"] = "Florianópolis, Santa Catarina, Brazil"
locations_fix_dict["Southhampton, UK"] = "Southampton, UK"
locations_fix_dict["Mancheester, UK"] = "Manchester, UK"
locations_fix_dict["Las Vegas Hilton, Las Vegas, NV, USA"] = "Las Vegas, NV, USA"
locations_fix_dict["Kohala Coast, HI, USA"] = "Kailua, HI, USA"
locations_fix_dict["Isle of Bendor, France"] = "Bendor, France"
locations_fix_dict["Aguascalientes City, Aguascalientes, Mex"] = "Aguascalientes, Mexico"
locations_fix_dict["Bay Garden, Saint Lucia"] = "Saint Lucia, Caribbean"
locations_fix_dict["Hilton San Jose, Bay Area, CA, USA"] = "Hilton San Jose, CA, USA"
locations_fix_dict["Cavtat, near, Dubrovnik, Croatia"] = "Dubrovnik, Croatia"
locations_fix_dict["Pibulsongkram Rajabhat, Phits"] = "Pibulsongkram Rajabhat, Thailand"
locations_fix_dict["Supercomputing, Portland, Oregon, USA"] = "Portland, Oregon, USA"
locations_fix_dict["Corfu, Greece, Heraklion, Crete, Greece"] = "Corfu, Greecee"
locations_fix_dict["Weidhofen/Ybbs, Austria"] = "Waidhofen, Austria"
locations_fix_dict["Mytilene, Lesvos Island, Greece"] = "Mytilene, Greece"
locations_fix_dict["Newcastle, United Kngdm"] = "Newcastle, UK"
locations_fix_dict["Plantation Island, Fiji"] = "Fiji, Fiji"
locations_fix_dict["Troia (Lisbon area), Portugal"] = "Lisbon, Portugal"
locations_fix_dict["Emei Mountain, Sichan Province, China"] = "Emei Mountain, China"
locations_fix_dict["Edmondon, Alberta, Canada"] = "Edmonton, Alberta, Canada"
locations_fix_dict["Yokohoma, Japan"] = "Yokohama, Japan"
locations_fix_dict["SECRYPT, Porto, Portugal"] = "Porto, Portugal"
locations_fix_dict["Fuduoka, Japan"] = "Fukuoka, Japan"
locations_fix_dict["Marrakech and Essaouira, Morocco"] = "Marrakech, Morocco"
locations_fix_dict["Siena-Tuscany, University of Siena, Ital"] = "Siena, Italy"
locations_fix_dict["Whistler Moutain, Canada "] = "Whistler Mountain, Canada"
locations_fix_dict["Kloster Irsee, Kaufbeuren, Germany"] = "Kaufbeuren, Germany"
locations_fix_dict["Santa Cantarina, Brazil"] = "Santa Catarina, Brazil"
locations_fix_dict["Tratanska Lomnica, Slovakia"] = "Tratranska Lomnica, Slovakia"
locations_fix_dict["Cheju Island, South Korea"] = "Jeju, Korea"
locations_fix_dict["Island of Oahu, Hawaii, USA"] = "Hawaii, USA"
locations_fix_dict["Seoul, Republic, of, Korea"] = "Seoul, Korea"
locations_fix_dict["Kibbutz Shefayim, Israel"] = "Shefayim, Israel"
locations_fix_dict["Dunedin, New Zealand - July"] = "Dunedin, New Zealand"
locations_fix_dict["Marina Del Ray, California, USA"] = "Marina Del Rey, California, USA"
locations_fix_dict["Salangor, Malaysia"] = "Selangor, Malaysia"
locations_fix_dict["Tsukuba-City, Ibarski, Japan"] = "Tsukuba, Ibaraki, Japan"
locations_fix_dict["Halifax, Novia Scotia, Canada"] = "Halifax, Nova Scotia, Canada"
locations_fix_dict["New Orleans, United State"] = "New Orleans, United States"
locations_fix_dict["Praha, Czechoslovakia"] = "Prague, Czech"
locations_fix_dict["Florianolpolis, Brazil"] = "Florianopolis, Brazil"
locations_fix_dict["Jeju City, Jeju Island, Korea (South)"] = "Jeju, Korea"
locations_fix_dict["Santa Marherita Ligure, Italy"] = "Santa Margherita Ligure, Italy"
locations_fix_dict["Dunhuang, Gansu, Chian"] = "Dunhuang, Gansu, China"
locations_fix_dict["Istanbul, Turkey, New York, NY, USA"] = "Istanbul, Turkey"
locations_fix_dict["Mauna Lani, Big Island, USA"] = "Hawaii, USA"
locations_fix_dict["The Fairmont San Jose, San Jose, CA"] = "San Jose, CA, USA"
locations_fix_dict["Puerto de Andratx, Mallorca, Spain"] = "Mallorca, Spain"
locations_fix_dict["Morehampstead, UK"] = "Moretonhampstead, UK"
locations_fix_dict["Gangneug, Korea"] = "Gangneung, Korea"
locations_fix_dict["InterContinental San Juan, Puerto Rico"] = "San Juan, Puerto Rico"
locations_fix_dict["Cambridge, UK - June 11 - 13"] = "Cambridge, UK"
locations_fix_dict["Smolenice Castle, Slovak Republic"] = "Smolenice, Slovak Republic"
locations_fix_dict["Supercomputing, Orlando, FL, USA"] = "Orlando, FL, USA"
locations_fix_dict["Scarborough, Trinidad, Tobago"] = "Scarborough, Tobagon"
locations_fix_dict["Shenzhen, China(collocated with PPoPP)"] = "Shenzhen, China"
locations_fix_dict["Trento, Italy, Toulouse, France"] = "Trento, Italy"
locations_fix_dict["Kuala Lumlur, Malaysia"] = "Kuala Lumpur, Malaysia"
locations_fix_dict["St. Maarten, Netherlands Antilles"] = "St. Maarten, Netherlands"
locations_fix_dict["Vijayawada, Guntur District, Andhra Pradesh, India"] = "Vijayawada, India"
locations_fix_dict["Fredericton, N.B., Canada"] = "Fredericton, Canada"
locations_fix_dict["Workshops, Limassol"] = "Limassol, Cyprus"
locations_fix_dict["Amelia Island Plantation, Florida, USA"] = "Amelia Island, Florida, USA"
locations_fix_dict["Naples, Italy, EU"] = "Naples, Italy"
locations_fix_dict["Kennesaw, Georgia, Alabama, USA"] = "Kennesaw, Georgia, USA"
locations_fix_dict["Norköping, Sweden"] = "Norrköping, Sweden"
locations_fix_dict["Morelia, Michocán, Mexico"] = "Morelia, Michoacan, Mexico"
locations_fix_dict["Zurich, Switerland"] = "Zurich, Switzerland"
locations_fix_dict["Tsukuba Science City, JAPAN"] = "Tsukuba, JAPAN"
locations_fix_dict["McLean, VA, Washington, DC, USA, USA"] = "McLean, VA, USA"
locations_fix_dict["Marciana Marina, Elba Island, Italy"] = "Marciana Marina, Italy"
locations_fix_dict["Hamammet, Tunisia"] = "Hammamet, Tunisia"
locations_fix_dict["London, UK - December"] = "London, UK"
locations_fix_dict["St. Petersburg Beach, FL, USA"] = "St. Pete Beach, FL, USA"
locations_fix_dict["Oviedo and Mieres (Asturias), Spain"] = "Mieres, Spain"
locations_fix_dict["Supercomputing, San Jose, USA"] = "San Jose, USA"
locations_fix_dict["Tourism, Amsterdam, The Netherlands"] = "Amsterdam, The Netherlands"
locations_fix_dict["Kona (Big Island), Hawaii, USA"] = "Hawaii, USA"
locations_fix_dict["Salishan Lodge, Gleneden Beach, Oregon, USA"] = "Gleneden Beach, Oregon, USA"
locations_fix_dict["San Fracisco, CA, USA"] = "San Francisco, CA, USA"
locations_fix_dict["Zagorochoria, Greece"] = "Zagori, Greece"
locations_fix_dict["Krackow, Poland"] = "Krakow, Poland"
locations_fix_dict["Hong Kong, China, Kuala Lumpur, Malaysia"] = "Hong Kong, China"
locations_fix_dict["Kuala Lumpur, Malaysia, Bali, Indonesia"] = "Kuala Lumpur, Malaysia"
locations_fix_dict["Taganrog, Rostov-on-Don, Russia"] = "Taganrog, Russia"
locations_fix_dict["Menschen, Klagenfurt, Austria"] = "Klagenfurt, Austria"
locations_fix_dict["Spinderluv Mlyn, Czech Republic"] = "Spindleruv Mlyn, Czech"
locations_fix_dict["Tbilisi, Georgia, USASR"] = "Tbilisi, Georgia, USA"
locations_fix_dict["Frankfurt Trade Fair, Germany"] = "Frankfurt, Germany"
locations_fix_dict["Bratislava, Czechoslovakia"] = "Bratislava, Czech"
locations_fix_dict["Carlsbad, Czechoslovakia"] = "Carlsbad, Czech"
locations_fix_dict["6th of October, Giza Province, Egypt"] = "Giza, Egypt"
locations_fix_dict["Benelux, Eindhoven, The Netherlands"] = "Eindhoven, The Netherlands"
locations_fix_dict["Manoir St-Castin, Quebec, Canada"] = "Lac-Beauport, Quebec, Canada"
locations_fix_dict["Tourism, Lausanne, Switzerland"] = "Lausanne, Switzerland"
locations_fix_dict["Zagreb, Yugoslavia"] = "Zagreb, Croatia"
locations_fix_dict["Sesimbra-Lisbon, Portugal"] = "Lisbon, Portugal"
locations_fix_dict["Hyatt, Orlando, Kissimee, Florida, USA"] = "Kissimmee, Florida, USA"
locations_fix_dict["Frauenwörth Cloister, Germany"] = "Frauenwörth, Germany"
locations_fix_dict["Washingon, DC, DC, USA"] = "Washington DC, USA"
locations_fix_dict["Kiawah Island Resort, SC, USA"] = "Charleston, SC, USA"
locations_fix_dict["Middleboxes, London, UK"] = "London, UK"
locations_fix_dict["Barelona, Catalonia, Spain"] = "Barcelona, Catalonia, Spain"
locations_fix_dict["Incline Villiage, Nevada, USA"] = "Incline Village, Nevada, USA"
locations_fix_dict["Thessalonica, Greece"] = "Thessaloniki, Greece"
locations_fix_dict["Lodz of Technology, Lodz, Pol"] = "Lodz, Poland"
locations_fix_dict["Innovations, Portland, Oregon, USA"] = "Portland, Oregon, USA"
locations_fix_dict["Saint Maarten, Netherlands, Antilles"] = "St. Maarten, Netherlands"
locations_fix_dict["Transputer, Aachen"] = "Aachen, Germany"
locations_fix_dict["Mustererkennung, Lübeck"] = "Lubeck, Germany"
locations_fix_dict["Worksharing, San Jose, USA"] = "San Jose, USA"
locations_fix_dict["Grenobles, France"] = "Grenoble, France"
locations_fix_dict["Fachgespräch, München"] = "Munich, Germany"
locations_fix_dict["Prague, Czech Repulbic"] = "Prague, Czech"
locations_fix_dict["Newcastle, United, Kngdm"] = "Newcastle, UK"
locations_fix_dict["Benelux, Luxembourg, Kirchberg, Luxembourg"] = "Benelux, Luxembourg"
locations_fix_dict["Nice, France, EU"] = "Nice, France"
locations_fix_dict["Palais de Beaulieu, Lausanne, Switzerlan"] = "Lausanne, Switzerland"
locations_fix_dict["International, Cavalese, Italy"] = "Cavalese, Italy"
locations_fix_dict["Tatihou Island, Normandie, France"] = "Tatihou, France"
locations_fix_dict["Royal Holloway, University of London, UK"] = "London, UK"
locations_fix_dict["Crystal City, Washington D.C., USA"] = "Washington D.C., USA"
locations_fix_dict["Portugal, Chicago, USA"] = "Chicago, USA"
locations_fix_dict["Brno, Czech Rep."] = "Brno, Czech"
locations_fix_dict["Twente, The Netherlands, Linköping, Sweden"] = "Twente, The Netherlands"
locations_fix_dict["Moscos, Russia"] = "Moscow, Russia"
locations_fix_dict["Kelona, BC, Canada"] = "Kelowna, BC, Canada"
locations_fix_dict["Tillburg, The Netherlands"] = "Tilburg, The Netherlands"
locations_fix_dict["Atlanta, GA, USA, Bristol, UK"] = "Atlanta, GA, USA"
locations_fix_dict["Cyprus, Turkey"] = "Cyprus, Cyprus"
locations_fix_dict["Balimore, Maryland, USA"] = "Baltimore, Maryland, USA"
locations_fix_dict["PoEM, Oslo, Norway"] = "Oslo, Norway"
locations_fix_dict["Los Angeles, CA, USA / 11th MOL 2009"] = "Los Angeles, CA, USA"
locations_fix_dict["Yellow Mountain, China"] = "Huangshan, China"
locations_fix_dict["Walworth Castle, County Durham, UK"] = "Darlington, UK"
locations_fix_dict["Toronto, ON, Canana"] = "Toronto, ON, Canada"
locations_fix_dict["San Antoni, USA"] = "San Antonio, USA"
locations_fix_dict["Glasgow, Portree, Isle of Skye, UK"] = "Glasgow, UK"
locations_fix_dict["RHUL, Egham, UK"] = "Egham, UK"
locations_fix_dict["Great City, Berlin, Germany"] = "Berlin, Germany"
locations_fix_dict["Mulitimedia, Warsaw, Poland"] = "Warsaw, Poland"
locations_fix_dict["Tepla Mona, Czech Republic"] = "Tepla, Czech"
locations_fix_dict["Gangnueng, Korea"] = "Gangneung, Korea"
locations_fix_dict["New Heights, Keystone, Colorado, USA"] = "Keystone, Colorado, USA"
locations_fix_dict["The Westin Pasadena, Pasadena, CA, USA"] = "Pasadena, CA, USA"
locations_fix_dict["The Grecian Bay, Cyprus"] = "Ayia Napa, Cyprus"
locations_fix_dict["London, UK, Paris, France"] = "London, UK"
locations_fix_dict["Intra, Lake Maggiore, Italy"] = "Intra, Italy"
locations_fix_dict["Katholieke Universiteit of Leuven, Belgi"] = "Leuven, Belgium"
locations_fix_dict["University of Salama, Spain"] = "Salama, Spain"
locations_fix_dict["Lund, Southern Sweden"] = "Lund, Sweden"
locations_fix_dict["C3S2E13, Porto, Portugal"] = "Porto, Portugal"
locations_fix_dict["Rio de Jan, Brazil"] = "Rio de Janeiro, Brazil"
locations_fix_dict["Paderbon, Germany"] = "Paderborn, Germany"
locations_fix_dict["Loughborou, UK"] = "Loughborough, UK"
locations_fix_dict["Middleboxes, Florianopolis, Brazil"] = "Florianopolis, Brazil"
locations_fix_dict["Vancouver Convention Center, Vancouver CANADA"] = "Vancouver, CANADA"
locations_fix_dict["Posnán, Poland"] = "Posnan, Poland"
locations_fix_dict["Tsukuba, Japan, Stockholm, Sweden"] = "Tsukuba, Japan"
locations_fix_dict["Australian National (ANU), Ca"] = "Canberra, Australia"
locations_fix_dict["Prague, Karlovy Vary, Czech Republic"] = "Prague, Czech"
locations_fix_dict["Eden Roc Renaissance Miami Beach, USA"] = "Miami, USA"
locations_fix_dict["IIVC, Halkidiki, Greece"] = "Halkidiki, Greece"
locations_fix_dict["Cryptography, Leuven, Belgium"] = "Leuven, Belgium"
locations_fix_dict["Italy, Hangzhou, China"] = "Hangzhou, China"
locations_fix_dict["Pasau, Germany"] = "Passau, Germany"
locations_fix_dict["Tsukuba Science City, Japan"] = "Tsukuba, Japan"
locations_fix_dict["Singapore, Singapore, The Hague, The Netherlands"] = "Singapore, Singapore"
locations_fix_dict["Jahrestagung, Vienna, Austria"] = "Vienna, Austria"
locations_fix_dict["Shenzen, China"] = "Shenzhen, China"
locations_fix_dict["Tornoto, Canada"] = "Toronto, Canada"
locations_fix_dict["Ljubljana, Yugoslavia"] = "Ljubljana, Slovenia"
locations_fix_dict["Biomedicine, Jena, Germany"] = "Jena, Germany"
locations_fix_dict["Supercomputing, Yorktown Heights, NY, USA"] = "Yorktown Heights, NY, USA"
locations_fix_dict["Philadelph, USA"] = "Philadelphia, USA"
locations_fix_dict["Tourism, Montreal, Canada"] = "Montreal, Canada"
locations_fix_dict["Sozhou, China"] = "Suzhou, China"
locations_fix_dict["Honolulu, Hawaii, Vancouver, Canada"] = "Honolulu, Hawaii"
locations_fix_dict["Kangwondo, South Korea"] = "Gangwon, Korea"
locations_fix_dict["EDMCC2, Munich, FRG"] = "Munich, Germany"
locations_fix_dict["Jahrestagung, Darmstadt"] = "Darmstadt, Germany"
locations_fix_dict["postponed [London, UK]"] = "London, UK"
locations_fix_dict["Perspectives, Aachen, Germany"] = "Aachen, Germany"
locations_fix_dict["Vesteras, Sweden"] = "Vasteras, Sweden"
locations_fix_dict["Arbeiten, München, 23.-24"] = "Munich, Germany"
locations_fix_dict["Taormina, Italy, Vienna, Austria"] = "Taormina, Italy"
locations_fix_dict["MSFP@ICFP, Baltimore, MD, USA"] = "Baltimore, MD, USA"
locations_fix_dict["Betrieb, Datenverarbeitung, München"] = "Munich, Germany"
locations_fix_dict["Schwelle, Düsseldorf"] = "Düsseldorf, Germany"
locations_fix_dict["Images, Buffalo, NY, USA"] = "Buffalo, NY, USA"
locations_fix_dict["Jeju Island, South Korea, Republic of"] = "Jeju Island, Korea"
locations_fix_dict["Produkte, Stuttgart, Germany"] = "Stuttgart, Germany"
locations_fix_dict["Auckland, New Zealand, Lucerne, Switzerland, Brazil"] = "Auckland, New Zealand"
locations_fix_dict["KDIR, Budapest, Hungary"] = "Budapest, Hungary"
locations_fix_dict["KDIR, Porto"] = "Porto, Portugal"
locations_fix_dict["Newark, Deleware, USA"] = "Deleware, USA"
locations_fix_dict["St. Paul, MN, USA, Dunedin, New Zealand"] = "St. Paul, MN, USA"
locations_fix_dict["São Paulo, Brazil, Porto, Portugal"] = "São Paulo, Brazil"
locations_fix_dict["Taipei, Taiwan, Lyon, France"] = "Taipei, Taiwan"
locations_fix_dict["Budapest, Hungary, Toronto, ON, Canada"] = "Budapest, Hungary"
locations_fix_dict["ITMAS, Taipei, Taiwan"] = "Taipei, Taiwan"
locations_fix_dict["Budapest, Hungary, Pasadena, USA, Turin, Italy"] = "Budapest, Hungary"
locations_fix_dict["Hakodate, Japan, Honolulu, USA"] = "Hakodate, Japan"
locations_fix_dict["Cork, Ireland, Jeju Island, South Korea"] = "Cork, Ireland"
locations_fix_dict["Cannes, French Riviera, France"] = "Cannes, France"
locations_fix_dict["Regions, San Francisco, California, USA"] = "San Francisco, California, USA"
locations_fix_dict["Tourism, Helsinki, Finland"] = "Helsinki, Finland"
locations_fix_dict["AI, Bamberg, Germany"] = "Bamberg, Germany"
locations_fix_dict["Jahrestagung, Eringerfeld"] = "Eringerfeld, Germany"
locations_fix_dict["SeCIHD, Regensburg, Germany"] = "Regensburg, Germany"
locations_fix_dict["Workshops, Berlin, Germany"] = "Berlin, Germany"
locations_fix_dict["ZIH, Dresden, Germany"] = "Dresden, Germany"
locations_fix_dict["XHPC, Sorrento, Italy"] = "Sorrento, Italy"
locations_fix_dict["Intl, Intl, Beijing, China"] = "Beijing, China"
locations_fix_dict["Fachtagung, Wien"] = "Wien, Austria"
locations_fix_dict["P^3MA, WOPSSS, Frankfurt, Germany"] = "Frankfurt, Germany"
locations_fix_dict["Zuirch, Switzerland"] = "Zurich, Switzerland"
locations_fix_dict["Greece, Cambridge, MA, USA"] = "Cambridge, MA, USA"
locations_fix_dict["Prague, Czech Republic, Paris, France"] = "Prague, Czech"
locations_fix_dict["Bristol, UK, Milwaukee, WI, USA"] = "Bristol, UK"
locations_fix_dict["Mobilität, Regensburg, Germany"] = "Regensburg, Germany"
locations_fix_dict["Informationsmärkten, Konstanz, Germany"] = "Konstanz, Germany"
locations_fix_dict["Kaohisung, Taiwan"] = "Kaohsiung, Taiwan"
locations_fix_dict["Barcelona, Spain, Bozen, Italy"] = "Barcelona, Spain"
locations_fix_dict["Copenhagen, Denmark, Ljubljana, Slovenia"] = "Copenhagen, Denmark"
locations_fix_dict["Poland, Düsseldorf, Germany"] = "Düsseldorf, Germany"
locations_fix_dict["Berlin, Germany, Seattle, WA, USA"] = "Berlin, Germany"
locations_fix_dict["Madrid, Valencia, Spain, Eindhoven, The Netherlands"] = "Madrid, Valencia, Spain"
locations_fix_dict["Spain, Valencia, Norrköping, Sweden"] = "Valencia, Spain"
locations_fix_dict["Valencia, Spain, Lecce, Italy"] = "Valencia, Spain"
locations_fix_dict["Epidemiology, New Brunswick, New Jersey, USA"] = "New Brunswick, New Jersey, USA"
locations_fix_dict["Automatentheorie, Bonn"] = "Bonn, Germany"
locations_fix_dict["Niigata, Japan, Kanazawa, Japan"] = "Niigata, Japan"
locations_fix_dict["Scala, Chicago, IL, USA"] = "Chicago, IL, USA"
locations_fix_dict["Belo Horizonte, Brazil, Bologna, Italy"] = "Belo Horizonte, Brazil"
locations_fix_dict["Rechnungswesen, Anwendergespräch, Berlin"] = "Berlin, Germany"
locations_fix_dict["Riga, Latvia, New York, NY, USA"] = "Riga, Latvia"
locations_fix_dict["Melbourne, Australia, Chicago, USA"] = "Melbourne, Australia"
locations_fix_dict["Hakodate, Japan, Luxembourg, Luxembourg"] = "Hakodate, Japan"
locations_fix_dict["Utrecht, Netherlands, Klagenfurt, Austria"] = "Utrecht, Netherlands"
locations_fix_dict["Leipzig, GDR"] = "Leipzig, Germany"
locations_fix_dict["Workshop Copenhagen, Denmark"] = "Copenhagen, Denmark"
locations_fix_dict["Porto, Portugal, Florence, Italy, Montreal, QC, Canada"] = "Porto, Portugal"
locations_fix_dict["Perth, Australia, Lyon, France"] = "Perth, Australia"
locations_fix_dict["Orlando, Forida, USA"] = "Orlando, Florida, USA"
locations_fix_dict["Wien, Asutria"] = "Wien, Austria"
locations_fix_dict["Proceedings, Berlin"] = "Berlin, Germany"
locations_fix_dict["semantics4ws, Brisbane, Australia"] = "Brisbane, Australia"
locations_fix_dict["semantics4ws, Vienna, Australia"] = "Wien, Austria"
locations_fix_dict["Kohala Coast, USA"] = "Hawaii, USA"
locations_fix_dict["Novosibirsk, USASR"] = "Novosibirsk, Russia"
locations_fix_dict["Naple, Italy"] = "Naples, Italy"
locations_fix_dict["Verbindung, München"] = "Munich, Germany"
locations_fix_dict["Jahrestagung, Hamburg, Deutschland"] = "Hamburg, Deutschland"
locations_fix_dict["Jahrestagung, Hamburg"] = "Hamburg, Deutschland"
locations_fix_dict["Jahrestagung, Bonn"] = "Bonn, Germany"
locations_fix_dict["Jahrestagung, Berlin"] = "Berlin, Germany"
locations_fix_dict["Jahrestagung, Karlsruhe, Deutschland"] = "Karlsruhe, Deutschland"
locations_fix_dict["Jahrestagung, Dortmund"] = "Dortmund, Germany"
locations_fix_dict["Jahrestagung, Nürnberg"] = "Nuremberg, Germany"
locations_fix_dict["Jahrestagung, Braunschweig"] = "Braunschweig, Germany"
locations_fix_dict["Jahrestagung, Stuttgart"] = "Stuttgart, Germany"
locations_fix_dict["Hyderabad, India, Chicago, IL, USA"] = "Hyderabad, India"
locations_fix_dict["Urgench, USASR"] = "Urgench, Russia"
locations_fix_dict["Beijing, China, Rotterdam, The Netherlands"] = "Beijing, China"
locations_fix_dict["Budapest, Hungary, Kyoto, Japan"] = "Budapest, Hungary"
locations_fix_dict["Arusha, Tanzania, United Republic of"] = "Arusha, Tanzania"
locations_fix_dict["Fukuoka, Japan, Beijing, China"] = "Fukuoka, Japan"
locations_fix_dict["Ausbildung, Chemnitz"] = "Chemnitz, Germany"
locations_fix_dict["Schule, Konzepte, INFOS99, Potsdam"] = "Potsdam, Germany"
locations_fix_dict["Schule, Ausbildung, München"] = "Munich, Germany"
locations_fix_dict["Hong Kong, Sinaia, Romania"] = "Sinaia, Romania"
locations_fix_dict["EvoSTOC, Naples, Italy"] = "Naples, Italy"
locations_fix_dict["EvoSTOC, Valencia, Spain"] = "Valencia, Spain"
locations_fix_dict["Melbourne, Victoria, Australia, Armidale, NSW, Australia"] = "Melbourne, Australia"
locations_fix_dict["Kona, Big Island, USA"] = "Hawaii, USA"
locations_fix_dict["Ninjing, China"] = "Nanjing, China"
locations_fix_dict["Fachgespräch, Wien"] = "Wien, Austria"
locations_fix_dict["BenchmarX, Tsukuba, Japan"] = "Tsukuba, Japan"
locations_fix_dict["BenchmarX, Brisbane, Australia"] = "Brisbane, Australia"
locations_fix_dict["Haskell, Gothenburg, Sweden"] = "Gothenburg, Sweden"
locations_fix_dict["Berder Island, France"] = "Île Berder, France"
locations_fix_dict["Sushou, China"] = "Suzhou, China"
locations_fix_dict["Modelle, Rechensysteme, Bonn"] = "Bonn, Germany"
locations_fix_dict["Budapest, Hungary, Komárno, Slovakia"] = "Budapest, Hungary"
locations_fix_dict["Munich, Germany, Edinburgh, UK"] = "Munich, Germany"
locations_fix_dict["Montpellier, France, Trentino, Italy"] = "Montpellier, France"
locations_fix_dict["Leicester, United Kindom"] = "Leicester, UK"
locations_fix_dict["Thirtieth, San Diego, California, USA"] = "San Diego, California, USA"
locations_fix_dict["Fachgespräch, Stuttgart, Germany"] = "Stuttgart, Germany"
locations_fix_dict["Workshop, Prague, Czech Republic"] = "Prague, Czech"
locations_fix_dict["Fachtagung, Hamburg"] = "Hamburg, Germany"
locations_fix_dict["ICALP92, Vienna, Austria"] = "Wien, Austria"
locations_fix_dict["ICALP94, Jerusalem, Israel"] = "Jerusalem, Israel"
locations_fix_dict["ICALP87, Karlsruhe, Germany"] = "Karlsruhe, Germany"
locations_fix_dict["Heraklion, Crete Island, Greece"] = "Heraklion, Greece"
locations_fix_dict["Sketches, Seoul, Republic of Korea"] = "Seoul, Korea"
locations_fix_dict["K.lo Alto, California, USA"] = "Palo Alto, California, USA"
locations_fix_dict["Datenfernverarbeitung, Fachtagung, Aachen"] = "Aachen, Germany"
locations_fix_dict["Ada, Arlington, Virginia, USA"] = "Arlington, Virginia, USA"
locations_fix_dict["Tianjin, China, Xi'an, China"] = "Tianjin, China"
locations_fix_dict["Twelth, Ithaca, New York, USA"] = "Ithaca, New York, USA"
locations_fix_dict["Fachtagung, Marburg"] = "Marburg, Germany"
locations_fix_dict["OnToContent, ODIS, Vilamoura, Portugal"] = "Vilamoura, Portugal"
locations_fix_dict["Sharm El Sheik, Egypt"] = "Sharm El-Sheik, Egypt"
locations_fix_dict["Fachtagung, Boppard"] = "Boppard, Germany"
locations_fix_dict["Zurich, Switzerland, Sweden"] = "Zurich, Switzerland"
locations_fix_dict["CHINA, China, Reno, Nevada, USA"] = "Reno, Nevada, USA"
locations_fix_dict["Thessaloniki, Greece, Rome, Italy"] = "Thessaloniki, Greece"
locations_fix_dict["St. Petersburg Beach, Florida, USA"] = "St. Petersburg, Florida, USA"
locations_fix_dict["ALSIP, SocNet, BigPMA, Tainan, Taiwan"] = "Tainan, Taiwan"
locations_fix_dict["Vancouver, Britith Columbia, Canada"] = "Vancouver, Canada"
locations_fix_dict["North Africa, Manama, Bahrain"] = "Manama, Bahrain"
locations_fix_dict["North Africa, Agadir, Morocco"] = "Agadir, Morocco"
locations_fix_dict["Healthcare, Athens, Greece"] = "Athens, Greece"
locations_fix_dict["Vienna, Austria, Berlin, Germany"] = "Wien, Austria"
locations_fix_dict["York, UK, Edinburgh, UK"] = "York, UK"
locations_fix_dict["Viareggio, Italy, Trento, Italy"] = "Viareggio, Italy"
locations_fix_dict["Barcelona, Spain, Belgrade, Serbia"] = "Barcelona, Spain"
locations_fix_dict["Buenos Aires, Argentina, Colonia del Sacramento, Uruguay"] = "Buenos Aires, Argentina"
locations_fix_dict["Lanarca, Cyprus"] = "Larnaca, Cyprus"
locations_fix_dict["Internetworking, Wuhan, China"] = "Wuhan, China"
locations_fix_dict["Fachtagung, Bremen"] = "Bremen, Germany"
locations_fix_dict["USA, Sendai, Japan"] = "Sendai, Japan"
locations_fix_dict["Gaussig, GDR"] = "Gaussig, Germany"
locations_fix_dict["Workshopband, Dresden, Germany"] = "Dresden, Germany"
locations_fix_dict["Magdebug, Germany"] = "Magdeburg, Germany"
locations_fix_dict["Tagungsband, Regensburg, Germany"] = "Regensburg, Germany"
locations_fix_dict["Tagungsband, Dresden, Germany"] = "Dresden, Germany"
locations_fix_dict["Workshopband, Regensburg, Germany"] = "Regensburg, Germany"
locations_fix_dict["Workshopband, Hamburg, Germany"] = "Hamburg, Germany"
locations_fix_dict["KSIR Virtual Conference Center, USA"] = "Pittsburgh, USA"
locations_fix_dict["Medizin, Bad Münster"] = "Bad Kreuznach, Germany"
locations_fix_dict["Funchal, Madeira, Portugal, Milan, Italy"] = "Funchal, Madeira, Portugal"
locations_fix_dict["Bali, Surabaye, Indonesia"] = "Bali, Surabaya, Indonesia"
locations_fix_dict["Galtinburg, Tennessee, USA"] = "Gatlinburg, Tennessee, USA"
locations_fix_dict["Melbourne, Australia, St. Petersburg, Russia"] = "Melbourne, Australia"
locations_fix_dict["Mass Storage, Monterey, USA"] = "Monterey, USA"
locations_fix_dict["Perspectives, Monterey, USA"] = "Monterey, USA"
locations_fix_dict["Sanya, Hainan Island, China"] = "Sanya, China"
locations_fix_dict["Strasbourg, France, Germany"] = "Strasbourg, Germany"
locations_fix_dict["Prototyping, Washington, DC, USA"] = "Washington DC, USA"
locations_fix_dict["Datensicherung, München"] = "Munich, Germany"
locations_fix_dict["Artiminio, Italy"] = "Artimino, Italy"
locations_fix_dict["Kalamazzo, Michigan, USA"] = "Kalamazoo, Michigan, USA"
locations_fix_dict["Niagra, ON, Canada"] = "Niagara, ON, Canada"
locations_fix_dict["Salt Lake City, USA, Hangzhou, China"] = "Salt Lake City, USA"
locations_fix_dict["Tarragona, Spain, Proceedigns"] = "Tarragona, Spain"
locations_fix_dict["Geotagging, Nara, Japan"] = "Nara, Japan"
locations_fix_dict["Theorie, Barcelona, Spain"] = "Barcelona, Spain"
locations_fix_dict["San Francisco, California, USA"] = "Aachen, Germany"
locations_fix_dict["Kibbutz, Ein Gedi, Israel"] = "Ein Gedi, Israel"
locations_fix_dict["Austin, Texas, USA, UK"] = "Texas, USA, UK"
locations_fix_dict["WEVR@VR, Los Angeles, USA"] = "Los Angeles, USA"
locations_fix_dict["Kohala Coast, USA, New Delhi, India, BIRTE 2017, Munich, Germany"] = "Hawaii, USA"
locations_fix_dict["Grand Cayman, British West Indies"] = "Grand Cayman, UK"
locations_fix_dict["Madrid, Spain, Mai 8, 2012"] = "Madrid, Spain"
locations_fix_dict["Bremen, UNK, Germany"] = "Bremen, Germany"
locations_fix_dict["Boston, MA, USA, Athens, Greece"] = "Boston, MA, USA"
locations_fix_dict["BigNovelTI, SW4CH, DC, Nicosia, Cyprus"] = "Nicosia, Cyprus"
locations_fix_dict["Liverpool, UK / Changsha, China"] = "Liverpool, UK"
locations_fix_dict["Jahrestagung, Kooperation, Hannover, Deutschland"] = "Hannover, Germany"
locations_fix_dict["Adirondack Mountains, NY, USA"] = "Adirondack, NY, USA"
locations_fix_dict["Capri Island, Naples, Italy"] = "Capri, Italy"
locations_fix_dict["Fachtagung, Speyer"] = "Speyer, Germany"
locations_fix_dict["Verwaltung, Fachtagung, Linz"] = "Linz, Austria"
locations_fix_dict["Positano, Amalfitan Coast, Salerno, Italy"] = "Positano, Italy"
locations_fix_dict["Atlanta, GA, USA, Montreal, QC, Canada"] = "Atlanta, GA, USA"
locations_fix_dict["Seattle, MA, USA, London, UK"] = "Seattle, MA, USA"
locations_fix_dict["Canada, Hyderabad, India"] = "Hyderabad, India"
locations_fix_dict["New Haven, CT, USA, Los Angeles, USA"] = "New Haven, CT, USA"
locations_fix_dict["[Yokohama, Japan, postponed]"] = "Yokohama, Japan"
locations_fix_dict["Programming, Portland, USA"] = "Portland, USA"
locations_fix_dict["Preprints, Cambridge, Massachusetts, USA"] = "Cambridge, Massachusetts, USA"
locations_fix_dict["Bremen, Germany, Philadelphia, PA, USA"] = "Bremen, Germany"
locations_fix_dict["Jeju Island, Korea, Republic of"] = "Jeju Island, Korea,"
locations_fix_dict["Xi'an city, Shaanxi province, China"] = "Xi'an, China"
locations_fix_dict["Yonago City, Tottori Prefecture, Japan"] = "Yonago, Japan"
locations_fix_dict["Odense, Denmark, Dortmund, Germany"] = "Odense, Denmark"
locations_fix_dict["Irkutsk, Russia, St. Petersburg, Russia"] = "Irkutsk, Russia"
locations_fix_dict["Yellow Mountains, China"] = "Huangshan, China"
locations_fix_dict["Fachtagung, Freiburg"] = "Freiburg, Germany"
locations_fix_dict["Dresden, GDR"] = "Dresden, Germany"
locations_fix_dict["Suhl, GDR"] = "Suhl, Germany"
locations_fix_dict["Sydney Australia, Paphos, Cyprus"] = "Sydney, Australia"
locations_fix_dict["Fachtagung, Technischen, Universität Berlin, Berlin"] = "Berlin, Germany"
locations_fix_dict["Konstruieren, München"] = "Munich, Germany"
locations_fix_dict["Intrepidity, Seattle, Washington, USA"] = "Seattle, USA"
locations_fix_dict["DataX, Web, Munich, Germany"] = "Munich, Germany"
locations_fix_dict["RIGiM, Florence, Italy"] = "lorence, Italy"
locations_fix_dict["Kiev, USASR"] = "Kiev, Russia"
locations_fix_dict["Seville, Spain, Bled, Slovenia"] = "Seville, Spain"
locations_fix_dict["Rostock, Germany, Lisbon, Portugal"] = "Rostock, Germany"
locations_fix_dict["Pilos, Messinia, Greece"] = "Pylos, Greece"
locations_fix_dict["Fachgespräch, Hamburg"] = "Hamburg, Germany"
locations_fix_dict["Internationa, Catania, Italy"] = "Catania, Italy"
locations_fix_dict["Los Angeles, USA, Bielefeld, Germany"] = "Los Angeles, USA"
locations_fix_dict["Anwendungen, Dortmund"] = "Dortmund, Germany"
locations_fix_dict["Trento, Italy, Salamanca, Spain"] = "Trento, Italy"
locations_fix_dict["Budapest, Hungary, Pasadena, USA"] = "Budapest, Hungary"
locations_fix_dict["MN, USA, Bellevue, WA, USA, Paris, France"] = "Bellevue, WA, USA"
locations_fix_dict["Toronto, ON, Canada, Cambridge, MA, USA"] = "Toronto, ON, Canada"
locations_fix_dict["Taipei, Taiwan, Barcelona, Spain"] = "Taipei, Taiwan"
locations_fix_dict["CS2@HiPEAC, Prague, Czech Republic"] = "Prague, Czech"
locations_fix_dict["Whistler Moutain, British Columbia, Canada"] = "Whistler Moutain, Canada"
locations_fix_dict["Angelès-Village, France"] = "Angle sur l'Anglin, France"
locations_fix_dict["Toronto, Canada, Barcelona, Spain"] = "Toronto, Canada"
locations_fix_dict["San Jose, USA, Pune, India"] = "San Jose, USA"
locations_fix_dict["Xi'an, China, San José, USA"] = "Xi'an, China"
locations_fix_dict["Toronto, ON, Canada, New Delhi, India"] = "Toronto, ON, Canada"
locations_fix_dict["DECSoS, Florence, Italy"] = "Florence, Italy"
locations_fix_dict["Sassur, DESEC4LCCI, Magdeburg, Germany"] = "Magdeburg, Germany"
locations_fix_dict["Wuhan, Hubei Province, China"] = "Wuhan, China"
locations_fix_dict["Toronto, Canada, Portland, USA"] = "Toronto, Canada"
locations_fix_dict["Eindhoven, The Netherlands, Madrid, Spain"] = "Eindhoven, The Netherlands"
locations_fix_dict["Tallinn, USASR"] = "Tallinn, Russia"
locations_fix_dict["Edinburgh, Scottland, UK"] = "Edinburgh, UK"
locations_fix_dict["Lisbon, Portugal, Madrid, Spain, Seville, Spain"] = "Lisbon, Portugal"
locations_fix_dict["Sydney, NSW, Australia, Wollongong, NSW, Australia"] = "Sydney, NSW, Australia"
locations_fix_dict["Transputer, Aachen, Germany"] = "Aachen, Germany"
locations_fix_dict["Practice, Seoul, South Korea"] = "Seoul, South Korea"
locations_fix_dict["Vancouver, Canada, Krakow, Poland"] = "Vancouver, Canada"
locations_fix_dict["France, College Park, MD, USA"] = "College Park, MD, USA"
locations_fix_dict["Bonn, Germany, Lisbon, Portugal, Toronto, Canada"] = "Bonn, Germany"

### Definition of the Geolocator Function

In [83]:
geolocator = Nominatim(user_agent="test2022_mail@gmail.com")

In [84]:
def geocode(location, recursion=0, request_delay=None, *args, **kwargs):
     # delay only between the first request. Otherwise, the normal sleep should have already been called
    if request_delay and recursion == 0:
        time.sleep(request_delay)

    try:
        return geolocator.geocode(location, *args, **kwargs)
    except Exception:
        if recursion > 10:      # max retry
            return None

        time.sleep(1) # wait before retrying
        return geocode(location, recursion=recursion + 1, *args, **kwargs)

### Disambiguation and Normalization Using Geopy

In [85]:
n_locations = locations_fix_dict.__len__()
count = 1

for loc in locations_fix_dict.keys():
    print(f"Location Normalization Request {count} out of {n_locations}: {locations_fix_dict[loc]}")
    count += 1

    raw_location_dict = geocode(locations_fix_dict[loc], request_delay=1, language="en", addressdetails=True, exactly_one=True, timeout=10)

    normalized_loc = ""

    city_ok = False

    if raw_location_dict is None:
        print("LOCATION NOT FOUND\n")
        locations_fix_dict[loc] = np.nan
        continue

    for key in raw_location_dict.raw['address'].keys():
        if key == "city" and not city_ok:
            normalized_loc = raw_location_dict.raw['address'][key]
            city_ok = True

        elif key == "municipality" and not city_ok:
            normalized_loc = raw_location_dict.raw['address'][key]
            city_ok = True

        elif key == "town" and not city_ok:
            normalized_loc = raw_location_dict.raw['address'][key]
            city_ok = True

        else:        
            if key == "county":
                if normalized_loc.__len__() == 0:
                    if normalized_loc != raw_location_dict.raw['address'][key]:
                        if normalized_loc.__len__() != 0:
                            normalized_loc += ", "
                        normalized_loc += raw_location_dict.raw['address'][key]

            elif key == "state":
                if normalized_loc.__len__() != 0:
                    normalized_loc += ", "
                normalized_loc += raw_location_dict.raw['address'][key]

            elif key == "country":
                if normalized_loc.__len__() != 0:
                    normalized_loc += ", "
                normalized_loc += raw_location_dict.raw['address'][key]

    #print(raw_location_dict.raw['address']) # DEBUG
    print("Normalized: " + normalized_loc + "\n") #DEBUG --- LEVARE IL COMMENTO

    locations_fix_dict[loc] = normalized_loc

Location Normalization Request 1 out of 5066: Austin, TX
Normalized: Austin, Texas, United States

Location Normalization Request 2 out of 5066: Wrocław, Poland
Normalized: Wrocław, Lower Silesian Voivodeship, Poland

Location Normalization Request 3 out of 5066: Innsbruck, Austria
Normalized: Innsbruck, Tyrol, Austria

Location Normalization Request 4 out of 5066: Provence, France
Normalized: Villefranche-sur-Saône, Auvergne-Rhône-Alpes, France

Location Normalization Request 5 out of 5066: Zakopane, Poland
Normalized: Zakopane, Lesser Poland Voivodeship, Poland

Location Normalization Request 6 out of 5066: Lisbon, Portugal
Normalized: Lisbon, Portugal

Location Normalization Request 7 out of 5066: Lübeck, Germany
Normalized: Lübeck, Schleswig-Holstein, Germany

Location Normalization Request 8 out of 5066: Poznań, Poland
Normalized: Poznań, Greater Poland Voivodeship, Poland

Location Normalization Request 9 out of 5066: Portland, Oregon, USA
Normalized: Portland, Oregon, United Sta

### Inserting the Fixed Locations to the Locations in the Original Dataframe

In [None]:
df_citations_and_locations = df_citations_and_locations.replace({"ConferenceLocation": locations_fix_dict})
df_citations_by_year_and_locations = df_citations_by_year_and_locations.replace({"ConferenceLocation": locations_fix_dict})

### Filter of the NaN Locations
During the normalization operations, I decided to assign a NaN value for the locations that were not recognised.

Now we'll drop them.

In [None]:
df_citations_and_locations = df_citations_and_locations.dropna(subset=[ConferenceLocation])
df_citations_by_year_and_locations = df_citations_by_year_and_locations.dropna(subset=[ConferenceLocation])

### Filter of the Papers that Only Have the Conference State (But Not the Cities)

Reset the indexes:

In [None]:
df_citations_and_locations = df_citations_and_locations.reset_index(drop=True)
df_citations_by_year_and_locations = df_citations_by_year_and_locations.reset_index(drop=True)

Row drop for the citation and locations dataset:

In [None]:
row_to_be_dropped_list = list()

for index, row in df_citations_and_locations.iterrows():
    if row["ConferenceLocation"].split(',').__len__() < 2:
        row_to_be_dropped_list.append(index)

df_citations_and_locations = df_citations_and_locations.drop(df_citations_and_locations.index[row_to_be_dropped_list])

Row drop for the citation by year and locations dataset:

In [None]:
row_to_be_dropped_list = list()

for index, row in df_citations_by_year_and_locations.iterrows():
    if row["ConferenceLocation"].split(',').__len__() < 2:
        row_to_be_dropped_list.append(index)

df_citations_by_year_and_locations = df_citations_by_year_and_locations.drop(df_citations_by_year_and_locations.index[row_to_be_dropped_list])

Reset the iindexes after the drop:

In [None]:
df_citations_and_locations = df_citations_and_locations.reset_index(drop=True)
df_citations_by_year_and_locations = df_citations_by_year_and_locations.reset_index(drop=True)

## Write of the Final CSVs on Disk

Saving the resulting dataframe on disk in CSV format.

In [None]:
# Write of the resulting CSVs on Disk
df_citations_and_locations.to_csv(path_file_export + 'out_citations_and_conferences_location_ready.csv')
print(f'Successfully Exported the Joined CSV to {path_file_export}out_citations_and_conferences_location_ready.csv')

df_citations_by_year_and_locations.to_csv(path_file_export + 'out_citations_by_year_and_conferences_location_ready.csv')
print(f'Successfully Exported the Joined CSV to {path_file_export}out_citations_by_year_and_conferences_location_ready.csv')

Check of the Exported CSVs to be sure that everything went fine.

In [None]:
# Check of the Exported CSV
df_joined_exported_csv_cit = pd.read_csv(path_file_export + 'out_citations_and_conferences_location_ready.csv', low_memory=False, index_col=[0])
df_joined_exported_csv_cit

In [None]:
# Check of the Exported CSV
df_joined_exported_csv_cit_by_year = pd.read_csv(path_file_export + 'out_citations_by_year_and_conferences_location_ready.csv', low_memory=False, index_col=[0])
df_joined_exported_csv_cit_by_year