# Conference Citations and Locations Dataset Preparation

Jupyter Notebook for the preparation of the Conference Citaions and Locations dataset for the integration of some conference ranking metrics.

____________________________________________________________

For this process, the following CSV files are needed: ```out_citations_and_conferences_location_ready.csv``` and ```out_citations_by_year_and_conferences_location_ready```. 

The above files must be generated running the Notebook ```Conference Locations Data Cleanup.ipynb``` that is contained in the ```4 - Citation and Conference Locations Data Cleanup``` folder of this project.

In particular, the following operations are going to be executed:
* Opening of the CSV conference citations and locations dataset
* Creation of a new column containing the normalized conference series names

Lastly, the processed datasets are going to be saved on disk in CSV format

In [1]:
# Libraries Import
import pandas as pd
import numpy as np

pd.set_option('display.max_columns', None)

## File Paths
Please set your working directory paths.

In [2]:
# ******************* PATHS ********************+

# Dumps Directory Path
path_file_import = r'/Users/marcoterzulli/File/Scuola Local/Magistrale/Materiale Corsi Attuali/Tirocinio/Cartella di Lavoro/Archivi Dump di Lavoro/Import/'

# CSV Exports Directory Path
path_file_export = r'/Users/marcoterzulli/File/Scuola Local/Magistrale/Materiale Corsi Attuali/Tirocinio/Cartella di Lavoro/Archivi Dump di Lavoro/Export/'

## Import of the Datasets

In [3]:
df_citations_and_locations = pd.read_csv(path_file_export + 'out_citations_and_conferences_location_ready.csv', low_memory=False, index_col=[0])
print(f'Successfully Imported the Conference Citations and Locations Ready CSV')

df_citations_by_year_and_locations = pd.read_csv(path_file_export + 'out_citations_by_year_and_conferences_location_ready.csv', low_memory=False, index_col=[0])
print(f'Successfully Imported the Conference Citations by Year and Locations Ready CSV')

Successfully Imported the Conference Citations and Locations Ready CSV
Successfully Imported the Conference Citations by Year and Locations Ready CSV


## Creation of the Conference Series Normalize Name Column

We need to recreate them by removing the conference instance year from the ConferenceNormalizedName column:

In [4]:
df_citations_and_locations['ConferenceSeriesNormalizedName'] = np.nan
df_citations_and_locations['ConferenceSeriesNormalizedName'] = df_citations_and_locations['ConferenceNormalizedName'].str.split(' ').str[0]
df_citations_and_locations.ConferenceSeriesNormalizedName = df_citations_and_locations.ConferenceSeriesNormalizedName.str.lower()

df_citations_by_year_and_locations['ConferenceSeriesNormalizedName'] = np.nan
df_citations_by_year_and_locations['ConferenceSeriesNormalizedName'] = df_citations_by_year_and_locations['ConferenceNormalizedName'].str.split(' ').str[0]
df_citations_by_year_and_locations.ConferenceSeriesNormalizedName = df_citations_by_year_and_locations.ConferenceSeriesNormalizedName.str.lower()

df_citations_and_locations.head(3)

Unnamed: 0,CitationCount_COCI,CitationCount_Mag,CitationCount_MagEstimated,ConferenceLocation,ConferenceNormalizedName,Doi,Year,ConferenceSeriesNormalizedName
0,10,12,12,"Austin, Texas, United States",disc 2014,10.1007/978-3-662-45174-8_28,2014,disc
1,5,10,10,"Wrocław, Lower Silesian Voivodeship, Poland",esa 2014,10.1007/978-3-662-44777-2_60,2014,esa
2,11,20,20,"Innsbruck, Tyrol, Austria",enter 2013,10.1007/978-3-319-03973-2_13,2013,enter


Column sort:

In [5]:
df_citations_and_locations = df_citations_and_locations.reindex(sorted(df_citations_and_locations.columns), axis=1)
df_citations_by_year_and_locations = df_citations_by_year_and_locations.reindex(sorted(df_citations_by_year_and_locations.columns), axis=1)

## Filter of Some Invalid Conference Names
I noticed that some conference names are invalid; it's not caused by this script, but the were already broken since the extraction and preprocessing phase of the raw dataset. 

They need to be removed, since they're useless.

In [6]:
df_citations_and_locations = df_citations_and_locations[df_citations_and_locations["ConferenceSeriesNormalizedName"].str.isnumeric() == False]
df_citations_by_year_and_locations = df_citations_by_year_and_locations[df_citations_by_year_and_locations["ConferenceSeriesNormalizedName"].str.isnumeric() == False]


# Reset of the index
df_citations_and_locations = df_citations_and_locations.reset_index(drop=True)
df_citations_by_year_and_locations = df_citations_by_year_and_locations.reset_index(drop=True)

## Write of the Final CSVs on Disk

Saving the resulting dataframe on disk in CSV format.

In [7]:
# Write of the resulting CSVs on Disk
df_citations_and_locations.to_csv(path_file_export + 'out_citations_and_conferences_location_ready_v2.csv')
print(f'Successfully Exported the Joined CSV to {path_file_export}out_citations_and_conferences_location_ready_v2.csv')

df_citations_by_year_and_locations.to_csv(path_file_export + 'out_citations_by_year_and_conferences_location_ready_v2.csv')
print(f'Successfully Exported the Joined CSV to {path_file_export}out_citations_by_year_and_conferences_location_ready_v2.csv')

Successfully Exported the Joined CSV to /Users/marcoterzulli/File/Scuola Local/Magistrale/Materiale Corsi Attuali/Tirocinio/Cartella di Lavoro/Archivi Dump di Lavoro/Export/out_citations_and_conferences_location_ready_v2.csv
Successfully Exported the Joined CSV to /Users/marcoterzulli/File/Scuola Local/Magistrale/Materiale Corsi Attuali/Tirocinio/Cartella di Lavoro/Archivi Dump di Lavoro/Export/out_citations_by_year_and_conferences_location_ready_v2.csv


Check of the Exported CSVs to be sure that everything went fine.

In [8]:
# Check of the Exported CSV
df_joined_exported_csv_citations_and_locations = pd.read_csv(path_file_export + 'out_citations_and_conferences_location_ready_v2.csv', low_memory=False, index_col=[0])
df_joined_exported_csv_citations_and_locations

Unnamed: 0,CitationCount_COCI,CitationCount_Mag,CitationCount_MagEstimated,ConferenceLocation,ConferenceNormalizedName,ConferenceSeriesNormalizedName,Doi,Year
0,10,12,12,"Austin, Texas, United States",disc 2014,disc,10.1007/978-3-662-45174-8_28,2014
1,5,10,10,"Wrocław, Lower Silesian Voivodeship, Poland",esa 2014,esa,10.1007/978-3-662-44777-2_60,2014
2,11,20,20,"Innsbruck, Tyrol, Austria",enter 2013,enter,10.1007/978-3-319-03973-2_13,2013
3,1,0,0,"Villefranche-sur-Saône, Auvergne-Rhône-Alpes, ...",dexa 2002,dexa,10.1007/3-540-46146-9_77,2002
4,9,19,19,"Zakopane, Lesser Poland Voivodeship, Poland",icaisc 2006,icaisc,10.1007/11785231_94,2006
...,...,...,...,...,...,...,...,...
3107878,4,0,0,"Thessaloniki, Macedonia and Thrace, Greece",sapere 2011,sapere,10.1007/978-3-642-31674-6_9,2011
3107879,4,0,0,"Thessaloniki, Macedonia and Thrace, Greece",sapere 2011,sapere,10.1007/978-3-642-31674-6_20,2011
3107880,2,0,0,"Thessaloniki, Macedonia and Thrace, Greece",sapere 2011,sapere,10.1007/978-3-642-31674-6_25,2011
3107881,0,0,0,"Thessaloniki, Macedonia and Thrace, Greece",sapere 2011,sapere,10.1007/978-3-642-31674-6_12,2011


In [9]:
# Check of the Exported CSV
df_joined_exported_csv_citations_by_year_and_locations = pd.read_csv(path_file_export + 'out_citations_by_year_and_conferences_location_ready_v2.csv', low_memory=False, index_col=[0])
df_joined_exported_csv_citations_by_year_and_locations

Unnamed: 0,1950,1951,1952,1953,1954,1955,1956,1957,1958,1959,1960,1961,1962,1963,1964,1965,1966,1967,1968,1969,1970,1971,1972,1973,1974,1975,1976,1977,1978,1979,1980,1981,1982,1983,1984,1985,1986,1987,1988,1989,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021,2022,ConferenceLocation,ConferenceNormalizedName,ConferenceSeriesNormalizedName,Doi,Year
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,1,1,0,2,1,2,0,"Austin, Texas, United States",disc 2014,disc,10.1007/978-3-662-45174-8_28,2014
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,2,0,0,0,0,0,"Wrocław, Lower Silesian Voivodeship, Poland",esa 2014,esa,10.1007/978-3-662-44777-2_60,2014
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,3,2,0,1,1,0,0,"Innsbruck, Tyrol, Austria",enter 2013,enter,10.1007/978-3-319-03973-2_13,2013
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,"Villefranche-sur-Saône, Auvergne-Rhône-Alpes, ...",dexa 2002,dexa,10.1007/3-540-46146-9_77,2002
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,3,0,1,0,0,1,0,1,0,0,0,0,0,0,0,0,"Zakopane, Lesser Poland Voivodeship, Poland",icaisc 2006,icaisc,10.1007/11785231_94,2006
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3107878,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,0,0,0,0,0,0,"Thessaloniki, Macedonia and Thrace, Greece",sapere 2011,sapere,10.1007/978-3-642-31674-6_9,2011
3107879,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,2,0,0,0,"Thessaloniki, Macedonia and Thrace, Greece",sapere 2011,sapere,10.1007/978-3-642-31674-6_20,2011
3107880,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,"Thessaloniki, Macedonia and Thrace, Greece",sapere 2011,sapere,10.1007/978-3-642-31674-6_25,2011
3107881,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,"Thessaloniki, Macedonia and Thrace, Greece",sapere 2011,sapere,10.1007/978-3-642-31674-6_12,2011
