# Citation and Conference Data Cleanup

**TODO**

Jupyter Notebook for the join of the conferences and location data between the DBLP + MAG and COCI dumps.

For this process, the following CSV files are needed: ```out_coci_citations_count.csv``` and ```out_dblp_and_mag_joined.csv```. <br>
The first must be generated running the Notebook ```preprocess_opencitations.ipynb``` that is contained in the ```1 - Citation Dumps Preprocess``` folder of this project.
The above files must be generated running the ```1 - DBLP and MAG Data Join Notebook.ipynb``` Notebook that is contained in the same folder as this Notebook.

In particular, the following operations are going to be executed:
* Opening of the CSV preprocessed dumps
* Join between the two datasets
* Drop of the useless columns
* Fix of the mismatched data types

Lastly, the entire preprocessed dump is going to be saved on disk in CSV format

In [1]:
# Libraries Import
import pandas as pd
import numpy as np

pd.set_option('display.max_columns', None)

## File Paths
Please set your working directory paths.

In [2]:
# ******************* PATHS ********************+

# Dumps Directory Path
path_file_import = r'/Users/marcoterzulli/File/Scuola Local/Magistrale/Materiale Corsi Attuali/Tirocinio/Cartella di Lavoro/Archivi Dump di Lavoro/Import/COCI_RAW/'

# CSV Exports Directory Path
path_file_export = r'/Users/marcoterzulli/File/Scuola Local/Magistrale/Materiale Corsi Attuali/Tirocinio/Cartella di Lavoro/Archivi Dump di Lavoro/Export/'

## Read of the Joined Datasets

In [3]:
df_citations_and_locations = pd.read_csv(path_file_export + 'out_citations_and_conferences.csv', low_memory=False, index_col=[0])
print(f'Successfully Imported the Conference Citations and Locations CSV')

df_citations_by_year_and_locations = pd.read_csv(path_file_export + 'out_citations_by_year_and_conferences.csv', low_memory=False, index_col=[0])
print(f'Successfully Imported the Conference Citations by Year and Locations CSV')

Successfully Imported the Conference Citations and Locations CSV
Successfully Imported the Conference Citations by Year and Locations CSV


### Conference Citations and Location

In [5]:
df_citations_and_locations.head(3)

Unnamed: 0,CitationCount_COCI,CitationCount_Mag,CitationCount_MagEstimated,ConferenceLocation,ConferenceNormalizedName,ConferenceTitle,Doi,OriginalTitle,Year
0,10,12,12,"Austin, TX",disc 2014,Distributed Computing - 28th International Sym...,10.1007/978-3-662-45174-8_28,The Adaptive Priority Queue with Elimination a...,2014
1,5,10,10,"Wrocław, Poland",esa 2014,Algorithms - ESA 2014 - 22th Annual European S...,10.1007/978-3-662-44777-2_60,Document Retrieval on Repetitive Collections,2014
2,11,20,20,"Innsbruck, Austria",enter 2013,Information and Communication Technologies in ...,10.1007/978-3-319-03973-2_13,SoCoMo Marketing for Travel and Tourism,2013


### Conference Citations by Year and Location

In [6]:
df_citations_by_year_and_locations.head(3)

Unnamed: 0,ConferenceLocation,ConferenceNormalizedName,ConferenceTitle,Doi,OriginalTitle,Year,1950,1951,1952,1953,1954,1955,1956,1957,1958,1959,1960,1961,1962,1963,1964,1965,1966,1967,1968,1969,1970,1971,1972,1973,1974,1975,1976,1977,1978,1979,1980,1981,1982,1983,1984,1985,1986,1987,1988,1989,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021,2022
0,"Austin, TX",disc 2014,Distributed Computing - 28th International Sym...,10.1007/978-3-662-45174-8_28,The Adaptive Priority Queue with Elimination a...,2014,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,1,1,0,2,1,2,0
1,"Wrocław, Poland",esa 2014,Algorithms - ESA 2014 - 22th Annual European S...,10.1007/978-3-662-44777-2_60,Document Retrieval on Repetitive Collections,2014,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,2,0,0,0,0,0
2,"Innsbruck, Austria",enter 2013,Information and Communication Technologies in ...,10.1007/978-3-319-03973-2_13,SoCoMo Marketing for Travel and Tourism,2013,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,3,2,0,1,1,0,0


## Drop of the Useless Columns
First of all, we're going to drop the columns that are not needed anymore.<br>
The following columns are going to be removed:
* ConferenceTitle: the full title of the conference. It's not defined for a lot a conferences.
* OriginalTitle: the paper's title. It's not defined for the most of the papers.

In [4]:
df_citations_and_locations.drop(columns=['ConferenceTitle', 'OriginalTitle'], inplace=True)
df_citations_by_year_and_locations.drop(columns=['ConferenceTitle', 'OriginalTitle'], inplace=True)

In [9]:
df_citations_and_locations.head(3)

Unnamed: 0,CitationCount_COCI,CitationCount_Mag,CitationCount_MagEstimated,ConferenceLocation,ConferenceNormalizedName,Doi,Year
0,10,12,12,"Austin, TX",disc 2014,10.1007/978-3-662-45174-8_28,2014
1,5,10,10,"Wrocław, Poland",esa 2014,10.1007/978-3-662-44777-2_60,2014
2,11,20,20,"Innsbruck, Austria",enter 2013,10.1007/978-3-319-03973-2_13,2013


In [10]:
df_citations_by_year_and_locations.head(3)

Unnamed: 0,ConferenceLocation,ConferenceNormalizedName,Doi,Year,1950,1951,1952,1953,1954,1955,1956,1957,1958,1959,1960,1961,1962,1963,1964,1965,1966,1967,1968,1969,1970,1971,1972,1973,1974,1975,1976,1977,1978,1979,1980,1981,1982,1983,1984,1985,1986,1987,1988,1989,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021,2022
0,"Austin, TX",disc 2014,10.1007/978-3-662-45174-8_28,2014,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,1,1,0,2,1,2,0
1,"Wrocław, Poland",esa 2014,10.1007/978-3-662-44777-2_60,2014,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,2,0,0,0,0,0
2,"Innsbruck, Austria",enter 2013,10.1007/978-3-319-03973-2_13,2013,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,3,2,0,1,1,0,0


## Conference Location Disambiguation
Microsofot Academics Graph and DBLP use two different scheme of rapresentation for the locations.

For example, MAG rapresents the US locations in the following format *City, State*, while DBLP uses the *City, State, USA* format.<br>

These different formats create ambiguity that we need to solve.

First of all we need to filter the papers that do not have a location:

In [5]:
original_rows = df_citations_and_locations.index.__len__()

df_citations_and_locations = df_citations_and_locations[df_citations_and_locations['ConferenceLocation'].notna()]
df_citations_by_year_and_locations = df_citations_by_year_and_locations[df_citations_by_year_and_locations['ConferenceLocation'].notna()]

actual_rows = df_citations_and_locations.index.__len__()

print(f"The operation filtered about {round(((original_rows - actual_rows) / 1000000), 1)}M of rows")

The operation filtered about 1.5M of rows


### Extraction of the Distinct Conferences Locations

Now, we're going to extract the distinct conferences locations:<br>
**Note**: since the two dataframes contain exactly the same papers and locations, the following operations are going to be executed only on a dataframe, and then replicated on the other.

In [None]:
locations_list = df_citations_and_locations.drop_duplicates(subset="ConferenceLocation")['ConferenceLocation'].tolist()

Filtering the locations that only have the state (but don't have the city): the don't need to be fixed.

In [9]:
new_locations_list = list()

for loc in locations_list:
    if loc.split(',').__len__() >= 2:
        new_locations_list.append(loc)

locations_list = new_locations_list
new_locations_list = None

['Austin, TX',
 'Wrocław, Poland',
 'Innsbruck, Austria',
 'Provence, France',
 'Zakopane, Poland',
 'Lisbon, Portugal',
 'Lübeck, Germany',
 'Poznań, Poland',
 'Portland, Oregon, USA',
 'Catania, Italy',
 'Glasgow, UK',
 'Las Vegas, NV, USA',
 'Brno, Czech Republic',
 'Chicago, IL, USA ',
 'Berlin, Germany',
 'Leuven, Belgium',
 'Melbourne, Australia',
 'Hong Kong, China',
 'Pittsburgh, PA, USA',
 'San Francisco, CA',
 'Patras, Greece',
 'Hefei, China',
 'Hsinchu, Taiwan',
 'New Hampshire, USA',
 'New York City, NY, USA',
 'Athens, Greece',
 'Sydney, Australia',
 'Wuhan, China',
 'Santa Barbara, CA, USA',
 'Seattle, WA, USA',
 'Uppsala, Sweden',
 'Washington, DC, USA',
 'Tallinn, Estonia',
 'Seattle, Washington, USA',
 'Chicago, IL, USA',
 'Bangkok, Thailand',
 'Hyderabad, India',
 'Santiago, Chile',
 'Turku, Finland',
 'Stirling, Scotland, UK',
 'Versailles, France',
 'Belfast, Northern Ireland, UK',
 'Beijin,China',
 'Monterey, CA, USA',
 'Augsburg, Germany',
 'Maputo, Mozambique',


### Creation of a Support Dictionary
We're going to create a support dictionary that's going to contain the locations and their fixed name.

In [18]:
locations_fix_dict = dict()

for loc in locations_list:
    locations_fix_dict[loc] = loc

### Fix of the Locations in the Format "City,state_acronym"
Some locations are in the format "City,state_acronym". We need to convert them to "City, STATE_ACRONYM".

For example: "Hamilton,nz" to "Hamilton, NZ"

In [20]:
for loc in locations_fix_dict.keys():
    if loc.split(',').__len__() == 2 and loc.split(',')[1].__len__() == 2:
        locations_fix_dict[loc] = loc.split(',')[0] + ', ' + loc.split(',')[1].upper()

### Fix of Some Extra Spacings

In [22]:
for loc in locations_fix_dict.keys():
    locations_fix_dict[loc] = loc.replace(' ,', ',')

In [23]:
for loc in locations_fix_dict.keys():
    print(locations_fix_dict[loc])

Austin, TX
Wrocław, Poland
Innsbruck, Austria
Provence, France
Zakopane, Poland
Lisbon, Portugal
Lübeck, Germany
Poznań, Poland
Portland, Oregon, USA
Catania, Italy
Glasgow, UK
Las Vegas, NV, USA
Brno, Czech Republic
Chicago, IL, USA 
Berlin, Germany
Leuven, Belgium
Melbourne, Australia
Hong Kong, China
Pittsburgh, PA, USA
San Francisco, CA
Patras, Greece
Hefei, China
Hsinchu, Taiwan
New Hampshire, USA
New York City, NY, USA
Athens, Greece
Sydney, Australia
Wuhan, China
Santa Barbara, CA, USA
Seattle, WA, USA
Uppsala, Sweden
Washington, DC, USA
Tallinn, Estonia
Seattle, Washington, USA
Chicago, IL, USA
Bangkok, Thailand
Hyderabad, India
Santiago, Chile
Turku, Finland
Stirling, Scotland, UK
Versailles, France
Belfast, Northern Ireland, UK
Beijin,China
Monterey, CA, USA
Augsburg, Germany
Maputo, Mozambique
Brisbane, Australia
Xi'an, China
Oulu, Finland
Budapest, Hungary
Worcester, MA, USA
Zagreb, Croatia
Toulouse, France
Sendai, Japan
Bangalore, India
Aachen, Germany
Chicago, Illinois, U

### Filtering the Conference Name
There are a small number of cases where the location wrongly contains the conference name. We need to filter it.

In [None]:
locations_fix_dict["ASIC, Chongqing, China"] = "Chongqing, China"
locations_fix_dict["ISCAS 1993, Chicago, Illinois, USA"] = "Chicago, Illinois, USA"
locations_fix_dict["WSTST'05, Muroran, Japan"] = "Muroran, Japan"
locations_fix_dict["ISCAS 2004, Vancouver, BC, Canada"] = "Vancouver, BC, Canada"
locations_fix_dict["York, UK / 2nd AAMAS 2002"] = "York, UK"
locations_fix_dict["WISEC'13, Budapest, Hungary"] = "Budapest, Hungary"
locations_fix_dict["COIN@AAMAS, Paris, France, Gold Coast, QLD, Australia"] = "Paris, France, Gold Coast, QLD, Australia"
locations_fix_dict["CSCL'07, New Brunswick, NJ, USA"] = "New Brunswick, NJ, USA"
locations_fix_dict["MASS 2019, Monterey, USA"] = "Monterey, USA"
locations_fix_dict["ICICS'97, Beijing, China"] = "Beijing, China"
locations_fix_dict["CAIP'95, Prague, Czech Republic"] = "Prague, Czech Republic"
locations_fix_dict["CAMS, Vilamoura, Portugal"] = "Vilamoura, Portugal"
locations_fix_dict["PoEM 2020, Riga, Latvia"] = "Riga, Latvia"
locations_fix_dict["PaCT-97, Yaroslavl, Russia"] = "Yaroslavl, Russia"
locations_fix_dict["COIN@AAMAS 2010, Toronto, Canada, Lyon, France"] = "Toronto, Canada, Lyon, France"
locations_fix_dict["CISIS-2018, Matsue, Japan"] = "Matsue, Japan"
locations_fix_dict["LCPC'99, La Jolla/San Diego, USA"] = "La Jolla/San Diego, USA"
locations_fix_dict["IEEE, Newark, NJ, USA"] = "Newark, NJ, USA"
locations_fix_dict["ICCBR-99, Germany"] = "Germany"
locations_fix_dict["CIRA'99, Monterey, California, USA"] = "Monterey, California, USA"
locations_fix_dict["SC11, Seattle, WA, USA"] = "Seattle, WA, USA"
locations_fix_dict["DOOD'89, Kyoto, Japan"] = "Kyoto, Japan"
locations_fix_dict["SIGUCCS'14, Salt Lake City, USA"] = "Salt Lake City, USA"
locations_fix_dict["ICACCI 2015, Kochi, India"] = "Kochi, India"
locations_fix_dict["SCIDOCA, Tsukuba, Tokyo, Japan"] = "Tsukuba, Tokyo, Japan"
locations_fix_dict["IEEE, Guadalajara, Jalisco, Mexico"] = "Guadalajara, Jalisco, Mexico"
locations_fix_dict["CBD, Suzhou, Chinao"] = "Suzhou, China"
locations_fix_dict["EvoSTOC, Istanbul, Turkey"] = "Istanbul, Turkey"
locations_fix_dict["DEXA, Zaragoza, Spain"] = "Zaragoza, Spain"
locations_fix_dict["IMC, Berlin, Germany"] = "Berlin, Germany"
locations_fix_dict["IMC, USA"] = "USA"
locations_fix_dict["ASIC, Chengdu, China"] = "Chengdu, China"
locations_fix_dict["ETAPS 2020, Dublin, Ireland"] = "Dublin, Ireland"
locations_fix_dict["MoMM 2015, Brussels, Belgium"] = "Brussels, Belgium"
locations_fix_dict["KES-2019, Budapest, Hungary"] = "Budapest, Hungary"
locations_fix_dict["SOSE 2020, Oxford, UK"] = "Oxford, UK"
locations_fix_dict["TPHOLs'96, Turku, Finland"] = "Turku, Finland"
locations_fix_dict["TMFCS'89, Poland"] = "Poland"
locations_fix_dict["HoloMAS 2007, Regensburg, Germany"] = "Regensburg, Germany"
locations_fix_dict["AIPR-07, Orlando, Florida, USA"] = "Orlando, Florida, USA"
locations_fix_dict["MoMM 2017, Salzburg, Austria"] = "Salzburg, Austria"
locations_fix_dict["RACS'13, Montreal, QC, Canada"] = "Montreal, QC, Canada"
locations_fix_dict["SAC'96, Philadelphia, PA, USA"] = "Philadelphia, PA, USA"
locations_fix_dict["IDA-99, Amsterdam, The Netherlands"] = "Amsterdam, The Netherlands"
locations_fix_dict["IIT Bombay, Mumbai, India"] = "Mumbai, India"
locations_fix_dict["FPL'99, Glasgow, UK"] = "Glasgow, UK"
locations_fix_dict["ASIC, Shenzhen, China"] = "Shenzhen, China"
locations_fix_dict["Eugene, OR, USA / 2nd IWOMP 2006"] = "Eugene, OR, USA"
locations_fix_dict["PaCT-99, St. Petersburg, Russia"] = "St. Petersburg, Russia"
locations_fix_dict["TBD, USA - United States of America"] = "USA"
locations_fix_dict["WCRE'01, Stuttgart, Germany"] = "Stuttgart, Germany"
locations_fix_dict["WI 2010, Toronto, Canada"] = "Toronto, Canada"
locations_fix_dict["SPAWC 2020, Atlanta, GA, USA"] = "Atlanta, GA, USA"
locations_fix_dict["DCGI'99, Marne-la-Vallee, France"] = "Marne-la-Vallee, France"
locations_fix_dict["IIT, Kharagpur, India"] = "Kharagpur, India"
locations_fix_dict["PASTE'02, Charleston, South Carolina, USA"] = "Charleston, South Carolina, USA"
locations_fix_dict["MFCS'91, Poland"] = "Poland"
locations_fix_dict["IMC 2004, Taormina, Sicily, Italy"] = "Sicily, Italy"
locations_fix_dict["ACM, Tsukuba, Japan"] = "Tsukuba, Japan"
locations_fix_dict["CAMS, Montpellier, France"] = "Montpellier, France"
locations_fix_dict["HSCC'15, Seattle, WA, USA"] = "Seattle, WA, USA"
locations_fix_dict["MPC'95, Germany"] = "Germany"
locations_fix_dict["BigSpatial@SIGSPATIAL 2020, Seattle, WA, USA"] = "Seattle, WA, USA"
locations_fix_dict["MASS 2020, Delhi, India"] = "Delhi, India"
locations_fix_dict["ETAPS 2004, Barcelona, Spain"] = "Barcelona, Spain"
locations_fix_dict["WAIM, Chengdu, China"] = "Chengdu, China"
locations_fix_dict["NANOARCH 2018, Athens, Greece"] = "Athens, Greece"
locations_fix_dict["IIT Guwahati, India"] = "Guwahati, India"
locations_fix_dict["IIT Madras,  Chennai, India"] = "Madras,  Chennai, India"
locations_fix_dict["WISE'01, Kyoto, Japan"] = "Kyoto, Japan"
locations_fix_dict["NBis 2015, Taipei, Taiwan"] = "Taipei, Taiwan"
locations_fix_dict["MFCS'92, Prague, Czechoslovakia"] = "Prague, Czechoslovakia"
locations_fix_dict["VLSI, Atlanta, GA, USA"] = "Atlanta, GA, USA"
locations_fix_dict["VLSI, Hong Kong, China"] = "Hong Kong, China"
locations_fix_dict["ICSC'95, Hong Kong"] = "ICSC'95, Hong Kong"
locations_fix_dict["AI*IA'95, Florence, Italy"] = "Florence, Italy"
locations_fix_dict["DEXA'93, Prague, Czech Republic"] = "Prague, Czech Republic"
locations_fix_dict["KAIST, Taejeon, South Korea"] = "Taejeon, South Korea"
locations_fix_dict["IIT Madras, Chennai, Tamil Nadu, India"] = "Madras, Chennai, Tamil Nadu, India"
locations_fix_dict["PoEM 2017, Leuven, Belgium"] = "Leuven, Belgium"
locations_fix_dict["MFCS'90, Czechoslovakia"] = "Czechoslovakia"
locations_fix_dict["MFCS'94, Kosice, Slovakia"] = "Kosice, Slovakia"
locations_fix_dict["UKSim'11, Cambridge, UK"] = "Cambridge, UK"
locations_fix_dict["IST Austria, Klosterneuburg, Austria"] = "Klosterneuburg, Austria"
locations_fix_dict["ICGI-96, Montpellier, France"] = "Montpellier, France"
locations_fix_dict["INCoS-2017, Toronto, ON, Canada"] = "Toronto, ON, Canada"
locations_fix_dict["PASTE'07, San Diego, California, USA"] = "San Diego, California, USA"

### Filter of the "- United State of America" and Other Special Cases

In [None]:
" - United States of America"
" - United States"
" - United Kingdom of Great Britain and Northern Ireland"
"Netherlands - Kingdom of the Netherlands"

### US, USA, U.S.A., U.S. and Other Special Cases

In [None]:
"U.S.A"

### United Kingdom, Great Bretain, and Other Special Cases

In [None]:
"UK"
"GB"
"United Kingdom"
"England"

### Fix of Some Special Cases

In [13]:
locations_fix_dict["Lyon,\xa0France"] = "Lyon, France"
locations_fix_dict["Workshops, Montreal, QC, Canada"] = "Montreal, QC, Canada"
locations_fix_dict[", USA"] = "USA"
locations_fix_dict["NEW ORLEANS, USA"] = "New Orleans, USA"
locations_fix_dict["NOIDA, India"] = "Noida, India"
locations_fix_dict["Orland, Florida, U.S.A."] = "Orland, Florida, USA"
locations_fix_dict["CANCUN, Mexico"] = "Cancun, Mexico"
locations_fix_dict["Auckland, New Zealand, 8-12 August 2016"] = "Auckland, New Zealand"
locations_fix_dict["IOWA STATE UNIVERSITY, USA"] = "Iowa State University, USA"
locations_fix_dict["No.1, Dai Co Viet Rd, Hanoi, Vietnam"] = "Dai Co Viet Rd, Hanoi, Vietnam"
locations_fix_dict["GUANGZHOU,CHINA"] = "Guangzhou, China"
locations_fix_dict["Guilin,Guangxi, ChinaA"] = "Guilin, Guangxi, China"
locations_fix_dict["Gyeongju, Republic of Korea - March"] = "Gyeongju, Republic of Korea"
locations_fix_dict["Pune, INDIA"] = "Pune, India"
locations_fix_dict["International, Athens, Greece"] = "Athens, Greece"
locations_fix_dict["Tokyo, JAPAN"] = "Tokyo, Japan"
locations_fix_dict["Bangkok, THAILAND"] = "Bangkok, Thailand"
locations_fix_dict["Harbin,China"] = "Harbin, China"
locations_fix_dict["Washington, D. C., USA"] = "Washington D.C., USA"
locations_fix_dict["Funchal, Madeira - Portugal"] = "Funchal, Madeira, Portugal"
locations_fix_dict["Kuantan, Pahang, MALAYSIA"] = "Kuantan, Pahang, Malaysia"
locations_fix_dict["LEIPZIG, GERMANY"] = "Leipzig, Germany"
locations_fix_dict["THESSALONIKI, GREECEY"] = "Thessaloniki, Greece"
locations_fix_dict["Phoenix Park, PyeongChang,, Korea (South)"] = "Phoenix Park, PyeongChang, Korea (South)"
locations_fix_dict["EvoFIN, EvoSTOC, Germany"] = "Germany"
locations_fix_dict["Prague,"] = "Prague"
locations_fix_dict[", York, UK"] = "York, UK"
locations_fix_dict["Royal Continental Hotel,Naples, Italy"] = "Naples, Italy"
locations_fix_dict["Puebla, MEXICO"] = "Puebla, Mexico"
locations_fix_dict["Jun 16-20, 2008"] = ""
locations_fix_dict["Taipei, Taiwan, August 29-31, 2012."] = "Taipei, Taiwan"
locations_fix_dict["YORK, UK"] = "York, UK"
locations_fix_dict["Kuala Lumpur, Malaysia."] = "Kuala Lumpur, Malaysia"
locations_fix_dict["Vienna University of Technology, Vienna"] = "Vienna, Austria"
locations_fix_dict["Hammamet,Tunisia"] = "Hammamet, Tunisia"
locations_fix_dict["MIT, Cambridge, USA"] = "Cambridge, USA"
locations_fix_dict["Cumbria, United, Kngdm"] = "Cumbria, UK"
locations_fix_dict["Hilton Hotel Cyprus, Nicosia"] = "Cyprus, Nicosia"
locations_fix_dict["changsha, China"] = "Changsha, China"
locations_fix_dict["PADERBORN, GERMANY"] = "Paderborn, Germany"
locations_fix_dict["DA NANG, Vietnam"] = "Da Nang, Vietnam"
locations_fix_dict["Durham, NC USA"] = "Durham, NC, USA"
locations_fix_dict["International, Mykonos Island, Greece"] = "Mykonos Island, Greece"
locations_fix_dict["GUNTUR, Vijayawada, PIN 622510,in"] = "Vijayawada, IN"
locations_fix_dict["Bolzano-Bozen, Italy"] = "Bolzano, Italy"
locations_fix_dict["Providence, RI,"] = "Providence, RI"
locations_fix_dict["Adisaptagram, Hooghly - 712121, India"] = "Adisaptagram, Hooghly, India"
locations_fix_dict["Alexandria, Virginia, U.S."] = "Alexandria, Virginia, USA"
locations_fix_dict["guilin, china"] = "Guilin, China"
locations_fix_dict["Washington, D.C. (USA)"] = "Washington D.C., USA"
locations_fix_dict["San, Diego, CA, USA"] = "San Diego, CA, USA"
locations_fix_dict["Bad Herrenalb near Karlsruhe, Germany"] = "Karlsruhe, Germany"
locations_fix_dict["Kinsdale,"] = "Kinsdale"
locations_fix_dict["Bhubaneswar,India."] = "Bhubaneswar, India"
locations_fix_dict["Florence, ITALY"] = "Florence, Italy"
locations_fix_dict["Munich,de"] = "Munich, DE"
locations_fix_dict["Crete, GREECE"] = "Crete, Greece"
locations_fix_dict["Montreal, QC, CANADA"] = "Montreal, QC, Canada"
locations_fix_dict["Beijing, People's Republic of China"] = "Beijing, China"
locations_fix_dict["Ceske Budejovice,cz"] = "Ceske Budejovice, CZ"
locations_fix_dict["MEXICO CITY, Mexico"] = "Mexico City, Mexico"
locations_fix_dict["DARMSTADT, Germany."] = "Darmstadt, Germany"
locations_fix_dict["singapore, Singapore"] = "Singapore, Singapore"
locations_fix_dict["St.-Petersburg, Russia"] = "St. Petersburg, Russia"
locations_fix_dict["Suwon, Korea,"] = "Suwon, Korea"
locations_fix_dict["Curium Palace Hotel, Limassol, Cyprus"] = "Limassol, Cyprus"
locations_fix_dict["Vilanova i la Geltru, Barcelona, Spain"] = "Barcelona, Spain"
locations_fix_dict["Vancouver Convention Center, Vancouver CANADA "] = "Vancouver, Canada"
locations_fix_dict["Denver,CO,USA"] = "Denver, CO, USA"
locations_fix_dict["San Francisco, U.S.A"] = "San Francisco, USA"
locations_fix_dict["Chiang Mai,, Thailand"] = "Chiang Mai, Thailand"
locations_fix_dict["DIVANI PALACE ACROPOLIS Athens, Greece"] = "Athens, Greece"
locations_fix_dict["Greenwich, London (UK)"] = "London, UK"
locations_fix_dict["Madrid,Spain"] = "Madrid, Spain"
locations_fix_dict["Chongqing,China"] = "Chongqing, China"
locations_fix_dict["Training, Atlanta, GA, USA"] = "Atlanta, GA, USA"
locations_fix_dict["denver, CA, USA"] = "Denver, CA, USA"
locations_fix_dict["HANGZHOU, PEOPLE'S REPUBLIC OF CHINA"] = "Hangzhou, China"
locations_fix_dict["Gurgaon (near New Delhi), India"] = "New Delhi, India"
locations_fix_dict["Portland, Oregon, June 18-19, 2015"] = "Portland, Oregon"
locations_fix_dict["UK, Guildford, United Kingdom"] = "Guildford, UK"
locations_fix_dict["London (Guildford), United Kingdom"] = "London, UK"
locations_fix_dict["MIT, Cambridge, U.S.A"] = "Cambridge, USA"
locations_fix_dict["54 on Bath, Rosebank, Johannesburg, South Africa"] = "Rosebank, Johannesburg, South Africa"
locations_fix_dict["hONOLULU, hAWAII"] = "Honolulu, Hawaii"
locations_fix_dict["Hefei, P.R.China"] = "Hefei, China"
locations_fix_dict["National Ilan Unviersity, I-Lan, Taiwan"] = "I-Lan, Taiwan"
locations_fix_dict["Galt House Hotel, Louisville, Kentucky, USA - United States"] = "Kentucky, USA - United States"
locations_fix_dict["HIROSHIMA, JAPAN"] = "Hiroshima, Japan"

### Correction of the Locations in the Original Dataframe

### Filter of the Papers that Only Have the Conference State (But Not the Cities)

In [10]:
locations_list.__len__()

5918

## Read of the DBLP + MAG CSV Joined Dump

In [4]:
if combine_with_partial_csv:
    new_df_joined_partial = pd.read_csv(partial_csv_path + 'out_citations_by_year_and_conferences.csv', low_memory=False, index_col=[0])
    print(f'Successfully Imported the Partial CSV')

df_joined = pd.read_csv(path_file_export + 'out_dblp_and_mag_joined.csv', low_memory=False, index_col=[0])
print(f'Successfully Imported the DBLP + MAG CSV')

Successfully Imported the Partial CSV
Successfully Imported the DBLP + MAG CSV


## Data Preparation

### Creation of the Support Dataframe
It's going to help us extracting the citation' year.

In [5]:
# Drop of the useless mag citations column
df_joined = df_joined.drop(columns=['CitationCount_Mag', 'CitationCount_MagEstimated'])

We need to create the columns that are going to contain the citation obtained by a paper during a specific year. Also, needed for filtering the COCI paper that are not contained neither and MAG or DBLP.

In [6]:
df_support_empty = df_joined.copy()

# Drop of the useless column
df_support_empty = df_support_empty.drop(columns=['ConferenceLocation', 'ConferenceNormalizedName', 'ConferenceTitle', 'OriginalTitle'])

# Creation of the support column
df_support_empty['Year_of_Citation'] = np.nan
df_support_empty.rename(columns={'Year': 'Year_of_Publication'}, inplace=True)
df_support_empty = df_support_empty.reindex(sorted(df_support_empty.columns), axis=1)

df_support_empty.loc[:5]

Unnamed: 0,Doi,Year_of_Citation,Year_of_Publication
0,10.1007/978-3-662-45174-8_28,,2014
1,10.1007/978-3-662-44777-2_60,,2014
2,10.1007/978-3-319-03973-2_13,,2013
3,10.1007/3-540-46146-9_77,,2002
4,10.1007/11785231_94,,2006
5,10.1007/978-3-642-22095-1_80,,2011


### Adding the Year Citation Columns to the Original Dataframe

In [7]:
start_year = 1950 # Probably there aren't citations before this date. We'll drop the empty columns later
actual_year = date.today().year

if not combine_with_partial_csv:
    for i in range(start_year, actual_year + 1):
        df_joined[str(i)] = 0
else:
    # We're going to use the partial joined dataframe
    # The original dataframe was only needed for the creation of the support dataframe structure
    df_joined = new_df_joined_partial.copy()
    new_df_joined_partial = None

df_joined.loc[:3]

Unnamed: 0,ConferenceLocation,ConferenceNormalizedName,ConferenceTitle,Doi,OriginalTitle,Year,1950,1951,1952,1953,1954,1955,1956,1957,1958,1959,1960,1961,1962,1963,1964,1965,1966,1967,1968,1969,1970,1971,1972,1973,1974,1975,1976,1977,1978,1979,1980,1981,1982,1983,1984,1985,1986,1987,1988,1989,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021,2022
0,"Austin, TX",disc 2014,Distributed Computing - 28th International Sym...,10.1007/978-3-662-45174-8_28,The Adaptive Priority Queue with Elimination a...,2014,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,1,1,0,2,1,0,0
1,"Wrocław, Poland",esa 2014,Algorithms - ESA 2014 - 22th Annual European S...,10.1007/978-3-662-44777-2_60,Document Retrieval on Repetitive Collections,2014,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,2,0,0,0,0,0
2,"Innsbruck, Austria",enter 2013,Information and Communication Technologies in ...,10.1007/978-3-319-03973-2_13,SoCoMo Marketing for Travel and Tourism,2013,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,3,2,0,1,1,0,0
3,"Provence, France",dexa 2002,"Database and Expert Systems Applications, 13th...",10.1007/3-540-46146-9_77,Similarity Image Retrieval System Using Hierar...,2002,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0


## Read and Join of the COCI Dump

In [8]:
# Get All Files' Names
coci_all_csvs = glob.glob(path_file_import + "*.csv")

In [9]:
count = 1
tot_csvs = coci_all_csvs.__len__()

for current_csv_name in coci_all_csvs:

    # Empty the support dataframe
    df_support = df_support_empty.copy()

    # Open the current CSV
    print(f'Currently processing CSV {count} ({tot_csvs} total): {current_csv_name}')
    count += 1
    df_coci_current_csv = pd.read_csv(current_csv_name, low_memory=False)

    # Drop of the useless columns: 'oci', 'citing', 'creation', 'journal_sc', 'author_sc'
    df_coci_current_csv = df_coci_current_csv.drop(columns=['oci', 'citing', 'creation', 'journal_sc', 'author_sc'])

    # Column rename
    df_coci_current_csv = df_coci_current_csv.rename(columns={'cited': 'Doi'})

    # Making sure that everything has the same format
    df_coci_current_csv.Doi = df_coci_current_csv.Doi.str.lower()

    # Join with the support dataframe
    df_support = pd.merge(df_support, df_coci_current_csv, on=['Doi'], how='inner')

    # Filtering the rows with a negative timespan
    df_support.timespan = df_support["timespan"].astype(str)
    df_support = df_support[~df_support["timespan"].str.contains('-')]

    # Computing the citation's year
    df_support.Year_of_Citation = df_support.timespan.str.split('Y').str[0].str.split('P').str[1]
    df_support = df_support.dropna(subset=['Year_of_Citation']) # Drop of the broken records
    df_support.Year_of_Citation = df_support.Year_of_Citation.astype(int) + df_support.Year_of_Publication.astype(int)

    # Removing the broken records
    df_support = df_support.loc[(df_support['Year_of_Citation'] <= actual_year)] # Keeping only year <= actual year
    df_support = df_support.loc[(df_support['Year_of_Citation'] >= start_year)] # Keeping only year >= 1950

    # Reshaping the dataframe and resetting its index
    df_support_reshaped = pd.crosstab(df_support.Doi, df_support.Year_of_Citation)
    df_support_reshaped = df_support_reshaped.reset_index()

    # Fixing the column name type
    for column in df_support_reshaped:
        df_support_reshaped.rename(columns = {column: str(column)}, inplace=True)

    # Join with the original dataframe
    df_joined = pd.merge(df_joined, df_support_reshaped, on=['Doi'], how='left')

    # Sum of the citation counts values
    for column in df_joined:
        if '_x' in str(column):
            coci_column = str(column).split('_x')[0] + '_y'

            # Replacing nan with zeros in the coci rows that didn't match
            df_joined[coci_column] = df_joined[coci_column].fillna(0).astype(int)

            # Column sum
            df_joined[column] += df_joined[coci_column]
            
            # Column rename and drop
            df_joined.rename(columns = {column: str(column).split('_x')[0]}, inplace=True)
            df_joined = df_joined.drop(columns=[coci_column])

Currently processing CSV 1 (25 total): /Users/marcoterzulli/File/Scuola Local/Magistrale/Materiale Corsi Attuali/Tirocinio/Cartella di Lavoro/Archivi Dump di Lavoro/Import/COCI_RAW/2022-03-15T025630_2_1.csv
Currently processing CSV 2 (25 total): /Users/marcoterzulli/File/Scuola Local/Magistrale/Materiale Corsi Attuali/Tirocinio/Cartella di Lavoro/Archivi Dump di Lavoro/Import/COCI_RAW/2022-01-21T100308_0_1.csv
Currently processing CSV 3 (25 total): /Users/marcoterzulli/File/Scuola Local/Magistrale/Materiale Corsi Attuali/Tirocinio/Cartella di Lavoro/Archivi Dump di Lavoro/Import/COCI_RAW/2022-01-21T100308_2_1.csv
Currently processing CSV 4 (25 total): /Users/marcoterzulli/File/Scuola Local/Magistrale/Materiale Corsi Attuali/Tirocinio/Cartella di Lavoro/Archivi Dump di Lavoro/Import/COCI_RAW/2022-03-15T025630_0_1.csv
Currently processing CSV 5 (25 total): /Users/marcoterzulli/File/Scuola Local/Magistrale/Materiale Corsi Attuali/Tirocinio/Cartella di Lavoro/Archivi Dump di Lavoro/Import/

In [10]:
df_joined

Unnamed: 0,ConferenceLocation,ConferenceNormalizedName,ConferenceTitle,Doi,OriginalTitle,Year,1950,1951,1952,1953,1954,1955,1956,1957,1958,1959,1960,1961,1962,1963,1964,1965,1966,1967,1968,1969,1970,1971,1972,1973,1974,1975,1976,1977,1978,1979,1980,1981,1982,1983,1984,1985,1986,1987,1988,1989,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021,2022
0,"Austin, TX",disc 2014,Distributed Computing - 28th International Sym...,10.1007/978-3-662-45174-8_28,The Adaptive Priority Queue with Elimination a...,2014,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,1,1,0,2,1,2,0
1,"Wrocław, Poland",esa 2014,Algorithms - ESA 2014 - 22th Annual European S...,10.1007/978-3-662-44777-2_60,Document Retrieval on Repetitive Collections,2014,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,2,0,0,0,0,0
2,"Innsbruck, Austria",enter 2013,Information and Communication Technologies in ...,10.1007/978-3-319-03973-2_13,SoCoMo Marketing for Travel and Tourism,2013,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,3,2,0,1,1,0,0
3,"Provence, France",dexa 2002,"Database and Expert Systems Applications, 13th...",10.1007/3-540-46146-9_77,Similarity Image Retrieval System Using Hierar...,2002,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0
4,"Zakopane, Poland",icaisc 2006,Artificial Intelligence and Soft Computing - I...,10.1007/11785231_94,Leukemia prediction from gene expression data—...,2006,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,3,0,1,0,0,1,0,1,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4988315,"Thessaloniki, Greece",sapere 2011,Philosophy and Theory of Artificial Intelligen...,10.1007/978-3-642-31674-6_9,,2011,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,0,0,0,0,0,0
4988316,"Thessaloniki, Greece",sapere 2011,Philosophy and Theory of Artificial Intelligen...,10.1007/978-3-642-31674-6_20,,2011,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,2,0,0,0
4988317,"Thessaloniki, Greece",sapere 2011,Philosophy and Theory of Artificial Intelligen...,10.1007/978-3-642-31674-6_25,,2011,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0
4988318,"Thessaloniki, Greece",sapere 2011,Philosophy and Theory of Artificial Intelligen...,10.1007/978-3-642-31674-6_12,,2011,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


## Write of the Final CSV on Disk

Saving the resulting dataframe on disk in CSV format.

In [11]:
# Write of the resulting CSV on Disk
df_joined.to_csv(path_file_export + 'out_citations_by_year_and_conferences.csv')
print(f'Successfully Exported the Joined CSV to {path_file_export}out_citations_by_year_and_conferences.csv')

Successfully Exported the Joined CSV to /Users/marcoterzulli/File/Scuola Local/Magistrale/Materiale Corsi Attuali/Tirocinio/Cartella di Lavoro/Archivi Dump di Lavoro/Export/out_citations_by_year_and_conferences.csv


Check of the Exported CSV to be sure that everything went fine.

In [12]:
# Check of the Exported CSV
df_joined_exported_csv = pd.read_csv(path_file_export + 'out_citations_by_year_and_conferences.csv', low_memory=False, index_col=[0])
df_joined_exported_csv

Unnamed: 0,ConferenceLocation,ConferenceNormalizedName,ConferenceTitle,Doi,OriginalTitle,Year,1950,1951,1952,1953,1954,1955,1956,1957,1958,1959,1960,1961,1962,1963,1964,1965,1966,1967,1968,1969,1970,1971,1972,1973,1974,1975,1976,1977,1978,1979,1980,1981,1982,1983,1984,1985,1986,1987,1988,1989,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021,2022
0,"Austin, TX",disc 2014,Distributed Computing - 28th International Sym...,10.1007/978-3-662-45174-8_28,The Adaptive Priority Queue with Elimination a...,2014,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,1,1,0,2,1,2,0
1,"Wrocław, Poland",esa 2014,Algorithms - ESA 2014 - 22th Annual European S...,10.1007/978-3-662-44777-2_60,Document Retrieval on Repetitive Collections,2014,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,2,0,0,0,0,0
2,"Innsbruck, Austria",enter 2013,Information and Communication Technologies in ...,10.1007/978-3-319-03973-2_13,SoCoMo Marketing for Travel and Tourism,2013,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,3,2,0,1,1,0,0
3,"Provence, France",dexa 2002,"Database and Expert Systems Applications, 13th...",10.1007/3-540-46146-9_77,Similarity Image Retrieval System Using Hierar...,2002,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0
4,"Zakopane, Poland",icaisc 2006,Artificial Intelligence and Soft Computing - I...,10.1007/11785231_94,Leukemia prediction from gene expression data—...,2006,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,3,0,1,0,0,1,0,1,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4988315,"Thessaloniki, Greece",sapere 2011,Philosophy and Theory of Artificial Intelligen...,10.1007/978-3-642-31674-6_9,,2011,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,0,0,0,0,0,0
4988316,"Thessaloniki, Greece",sapere 2011,Philosophy and Theory of Artificial Intelligen...,10.1007/978-3-642-31674-6_20,,2011,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,2,0,0,0
4988317,"Thessaloniki, Greece",sapere 2011,Philosophy and Theory of Artificial Intelligen...,10.1007/978-3-642-31674-6_25,,2011,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0
4988318,"Thessaloniki, Greece",sapere 2011,Philosophy and Theory of Artificial Intelligen...,10.1007/978-3-642-31674-6_12,,2011,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
