# GRIN Conference Rating Integration

Jupyter Notebook for the processing and integration of the GRIN Conference Rating.

The GRIN Rating is an initiative sponsored by GII (Group of Italian Professors of Computer Engineering), GRIN (Group of Italian Professors of Computer Science), and SCIE (Spanish Computer-Science Society).

The GRIN Rating is provided in XLSX format.
____________________________________________________________

For this process, the following CSV files are needed: ```out_citations_and_conferences_location_ready_v2.csv``` and ```GII-GRIN-SCIE-Conference-Rating-24-ott-2021-9.17.09-Output.xlsx```. 

The first one must be generated running the Notebook ```1 - Citation and Locations Dataset Preparation.ipynb``` that is contained in the same folder as this notebook.<br>
The second one can be downloaded from the [GRIN website](http://www.consorzio-cini.it/gii-grin-scie-rating.html). **Note**: the name of the file could change in the future.

In particular, the following operations are going to be executed:
* Reading the GRIN XLSX file
* Opening of the CSV conference citations and locations dataset
* Drop of the useless GRIN columns
* Filter of the conferences without a rank
* Extraction of the distinct conference series name from the conference citations and locations dataset
* Join between the distinct conference series name and the GRIN ratings

Lastly, the processed datasets are going to be saved on disk in CSV format

In [1]:
# Libraries Import
import pandas as pd
import numpy as np

pd.set_option('display.max_columns', None)

## File Paths
Please set your working directory paths.

In [2]:
# ******************* PATHS ********************+

# Dumps Directory Path
path_file_import = r'/Users/marcoterzulli/File/Scuola Local/Magistrale/Materiale Corsi Attuali/Tirocinio/Cartella di Lavoro/Archivi Dump di Lavoro/Import/'

# CSV Exports Directory Path
path_file_export = r'/Users/marcoterzulli/File/Scuola Local/Magistrale/Materiale Corsi Attuali/Tirocinio/Cartella di Lavoro/Archivi Dump di Lavoro/Export/'

## Selection of the XLSX File and of the Sheet

Specify the XLSX file path:

In [3]:
grin_file_name = "GII-GRIN-SCIE-Conference-Rating-24-ott-2021-9.17.09-Output.xlsx"

Select the sheet that you want to use:

**Note**: there should be only one sheet, but things could change in the future

In [4]:
sheet_name = "GII-GRIN-SCIE-Conference-Rating"

## Data Import

### Read of the Citation Datasets

In [5]:
df_citations_and_locations = pd.read_csv(path_file_export + 'out_citations_and_conferences_location_ready_v2.csv', low_memory=False, index_col=[0])
print(f'Successfully Imported the Conference Citations and Locations Ready CSV')

Successfully Imported the Conference Citations and Locations Ready CSV


In [6]:
df_citations_and_locations.head(3)

Unnamed: 0,CitationCount_COCI,CitationCount_Mag,CitationCount_MagEstimated,ConferenceLocation,ConferenceNormalizedName,Doi,Year
0,10,12,12,"Austin, Texas, United States",disc 2014,10.1007/978-3-662-45174-8_28,2014
1,5,10,10,"Wrocław, Lower Silesian Voivodeship, Poland",esa 2014,10.1007/978-3-662-44777-2_60,2014
2,11,20,20,"Innsbruck, Tyrol, Austria",enter 2013,10.1007/978-3-319-03973-2_13,2013


### Read of the GRIN Rating File

Note: the first row is a useless header, hence it's going to be skipped.

In [7]:
df_grin_xls = pd.read_excel(io=path_file_import + grin_file_name, sheet_name=sheet_name, dtype=str, skiprows=1)

Here you can check the imported XLSX to be sure that the data types are correct:

In [8]:
df_grin_xls.head(5)

Unnamed: 0,0,Title,Acronym,GGS Class,GGS Rating,Qualified Classes,Collected Classes,All Qualified Classes,MA-Max. Field Rating,MA-Best Rank Field Rating,MA-Best Class Field Rating,MA-All Field Ratings,MA-All Ranks Field Rating,MA-All Classes Field Rating,MA-Max. Avg. Citations,MA-Best Rank Avg. Citations,MA-Best Class Avg. Citations,MA-All Avg. Citations,MA-All Ranks Avg. Citations,MA-All Classes Avg. Citations,MA-Max. Num. of Papers,MA-Max. Citations,MA-All Num. of Papers,MA-All Citations,LiveSHINE-Max. H-Index,LiveSHINE-Best Rank H-Index,LiveSHINE-Best Class H-Index,LiveSHINE-All H-Indexes,LiveSHINE-All Ranks H-Index,LiveSHINE-All Classes H-Index,LiveSHINE-Max. Avg. Citations,LiveSHINE-Best Rank Avg. Citations,LiveSHINE-Best Class Avg. Citations,LiveSHINE-All Avg. Citations,LiveSHINE-All Ranks Avg. Citations,LiveSHINE-All Classes Avg. Citations,LiveSHINE-Max. Num. of Papers,LiveSHINE-Max. Citations,LiveSHINE-All Num. of Papers,LiveSHINE-All Citations,CORE-Best Class,CORE-All Classes,Num. of Input Records
0,0,"INTERNATIONAL CONFERENCE ON 3D IMAGING, MODELI...",3DIMPVT,Work in Progress,Work in Progress,MA:B-,B-,MA:[C|A],27,1230,C,27,1230,C,1988,314,A,1988,314,A,129,2565,129,2565,,,,,,,,,,,,,,,,,,,1
1,1,"INTERNATIONAL SYMPOSIUM ON 3D DATA PROCESSING,...",3DPVT,Work in Progress,Work in Progress,MA:B-,B-,MA:[C|A],41,654,C,41,654,C,1954,321,A,1954,321,A,436,8518,436,8518,,,,,,,,,,,,,,,,,,,1
2,2,"3DTV-CONFERENCE: THE TRUE VISION - CAPTURE, TR...",3DTV-CON,Work in Progress,Work in Progress,MA:C,C,MA:[C|C],29,1098,C,29,1098,C,744,1034,C,744,1034,C,555,4128,555,4128,,,,,,,,,,,,,,,,,,,1
3,3,IEEE SYMPOSIUM ON 3D USER INTERFACES,3DUI,Work in Progress,Work in Progress,"LiveSHINE:B, MA:C","B, C","LiveSHINE:[B|C], MA:[C|B]",39,706,C,39,706,C,121,612,B,121,612,B,548,6631,548,6631,32.0,379.0,B,32.0,379.0,B,911.0,509.0,C,911.0,509.0,C,418.0,3808.0,418.0,3808.0,,,2
4,4,INTERNATIONAL CONFERENCE ON 3D VISION,3DV,Work in Progress,Work in Progress,MA:A-,A-,MA:[B|A],46,540,B,46,540,B,2046,305,A,2046,305,A,670,13710,670,13710,,,,,,,,,,,,,,,,,,,1


## GRIN Dataframe Cleanup

### Drop of the Useless Columns

In [9]:
df_grin_xls.drop(df_grin_xls.columns.difference(['Acronym', "GGS Class", "GGS Rating"]), axis=1, inplace=True)
df_grin_xls.head(5)

Unnamed: 0,Acronym,GGS Class,GGS Rating
0,3DIMPVT,Work in Progress,Work in Progress
1,3DPVT,Work in Progress,Work in Progress
2,3DTV-CON,Work in Progress,Work in Progress
3,3DUI,Work in Progress,Work in Progress
4,3DV,Work in Progress,Work in Progress


In [10]:
df_grin_xls.tail(5)

Unnamed: 0,Acronym,GGS Class,GGS Rating
2391,,Work in Progress,Work in Progress
2392,,Work in Progress,Work in Progress
2393,,Work in Progress,Work in Progress
2394,,Work in Progress,Work in Progress
2395,,Work in Progress,Work in Progress


### Filter of the Invalid Rows
We're going to remove the rows that contain "Work in Progress" ratings or don't contain the conference acryonim

In [11]:
df_grin_xls = df_grin_xls[df_grin_xls["GGS Rating"].str.contains("Work in Progress") == False]
df_grin_xls = df_grin_xls[df_grin_xls["GGS Rating"].str.contains("Not Rated") == False]
df_grin_xls = df_grin_xls.dropna(subset=['Acronym'])

# reset of the index
df_grin_xls = df_grin_xls.reset_index(drop=True)

df_grin_xls.head(5)

Unnamed: 0,Acronym,GGS Class,GGS Rating
0,AAAI,1,A++
1,AAMAS,2,A
2,ACCV,3,B
3,DIMEA,3,B-
4,ACII,3,B-


## Extracion of the Distinct Conference Series from the Conference and Locations Datasets

In [14]:
df_conference_series = df_citations_and_locations.drop_duplicates(subset="ConferenceSeriesNormalizedName")

#filter of the useless columns
df_conference_series = df_conference_series.drop(df_conference_series.columns.difference(["ConferenceSeriesNormalizedName"]), axis=1)

# reset of the index
df_conference_series = df_conference_series.reset_index(drop=True)

df_conference_series

Unnamed: 0,ConferenceSeriesNormalizedName
0,disc
1,esa
2,enter
3,dexa
4,icaisc
...,...
5314,infinity
5315,calculemus
5316,agp
5317,sci


## Join Between the Conference Series (from the Conference Citations and Locations Dataset) and the GRIN Ratings

### The Idea

We're going to join the Grin ratings to the distinct conference series that we previoulsy extracted from the Conference Citations and Locations Dataframe.

The resulting dataframe is going to have only the conference series that are present in the Conference Citations and Locations Dataframe, so it can be easily joined with it if needed.

### Data Preparation and Join

Rename of some GRIN columns:

In [15]:
df_grin_xls = df_grin_xls.rename(columns={'Acronym': 'ConferenceSeriesNormalizedName', 'GGS Class': 'GrinClass', 'GGS Rating': 'GrinRating'})

Making sure that all dois are in lowercase:

In [16]:
df_grin_xls.ConferenceSeriesNormalizedName = df_grin_xls.ConferenceSeriesNormalizedName.str.lower()

Fix of the Grin Class column data type

In [17]:
df_grin_xls = df_grin_xls.astype({"GrinClass": int}) 

Now we can proceed with the join and cleaning operations:

In [18]:
df_conference_series_with_grin_rank = pd.merge(df_conference_series, df_grin_xls, on=['ConferenceSeriesNormalizedName'], how='left')

# Column sort
df_conference_series_with_grin_rank = df_conference_series_with_grin_rank.reindex(sorted(df_conference_series_with_grin_rank.columns), axis=1)

Print of the final dataset:

In [19]:
df_conference_series_with_grin_rank

Unnamed: 0,ConferenceSeriesNormalizedName,GrinClass,GrinRating
0,disc,3.0,B
1,esa,2.0,A-
2,enter,,
3,dexa,3.0,B
4,icaisc,,
...,...,...,...
5314,infinity,,
5315,calculemus,,
5316,agp,,
5317,sci,,


## Write of the Final CSVs on Disk

Saving the resulting dataframe on disk in CSV format.

In [20]:
# Write of the resulting CSVs on Disk
df_conference_series_with_grin_rank.to_csv(path_file_export + 'out_conference_series_with_grin_rank.csv')
print(f'Successfully Exported the Joined CSV to {path_file_export}out_conference_series_with_grin_rank.csv')

Successfully Exported the Joined CSV to /Users/marcoterzulli/File/Scuola Local/Magistrale/Materiale Corsi Attuali/Tirocinio/Cartella di Lavoro/Archivi Dump di Lavoro/Export/out_citations_and_conferences_location_ready_v2.csv
Successfully Exported the Joined CSV to /Users/marcoterzulli/File/Scuola Local/Magistrale/Materiale Corsi Attuali/Tirocinio/Cartella di Lavoro/Archivi Dump di Lavoro/Export/out_conference_series_with_grin_rank.csv


Check of the Exported CSV to be sure that everything went fine.

In [22]:
# Check of the Exported CSV
df_joined_exported_csv_conference_series_with_grin_rank = pd.read_csv(path_file_export + 'out_conference_series_with_grin_rank.csv', low_memory=False, index_col=[0])
df_joined_exported_csv_conference_series_with_grin_rank

Unnamed: 0,ConferenceSeriesNormalizedName,GrinClass,GrinRating
0,disc,3.0,B
1,esa,2.0,A-
2,enter,,
3,dexa,3.0,B
4,icaisc,,
...,...,...,...
5314,infinity,,
5315,calculemus,,
5316,agp,,
5317,sci,,
