# Poggi's Conference Acceptance Rate Data Integration

Jupyter Notebook for the processing and integration of the Conference Acceptance Rate Data obtained by Prof. Francesco Poggi, professor at the University of Modena and Reggio Emilia.

*The CORE Conference Ranking provides assessments of major conferences in the computing disciplines.The rankings are managed by the CORE Executive Committee, with periodic rounds for submission of requests for addition or reranking of conferences. Decisions are made by academic committees based on objective data requested as part of the submission process.* (source: CORE)

The data was obtained by web scraping and is provided in JSON format.
____________________________________________________________

For this process, the following CSV files are needed: ```out_citations_and_conferences_location_ready_v2.csv``` and the Conference Acceptance Rate JSON files. 

The first one must be generated running the Notebook ```1 - Citation and Locations Dataset Preparation.ipynb``` that is contained in the ```5 - Conference Locations Ranking Integration``` folder of this Repository.<br>
Poggi's Conference Acceptance Rate JSON files can be downloaded from [here]().

In particular, the following operations are going to be executed:
* Opening of the CSV conference citations and locations dataset
* (Sequential) Reading of the JSON files
* Expansion of the JSON fields
* Processing of the JSON data
* Join of the different processed JSON data
* Join between the distinct conference series name and the Conference Acceptance Rate Data

Lastly, the processed dataset is going to be saved on disk in CSV format

In [1]:
# Libraries Import
import pandas as pd
import numpy as np

pd.set_option('display.max_columns', None)

## File Paths
Please set your working directory paths.

In [2]:
# ******************* PATHS ********************+

# Dumps Directory Path
path_file_import = r'/Users/marcoterzulli/File/Scuola Local/Magistrale/Materiale Corsi Attuali/Tirocinio/Cartella di Lavoro/Archivi Dump di Lavoro/Import/data_acceptance/'

# CSV Exports Directory Path
path_file_export = r'/Users/marcoterzulli/File/Scuola Local/Magistrale/Materiale Corsi Attuali/Tirocinio/Cartella di Lavoro/Archivi Dump di Lavoro/Export/'

## Read and Preparation of the Citation Dataset

In [3]:
df_citations_and_locations = pd.read_csv(path_file_export + 'out_citations_and_conferences_location_ready_v2.csv', low_memory=False, index_col=[0])
print(f'Successfully Imported the Conference Citations and Locations Ready CSV')

Successfully Imported the Conference Citations and Locations Ready CSV


In [4]:
df_citations_and_locations.head(3)

Unnamed: 0,CitationCount_COCI,CitationCount_Mag,CitationCount_MagEstimated,ConferenceLocation,ConferenceNormalizedName,ConferenceSeriesNormalizedName,Doi,Year
0,10,12,12,"Austin, Texas, United States",disc 2014,disc,10.1007/978-3-662-45174-8_28,2014
1,5,10,10,"Wrocław, Lower Silesian Voivodeship, Poland",esa 2014,esa,10.1007/978-3-662-44777-2_60,2014
2,11,20,20,"Innsbruck, Tyrol, Austria",enter 2013,enter,10.1007/978-3-319-03973-2_13,2013


### Extracion of the Distinct Conference Series from the Conference and Locations Datasets

In [5]:
df_conference_series = df_citations_and_locations.drop_duplicates(subset="ConferenceSeriesNormalizedName")

#filter of the useless columns
df_conference_series = df_conference_series.drop(df_conference_series.columns.difference(["ConferenceSeriesNormalizedName"]), axis=1)

# drop of the nan row
df_conference_series = df_conference_series.dropna(subset={"ConferenceSeriesNormalizedName"})

# reset of the index
df_conference_series = df_conference_series.reset_index(drop=True)

df_conference_series

Unnamed: 0,ConferenceSeriesNormalizedName
0,disc
1,esa
2,enter
3,dexa
4,icaisc
...,...
5307,infinity
5308,calculemus
5309,agp
5310,sci


## Understanding the Structure of the JSON Files

In [174]:
df_AcmConferences_raw = pd.read_json(path_file_import + 'AcmConferences.json')
print(f'Successfully Imported the AcmConferences JSON')

Successfully Imported the AcmConferences JSON


Print of the dataset structure:

In [15]:
df_AcmConferences_raw.head(3)

Unnamed: 0,confName,confAcronym,confUrl,info
0,DocEng: Document Engineering,DocEng,https://dl.acm.org/action/doSearch?target=brow...,"[{'year': '19', 'yearUnparsed': 'DocEng '19', ..."
1,"MANPU: Comics Aanalysis, Processing and Unders...",MANPU,https://dl.acm.org/action/doSearch?target=brow...,"[{'year': '16', 'yearUnparsed': 'MANPU '16', '..."
2,Nanoarch: Nanoscale Architectures,Nanoarch,https://dl.acm.org/action/doSearch?target=brow...,"[{'year': '18', 'yearUnparsed': 'NANOARCH '18'..."


### Expansion of the info field
The following stuff was only needed to understand the structure of the dataset

In [79]:
df_first_level = pd.json_normalize(df_AcmConferences_raw.iloc[[0]]['info'])
df_first_level

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17
0,"{'year': '19', 'yearUnparsed': 'DocEng '19', '...","{'year': '17', 'yearUnparsed': 'DocEng '17', '...","{'year': '16', 'yearUnparsed': 'DocEng '16', '...","{'year': '15', 'yearUnparsed': 'DocEng '15', '...","{'year': '14', 'yearUnparsed': 'DocEng '14', '...","{'year': '13', 'yearUnparsed': 'DocEng '13', '...","{'year': '12', 'yearUnparsed': 'DocEng '12', '...","{'year': '11', 'yearUnparsed': 'DocEng '11', '...","{'year': '10', 'yearUnparsed': 'DocEng '10', '...","{'year': '09', 'yearUnparsed': 'DocEng '09', '...","{'year': '08', 'yearUnparsed': 'DocEng '08', '...","{'year': '07', 'yearUnparsed': 'DocEng '07', '...","{'year': '06', 'yearUnparsed': 'DocEng '06', '...","{'year': '05', 'yearUnparsed': 'DocEng '05', '...","{'year': '04', 'yearUnparsed': 'DocEng '04', '...","{'year': '03', 'yearUnparsed': 'DocEng '03', '...","{'year': '02', 'yearUnparsed': 'DocEng '02', '...","{'year': '01', 'yearUnparsed': 'DocEng '01', '..."


In [80]:
pd.json_normalize(df_first_level.iloc[:, 0])

Unnamed: 0,year,yearUnparsed,yearUrl,submitted,accepted,percAccepted
0,19,DocEng '19,https://dl.acm.org/doi/proceedings/10.1145/334...,77,30,39%


## Processing of the JSON Files

### Definition of the JSON Processing Function

Definition of the processing function (is going to be used for all the different datasets):

In [219]:
def process_json_dataframe(raw_dataframe):
    out_row_list = list()

    for index, row in raw_dataframe.iterrows():

        # expansion from the info field
        df_first_level = pd.json_normalize(row['info'])

        for index_first_level, row_first_level in df_first_level.iterrows():
            # check if there's the year
            try:
                row_first_level['year'] 
            except KeyError:
                continue

            # check if there's the percAccepted
            try:
                row_first_level['percAccepted'] 
            except KeyError:
                continue
            
            # filter of null rows
            if pd.isnull(row_first_level['year']) or pd.isnull(row_first_level['percAccepted']) or pd.isnull(row_first_level['submitted']) or pd.isnull(row_first_level['accepted']):
                continue # skip row

            # filter of some year special cases
            if isinstance(row_first_level['year'], str):
                if row_first_level['year'].__len__() > 4: # special case like "19 Companion", that we need to filter
                    continue # skip row

            # filter of some invalid data
            if "?" in str(row_first_level['submitted']) or "?" in str(row_first_level['accepted']) or "?" in str(row_first_level['percAccepted']):
                continue # skip row

            # normalization of the year format
            year = int(row_first_level['year'])
            if year < 100: # two figures
                year = 2000 + year

            # creation of the support dataframe
            support_dict = dict()
            support_dict['Conf_Acronym'] = row['confAcronym'].lower()
            support_dict['Year'] = year

            # removing points or commas 
            submitted_str = str(row_first_level['submitted']).split(".")[0].split(",")[0].split("+")[0].split(">")[0].split("<")[0].split("(")[0]
            if "~" in submitted_str:
                submitted_str = submitted_str.split("~")[1]
            submitted_str = submitted_str.strip()
            if submitted_str.__len__() == 0 or "--" in submitted_str or not submitted_str.isnumeric():
                continue # skip row
            submitted = int(submitted_str)

            accepted_str = str(row_first_level['accepted']).split(".")[0].split(",")[0].split("+")[0].split(">")[0].split("<")[0].split("(")[0]
            if ":" in accepted_str:
                accepted_str = accepted_str.split(":")[1]
            if "~" in accepted_str:
                accepted_str = accepted_str.split("~")[1]
            accepted_str = accepted_str.strip()
            if accepted_str.__len__() == 0 or "--" in accepted_str or not accepted_str.isnumeric():
                continue # skip row
            accepted = int(accepted_str)
                

            perc_accepted_str = str(row_first_level['percAccepted']).split('%')[0].split("+")[0].split(">")[0].split("<")[0].replace(",", ".")
            if perc_accepted_str.__len__() == 0:
                continue # skip row
            if "~" in perc_accepted_str:
                perc_accepted = float(perc_accepted_str.split("~")[1])
            else:
                perc_accepted = float(perc_accepted_str)

            support_dict['Papers_Submitted'] = submitted
            support_dict['Papers_Accepted'] = accepted
            support_dict['Papers_Perc_Accepted'] = perc_accepted
            
            # putting the values inside our output list
            out_row_list.append(support_dict)

    # conversion of the output list to a dataframe
    return pd.DataFrame(out_row_list) 

### Dataset Processing

In [220]:
df_AcmConferences_processed = process_json_dataframe(df_AcmConferences_raw)

Check of the resulting processed dataframe:

In [221]:
df_AcmConferences_processed
    

Unnamed: 0,Conf_Acronym,Year,Papers_Submitted,Papers_Accepted,Papers_Perc_Accepted
0,doceng,2019,77,30,39.0
1,doceng,2017,71,13,18.0
2,doceng,2016,35,11,31.0
3,doceng,2015,31,11,35.0
4,doceng,2014,41,15,37.0
...,...,...,...,...,...
4006,gpgpu,2019,15,6,40.0
4007,gpgpu,2016,23,9,39.0
4008,dpg,2012,10,6,60.0
4009,icaicr,2019,382,49,13.0


## Read and Processing of the ComputerSecurityConferencesStatistics JSON

In [75]:
df_ComputerSecurity_raw = pd.read_json(path_file_import + 'ComputerSecurityConferencesStatistics.json')
print(f'Successfully Imported the ComputerSecurityConferencesStatistics JSON')

Successfully Imported the ComputerSecurityConferencesStatistics JSON


Print of the dataset structure:

In [76]:
df_ComputerSecurity_raw.head(3)

Unnamed: 0,confAcronym,confUrl,info
0,IEEE S&P,http://www.ieee-security.org/TC/SP-Index.html,"[{'percAccepted': '12%', 'accepted': '84', 'su..."
1,ACM CCS,http://www.acm.org/sigs/sigsac/ccs/,"[{'percAccepted': '16.9%', 'accepted': '121', ..."
2,USENIX Security,http://www.usenix.org/events/,"[{'percAccepted': '19.1%', 'accepted': '100', ..."


### Expansion of the info field
The following stuff was only needed to understand the structure of the dataset

In [81]:
df_first_level = pd.json_normalize(df_ComputerSecurity_raw.iloc[[0]]['info'])
df_first_level

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39
0,"{'percAccepted': '12%', 'accepted': '84', 'sub...","{'percAccepted': '11.5%', 'accepted': '63', 's...","{'percAccepted': '13%', 'accepted': '60', 'sub...","{'percAccepted': '13.3%', 'accepted': '55', 's...","{'percAccepted': '13.5%', 'accepted': '55', 's...","{'percAccepted': '13%', 'accepted': '44', 'sub...","{'percAccepted': '12%', 'accepted': '38', 'sub...","{'percAccepted': '13%', 'accepted': '40', 'sub...","{'percAccepted': '11%', 'accepted': '34', 'sub...","{'percAccepted': '11.6%', 'accepted': '31', 's...","{'percAccepted': '10%', 'accepted': '26', 'sub...","{'percAccepted': '11.2%', 'accepted': '28', 's...","{'percAccepted': '8%', 'accepted': '20', 'subm...","{'percAccepted': '9.2%', 'accepted': '23', 'su...","{'percAccepted': '8.9%', 'accepted': '17', 'su...","{'percAccepted': '10.2%', 'accepted': '19', 's...","{'percAccepted': '14.5%', 'accepted': '19', 's...","{'percAccepted': '22.1%', 'accepted': '21', 's...","{'percAccepted': '17.8%', 'accepted': '19', 's...","{'percAccepted': '13.1%', 'accepted': '18', 's...","{'percAccepted': '24.6%', 'accepted': '15', 's...","{'percAccepted': '16.4%', 'accepted': '19', 's...","{'percAccepted': '18.2%', 'accepted': '20', 's...","{'percAccepted': '29.9%', 'accepted': '20', 's...","{'percAccepted': '27.8%', 'accepted': '20', 's...","{'percAccepted': '29.2%', 'accepted': '19', 's...","{'percAccepted': '24.3%', 'accepted': '17', 's...","{'percAccepted': '23.6%', 'accepted': '21', 's...","{'percAccepted': '30.4%', 'accepted': '28', 's...","{'accepted': '34', 'year': '1990'}","{'accepted': '30', 'year': '1989'}","{'accepted': '26', 'year': '1988'}","{'accepted': '26', 'year': '1987'}","{'percAccepted': '27.5%', 'accepted': '25', 's...","{'percAccepted': '39.7%', 'accepted': '25', 's...","{'percAccepted': '64.1%', 'accepted': '25', 's...","{'percAccepted': '67.6%', 'accepted': '23', 's...","{'percAccepted': '55.9%', 'accepted': '19', 's...","{'accepted': '18', 'year': '1981'}","{'percAccepted': '100%', 'accepted': '19', 'su..."


In [83]:
pd.json_normalize(df_first_level.iloc[:, 0])

Unnamed: 0,percAccepted,accepted,submitted,year
0,12%,84,679,2019


### Dataset Processing

In [140]:
df_ComputerSecurity_processed = process_json_dataframe(df_ComputerSecurity_raw)

Check of the resulting processed dataframe:

In [141]:
df_ComputerSecurity_processed

Unnamed: 0,Conf_Acronym,Year,Papers_Submitted,Papers_Accepted,Papers_Perc_Accepted
0,ieee s&p,2019,679,84,12.0
1,ieee s&p,2018,549,63,11.5
2,ieee s&p,2017,457,60,13.0
3,ieee s&p,2016,413,55,13.3
4,ieee s&p,2015,407,55,13.5
...,...,...,...,...,...
270,dfrws,2010,39,16,41.0
271,dfrws,2009,40,15,37.5
272,dfrws,2008,43,17,39.5
273,dfrws,2007,36,17,47.2


## Read and Processing of the CryptographyConferencesStatistics JSON

In [92]:
df_Cryptography_raw = pd.read_json(path_file_import + 'CryptographyConferencesStatistics.json')
print(f'Successfully Imported the CryptographyConferencesStatistics JSON')

Successfully Imported the CryptographyConferencesStatistics JSON


In [93]:
df_Cryptography_raw.head(3)

Unnamed: 0,confAcronym,info
0,Crypto,"[{'submitted': '227', 'accepted': '60', 'ratio..."
1,Eurocrypt,"[{'submitted': '197', 'accepted': '38', 'ratio..."
2,Asiacrypt,"[{'submitted': '269', 'accepted': '54', 'ratio..."


### Expansion of the info field
The following stuff was only needed to understand the structure of the dataset

In [94]:
df_first_level = pd.json_normalize(df_ComputerSecurity_raw.iloc[[0]]['info'])
df_first_level

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39
0,"{'percAccepted': '12%', 'accepted': '84', 'sub...","{'percAccepted': '11.5%', 'accepted': '63', 's...","{'percAccepted': '13%', 'accepted': '60', 'sub...","{'percAccepted': '13.3%', 'accepted': '55', 's...","{'percAccepted': '13.5%', 'accepted': '55', 's...","{'percAccepted': '13%', 'accepted': '44', 'sub...","{'percAccepted': '12%', 'accepted': '38', 'sub...","{'percAccepted': '13%', 'accepted': '40', 'sub...","{'percAccepted': '11%', 'accepted': '34', 'sub...","{'percAccepted': '11.6%', 'accepted': '31', 's...","{'percAccepted': '10%', 'accepted': '26', 'sub...","{'percAccepted': '11.2%', 'accepted': '28', 's...","{'percAccepted': '8%', 'accepted': '20', 'subm...","{'percAccepted': '9.2%', 'accepted': '23', 'su...","{'percAccepted': '8.9%', 'accepted': '17', 'su...","{'percAccepted': '10.2%', 'accepted': '19', 's...","{'percAccepted': '14.5%', 'accepted': '19', 's...","{'percAccepted': '22.1%', 'accepted': '21', 's...","{'percAccepted': '17.8%', 'accepted': '19', 's...","{'percAccepted': '13.1%', 'accepted': '18', 's...","{'percAccepted': '24.6%', 'accepted': '15', 's...","{'percAccepted': '16.4%', 'accepted': '19', 's...","{'percAccepted': '18.2%', 'accepted': '20', 's...","{'percAccepted': '29.9%', 'accepted': '20', 's...","{'percAccepted': '27.8%', 'accepted': '20', 's...","{'percAccepted': '29.2%', 'accepted': '19', 's...","{'percAccepted': '24.3%', 'accepted': '17', 's...","{'percAccepted': '23.6%', 'accepted': '21', 's...","{'percAccepted': '30.4%', 'accepted': '28', 's...","{'accepted': '34', 'year': '1990'}","{'accepted': '30', 'year': '1989'}","{'accepted': '26', 'year': '1988'}","{'accepted': '26', 'year': '1987'}","{'percAccepted': '27.5%', 'accepted': '25', 's...","{'percAccepted': '39.7%', 'accepted': '25', 's...","{'percAccepted': '64.1%', 'accepted': '25', 's...","{'percAccepted': '67.6%', 'accepted': '23', 's...","{'percAccepted': '55.9%', 'accepted': '19', 's...","{'accepted': '18', 'year': '1981'}","{'percAccepted': '100%', 'accepted': '19', 'su..."


In [95]:
pd.json_normalize(df_first_level.iloc[:, 0])

Unnamed: 0,percAccepted,accepted,submitted,year
0,12%,84,679,2019


### Dataset Processing

In [142]:
df_Cryptography_processed = process_json_dataframe(df_Cryptography_raw)

Check of the resulting processed dataframe:

In [124]:
df_Cryptography_processed
    

Unnamed: 0,Conf_Acronym,Year,Papers_Submitted,Papers_Accepted,Papers_Perc_Accepted
0,crypto,2014,227,60,26.4
1,crypto,2013,227,61,26.9
2,crypto,2012,225,48,21.3
3,crypto,2011,230,42,18.3
4,crypto,2010,202,39,19.3
...,...,...,...,...,...
122,tcc,2010,100,33,33.0
123,tcc,2009,109,33,30.3
124,tcc,2008,81,33,40.7
125,tcc,2007,118,31,26.3


## Read and Processing of the NetworkingConferencesStatistics JSON

In [100]:
df_Networking_raw = pd.read_json(path_file_import + 'NetworkingConferencesStatistics.json')
print(f'Successfully Imported the NetworkingConferencesStatistics JSON')

Successfully Imported the NetworkingConferencesStatistics JSON


In [101]:
df_Networking_raw.head(3)

Unnamed: 0,confAcronym,confUrl,info
0,ACNS,http://icsd.i2r.a-star.edu.sg/staff/jianying/a...,"[{'year': '2003', 'yearUrl': 'http://acns2003...."
1,AIMS,,"[{'year': '2007', 'yearUrl': 'http://www.aims-..."
2,ANCS (IEEE/ACM),http://www.ancsconf.org/,"[{'year': '2005', 'yearUrl': 'http://www.cesr...."


### Expansion of the info field
The following stuff was only needed to understand the structure of the dataset

In [102]:
df_first_level = pd.json_normalize(df_ComputerSecurity_raw.iloc[[0]]['info'])
df_first_level

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39
0,"{'percAccepted': '12%', 'accepted': '84', 'sub...","{'percAccepted': '11.5%', 'accepted': '63', 's...","{'percAccepted': '13%', 'accepted': '60', 'sub...","{'percAccepted': '13.3%', 'accepted': '55', 's...","{'percAccepted': '13.5%', 'accepted': '55', 's...","{'percAccepted': '13%', 'accepted': '44', 'sub...","{'percAccepted': '12%', 'accepted': '38', 'sub...","{'percAccepted': '13%', 'accepted': '40', 'sub...","{'percAccepted': '11%', 'accepted': '34', 'sub...","{'percAccepted': '11.6%', 'accepted': '31', 's...","{'percAccepted': '10%', 'accepted': '26', 'sub...","{'percAccepted': '11.2%', 'accepted': '28', 's...","{'percAccepted': '8%', 'accepted': '20', 'subm...","{'percAccepted': '9.2%', 'accepted': '23', 'su...","{'percAccepted': '8.9%', 'accepted': '17', 'su...","{'percAccepted': '10.2%', 'accepted': '19', 's...","{'percAccepted': '14.5%', 'accepted': '19', 's...","{'percAccepted': '22.1%', 'accepted': '21', 's...","{'percAccepted': '17.8%', 'accepted': '19', 's...","{'percAccepted': '13.1%', 'accepted': '18', 's...","{'percAccepted': '24.6%', 'accepted': '15', 's...","{'percAccepted': '16.4%', 'accepted': '19', 's...","{'percAccepted': '18.2%', 'accepted': '20', 's...","{'percAccepted': '29.9%', 'accepted': '20', 's...","{'percAccepted': '27.8%', 'accepted': '20', 's...","{'percAccepted': '29.2%', 'accepted': '19', 's...","{'percAccepted': '24.3%', 'accepted': '17', 's...","{'percAccepted': '23.6%', 'accepted': '21', 's...","{'percAccepted': '30.4%', 'accepted': '28', 's...","{'accepted': '34', 'year': '1990'}","{'accepted': '30', 'year': '1989'}","{'accepted': '26', 'year': '1988'}","{'accepted': '26', 'year': '1987'}","{'percAccepted': '27.5%', 'accepted': '25', 's...","{'percAccepted': '39.7%', 'accepted': '25', 's...","{'percAccepted': '64.1%', 'accepted': '25', 's...","{'percAccepted': '67.6%', 'accepted': '23', 's...","{'percAccepted': '55.9%', 'accepted': '19', 's...","{'accepted': '18', 'year': '1981'}","{'percAccepted': '100%', 'accepted': '19', 'su..."


In [103]:
pd.json_normalize(df_first_level.iloc[:, 0])

Unnamed: 0,percAccepted,accepted,submitted,year
0,12%,84,679,2019


### Dataset Processing

In [143]:
df_Networking_processed = process_json_dataframe(df_Networking_raw)

Check of the resulting processed dataframe:

In [144]:
df_Networking_processed
    

Unnamed: 0,Conf_Acronym,Year,Papers_Submitted,Papers_Accepted,Papers_Perc_Accepted
0,acns,2003,191,32,16.8
1,acns,2004,297,36,12.1
2,acns,2005,158,35,22.2
3,acns,2006,218,33,15.1
4,acns,2007,260,31,11.9
...,...,...,...,...,...
1437,www,2007,750,111,14.8
1438,www,2008,880,97,11.0
1439,www,2009,823,105,12.8
1440,www,2010,754,91,12.1


## Read and Processing of the SEConferencesStatistics JSON

In [145]:
df_SEConferences_raw = pd.read_json(path_file_import + 'SEConferencesStatistics.json')
print(f'Successfully Imported the SEConferencesStatistics JSON')

Successfully Imported the SEConferencesStatistics JSON


In [146]:
df_SEConferences_raw.head(3)

Unnamed: 0,confAcronym,confUrl,info
0,ICSE,[http://portal.acm.org/browse_dl.cfm?linked=1&...,"[{'accepted': '99', 'submitted': '496', 'percA..."
1,FSE/ESEC,[http://portal.acm.org/browse_dl.cfm?linked=1&...,"[{'accepted': '61', 'submitted': '280', 'percA..."
2,ASE,[http://ieeexplore.ieee.org/xpl/conferences.js...,"[{'accepted': '55', 'submitted': '276', 'percA..."


### Expansion of the info field
The following stuff was only needed to understand the structure of the dataset

In [147]:
df_first_level = pd.json_normalize(df_ComputerSecurity_raw.iloc[[0]]['info'])
df_first_level

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39
0,"{'percAccepted': '12%', 'accepted': '84', 'sub...","{'percAccepted': '11.5%', 'accepted': '63', 's...","{'percAccepted': '13%', 'accepted': '60', 'sub...","{'percAccepted': '13.3%', 'accepted': '55', 's...","{'percAccepted': '13.5%', 'accepted': '55', 's...","{'percAccepted': '13%', 'accepted': '44', 'sub...","{'percAccepted': '12%', 'accepted': '38', 'sub...","{'percAccepted': '13%', 'accepted': '40', 'sub...","{'percAccepted': '11%', 'accepted': '34', 'sub...","{'percAccepted': '11.6%', 'accepted': '31', 's...","{'percAccepted': '10%', 'accepted': '26', 'sub...","{'percAccepted': '11.2%', 'accepted': '28', 's...","{'percAccepted': '8%', 'accepted': '20', 'subm...","{'percAccepted': '9.2%', 'accepted': '23', 'su...","{'percAccepted': '8.9%', 'accepted': '17', 'su...","{'percAccepted': '10.2%', 'accepted': '19', 's...","{'percAccepted': '14.5%', 'accepted': '19', 's...","{'percAccepted': '22.1%', 'accepted': '21', 's...","{'percAccepted': '17.8%', 'accepted': '19', 's...","{'percAccepted': '13.1%', 'accepted': '18', 's...","{'percAccepted': '24.6%', 'accepted': '15', 's...","{'percAccepted': '16.4%', 'accepted': '19', 's...","{'percAccepted': '18.2%', 'accepted': '20', 's...","{'percAccepted': '29.9%', 'accepted': '20', 's...","{'percAccepted': '27.8%', 'accepted': '20', 's...","{'percAccepted': '29.2%', 'accepted': '19', 's...","{'percAccepted': '24.3%', 'accepted': '17', 's...","{'percAccepted': '23.6%', 'accepted': '21', 's...","{'percAccepted': '30.4%', 'accepted': '28', 's...","{'accepted': '34', 'year': '1990'}","{'accepted': '30', 'year': '1989'}","{'accepted': '26', 'year': '1988'}","{'accepted': '26', 'year': '1987'}","{'percAccepted': '27.5%', 'accepted': '25', 's...","{'percAccepted': '39.7%', 'accepted': '25', 's...","{'percAccepted': '64.1%', 'accepted': '25', 's...","{'percAccepted': '67.6%', 'accepted': '23', 's...","{'percAccepted': '55.9%', 'accepted': '19', 's...","{'accepted': '18', 'year': '1981'}","{'percAccepted': '100%', 'accepted': '19', 'su..."


In [148]:
pd.json_normalize(df_first_level.iloc[:, 0])

Unnamed: 0,percAccepted,accepted,submitted,year
0,12%,84,679,2019


### Dataset Processing

In [163]:
df_SEConferences_processed = process_json_dataframe(df_SEConferences_raw)

Check of the resulting processed dataframe:

In [164]:
df_SEConferences_processed

Unnamed: 0,Conf_Acronym,Year,Papers_Submitted,Papers_Accepted,Papers_Perc_Accepted
0,icse,2014,496,99,20.0
1,icse,2013,461,85,18.0
2,icse,2012,408,87,21.0
3,icse,2011,441,62,14.0
4,icse,2010,380,52,14.0
...,...,...,...,...,...
883,policy,2006,59,18,31.0
884,policy,2005,90,20,22.0
885,policy,2004,87,18,21.0
886,fmse,2005,22,8,36.0


## Read and Processing of the TheoreticalCSConferencesStatistics JSON

In [170]:
df_TheoreticalCS_raw = pd.read_json(path_file_import + 'TheoreticalCSConferencesStatistics.json')
print(f'Successfully Imported the TheoreticalCSConferencesStatistics JSON')

Successfully Imported the TheoreticalCSConferencesStatistics JSON


In [171]:
df_TheoreticalCS_raw.head(3)

Unnamed: 0,confAcronym,info,usualMonth
0,AAAI,[{'yearUrl': 'https://google.com/search?source...,February
1,AAIM,[{'yearUrl': 'https://google.com/search?source...,July
2,ACML,[{'yearUrl': 'https://google.com/search?source...,November


### Expansion of the info field
The following stuff was only needed to understand the structure of the dataset

In [172]:
df_first_level = pd.json_normalize(df_ComputerSecurity_raw.iloc[[0]]['info'])
df_first_level

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39
0,"{'percAccepted': '12%', 'accepted': '84', 'sub...","{'percAccepted': '11.5%', 'accepted': '63', 's...","{'percAccepted': '13%', 'accepted': '60', 'sub...","{'percAccepted': '13.3%', 'accepted': '55', 's...","{'percAccepted': '13.5%', 'accepted': '55', 's...","{'percAccepted': '13%', 'accepted': '44', 'sub...","{'percAccepted': '12%', 'accepted': '38', 'sub...","{'percAccepted': '13%', 'accepted': '40', 'sub...","{'percAccepted': '11%', 'accepted': '34', 'sub...","{'percAccepted': '11.6%', 'accepted': '31', 's...","{'percAccepted': '10%', 'accepted': '26', 'sub...","{'percAccepted': '11.2%', 'accepted': '28', 's...","{'percAccepted': '8%', 'accepted': '20', 'subm...","{'percAccepted': '9.2%', 'accepted': '23', 'su...","{'percAccepted': '8.9%', 'accepted': '17', 'su...","{'percAccepted': '10.2%', 'accepted': '19', 's...","{'percAccepted': '14.5%', 'accepted': '19', 's...","{'percAccepted': '22.1%', 'accepted': '21', 's...","{'percAccepted': '17.8%', 'accepted': '19', 's...","{'percAccepted': '13.1%', 'accepted': '18', 's...","{'percAccepted': '24.6%', 'accepted': '15', 's...","{'percAccepted': '16.4%', 'accepted': '19', 's...","{'percAccepted': '18.2%', 'accepted': '20', 's...","{'percAccepted': '29.9%', 'accepted': '20', 's...","{'percAccepted': '27.8%', 'accepted': '20', 's...","{'percAccepted': '29.2%', 'accepted': '19', 's...","{'percAccepted': '24.3%', 'accepted': '17', 's...","{'percAccepted': '23.6%', 'accepted': '21', 's...","{'percAccepted': '30.4%', 'accepted': '28', 's...","{'accepted': '34', 'year': '1990'}","{'accepted': '30', 'year': '1989'}","{'accepted': '26', 'year': '1988'}","{'accepted': '26', 'year': '1987'}","{'percAccepted': '27.5%', 'accepted': '25', 's...","{'percAccepted': '39.7%', 'accepted': '25', 's...","{'percAccepted': '64.1%', 'accepted': '25', 's...","{'percAccepted': '67.6%', 'accepted': '23', 's...","{'percAccepted': '55.9%', 'accepted': '19', 's...","{'accepted': '18', 'year': '1981'}","{'percAccepted': '100%', 'accepted': '19', 'su..."


In [173]:
pd.json_normalize(df_first_level.iloc[:, 0])

Unnamed: 0,percAccepted,accepted,submitted,year
0,12%,84,679,2019


## Join Between the Different Processed JSON Files

## Write of the Final CSVs on Disk

Saving the resulting dataframe on disk in CSV format.

In [47]:
# Write of the resulting CSVs on Disk
df_conference_series_with_core_rank.to_csv(path_file_export + 'out_conference_series_with_core_rank.csv')
print(f'Successfully Exported the Joined CSV to {path_file_export}out_conference_series_with_core_rank.csv')

Successfully Exported the Joined CSV to /Users/marcoterzulli/File/Scuola Local/Magistrale/Materiale Corsi Attuali/Tirocinio/Cartella di Lavoro/Archivi Dump di Lavoro/Export/out_conference_series_with_core_rank.csv


Check of the Exported CSV to be sure that everything went fine.

In [48]:
# Check of the Exported CSV
df_conference_series_with_core_rank = pd.read_csv(path_file_export + 'out_conference_series_with_core_rank.csv', low_memory=False, index_col=[0])
df_conference_series_with_core_rank

Unnamed: 0,CORE_2008_Rank,CORE_2013_Rank,CORE_2014_Rank,CORE_2017_Rank,CORE_2018_Rank,CORE_2020_Rank,CORE_2021_Rank,ConferenceSeriesNormalizedName,ERA_2010_Rank
0,A,A,A,A,A,A,A,disc,A
1,A,A,A,A,A,A,A,esa,A
2,,C,C,C,C,C,C,enter,C
3,A,B,B,B,B,B,B,dexa,B
4,,C,C,C,C,,,icaisc,C
...,...,...,...,...,...,...,...,...,...
5307,,,,,,,,infinity,
5308,,,,,,,,calculemus,
5309,,,,,,,,agp,
5310,,,,,,,,sci,
