We collected data from [AMiner API](https://cn.aminer.org/citation), afterwards select 25 top conference from 7 fields.

- 'DM': 'Data_Mining'
- 'DP': 'Distributed_and_Parallel_Computing'
- 'ED': 'Computer_Education'
- 'ML': 'Machine_Learning'
- 'NC': 'Networks,Communications&Performance'
- 'NL':'Natural_Language_Processing'
- 'OS': 'Operating_Systems/Simulations'

We use [this ranking](http://webdocs.cs.ualberta.ca/~zaiane/htmldocs/ConfRanking.html) for reference.

In [1]:
%matplotlib inline
%config InlineBackend.figure_format = 'retina'
import numpy as np
import pandas as pd
import scipy as sp
import matplotlib.pyplot as plt
import seaborn as sns
import gensim
import networkx as nx

In [2]:
df = pd.read_csv("dblp.csv")

In [3]:
df = df.dropna(subset=['title', 'author', 'time', 'publication', 'index'], how='any')

In [4]:
df = df.drop(['index', 'citation'], axis=1)

In [5]:
df.head()

Unnamed: 0,title,author,time,publication
254,Information Contents of Fracture Lines.,"Helena Cristina da Gama Leitão, Jorge Stolfi",2000.0,WSCG
255,Influence of Dynamic Wrinkles on the Perceived...,"Javier Alcon, David Travieso, Caroline Larboul...",2013.0,WSCG
256,Automatic Graphic User Interface Generation fo...,Wilfrid Lefer,2002.0,WSCG
257,Refinement and Hierarchical Coarsening Schemes...,"José P. Suárez, Angel Plaza",2003.0,WSCG
258,Efficient NURBS Rendering using View-Dependent...,"Michael Guthe, Reinhard Klein",2003.0,WSCG


In [6]:
dp_conf = ['PPOPP', 'PACT', 'IPDPS', 'ICPP']

In [7]:
ml_conf = ['IJCAI', 'AAAI','ICML', 'NIPS']

In [8]:
nc_conf = ['SIGCOMM', 'PERFORMANCE', 'SIGMETRICS', 'INFOCOM', 'MOBICOM']

In [9]:
dm_conf = ['ICDE', 'SIGMOD', 'KDD', 'ICDM']

In [10]:
ed_conf = ['AIED', 'ITS', 'ICALT']

In [11]:
nl_conf = ['ACL', 'EACL', 'COLING', 'EMNLP']

In [12]:
os_conf = ['MASCOTS', 'SOSP', 'OSDI']

In [13]:
def get_field_df(conf_list, field):
    temp = df[df.publication.apply(lambda x: x in conf_list)]
    temp['field'] = [field] * temp.shape[0]
    print temp.shape
    print temp.groupby('publication').count()
    return temp

In [14]:
dp_df = get_field_df(dp_conf, 'DP')

(6568, 5)
             title  author  time  field
publication                            
ICPP          2053    2053  2053   2053
IPDPS         3749    3749  3749   3749
PACT           455     455   455    455
PPOPP          311     311   311    311


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  app.launch_new_instance()


In [15]:
ml_df = get_field_df(ml_conf, 'ML')

(18628, 5)
             title  author  time  field
publication                            
AAAI          5693    5693  5693   5693
ICML          2307    2307  2307   2307
IJCAI         5160    5160  5160   5160
NIPS          5468    5468  5468   5468


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  app.launch_new_instance()


In [16]:
nc_df = get_field_df(nc_conf, 'NC')

(8829, 5)
             title  author  time  field
publication                            
INFOCOM       6693    6693  6693   6693
MOBICOM        521     521   521    521
SIGCOMM        816     816   816    816
SIGMETRICS     799     799   799    799


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  app.launch_new_instance()


In [17]:
dm_df = get_field_df(dm_conf, 'DM')

(6985, 5)
             title  author  time  field
publication                            
ICDE          2984    2984  2984   2984
ICDM          2087    2087  2087   2087
KDD           1914    1914  1914   1914


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  app.launch_new_instance()


In [18]:
ed_df = get_field_df(ed_conf, 'ED')

(4688, 5)
             title  author  time  field
publication                            
AIED           952     952   952    952
ICALT         3310    3310  3310   3310
ITS            426     426   426    426


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  app.launch_new_instance()


In [19]:
nl_df = get_field_df(nl_conf, 'NL')

(6885, 5)
             title  author  time  field
publication                            
ACL           2092    2092  2092   2092
COLING        2784    2784  2784   2784
EACL           816     816   816    816
EMNLP         1193    1193  1193   1193


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  app.launch_new_instance()


In [20]:
os_df = get_field_df(os_conf, 'OS')

(1668, 5)
             title  author  time  field
publication                            
MASCOTS        944     944   944    944
OSDI           252     252   252    252
SOSP           472     472   472    472


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  app.launch_new_instance()


In [21]:
total_df = pd.concat((dp_df, ml_df, nc_df, dm_df, ed_df, nl_df, os_df))

In [22]:
total_df.to_csv("seven_topconf_papers.csv")