# BLG 527E - Machine Learning
## Term Project

In [1]:
#import neccessary libraries
import warnings
import os
warnings.filterwarnings('ignore')

#libraries for data analysis
import pandas as pd
import numpy as np
import math
from scipy import stats
from scipy.stats import norm

#libraries for data visualization
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

#show all row and column values
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)

## Data Description
### Citation Knowledge with Position Dataset

This dataset contains information from scientific publications written by authors who have published papers in the RecSys conference. It contains four files which have information extracted from scientific publications. The details of each file are explained below:

i) all_authors.tsv: This file contains the details of authors who published research papers in the RecSys conference. The details include authors' identifier in various forms, such as number, orcid id, dblp url, dblp key and google scholar url, authors' first name, last name and their affiliation (where they work)

ii) all_publications.tsv: This file contains the details of publications authored by the authors mentioned in the all_authors.tsv file (Please note the list of publications does not contain all the authored publications of the authors, refer to the publication for further details).

The details include publications' identifier in different forms (such as number, dblp key, dblp url, dblp key, google scholar url), title, filtered title, published date, published conference and paper abstract.

iii) selected_author_publications_information.tsv: This file consists of identifiers of authors and their publications. Here, we provide the information of selected authors and their publications used for our experiment.

iv) selected_publication_citations_information.tsv: This file contains the information of the selected publications which consists of both citing and cited papers’ information used in our experiment. It consists of identifier of citing paper, identifier of cited paper, citation title, citation filtered title, the sentence before the citation is mentioned, citing sentence, the sentence after the citation is mentioned, citation position (section).

Please note, it does not contain information of all the citations cited in the publications. For more detail, please refer to the paper.

This dataset is for the use of research purposes only and if you use this dataset, please cite our paper "Capturing and exploiting citation knowledge for recommending recently published papers".

https://ieeexplore.ieee.org/abstract/document/9338486

In [2]:
from google.colab import drive
drive.mount('/content/drive/')

Drive already mounted at /content/drive/; to attempt to forcibly remount, call drive.mount("/content/drive/", force_remount=True).


In [3]:
import os
os.chdir("/content/drive/My Drive/ML Project")
!ls

all_authors.tsv
all_publications.tsv
BLG527E-Project.ipynb
Directionally_Paired_Principal_Component_Analysis_for_Bivariate_Estimation_Problems.pdf
Feature_Extraction_and_Selection_via_Robust_Discriminant_Analysis_and_Class_Sparsity.pdf
Kernel_Discriminant_Correlation_Analysis_Feature_Level_Fusion_for_Nonlinear_Biometric_Recognition.pdf
selected_author_publications_information.tsv
selected_publication_citations_information.tsv


In [4]:
#read data from .tsv files
df_all_authors = pd.read_csv("all_authors.tsv", sep='\t', encoding= 'ISO-8859-1')
df_all_publications = pd.read_csv("all_publications.tsv", sep='\t')
df_selected_author_publications = pd.read_csv("selected_author_publications_information.tsv", sep='\t')
df_selected_publication_citations = pd.read_csv("selected_publication_citations_information.tsv", sep='\t')


### Data Exploration

In [5]:
#Author Data
print(df_all_authors.columns)
print(df_all_authors.shape)
df_all_authors.head(3)

Index(['author_id', 'author_orchid_id', 'author_fname', 'author_lname',
       'author_dblp_url', 'author_dblp_key', 'author_gscholar_url',
       'author_affiliation', 'author_page'],
      dtype='object')
(1931, 9)


Unnamed: 0,author_id,author_orchid_id,author_fname,author_lname,author_dblp_url,author_dblp_key,author_gscholar_url,author_affiliation,author_page
0,1,,Jinkai,Yu,https://dblp.org/pid/227/0764,homepages/227/0764,https://scholar.google.com/scholar?q=Jinkai+Yu,,
1,2,0000-0002-6866-9451,Pasquale,Lops,https://dblp.org/pers/hb/l/Lops:Pasquale.html,homepages/78/5518,https://scholar.google.com/scholar?q=Pasquale+...,"University of Bari ""Aldo Moro"", Italy",http://www.di.uniba.it/~swap/index.php?n=Membr...
2,3,,A.,Yagci,https://dblp.org/pid/16/7955,homepages/16/7955,https://scholar.google.com/scholar?q=A.+Murat+...,,


In [6]:
#Author Data
print(df_all_authors.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1931 entries, 0 to 1930
Data columns (total 9 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   author_id            1931 non-null   int64 
 1   author_orchid_id     1931 non-null   object
 2   author_fname         1931 non-null   object
 3   author_lname         1931 non-null   object
 4   author_dblp_url      1931 non-null   object
 5   author_dblp_key      1931 non-null   object
 6   author_gscholar_url  1931 non-null   object
 7   author_affiliation   379 non-null    object
 8   author_page          1931 non-null   object
dtypes: int64(1), object(8)
memory usage: 135.9+ KB
None


In [7]:
#All Publications Data
print(df_all_publications.columns)
print(df_all_publications.shape)
df_all_publications.head(5)

Index(['paper_id', 'paper_dblp_key', 'paper_title', 'paper_filtered_title',
       'paper_published_date', 'paper_published_conference',
       'paper_gscholar_url', 'paper_dblp_url', 'paper_abstract'],
      dtype='object')
(35473, 9)


Unnamed: 0,paper_id,paper_dblp_key,paper_title,paper_filtered_title,paper_published_date,paper_published_conference,paper_gscholar_url,paper_dblp_url,paper_abstract
0,1,conf/recsys/LiuTLYGHZ18,Field-aware probabilistic embedding neural net...,field aware probabilistic embedding neural net...,2018,RecSys,https://scholar.google.com/scholar?q=Field-awa...,https://dblp.org/rec/bibtex/conf/recsys/LiuTLY...,"[For Click-Through Rate (CTR) prediction, Fiel..."
1,2,journals/ijmms/MustoNLGS19,Linked open data-based explanations for transp...,linked open data based explanations for transp...,2019,Int. J. Hum.-Comput. Stud.,https://scholar.google.com/scholar?q=Linked+op...,https://dblp.org/rec/bibtex/journals/ijmms/Mus...,[In this article we propose a framework that g...
2,5,conf/iir/AnelliNSLT18,Moving from Item Rating to Features Relevance ...,moving from item rating to features relevance ...,2018,IIR,https://scholar.google.com/scholar?q=Moving+fr...,https://dblp.org/rec/bibtex/conf/iir/AnelliNSLT18,[Although very effective in computing accurate...
3,6,conf/recsys/NarducciBIGLS18,A Domain-independent Framework for building Co...,a domain independent framework for building co...,2018,KaRS@RecSys,https://scholar.google.com/scholar?q=A+Domain-...,https://dblp.org/rec/bibtex/conf/recsys/Narduc...,[Conversational Recommender Systems (CoRSs) im...
4,7,conf/recsys/BrusilovskyGFLO18,Recsys'18 joint workshop on interfaces and hum...,recsys 18 joint workshop on interfaces and hum...,2018,RecSys,https://scholar.google.com/scholar?q=Recsys%27...,https://dblp.org/rec/bibtex/conf/recsys/Brusil...,"[As intelligent interactive systems, recommend..."


In [8]:
#All Publications Data
print(df_all_publications.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 35473 entries, 0 to 35472
Data columns (total 9 columns):
 #   Column                      Non-Null Count  Dtype 
---  ------                      --------------  ----- 
 0   paper_id                    35473 non-null  int64 
 1   paper_dblp_key              35473 non-null  object
 2   paper_title                 35473 non-null  object
 3   paper_filtered_title        35473 non-null  object
 4   paper_published_date        35473 non-null  int64 
 5   paper_published_conference  35473 non-null  object
 6   paper_gscholar_url          35473 non-null  object
 7   paper_dblp_url              35473 non-null  object
 8   paper_abstract              32944 non-null  object
dtypes: int64(2), object(7)
memory usage: 2.4+ MB
None


In [9]:
#Selected Author Publications Data
print(df_selected_author_publications.columns)
print(df_selected_author_publications.shape)
df_selected_author_publications.head(5)

Index(['author', 'publication'], dtype='object')
(17637, 2)


Unnamed: 0,author,publication
0,4,173
1,4,174
2,4,175
3,4,176
4,4,177


In [10]:
#Selected Author Publications Data
print(df_selected_author_publications.info())

#There are 547 authors in this dataset.
print(len(df_selected_author_publications['author'].unique()))

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 17637 entries, 0 to 17636
Data columns (total 2 columns):
 #   Column       Non-Null Count  Dtype
---  ------       --------------  -----
 0   author       17637 non-null  int64
 1   publication  17637 non-null  int64
dtypes: int64(2)
memory usage: 275.7 KB
None
547


In [11]:
#Selected Publication Citations Data
print(df_selected_publication_citations.columns)
print(df_selected_publication_citations.shape)
df_selected_publication_citations.head(5)

# Ex 0: Paper #1 quoted paper 5835. Citation title etc information belongs to paper number 5835.

Index(['citing_paper_id', 'cited_paper_id', 'citation_title',
       'citation_filtered_title', 'citation_sentence_before',
       'citation_sentence_cited', 'citation_sentence_after',
       'citation_position (section)'],
      dtype='object')
(20652, 8)


Unnamed: 0,citing_paper_id,cited_paper_id,citation_title,citation_filtered_title,citation_sentence_before,citation_sentence_cited,citation_sentence_after,citation_position (section)
0,1,5835,Deep neural networks for youtube recommendations,deep neural networks for youtube recommendations,though shallow models are generally easy to im...,recently due to the powerful ability of featur...,deep models however are biased to high order i...,Introduction
1,1,5835,Deep neural networks for youtube recommendations,deep neural networks for youtube recommendations,though shallow models are generally easy to im...,recently due to the powerful ability of featur...,deep models however are biased to high order i...,Introduction
2,1,5835,Deep neural networks for youtube recommendations,deep neural networks for youtube recommendations,though shallow models are generally easy to im...,recently due to the powerful ability of featur...,deep models however are biased to high order i...,Introduction
3,2,12097,The effects of transparency on trust in and ac...,the effects of transparency on trust in and ac...,in the system provided a tag based explanation...,another content based explanation approach is ...,the authors establi shed that explaining why a...,Related_Work
4,2,42695,Explaining collaborative filtering recommendat...,explaining collaborative filtering recommendat...,the importance of providing information system...,however the first attempt towards the exploita...,more recently on different explanation goals,Related_Work


In [12]:
#Selected Publication Citations Data
print(df_selected_publication_citations.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20652 entries, 0 to 20651
Data columns (total 8 columns):
 #   Column                       Non-Null Count  Dtype 
---  ------                       --------------  ----- 
 0   citing_paper_id              20652 non-null  int64 
 1   cited_paper_id               20652 non-null  int64 
 2   citation_title               20652 non-null  object
 3   citation_filtered_title      20652 non-null  object
 4   citation_sentence_before     15147 non-null  object
 5   citation_sentence_cited      20617 non-null  object
 6   citation_sentence_after      15793 non-null  object
 7   citation_position (section)  20652 non-null  object
dtypes: int64(2), object(6)
memory usage: 1.3+ MB
None


## Data Preprocessing

In this section, the operations that may be required for preprocessing the data are shared with an example. These are just a few basics, different preprocessing need to be added if needed.

1.   **Duplicate samples:**
2.   **Non-informative Features:** 
3.   **NA (Not Available) Values:**


--------------------------------

1.   **Duplicate samples:**

In [13]:
print(df_selected_publication_citations.shape)
df_selected_publication_citations.head(5)
#It seems 0,1 & 2 are duplicated rows.

(20652, 8)


Unnamed: 0,citing_paper_id,cited_paper_id,citation_title,citation_filtered_title,citation_sentence_before,citation_sentence_cited,citation_sentence_after,citation_position (section)
0,1,5835,Deep neural networks for youtube recommendations,deep neural networks for youtube recommendations,though shallow models are generally easy to im...,recently due to the powerful ability of featur...,deep models however are biased to high order i...,Introduction
1,1,5835,Deep neural networks for youtube recommendations,deep neural networks for youtube recommendations,though shallow models are generally easy to im...,recently due to the powerful ability of featur...,deep models however are biased to high order i...,Introduction
2,1,5835,Deep neural networks for youtube recommendations,deep neural networks for youtube recommendations,though shallow models are generally easy to im...,recently due to the powerful ability of featur...,deep models however are biased to high order i...,Introduction
3,2,12097,The effects of transparency on trust in and ac...,the effects of transparency on trust in and ac...,in the system provided a tag based explanation...,another content based explanation approach is ...,the authors establi shed that explaining why a...,Related_Work
4,2,42695,Explaining collaborative filtering recommendat...,explaining collaborative filtering recommendat...,the importance of providing information system...,however the first attempt towards the exploita...,more recently on different explanation goals,Related_Work


In [14]:
#Drop duplicated rows
df_selected_publication_citations.drop_duplicates(inplace=True)

print(df_selected_publication_citations.shape)
df_selected_publication_citations.head(5)

(14287, 8)


Unnamed: 0,citing_paper_id,cited_paper_id,citation_title,citation_filtered_title,citation_sentence_before,citation_sentence_cited,citation_sentence_after,citation_position (section)
0,1,5835,Deep neural networks for youtube recommendations,deep neural networks for youtube recommendations,though shallow models are generally easy to im...,recently due to the powerful ability of featur...,deep models however are biased to high order i...,Introduction
3,2,12097,The effects of transparency on trust in and ac...,the effects of transparency on trust in and ac...,in the system provided a tag based explanation...,another content based explanation approach is ...,the authors establi shed that explaining why a...,Related_Work
4,2,42695,Explaining collaborative filtering recommendat...,explaining collaborative filtering recommendat...,the importance of providing information system...,however the first attempt towards the exploita...,more recently on different explanation goals,Related_Work
5,2,42695,Explaining collaborative filtering recommendat...,explaining collaborative filtering recommendat...,however the first attempt towards the exploita...,more recently on different explanation goals,these explanation goals are inspired by the wo...,Related_Work
6,2,42695,Explaining collaborative filtering recommendat...,explaining collaborative filtering recommendat...,indeed most of the works presented in the lite...,as an example in information about the neighbo...,in that direction in the information retrieval...,Related_Work


2.   **Non-informative Features:** 


In [15]:
#For this dataset, information such as the name of the author and orcid_id should be dropped, as it will not help the prediction models.
print(df_all_authors.shape)
df_all_authors.head(3)

(1931, 9)


Unnamed: 0,author_id,author_orchid_id,author_fname,author_lname,author_dblp_url,author_dblp_key,author_gscholar_url,author_affiliation,author_page
0,1,,Jinkai,Yu,https://dblp.org/pid/227/0764,homepages/227/0764,https://scholar.google.com/scholar?q=Jinkai+Yu,,
1,2,0000-0002-6866-9451,Pasquale,Lops,https://dblp.org/pers/hb/l/Lops:Pasquale.html,homepages/78/5518,https://scholar.google.com/scholar?q=Pasquale+...,"University of Bari ""Aldo Moro"", Italy",http://www.di.uniba.it/~swap/index.php?n=Membr...
2,3,,A.,Yagci,https://dblp.org/pid/16/7955,homepages/16/7955,https://scholar.google.com/scholar?q=A.+Murat+...,,


In [16]:
drop_list= ['author_orchid_id', 'author_fname', 'author_lname']
df_all_authors.drop(drop_list, axis=1, inplace=True)

print(df_all_authors.shape)
df_all_authors.head(3)

(1931, 6)


Unnamed: 0,author_id,author_dblp_url,author_dblp_key,author_gscholar_url,author_affiliation,author_page
0,1,https://dblp.org/pid/227/0764,homepages/227/0764,https://scholar.google.com/scholar?q=Jinkai+Yu,,
1,2,https://dblp.org/pers/hb/l/Lops:Pasquale.html,homepages/78/5518,https://scholar.google.com/scholar?q=Pasquale+...,"University of Bari ""Aldo Moro"", Italy",http://www.di.uniba.it/~swap/index.php?n=Membr...
2,3,https://dblp.org/pid/16/7955,homepages/16/7955,https://scholar.google.com/scholar?q=A.+Murat+...,,


3.   **NA (Not Available) Values:**


In [17]:
print(df_all_authors.shape)
df_all_authors.isna().sum()

#In 1931 samples, all values are available for all other features, while 1552 (quite a lot) are NA for 'author_affiliation'. 
#If only this data is to be used, this feature should be removed from the dataset.
#If a new dataset is created by establishing a relationship between different datasets, it should be kept in the first stage; then NA conditions should be checked again.

(1931, 6)


author_id                 0
author_dblp_url           0
author_dblp_key           0
author_gscholar_url       0
author_affiliation     1552
author_page               0
dtype: int64

In [18]:
#uncomment, if you want to use
#df_all_authors.drop(['author_affiliation'], axis=1, inplace=True)

## Handling Text Data



1. **SBERT - Sentence Trasformers:** 

 [https://huggingface.co/sentence-transformers](https://huggingface.co/sentence-transformers)


2.   **Universal Sentence Encoder**

 [https://www.tensorflow.org/hub/tutorials/semantic_similarity_with_tf_hub_universal_encoder](https://www.tensorflow.org/hub/tutorials/semantic_similarity_with_tf_hub_universal_encoder)



In [19]:
df_all_publications['paper_abstract'] = df_all_publications['paper_abstract'].astype(str).map(lambda x: x.lstrip('[').rstrip(']'))
paper_abstract = df_all_publications['paper_abstract'].values.tolist()

#uncomment, if you want to have a look to abstracts of the papers
#paper_abstract

--------------------------------

1. **SBERT - Sentence Trasformers:** 
* For pre-trained models, please check link below:
[https://www.sbert.net/docs/pretrained_models.html](https://www.sbert.net/docs/pretrained_models.html)

In [20]:
!pip install sentence-transformers

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [21]:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-MiniLM-L6-v2')


#For other pre-trained models, please check link below:
#https://www.sbert.net/docs/pretrained_models.html

In [22]:
sentence_embeddings = model.encode(paper_abstract[0:10])

In [23]:
#Embedding Dimension: 384
#Sample size: 10
sentence_embeddings.shape

(10, 384)

2.   **Universal Sentence Encoder**

In [24]:
import tensorflow_hub as hub

In [25]:
embed = hub.load("https://tfhub.dev/google/universal-sentence-encoder/4")

In [26]:
#An example
embeddings = embed([
    "The quick brown fox jumps over the lazy dog.",
    "I am a sentence for which I would like to get its embedding",
        "The quick brown fox jumps over the lazy dog.",
            "The quick brown fox jumps over the lazy dog."])

print(embeddings)

tf.Tensor(
[[-0.03133019 -0.06338634 -0.016075   ... -0.03242778 -0.0457574
   0.05370456]
 [ 0.05080859 -0.01652432  0.01573779 ...  0.00976659  0.0317012
   0.01788116]
 [-0.03133017 -0.06338633 -0.01607503 ... -0.03242779 -0.04575738
   0.05370456]
 [-0.03133017 -0.06338633 -0.01607501 ... -0.03242779 -0.04575739
   0.05370457]], shape=(4, 512), dtype=float32)


In [27]:
embeddings = embed(paper_abstract[0:10])

In [28]:
#Embedding Dimension: 512
#Sample size: 10
embeddings.shape

TensorShape([10, 512])

## Clustering of Papers by subject

In this section, you are expected to find a clustering model that will place papers on similar topics in the same cluster using the contents of the papers.You may use the paper abstracts in the dataset for this. You can also obtain additional information about any paper from the shared url's. How you create the dataset is up to you. Note that you need to select features that will aid clustering by topics.

An example clustering study with SBERT: https://www.sbert.net/examples/applications/clustering/README.html


Things to do:

* Create the dataset to be used in clustering by making use of the shared datasets in this folder.

* Split the dataset as training data & test data (general approach is to separate it as 70% - 30%). **In all clustering methods same training - test datasets must be used.**

* 4 clustering methods are given in the following cells. You need to make the necessary coding right after the cell reserved for the method. You need to train the prediction model on the training data and share the prediction results you obtained on the test data with the visualizations.
 
  Clustering Methods: https://scikit-learn.org/stable/modules/clustering.html


* For each clustering method, you are expected to compare the performance of 3 different text-to-vec pre-trained models. For this, you must choose 2 of the clustering performance evaluation metrics.

  Pre-trained text-to-vec models: [https://www.sbert.net/docs/pretrained_models.html](https://www.sbert.net/docs/pretrained_models.html)

  Clustering Performance Evaluation Metrics: [https://scikit-learn.org/stable/modules/clustering.html#clustering-performance-evaluation](https://scikit-learn.org/stable/modules/clustering.html#clustering-performance-evaluation)

* To optimize the each clustering method, try at least 3 different values for a hyperparameter of this method.


# Dataset Creation


**Hint:** 

*   With 3 different pre-trained text-to-vec models, 3 different datasets are created.

* Do not repeatedly create these three datasets under each clustering model. Do this at once and save it to a file. In the code cell allocated for each clustering model, first read the data you saved from the file, then complete the other necessary operations.

* Consider also the runtime of pre-trained text-to-vec models and Google Colab resource constraints. With the file you created, it may not be possible to get results from the pre-trained model at once. In this case, divide the dataset into subsets, present them as separate inputs to the pre-trained text-to-vec model and combine the results.

* You may complete the data creation phase with pre-trained text-to-vec models on your own computer or on a different additional hardware source. However, you need to share the created datasets via Google Drive and write the code you have executed in the relevant cell below.


In [29]:
#read data from .tsv files
df_all_authors = pd.read_csv("all_authors.tsv", sep='\t', encoding= 'ISO-8859-1')
df_all_publications = pd.read_csv("all_publications.tsv", sep='\t')
df_selected_author_publications = pd.read_csv("selected_author_publications_information.tsv", sep='\t')
df_selected_publication_citations = pd.read_csv("selected_publication_citations_information.tsv", sep='\t')

In [30]:
#Code Below
#You can add as many code cells as you want.

##Clustering
### K-means
 
 Compare 3 different pre-trained text-to-vec models.

 Use 2 performance evaluation metrics and virtualize the results.

 To optimize the method, try at least 3 different values for a hyperparameter of this method.

In [31]:
#Code Below
#You can add as many code cells as you want.

### Spectral Clustering
 
 Compare 3 different pre-trained text-to-vec models.

 Use 2 performance evaluation metrics and virtualize the results.

 To optimize the method, try at least 3 different values for a hyperparameter of this method.

In [32]:
#Code Below
#You can add as many code cells as you want.

### DBSCAN
 
 Compare 3 different pre-trained text-to-vec models.

 Use 2 performance evaluation metrics and virtualize the results.

 To optimize the method, try at least 3 different values for a hyperparameter of this method.

In [33]:
#Code Below
#You can add as many code cells as you want.

### Gaussian Mixture
 
 Compare 3 different pre-trained text-to-vec models.

 Use 2 performance evaluation metrics and virtualize the results.

 To optimize the method, try at least 3 different values for a hyperparameter of this method.

In [34]:
#Code Below
#You can add as many code cells as you want.

### The most accurate clustering model you recommend

Present the operations for the selected clustering method and the text-to-vec pre-trained models pair again below.

In [35]:
#Code Below
#You can add as many code cells as you want.

## Clustering Analysis

**Only for the most accurate clustering model you recommend**,

1.   How many clusters did the model you propose create?
2.   What is the number of papers in each cluster?
3.   What is the unique number of authors in each cluster?
4.   What are the top 3 most frequently used words (the words that describe the cluster) in each cluster?
        
        Ex: https://www.sbert.net/examples/applications/clustering/README.html


In [36]:
#Code Below
#You can add as many code cells as you want.

## Clustering of Authors

Observe the method you recommended in a different area. Cluster authors by research area.
How you create the dataset is up to you. Note that you need to select features that will aid clustering by research area.

In [37]:
#Code Below
#You can add as many code cells as you want.

#Students who want to get extra points can also do additional work on the following research areas. This part is not mandatory.

## Semantic Search
 Selecting and presenting the papers closest to the entered text.


*   https://www.sbert.net/examples/applications/semantic-search/README.html
*   https://huggingface.co/course/chapter5/6?fw=pt





In [38]:
#This part is not mandatory
#Code Below
#You can add as many code cells as you want.

## Academic Ranking of Authors


In [39]:
#This part is not mandatory
#Code Below
#You can add as many code cells as you want.