<h1><center>Charting AI/ML Growth Across the World</center></h1>
<h4><center>
<a style="text-decoration:none" href="https://author1.github.io/">Mahika Jaguste</a>, IIT Gandhinagar, <a style="text-decoration:none" href="mailto:<mahika.oj>@iitgn.ac.in">mahika.oj@iitgn.ac.in</a>
<br><br>
<a style="text-decoration:none" href="https://nipun0307.github.io/">Nipun Mahajan</a>, IIT Gandhinagar, <a style="text-decoration:none" href="mailto:<mahajan.n>@iitgn.ac.in">mahajan.n@iitgn.ac.in</a>
<br><br>
<a style="text-decoration:none" href="https://author2.github.io/">Shrreya Singh</a>, IIT Gandhinagar, <a style="text-decoration:none" href="mailto:<singh.shrreya>@iitgn.ac.in">singh.shrreya@iitgn.ac.in</a>
</center></h4>

[![Repo](https://img.shields.io/badge/GitHub-<mahika_shrreya_nipun>-brightgreen)](https://github.com/orgs/CS328-Spring-2022/teams/mahika_nipun_shrreya)

### Charting The Growth of Papers by Top 10 Institutes

The growth of Machine Learning, Deep Learningb and related learning module techniques, including artificial intelligence can be estimated using the number of research papers being published in a particular arena. The hypothesis is supported by the proposition that the countries or institutes which are leading the evolution of these learning techniques, invest huge amount of capital in research, devbelopment and marketing of such tools that accelerate the growth of Machine learning in other parts of the world.

'Non-satisfaction' is a trait of human being which in some cases motivates the later to thrive for a better way of living. With the incorporation of Machine Learning along with Artifical Intelligence in our daily lives and the burst in number of learning techniques or methods in recent years, it is important to factor out the institutions or organisations responsible for such trends.

The number of research papers affiliated to an institute does not necessarily associate with the contribution to better algorithms in the world of machine-based learning. However, it indeed influences the pace of research and influence of such techniques. Analysis was done on the dataset provided by `GitHub` referenced: [here](https://github.com/martenlienen/icml-nips-iclr-dataset). 

The dataset contains all paper titles, authors and their affiliations from three major conferences **ICML**, **NeurIPS**, **ICLR** over the years:
- ICML: 2017-2020
- NeurIPS: 2006-2020
- ICLR: 2018-2021 (except 2020)

In [3]:
# importing libraries
import pandas as pd
import numpy as np
import plotly.express as px

In [4]:
papers_df = pd.read_csv('data/new_papers.csv')
papers_df.head()

Unnamed: 0,Conference,Year,Title,Author,Affiliation
0,NeurIPS,2006,Attentional Processing on a Spike-Based VLSI N...,Yingxue Wang,"Swiss Federal Institute of Technology, Zurich"
1,NeurIPS,2006,Attentional Processing on a Spike-Based VLSI N...,Rodney J Douglas,Institute of Neuroinformatics
2,NeurIPS,2006,Attentional Processing on a Spike-Based VLSI N...,Shih-Chii Liu,"Institute for Neuroinformatics, University of ..."
3,NeurIPS,2006,Multi-Task Feature Learning,Andreas Argyriou,Ecole Centrale de Paris
4,NeurIPS,2006,Multi-Task Feature Learning,Theos Evgeniou,INSEAD


#### Visualising the Data Points

In [5]:
# Visualising the data

fig = px.scatter_3d(papers_df.sample(n=100), x='Year', y='Conference', z='Author',
              color='Affiliation', template="plotly_dark")
# fig.update_layout(margin={"r":0,"t":10,"l":00,"b":100})
fig.show()

In [6]:
# get the list of the top 10 universities till now
dict_ = {}
for insti in papers_df['Affiliation']:
    insti = str(insti)
    if insti == 'None':
        continue
    if insti not in dict_.keys():
        dict_[insti] = (papers_df.Affiliation == insti).sum()

sorted_by_value=(sorted(dict_.items(), key=lambda item: item[1], reverse=True))
top_10_institutes = []
for tuple in sorted_by_value[:10]:
    top_10_institutes.append(tuple[0])

In [7]:
# get a dictionary which outputs the list of number of papers published by top 10 universities in a year x:
y_paper_count_top_10 = {}

for year in range(2006,2022, 1):
    y_paper_count_top_10[year] = []
    temp_df = papers_df[papers_df.Year == year]
    for insti in top_10_institutes:
        count = (temp_df.Affiliation==insti).sum()
        y_paper_count_top_10[year].append(count)


In [9]:
# Now we need to plot the growth on a bar chart for the institutes
# we will have 8 points : 2006, 2008, 2010, 2012, 2014, 2016, 2018, 2020, 2021

import plotly.graph_objects as go

# Create figure
fig = go.Figure()

# Add traces, one for each slider step
for step in np.arange(2006, 2021, 1):
    fig.add_trace(
        go.Bar(
            visible=False,
            # line=dict(color="#00CED1", width=6),
            # name="ùúà = " + str(step),
            x = top_10_institutes,
            y = y_paper_count_top_10[step],
            name = "Year="+str(step),
            
            # color = 'rgb(255,0,0)',
           ))

# Make 10th trace visible
fig.data[0].visible = True

# Create and add slider
steps = []
for i in range(len(fig.data)):
    step = dict(
        method="update",
        args=[{"visible": [False] * len(fig.data)},
              {"title": "Slider switched to Year: " + str(2006+i)}],  # layout attribute
        label = str(2006+i*1),
    )
    step["args"][0]["visible"][i] = True  # Toggle i'th trace to "visible"
    steps.append(step)

sliders = [dict(
    active=0,
    currentvalue={"prefix": "Trend For Year: ", },
    pad={"t": 80},
    steps=steps
)]

fig.update_layout(
    sliders=sliders
)

fig.show()

The above interactive slider displays the contribution of top 10 organisations in terms of number of research papers published (centered around machine-based learning) in a certain year. 

The notion of 'top organisations' in our analysis is subject to the cummulative number of papers published over the years. For our dataset available till only the year 2021, we have retrieved the top 10 organisations based on the number of papers published in the duration: 2006-2021. However, due to methodological constraints, the data corresponding to the year 2021 is not complete. 

We analyse the number of research papers published to abstract the contribution of certain organisations in the mentioned field. Charting such a trend helps to analyse the behaviour of the current top 10 institutes over the years. For example, **DeepMind**, a current top institute in the research field of Deep Learning had no publications in thye year `2006`. However, as the demand and attention towards Deep Learning increased after 2014, the number of publications sore to `301` in the year `2019`. This paradigm shift was a response to the growing demand for faster algorithms for deep learning and related techniques after the year 2014. 

Interestingly, the year `2020` witnessed the highest number of cumumulative research papers across the top 10 affiliated institutes. The trend explains the boost in current demand of the technology as every manufacturing sector depends on machine-based learning to increase responsiveness. 

The growth this 'growth' trend spoken-of numerous times is evident as the average number of publications increases tremedously from `25` in the year `2014` to about `290` in the year `2020`.