# <Center> Bokeh Plot to Visualize the History of Natural Language Processing (NLP) </center>

#### I'll step through the process of creating an interactive Bokeh plot to visualize the history of Natural Language Processing (NLP). It will represent significant events in the history of NLP on a timeline providing a clear and informative visualization of how the field has evolved over time.

## <center> Step 1: Importing Necessary Libraries </center>
#### First, i'll be import the required Python libraries to work with data and create the Bokeh plot. These libraries include NumPy, Pandas and Bokeh.

In [1]:
import numpy as np
import pandas as pd
from bokeh.plotting import figure, show, output_notebook
from bokeh.models import ColumnDataSource, HoverTool
from bokeh.palettes import Category20c
from bokeh.transform import linear_cmap


## <center> Step 2: Defining the Data </center>

#### We define the data representing significant events in the history of NLP. The data includes the year of each event and a description of the event.>

In [2]:
data = {
    'Year': [1949, 1955, 1957, 1966, 1971, 1972, 1974, 1979, 1980, 1987, 1993, 1994, 2003, 2008, 2013, 2013, 2014, 2015, 2019, 2022],
    'EventDescription': [
        "Weaver’s Memorandum and the Beginning of Machine Translation (MT)",
        "Georgetown Experiment: IBM and Russian-English Machine Translation",
        "Introduction of Generative Grammar (Chomsky)",
        "Challenges and Slow Progress in Machine Translation",
        "ALPAC Report: MT Not Yet Achievable",
        "Syntactic Theory Development",
        "Birth of Prototype Systems: ELIZA, SHRDLU, LUNAR, PARRY",
        "Conceptual Ontologies: MARGIE, TaleSpin, QUALM, SAM, PAM, Politics",
        "Symbolic Approaches in NLP (1980s)",
        "Statistical Models Revolution: Bahl, Brill, Chitrao, Brown",
        "Introduction of Decision Trees in NLP",
        "Shift to Machine Learning Algorithms",
        "First Neural Language Model (Bengio, 2003)",
        "Multi-Task Learning with CNN (Collobert and Weston, 2008)",
        "Introduction of Word2Vec (Mikolov et al., 2013)",
        "Adoption of Neural Networks in NLP (2013)",
        "Sequence-to-Sequence Learning (Sutskever et al., 2014)",
        "Principle of Attention Introduced (Bahdanau et al., 2015)",
        "Rise of Large Pretrained Language Models (2018-2019)",
        "Efficient Learning with Pretrained Language Models 2022 & No Code",
    ]
}

## <center> Step 3: Data Preprocessing </center>

#### To ensure that the data for years and event descriptions align correctly, we perform data preprocessing by trimming the lists to the same length.

In [3]:
min_length = min(len(data['Year']), len(data['EventDescription']))
data['Year'] = data['Year'][:min_length]
data['EventDescription'] = data['EventDescription'][:min_length]


## <center> Step 4: Creating a DataFrame </center>
#### We create a Pandas DataFrame from the preprocessed data, which will be used as the data source for our Bokeh plot.


In [4]:
nlp_events_df = pd.DataFrame(data)


## <center> Step 5: Initializing Bokeh </center>

#### To create an interactive Bokeh plot in a Jupyter Notebook, we need to initialize Bokeh with the following line:

In [5]:
output_notebook()

## <center> Step 6: Creating ColumnDataSource and HoverTool </center>

#### We create a `ColumnDataSource` to link our DataFrame to the Bokeh plot, and we also set up a `HoverTool` to display tooltips when hovering over data points.



In [6]:
source = ColumnDataSource(nlp_events_df)

hover = HoverTool(
    tooltips=[("Year", "@Year"), ("Event", "@EventDescription")]
)

## <center> Step 7: Customizing Colors </center>

#### We customize the colors used for the events on the plot to make them visually distinct. We use the `Category20c` palette and create a custom colormap.

In [7]:
event_descriptions = nlp_events_df['EventDescription'].unique()
colors = Category20c[len(event_descriptions)]
mapper = linear_cmap(field_name='EventDescription', palette=colors, low=0, high=len(event_descriptions))


## <center> Step 8: Creating the Bokeh Plot </center>

#### Now, we create the actual Bokeh plot, specifying its dimensions and title.

In [8]:
p = figure(width=1000, height=550, title="History of NLP")

## <center> Step 9: Adding Data Points </center>

We add circular data points to represent each event on the timeline. We customize their appearance using the previously defined colormap and set the alpha value for transparency.

In [9]:
p.circle(
    x="Year", y=0, size=20, source=source, legend_field="EventDescription",
    fill_color=mapper,  # Use the custom colormap
    line_color="white", line_width=2,
    fill_alpha=0.8  # Set the alpha value
)

## <center> Step 10: Axis Labels and Visibility </center>
#### We set labels for the X-axis and hide the Y-axis for a cleaner appearance.

In [10]:
p.xaxis.axis_label = "Year"
p.yaxis.visible = False


## <center> Step 11: Adding Hover Tool and Legend </center>

#### We add the hover tool we defined earlier to enable tooltips when hovering over data points. Additionally, we set the title for the legend.


In [11]:
p.add_tools(hover)
p.legend.title = "Events"


## Step 12: Displaying the Bokeh Plot
#### Finally, we use `show(p)` to display the interactive Bokeh plot in the Jupyter Notebook.

In [12]:
show(p)