# A History of NLP

The history of NLP can be broken down in many different ways. We will be taking a look at the long-term history and the *approach* that researchers have taken through the decades. There are three *eras* of NLP that we can consider, which correlate with the trends in Machine Learning as a whole.

## Symbolic AI

The first era of NLP and AI was dominated with **symbolic** AI. This era covered the period from the mid 1950s to the late 1980s. It consists of a collection of methods that view AI as being solvable using *symbolic* representations of problems. For example, researches would manually label and categorize every scenario that an AI may encounter, and write a set of logic/rule-based instructions on how to deal with each scenerio.

Applied to NLP, we would write a fixed set of rules. So for a sentiment classification task, our rules may look something like:

```
IF 'happy' IN SENTENCE
SENTIMENT IS POSITIVE

IF 'sad' IN SENTENCE
SENTIMENT IS NEGATIVE
```

These sets of rules would ofcourse be much more complex, a researcher may add `IF 'happy' IN SENTENCE AND 'not' BEFORE 'happy'; SENTIMENT IS NEGATIVE` as a simple toy example.

The **benefit** of this is interpretability, all of the rules are written by humans, and are human readable, so we can understand what is happening.

However, there are many **drawbacks** to this approach. Eeven a simple symbolic representation of language is incredibly complex, the researcher designin such as system must be an expert in so many different areas of language, and even if they did understand all there is to know about language (which as far as I am aware, is not possible), there will always be strange nuances (for example context) which are so complex that no reasonable person could ever expect to express them in a logical, symbolic representation. Encoding every possible scenerio is simply not feasible, there are many edge-cases, maybe a AI has excellent conversational skills with English speakers, but if an English speaker from some isolated Welsh village, with a unique set of slang words, attempts to converse with the AI - it will fail because the symbolic representation of language simply is not flexible to function even with one unknown *'symbol'*.

Despite these drawbacks, we still use symbolic methods in present day NLP. When we tokenize words we are creating *symbolic representations* of those words, although these methods are significantly less manual, and form one part of a solution rather than the full thing.

[Symbolic artificial intelligence, Wikipedia](https://en.wikipedia.org/wiki/Symbolic_artificial_intelligence)

[Physical symbol system, Wikipedia](https://en.wikipedia.org/wiki/Physical_symbol_system)

## Statistical AI

Statistical AI dominated from the 1980s until ~2010. During this time we relied very heavily on more statistical machine learning methods such as logistic regression, Naive Bayes classification and so on. These methods became dominant with the increase of computational power which allowed these models to be tested and tuned quicker, and on larger (but still limited) amounts of data.

The **benefits** of these statistical approaches was primarily a greater ability to generalize and deal with outliers (to an extent), while also requiring less domain-specific understanding from researchers. The **drawbacks** were the limitations on these benefits, yes the models could generalize better and handle outliers better, but they were still significantly limited in this regard. These models stuggle to adapt to other, even if only slightly different use-cases, and so they were siloed into very specific use-cases.

Today, we do still use many of these methods, although many of their original applications have been superceeded by the greater performance of neural nets, nonetheless we wills till find them used in some places. They are often used as part of a larger model/process, and much of the knowledge and methods discovered and used during this statistical age led directly into the current age.

[Statistical learning theory, Wikipedia](https://en.wikipedia.org/wiki/Statistical_learning_theory)

## Neural AI

Neural AI exploded from 2010 onwards with the 'rediscovery' of the neural network. By 2010 the computational power and high availability of data led to the perfect conditions for neural nets to take centre stage. Neural networks require significant amount of computational power, and are incredibly data-hungry, pre-2010 there was simply not enough compute power, and very few 'big data' databases in the world. Researchers found that multilayer neural networks (deep learning) provided massively improved performance when compared to the previous cutting-edge statistical models, and the same models could be applied to a huge range of use-cases with ease.

**Benefits** of the new neural age of AI are incredibly diverse. We now have models that are reasonably adaptable, they can deal with outliers, they're incredibly accurate, to apply them to real problems is becoming easier and easier everyday. The **drawbacks** of neural are perhaps harder to identify due to a lack of hindsight, but we can say that despite being more adaptable than symbolic or statistical methods, the neural approach is still fundamentally brittle - many of the models that we see playing games and beating human scores will break if the screen is rotated by a few degrees, or if you ask GPT-3 *"How many eyes does my foot have?*, it will happily answer *"Your foot has two eyes."* [source](https://lacker.io/ai/2020/07/06/giving-gpt-3-a-turing-test.html).

Another drawback which gains a lot of attention at the moment is **interpretability**, neural models have become so complex and evolve in such a way that the people that build them often can't explain certain behaviors of the models, which raises concerns in how we can trust these models with important tasks, like self-driving, screening resumés for hiring (and not being racist or [sexist](https://www.reuters.com/article/us-amazon-com-jobs-automation-insight-idUSKCN1MK08G)), or making financial decisions in the stock markets (see [2010 flash crash](https://en.wikipedia.org/wiki/2010_flash_crash) and [2017 Ethereum flash crash](https://www.cnbc.com/2017/06/22/ethereum-price-crash-10-cents-gdax-exchange-after-multimillion-dollar-trade.html)).