# Background

## Basic "epistemic" situation

Let us presuppose the situation that an "agent" (epistemic system) is located in an environment with its own dynamics together with other agents and tries to reach its goals via perceiving its environment (and the other agents actions in it) and taking actions (subset of which is communicative action towards other agents).

See the definition of Wikipedia on ["intelligent agent"](https://en.wikipedia.org/wiki/Intelligent_agent), originally from Russell & Norvig (2003, chpt. 2).

We trace the notion of intelligence back to the this efficent goal seeking behavior, which in turn presupposes getting to know (learn) and apply invariances (knowledge) which has predictive power about the environment (and the agent itself).

It is important to note, that for efficient coordination of actions quality **communication** is necessary among the agents, which in turn presupposes the presence of some **common representations among the agents** about the world and themselves.


Following steps are needed for this:  

<img src="http://drive.google.com/uc?export=view&id=1S7ZapG1vzlbdTB06G-I1ln5E-G66k3N7" width=60%>

Agent having a goal and being in an environment does:
- Perception <-> Pattern recognition
- Learning
- Knowledge retrieval
- Reasoning
- Decision
- Action / **Communication**

<img src="http://drive.google.com/uc?export=view&id=13sLAx8ml5CD9XLqhZ8QxivCv1mxkA5B3" width=50%>

There are two important "aspects" of the agent's inner intelligence, which are in essence inseparable, but can be examined or "emphasized" in themselves:

* **"Learning"** and
* **"Knowledge"** (tied to **"Reasoning"**, which is utilization of knowledge and perceptual information for decision)

<img src="http://drive.google.com/uc?export=view&id=1Ba1qAlEH9D0CAdimkQmJ-ZazuBrUdk2v" width=50%>


The different emphasis on knowledge and learning represent the two major "schools", and two epochs of (cognitive science and) AI research:
- The "symbolic" or "knowledge based" "style", which (amongst other things) had it's roots in classical logical reasoning as well as ontological knowledge representation, and was influential in the creation of the World Wide Web. ("knowledge bases", "expert systems", "ontologies", "reasoning engines" are all the product of this approach)
- The "learning" based style, which can be considered as being in strong connection with statistics and can be termed "machine learning".

This **dual approach** of symbol and rule based and distributed / probabilistic concepts and techniques will be a **core motif in Natural Language Processing**. 

### Connection between communication and representation

If we concern ourselves with the **communication** between agents, we will naturally emphasize the **knowledge representation** aspect, since we suppose, that this representation is in strong connection with the **encoding, transferring and decoding** of **meanings** and **intents** between agents.


The typical characterization of a communication process is as follows:

<img src="https://www.skillsyouneed.com/images/communication-process.png" width=50%>

Language is a:
- set of symbols (codes)
- ruleset for their production and consumption


**Some code in a language is the (symbolic) representation of some information bearing a communicative intent between agents.**

<img src="http://drive.google.com/uc?export=view&id=1MlX-vdeWGP17cdfWyCRPmDSzHSFL8bN3" width=65%>

**There is a non-trivial relationship between the concept, the observed phenomena, and the symbols used to encode them.**

For further discussion see: [Time to put an end to BERTology (or, ML/DL is not even relevant to NLU)](https://medium.com/ontologik/time-to-put-an-end-to-bertology-or-ml-dl-is-not-even-relevant-to-nlu-e5ba6fc53403)

Note: the Symbolic aspect implies two things:
- There can be rules of operation we can carry out over the symbols, but
- there is always a (challenging) decoding aspect, that has to get the original "denotatum" back from the symbols.

There is a widespread tradition in philosophy, psychology and cognitive science that appreciates the deep connection, sometimes even equality of a language and thought. (For the connections between self-talk and self-knowledge see eg.: [here](https://www.researchgate.net/publication/10947675_Relations_among_self-talk_self-consciousness_and_self-knowledge))

For a detailed account of this position in cognitive science see [here](https://plato.stanford.edu/entries/mental-representation/).

This position lends itself easily to integration with the **"classical computational paradigm"** of symbol manipulation, thus was pretty widespread with the advent of computation machinery. 

On the other hand, argument arose against the symbolic paradigm on multiple fronts, since neural and experimental evidence can not account for any kind of localized symbolic computation in the brain, thus a more broad view of **"distributed representation"** and **"emergent cognition"** has been proposed. 

This paradigm lends itself easily for integration with probabilistic and "neural" models of computation, which recently became dominant in the field of NLP also.

For a detailed account of the paradigm change see: [Várela, Thompson, Rosch: The Embodied Mind, Sections II-III](https://monoskop.org/images/b/b2/Varela_Thompson_Rosch_-_The_Embodied_Mind_Cognitive_Science_and_Human_Experience.pdf)




## Human commmunication - Language

### Human communication channels - and how much we care

<img scr="https://transferoflearning.com/wp-content/uploads/2016/04/VAK-image-3.png" width=30%>

Three main sensory channels we humans use to communicate are:
- Visual
- Auditory
- Tactile (and other bodily)

There is a widely circulated mischaracterization of the relative importance of these modalities.


<img src="https://pbs.twimg.com/media/CwwsDwWWIAEhhXJ.jpg" width=30%>


**<center><font color='red'>Well, NO!!!!!!!</font></center>**

"Please note that this and other equations regarding relative importance of verbal and nonverbal messages were derived from experiments dealing with communications of feelings and attitudes (i.e., like-dislike). Unless a communicator is talking about their feelings or attitudes, these equations are not applicable." ([source](http://www.kaaj.com/psych/smorder.html)) ([other problems](https://www.psychologytoday.com/us/blog/beyond-words/201109/is-nonverbal-communication-numbers-game))

**We will regard language** (irrespective of transmission channel) as the **primary source of information exchange.** 


### Natural Language

Beyond the "naturally" occuring languages humans use to communicate, there are multiple forms of non-natural languages:

<img src="http://www.jkerkkonen.com/julang2.gif" width=50%>


# Natural Language Processing

NLP tries to use programming languages and/or logic languages to process and manipulate the products from the naturally occuring ones. 


## NLP and neighboring fields

<img src="http://drive.google.com/uc?export=view&id=1j07-GA461uucU4rAZlLTR6TAT7R5jeb5" width=100%>

- Automatic Speech recognition and Speech generation concerns with spoken audio
- Optical Character Recognition and (hand)writing generation with written text
- Natural Language Processing is focused on digital text

Though many times the term is used confusingly, in this class we will regard Natural Language Processing proper as the manipulation and processing of **digitalized text.**

These fields have some strong synergies, OCR and ASR models rely heavily on NLP models (but typically not vice-versa).

### NLP and linguistics

Traditionally linguistics as a field has the following layering of study topics:

<img src="http://drive.google.com/uc?export=view&id=1u8ZPNGdxCBRPSQXOhoin2junihCWQnVK" width=90%>

([source](https://medium.com/@ehfirst/natural-language-processing-nlp-foundations-of-linguistics-layers-of-language-2d691e3c905e))

Though ASR actively uses some fields of linguistics, NLP more widely covers the area.

**The structure of NLP problems and solutions will follow this hierarchy.**

### Why are these levels important?

#### Getting the right words and marks

<img src="http://image.linotype.com/nonlatinfonts/indian/devanagari/explanation.gif" width=55%>

<img src="https://kettlefirecreative.com/wp-content/uploads/2017/12/kf-social-punday-peterpan-1.jpg" width=25%>

<img src="https://the16percent.files.wordpress.com/2013/05/punctuation_saves_lives_poster-rde0b962e192d4a14b84cfc8bf1a972ec_wir_400.jpg?w=266&h=337" width=20% heigth=20%>

#### Recognizing "named entities"

<img src="https://imanage.com/wp-content/uploads/2014/10/NER1.png">

<img src="https://cdn.someecards.com/someecards/usercards/1334252140708_1624830.png" width=45%>

#### Setting up sentence boundaries

<img src="http://drive.google.com/uc?export=view&id=1z-Aw53xi-FH1RtuTp2TrzSlTjEBc2RrU" width=65%>

<img src="https://i.pinimg.com/originals/83/d0/99/83d0997f7f1b978b7fc1c52518074433.jpg" width=40%>

#### Getting to know sentence structure

<img src="http://www.cs.joensuu.fi/pages/edtech/mw/pictures/parsetrees_big.gif" width=65% heigth=65%>

Eg. who or what the is the real subject...

<img src="https://cdn.someecards.com/someecards/usercards/1333064468910_168538.png" width=45%>

#### Clarifying meaning

<img src="https://plato.stanford.edu/entries/computational-linguistics/fig4.png" width=65% heigth=65%>

Though sometimes we are out of luck, without looking at the real world, like in:

**"I see Doggy with the looking glass."**

<img src="https://st.depositphotos.com/1146092/1352/i/950/depositphotos_13520966-stock-photo-binoculars-safari-compass-dog-watching.jpg" width=25% heigth=25%>


Ultimately, we can try to clarify the "pragmatics", that is: **"What did he want?"**

**We have to constantly be aware of the fact that language is a form of lossy compression of thought!**

<img src="http://drive.google.com/uc?export=view&id=1X5vNEmkueuunkgDrLihLB-4V6rT0MU-n" width=65%>

For further discussion see: [Time to put an end to BERTology (or, ML/DL is not even relevant to NLU)](https://medium.com/ontologik/time-to-put-an-end-to-bertology-or-ml-dl-is-not-even-relevant-to-nlu-e5ba6fc53403)



## Structure of "raw material" for NLP

Natural language has a **deep compositional structure**, which entails smaller units of meaning composing larger units, conveying more and more complex information.

This is also reflected in the **way we organize text**, the raw material for NLP

<img src="http://narratext.com/images/narratext_story-text_structure.jpg" width=70%>

Two units are noteworthy from the above:
- A **corpus**, is the collection of available documents from a given **domain** 
- A **domain** is a representation of a form of **language usage** and a set / society of knowledge.

**Meaning (semantics) is strongly defined by the domain!**

- Certain meanings are **undefined** outside of the domain (eg.: anatomic jargon)
- Certain meanings are **overridden** in a domain (eg.: see below)
- Certain meanings can thus can get **ambiguous** in / across a domains 

<img src="http://www.aerospaceweb.org/question/performance/turn/bank.gif" width=45%>

VS.

<img src="https://img.etimg.com/thumb/msid-71487585,width-1200,height-900,imgsize-169788,overlay-economictimes/photo.jpg" width=35%>
<img src="https://www.inchcalculator.com/wp-content/uploads/2015/04/angle.png" width=35%>

**Ambiguity and context dependence** is one of the main probelms in NLP.

##  Typical tasks in NLP

### End user goals

From the user perspective the typical goals (s)he would like to achieve by using an NLP enabled system are the same as the general case for agents:
- **Information retrieval / Question answering:** User would like to gain knowledge ("What are the ingredients of chicken curry?")
- **Information organization:** User would like to organize, structure, categorize knowledge
- **Information transformation:** User would like to make the system transform knowledge from one form to the other (**machine translation**, **document summarization**) 
- **Action automation:** User would like to carry out an action or wants the system to carry it out ("Order the ingredients on amazon, please!", "Automatically assign incoming mails to agents")

Though these sound obvious, by designing NLP systems, it is easy to forget about the design principle of user first. It is always important to ask the question: "What does the user want to achieve **in her situation** (context) with the system?


### Intermediate actions

For reaching the above mentioned goal, typically multiple steps of processing are necessary, which are themselves complex NLP activities. 

Some examples (incomplete):

<img src="https://content-static.upwork.com/blog/uploads/sites/3/2017/06/27091951/image-101.png" width=50%>


([source](https://www.upwork.com/hiring/for-clients/artificial-intelligence-and-natural-language-processing-in-big-data/))

### Vision of a pipeline

Based on this understanding of compositionality, as well as the hierarchy of tasks that can be carried out, the **most prevalent vision of NLP systems is that of a pipeline**.

<img src="https://d33wubrfki0l68.cloudfront.net/16b2ccafeefd6d547171afa23f9ac62f159e353d/48b91/pipeline-7a14d4edd18f3edfee8f34393bff2992.svg" width=65%>

As such, the pipeline carries out subsequent tasks **dependent on each-other**, whereby one usually travels along **"upwards" in the hierarchy of linguistics** (from morphemes to syntax to semantics,...) as well as **larger and larger chunks of text** (from words to sentences to documents,...) 

Although this is a handy metaphor, reality is - as it is usually - more complicated, not infrequently, **the solutions for the sub-tasks are circular.**

Let's take for example the case of **contractions**:

<img src="http://drive.google.com/uc?export=view&id=1Nms48ml_40I6bOb91Vzj7aAriqMzTc9e" width=35%>

In this case for the resolution of the contraction, one would have to decide if we are talking about _"would have"_ or _"had have"_. This, though only represents a tokenization / lemmatization task, can only be decided by knowledge about the sentence syntax.

And if we would think, that this is a fringe problem, well, let's think twice. In Portugese eg. such contractions can represent ~2% of all words. see: [Tokenization-Tagging circularity](https://pdfs.semanticscholar.org/ee95/19a48834acc972d453de0fc1c5932e8055a0.pdf)

So pipeline is a widespread, useful but incomplete notion...

### Benchmarks and progress

As for all areas of machine learning, there are dedicated tasks and benchmarks that measure the progress in the state of the art. These naturally **do not reflect** the totality of real life project complexity and challenges, but can be good estimates of performance.

<img src="https://www.researchgate.net/profile/Zhiyong_Lu2/publication/275772841/figure/fig3/AS:294551743942659@1447238016369/Challenges-subtasks-tracks-organized-based-on-NLP-perspectives.png" width=60%>

([source](https://www.researchgate.net/publication/275772841_Community_challenges_in_biomedical_text_mining_over_10_years_Success_failure_and_the_future))

For an estimate of progress see: [Electronic Frontier Foundation - AI Progress metrics](https://www.eff.org/ai/metrics)

To further the general progress in the field, notable "hub organizations", like the [Association for Computational Linguistics](https://www.aclweb.org/portal/what-is-cl) organize "challenges" or "shared tasks", like [SemEval](https://www.aclweb.org/portal/content/semeval-2019-international-workshop-semantic-evaluation) that act as a measuring rod for the state of the art. These challenges became major drivers of progress in the field.