# SS 2021 SEMINAR 08 Reinforcement Learning in der Sprachtechnologie
## Script-Based Agents

### Announcements

#### Papers

* Yannic Kilcher recently reviewed an interesting paper about arguments why RL might be the only thing we need for creating AGI

* you can find the video here: https://www.youtube.com/watch?v=dmH1ZpcROMk

* you can find the paper here: https://www.sciencedirect.com/science/article/pii/S0004370221000862

#### Homework

* Wrong information in last weeks notebook (sry for that ...)
        
* DEADLINE: JUNE 16th

#### Today

* short session about script based agents

* libraries to check out FYI

***

### A) Paper Presentation I: Jovanka

### **Title: Reinforcement learning of minimalist grammars**
**Link:** [MyPaper](https://arxiv.org/pdf/2005.00359v1.pdf)

**Summary**

The authors provide a machine learning algorithm for a cognitive agent with semantic understanding. The algorithm combines methods from computational linguistics, formal logic and abstract algebra. The syntax is based on minimalist grammar, the semantic on predicate logic. The goal of the agent is to learn the mental lexicon that includes expert language knowledge. This acquisition process is due to a feedback loop applicable for reinforcement learning.

    
**Problem**

* state-of-the-art semantic analysis of user inputs is based on slot-filling procedures
  * detect relevant keywords and insert them into semantic frames
  * no representation of meaning
  * no cognitive understanding of utterances
  * goal: overcome traditional slot-filling by proper cognitive information and communication technologies


* demand on cognitive user interfaces
  * processing and understanding of declarative or imperative sentences
    1. syntactic analysis
    2. semantic analysis --> semantic representation
    3. compute logical inferences
    4. respond accordingly (including generation of utterances as feedback signal)
    
*--> need of language acquisition algorithm*

**Idea/Approach**

* machine learning algorithm for the acquisition of a minimalist grammar (MG) mental lexicon of the syntax and semantics for English declarative sentences through reinforcement learning
* simultaneous segmentation of syntax and semantics
* the approach combines methods from computational linguistics, formal logic, and abstract algebra

*minimalist grammar (MG):*
* important property: effective learnability in the sense of Gold's learning theory
* developed by Stabler: mathematical codification of Chomsky's Minimalist Program in the generative grammar framework
* consists of
  * mental lexicon storing linguistic signs as arrays of syntactic, phonetic and semantic features
    * linguistic base types: noun, verb, adjcetive etc.
    * selector categories
    * licensors and licensees 
  * two structure-buildung functions: "merge" and "move"
* all syntactic information is encoded in the feature array of the mental lexicon
* syntax and compositional semantics can be combined via the lambda calculus

**Definitions**

* *utterance meaning pairs (UMP)* 
  * $u = \langle e,\sigma\rangle$, where $e \in E$ is the spoken or written utterance, given as the *exponent* of a linguistic sign and $\sigma \in \Sigma$ is the sign's *semantics* as a logical term, expressed by means of predicate logic

    example: $ u = \langle \texttt{the mouse eats cheese, eat} (\texttt{cheese})(\texttt{mouse})\rangle$
    

* *linguistic signs* are ordered triple
  * $z = \langle e,t,\sigma \rangle$, where additionally $t\in T$ is a *syntactic type* and is encoded by means of MG in its chain representation
  * syntactic types control the generation of syntactic structure and the order of lambda application
    example: $u = \langle \texttt{the mouse eats cheese, :c, eat} (\texttt{cheese})(\texttt{mouse})\rangle$
    

  * $\texttt{:c}$ indicates that the sign is complex and a complementizer phrase of type $\texttt{c}$
    
* The compositional semantic can be described by the terms $\lambda\texttt{P}.\lambda\texttt{Q}.\texttt{P}(\texttt{Q})$ and $\lambda\texttt{P}.\lambda\texttt{Q}.\texttt{Q}(\texttt{P})$, the predicate $\texttt{eat}$ and the individuals $\texttt{cheese}$ and $\texttt{mouse}$.
  * $\lambda\texttt{P}.\lambda\texttt{Q}.\texttt{P}(\texttt{Q})(\texttt{eat})(\texttt{cheese})$
  * $\lambda\texttt{Q}.\texttt{eat}(\texttt{Q})(\texttt{cheese})$
  * $\texttt{eat}(\texttt{cheese})$
  * $\lambda\texttt{P}.\lambda\texttt{Q}.\texttt{Q}(\texttt{P})(\texttt{mouse})(\texttt{eat}(\texttt{cheese}))$
  * $\lambda\texttt{Q}.\texttt{Q}(\texttt{mouse})(\texttt{eat}(\texttt{cheese}))$
  * $\texttt{eat}(\texttt{cheese})(\texttt{mouse})$


* Given the logical term $\texttt{eat}(\texttt{cheese})(\texttt{mouse})$ you can get $\lambda\texttt{x}.\lambda\texttt{y}.\texttt{eat}(\texttt{x})(\texttt{y})$ with two successive lambda abstraction. So we have both directions: from general to special and from special to general.


* *syntactic features*
    
  * *basic types* $b\in B = \{ \texttt{n,a,v,d,...} \}$
  * *selectors* $S = \{=b|b \in B \}$, unified by *merge*
  * *licensors* $L_+ = \{+l|l \in L\}$, where $L$ is a finite set of movement identifiers
  * *licensees* $L_- = \{-l|l \in L\}$, licensors and licensees trigger move
  * *feature set* $F = B \cup S \cup L_+ \cup L_-$
  * *categories* $C=\{::, :\}$, where "::" indicates *simple, lexical* categories and ":" *complex, derived* categories
  * the ordering of syntactic features is prescribed in the lexicon as regular expressions i.e. $T=C(S\cup L_+)^*BL_-^*$
    
example: 
* $\langle\texttt{mouse, ::n, mouse} \rangle$
* $\langle\texttt{cheese, ::n -k, cheese} \rangle$
* $\langle\texttt{the, ::-n d -k, } \epsilon \rangle$   
* $\langle\texttt{eat, ::-n v -f, }\lambda\texttt{x}.\lambda\texttt{y}.\texttt{eat}(\texttt{x})(\texttt{y}) \rangle$    
* $\langle\texttt{-s, ::-pred +f +k t, }\epsilon\rangle$
* $\langle\epsilon\texttt{, ::-v +k -d pred, }\lambda\texttt{P}.\lambda\texttt{Q}.\texttt{eat}(\texttt{Q})(\texttt{P})\rangle$  
* $\langle\epsilon\texttt{, ::-t c, }\epsilon \rangle$

    
* syntactic functions (p. 7)
merge-1, merge-2, merge-3 and move-1, move 2
    
A minimalist derivation terminates when all syntactic features besides one distinguished *start symbol*, which is $\texttt{c}$ (complementizer phrase) in our case,  have been consumed.
    
example for derivation (bottom-up) (p. 10)

**Utterance-Meaning Transducer (UMT)**
    
* bottom-up derivations are essential for MG
* but they are not suitable for NLP, because their computation is neither incremental nor predictive
* therefore the authors present a bidirectional utterance-meaning transducer for MG
* central object for MG language processing is derivation tree
* derivation tree with comma-separated sequence of exponents and index tuples for every node (p. 12)
* derivation = tree paths from the bottom to the top
* tree paths from the top towards the bottom allows for an interpretation in terms of *multiple context-free grammars* (MCFG)
* MCFG: categories are *n*-ary predicates over string exponents
* every branching leads to one phrase structure rule in the MCFG (p. 13)
* reversion of MG rules
* UMT with language production module and language understanding module (details pp. 13)
  * language production: get semantic representation and compute the exponent
  * language understanding: three memory tapes: input sequence, syntectic priority queue, semantic priority queue
    
**Reinforcement Learning**
* RL algorithm for learning the MG lexicon
* training algorithm that simultaneously analyzes similarities between exponents and semantic terms
* positive and negative examples to obtain a better performance through reinforcement learning
* model:    
  * cognitive agent $L$ in a state $X_t$, identified as $L$'s mental lexicon at training time $t$
  * for $t=0$ $X_0$ is initialized as $X_0 \leftarrow \emptyset$
  * $L$ is exposed to UMPs produced by a teacher $T$
  * assumption: $T$ presents already complete UMPs, not singular utterances to $L$ 
    * avoid *symbol grounding problem* of firstly assigning meaning $\sigma$ to uttered exponents $e$
  * $L$ is instructed to reproduce $T$'s utterances based on its own semantic understanding
    * provides feedback loop --> applicable for reinforcement learning
  * in each iteration, the teacher utters an UMP that should be learned by the learner

<img src="https://raw.githubusercontent.com/clause-bielefeld/SS_2021_SEMINAR_Reinforcement_Learning_in_der_Sprachtechnologie/main/materials/images/2021_05_29 19_54 Office Lens.jpg" width="500"/>

example

$t = 1$:

$u_1 = \langle \texttt{the mouse eats cheese, eat}(\texttt{cheese})(\texttt{mouse})\rangle$
* as long as $L$ is not able to detect patterns or common similarities in $T$'s UMPs, it simply adds new entries directly to its mental lexicon, assuming the UMP is comlex ":", possessing base type $\texttt{c}$
* update rule for $L$'s mental lexicon: $X_t \leftarrow X_{t-1} \cup \{\langle e_t, \texttt{:c}, \sigma_t\rangle \}$, when $u_t = \langle e_t, \sigma_t \rangle$ is the UMP presented at time $t$ by $T$
* mental lexicon $X_1 = \{\langle \texttt{the mouse eats cheese, :c, eat}(\texttt{cheese})(\texttt{mouse})\rangle \}$

$t = 2$:

$u_2 = \langle \texttt{the rat eats cheese, eat}(\texttt{cheese})(\texttt{rat})\rangle$
* similarities between $u_1$ and $u_2$
* $L$ creates to distinct items for $\texttt{the mouse}$ and $\texttt{the rat}$
* $L$ carries out lambda abstraction to obtain the updatet lexicon $X_2$
* $X_2 = \{\langle \texttt{the mouse, :d, mouse}\rangle, \langle \texttt{the rat, :d, rat}\rangle, \langle \texttt{eats cheese, :-d c, } \lambda y.\texttt{eat}(\texttt{cheese})(y)\rangle \}$
* further segmentation
* $X_{21} = \{\langle \texttt{the, ::-n d, }\epsilon \rangle, \langle \texttt{mouse, ::n, mouse}\rangle, \langle \texttt{rat, ::n, rat}\rangle, \langle \texttt{eats cheese, :-d c, } \lambda y.\texttt{eat}(\texttt{cheese})(y)\rangle \}$
* for closing of the reinforcement cycle, $L$ is supposed to produce utterances on its own understanding
* assumption $L$ wants to express the proposition $\texttt{eat}(\texttt{cheese})(\texttt{rat})$
* correct derivation with the corresponding signs from the lexicon is made leading to correct utterance $\texttt{the rat eats cheese}$
* $T$ endorses this utterance


$t=3$:

$u_3 = \langle \texttt{the mouse eats carrot, eat}(\texttt{carrot})(\texttt{mouse})\rangle$

* similarities with lexicon entry $ \langle \texttt{eats cheese, :-d c, } \lambda y.\texttt{eat}(\texttt{cheese})(y)\rangle$
* lambda abstraction
* $X_{3} = \{\langle \texttt{the, ::-n d, }\epsilon \rangle, \langle \texttt{mouse, ::n, mouse}\rangle, \langle \texttt{rat, ::n, rat}\rangle, \langle \texttt{cheese, ::n, cheese}\rangle, \langle \texttt{carrot, ::n, carrot}\rangle, \langle \texttt{eats, ::-n -d c, } \lambda x.\lambda y.\texttt{eat}(x)(y)\rangle \}$
* $L$ produces utterance of novel semantic represantation $\texttt{eat}(\texttt{carrot})(\texttt{rat})$
* derivation of UMP $\langle \texttt{the rat eats carrot}, \texttt{eat}(\texttt{carrot})(\texttt{rat})\rangle$
* UMP is rewarded by $T$

$t=4$:

$u_4 = \langle \texttt{the rats eat cheese, eat}(\texttt{cheese})(\texttt{rats})\rangle$

* pattern matching
* $X_{4} = \{\langle \texttt{the, ::-n d, }\epsilon \rangle, \langle \texttt{mouse, ::n, mouse}\rangle, \langle \texttt{rat, ::n, rat}\rangle, \langle \texttt{rats, ::n, rats}\rangle, \langle \texttt{cheese, ::n, cheese}\rangle, \langle \texttt{carrot, ::n, carrot}\rangle, \langle \texttt{eats, ::-n -d c, } \lambda x.\lambda y.\texttt{eat}(x)(y)\rangle, \langle \texttt{eat, ::-n -d c, } \lambda x.\lambda y.\texttt{eat}(x)(y)\rangle \}$
* new proposition $\texttt{eat}(\texttt{carrot})(\texttt{rats})$
* derivation leads to UMP $\texttt{the rats eats carrot}$
* $T$ rejects utterance because of grammatical number agreement error
* $L$ has to find suitable revision of $X_4$
* new features for number: $\texttt{a}$ and $\texttt{num}$
* new entry: plural suffix
* $X_{41} = \{\langle \texttt{the, ::-num d, }\epsilon \rangle, \langle \texttt{mouse, ::n -a, mouse}\rangle, \langle \texttt{rat, ::n -a, rat}\rangle, \langle \texttt{-s, ::-n +a num, }\epsilon \rangle, \langle \texttt{cheese, ::n, cheese}\rangle, \langle \texttt{carrot, ::n, carrot}\rangle, \langle \texttt{eats, ::-n -d c, } \lambda x.\lambda y.\texttt{eat}(x)(y)\rangle, \langle \texttt{eat, ::-n -d c, } \lambda x.\lambda y.\texttt{eat}(x)(y)\rangle \}$

and so on

**Critique**
* theoretical and abstract paper
* explanation of approach with examples
* contribution to an important problem: computational semantic represantation of utterances
* language acquisition algorithm provides understanding and generation of utterances
* many points of contact to other concepts and problems: minimalist grammar, multiple context-free grammars, Gold's learning theory, symbol grounding problem, predication logic

### A) Paper Presentation II: Toni

### Title: Reinforcement Learning for Relation Classification from Noisy Data

#### Link:
[Reinforcement Learning for Relation Classification from Noisy Data](https://arxiv.org/pdf/1808.08013.pdf)

#### Summary:
Existing relation classification methods that rely on distant supervision assume that 
a bag of sentences mentioning an entity pair are all describing a relation for the entity pair. 
Such methods, performing classification at the bag level, cannot identify the mapping between 
a relation and a sentence, and largely suffers from the noisy labeling problem. In this paper,
we propose a novel model for relation classification at the sentence level from noisy data. 


#### Problem/Task/Question:
 - Can we use reinforcement learning methods to classificate noisy relations of 2 entities?
 - NLP Problem:
     - categorize semantic relations between 2 entities given a plain text  
    
 -  Noisy labeling problem:
     - (Barack_Obama, BornIn, United_States) = (entity,relation,entity) tripel
     - "Barack Obama is the 44th president of the United States" will be regarded as positive instance
        by distant supervision for relation BornIn
     - (53% of 100 sample bags have no sentences that describes the relation of the entities)
        
 - previous distant super vision methods suffer from noisy labeling problem, because they assume that 2 entities are in a relation if this relation is mentioned in the sentences
 
 - instance selection problem, which sentence truly describes relation and should be selected as training instance?
 - relation classification problem, which semantic relation has the highest probability in a sentence and mentioned entity pair?

![alt text](https://raw.githubusercontent.com/clause-bielefeld/SS_2021_SEMINAR_Reinforcement_Learning_in_der_Sprachtechnologie/main/materials/images/sentencebaglevel.PNG "")
- bag level: -  bag contains noisy sentences with same entities (possibly not same relation)
- 2 problems in bag level approach:
    - 1. unable to handle sentence level prediction
    - 2. sensitive to bags with all noisy sentences (no relation described)
    - decreases performance of relation classification

#### Solution/Idea/Model/Approach:


**Overall Structure:** 
- instance selector chooses sentences according to policy function
- the selected sentences are used to train a better relation classifier
- instance selector updates its parameters, with reward from relation classifier
![alt text](https://raw.githubusercontent.com/clause-bielefeld/SS_2021_SEMINAR_Reinforcement_Learning_in_der_Sprachtechnologie/main/materials/images/overallstructure.PNG "")



This model consists of 2 modules:
- 1. instance selector: 	
   - acts as agent
   - first select high quality sentences from a sentence bag
   - has no explicit knowledge about correct labeled sentences
   - reinforcement learning, we can measure the utility of the selected sentences as whole
   - filter entire bag if all sentences are labeled incorrectly
      
- 2. relation classifier: 
     - predict relation from each sentence in cleansed data and sentence level p(r|x)
     - provide reward to instance selector

Training data: 
- text data
- widely used dataset NYT:
     - 522k sentences, 281k entity pairs, 18k relational facts + NA that says: "there is no relation"
- word2vec to train word embedding on NYT Corpus (relative distances from the current word) (word and position embedding)
- Convolutional Neural Network (CNN) as relation classifier
- three-fold cross validation for tuning the model


Training: 
- we want to: 
    - optimize policy network in instance selector with policy gradient method
    - optimize the CNN Component they use gradient descent method to minimize the loss function (cross entropy)

- first pretrain CNN, then pretrain policy function by computing reward with the pretrained CNN (with frozen parameters)
- then jointly train the instance selector and relation classifier
- when training of instance manager is done, we merge all selected sentences in each bag to obtain a cleansed dataset and train the relation classifier with it
- supervise instance selector to maximize the average likelihood of choosen instances


**reinforcement learning spaces:**

**state:**
 - current sentence, already selected sentence and entity pair when
   making descision on the i-th sentence of bag B
 - as continuous real-valued vector

**action:**
 - action space: {0,1}
 - instance selector selects as training instance the i-th sentence of bag B (1) or not (0)
 - selects according to stochastic policy
 
**policy:**
 - normally updated after selection on all training instances are finished (with policy gradient theorem and REINFORCE algorithm)
 - but, we can split the training sentence instances into N Bags, and compute the reward for each finished bag
 - each bag corresponds to a distinct entity pair, and has a sequence of sentence with the same noisy relation label

**reward**
 - indicates the utility of the chosen sentences
 - relation classifier provides reward to instance selector
 - we only receive delayed reward after last sentence of the bag, zero in all other states
 - if bag empty, reward is the average likelihood of all sentences in the training data (to exclude noisy bag)
 - we aim to maximize the expected total reward
 
**transition**
 - load next sentence of bag and choose an action



#### Results:
  ![alt text](https://raw.githubusercontent.com/clause-bielefeld/SS_2021_SEMINAR_Reinforcement_Learning_in_der_Sprachtechnologie/main/materials/images/performance.PNG "")
 - performance on sentence level classification
 - (F1 metric for classification problems, recall and precision)
 - CNN+RL has the best accuracy with 64%
 - previous methods could not filter the bags with noisy sentences
 - instance selector can exlude noisy sentences effectively (they selected 100 deleted sentence bags
   and found that 86% of the bag consist of all noisy sentences)


**Different Models**

 - CNN(2014) sentence-level classification model, doesnt consider noisy labeling problem
 - CNN+Max(2015) bag-level classification, assumes there is one sentence in the bag, thats describing the relation (most correct sentence in each bag)
 - CNN+ATT(2016) bag-level model, similar to CNN+Max, can weight down noisy sentences. because it addopts sentence level attention)


**Evaluation**

 - selected data by instance selector is better for relation classification
 ![alt text](https://raw.githubusercontent.com/clause-bielefeld/SS_2021_SEMINAR_Reinforcement_Learning_in_der_Sprachtechnologie/main/materials/images/precision.PNG "")
 - with weighting down noisy sentences
 ![alt text](https://raw.githubusercontent.com/clause-bielefeld/SS_2021_SEMINAR_Reinforcement_Learning_in_der_Sprachtechnologie/main/materials/images/precisionwithweights.PNG "")
 - accuracy of selection decision
 
 - compared reinforcement learning selection with greedy selection
 - greedy selection selects top N sentences with largest likelihood (estimated by a pre trained CNN)
 ![alt text](https://raw.githubusercontent.com/clause-bielefeld/SS_2021_SEMINAR_Reinforcement_Learning_in_der_Sprachtechnologie/main/materials/images/greedy.PNG "")



#### Critical Discussion:

* **+** deals better with noisy data, unlike reducing weights 
* **+** classification at sentence level (relation classifier trained and tested at sentence level on cleansed data)
* **+** good structure and visualisations of the results


* **-** instance selector rejects more than just the noisy instances

***

### A) Paper Presentation III: Julia

### Reinforced Extractive Summarization with Question-Focused Rewards
##### Link:
[A Survey on Reinforcement Learning for Dialogue Systems](https://arxiv.org/pdf/1805.10392.pdf)

##### Summary:
This Paper aims at derriving extractive summaries, which are summaries that are made up of a subset of word sequences in the source document. As oppose to abstractive summaries, this approach to summarization is less pron to diverging from the facts in the document. They extract 'questions' from human written summaries to evaluate and reinforce automatically generated summareis that retain the necessary information. This way, they use reinforcemnet learning to explore the space of possible summareis.

##### Problem/Task/Question:
* Automatically extracting word sequnces from a source document such the resulting text contains the same information as the human written summaries
* human abstracts traditionally do not align on align on word level but used as ground truth labels for extractive summarization anyway

##### Solution/Idea/Model/Approach:
* **Design a question orientated reward function**
    * Extract "Cloze questions" $Q_k$ from each sentence in human written summary by replacing **entities** or **root words**, i.e. root of sentence dependency parse tree, with placeholder token.
    * Encode entire question sequence Bi-LSTM$(Q_k) =q_k$
    * Given extractive summary $Y$ use Bi-LSTM to ecode every word at postion i in the summary $h_i^Y$
    * Use attention mechanism to locate which part of summary is relavent to question $\alpha_{k,i} \propto exp(q_k W^{\alpha}h_t^Y)$
    * For every question, compute a context vector $c_k$ as $\alpha$-weighted sum of word encodings $h_i^Y$ to predict correct word for placeholder P$(e_k|Y,Q_k)=$ softmax$(W^c c_k)$
    * The reward is defined as log-likelihood of correctly predicting token averaged over all questions $R_a(Y) = \frac{1}{K} \sum_{k=1}^{K}logP(e_k|Y,Q_k)$
    * Other reward components: 
    - $R_s(Y)$ consciese by restricting number of words
    - $R_f(Y)$ fluency by encouraging consequtive words in source documnt to be selected 
    - $R_b(Y)$ encourages original wording by overlapping of biagrams
    * final reward is weighted sum of reward components 
$R(Y) = R_a(Y)+\gamma R_b(Y)+\beta R_f(Y)+\alpha R_s(Y)$

![img](https://d3i71xaburhd42.cloudfront.net/f383d0a898df9f59538a8bdeff9b44bd9055c8af/3-Figure1-1.png)

* **Extract summareis with Reinforcement Learning**
    * Seek policy $P(Y|X)$ to extract summary $Y$ for any source document $X$ such that $\mathbb{E}_{P(Y|X)}[R(Y)]$ is maximized
    * Models used for extracting summaries:
        * Bi-LSTM$(X)$ used to encode each word in the document ($h_i^X$)
        * LSTM where hidden state encodes previous sampling decisions $s_{t-1}$
        * => as concatenated input for feedforward layer with sigmoid activation to retrieve sampling decisions 
    
* **Training and Hyperparameter tuning**
    * First models used for extracting reward response are trained using a subset of the CNN dataset (articles and summaries)
    * CNN dataset used to pretrain the Bi-LSTM$(X)$ and LSTM for sampling decisions
    * Sample multiple summaries $\hat{Y}$ by introducing dropout rate in the extraction process
    * Use reward for all gnerated summaries to weight the gradients used to update the two models and feedforward layer 
    * Hyperparameters such as $\alpha$, $\gamma$, $\beta$, hidden stat size of model, drop out rate and summary length tuned. with validation set
    
##### 3 Main Results/Findings:
* This approach outperforms all methods they compare to when using the ROUGE-Metric for evaluation
* There is no significant difference between using one question or using five questions, possibly, because articles are ery short 
* removing sentence root words for questions is the more viable strategy

![img](https://d3i71xaburhd42.cloudfront.net/f383d0a898df9f59538a8bdeff9b44bd9055c8af/5-Table3-1.png)

![img](https://d3i71xaburhd42.cloudfront.net/f383d0a898df9f59538a8bdeff9b44bd9055c8af/4-Table2-1.png)

![img](https://d3i71xaburhd42.cloudfront.net/f383d0a898df9f59538a8bdeff9b44bd9055c8af/6-Table4-1.png)

##### Critical Discussion:
* -training process not explained at all
* +clear description of reward function
* -some issues with indexing
* -question answering accuracy seems low (?)
* -only one (not very convincing) qualitative result, without detailing how it was selected 
* +future work: qualtivative analysis
* +comparing to many recent other works
* -cherry picked comparison works. In 2018, when this was published, better Rouge metrics had been achieved. So claim made in abstract 'surpassing state-of-the' is not true (https://paperswithcode.com/sota/extractive-document-summarization-on-cnn)

### Discussion



***

## B) Theory

### RL in LANGUAGE TASKS | NLP

> LANGUAGE GAMES being able to be modeled by RL: 

> Translation Agent: see sentence -> action space vocab -> reward based on the appropriateness of the translation. 

> Language Learning: environment with sounds, words, images -> reward from teacher?

> Google Assistant/Personal Assistant: user asks question -> assistant gives answer -> QA-Dialogue!!!

> Information Retrieval: explanation creation -> SCORING => adaptive explanations/generations

> Human Language Learning: producing sounds, BIAS(no,..), GOALS (achieve common ground, understanding, does he/she understand me?), 

> FUTURE SESSION: language immergence, how is language immerging in an environment -> why and how do agents exchange sounds to ACHIEVE THINGS/GOALS?!!! (deepmind group)

#### ENVIRONMENT: 
* STATE_SPACE: different possibilities to model this, one suggestion: DIALOGUE of fixed length N -> all possible sequences consisting of utterances of max_len L with max_seq_len N 
* e.g. Chat back and forth:

![alt text](https://cloud.netlifyusercontent.com/assets/344dbf88-fdf9-42bb-adb4-46f01eedd629/e8b92e3a-74f3-4403-a609-d401592c5919/topbots-chatbots-2-mitsuku-goodconversationalist-opt.png "")

* STATE: current utterance (1...N) 

* continous state space = COMMON GROUND of the conversation of all particpating agents => cross product of the agents state graphs, agent state graph = current utterance + memory of N utterances + future n step lookahead

![alt text](https://lh3.googleusercontent.com/proxy/5BB_NDe4kkUgNLmxo2cD5qB37qRJkGB2XVNQtJFVcpBhfs8TcYj5iKvAmdKHXhBhI4Bujyb7Vw "")

* STATE: continous state space common mind state/common ground of dialogue -> common ground dialogue state (memory, now, future graph cross product)

![alt text](https://www.neoformix.com/2007/StateOfUnion_2007.png "")

* OBSERVATION_SPACE: categorical: dialogue sequence, continous: memory of dialogue utterances, current utterance, own lookahead

* OBSERVATION: categorical: single utterance, continous: state graph

#### AGENT: 
* ACTION SPACE: [VOCABULARY]

![alt text](http://colah.github.io/posts/2015-01-Visualizing-Representations/img/wiki-pic-major.png "")

* ACTION: [WORD or SENTENCE]

![alt text](https://i.stack.imgur.com/91KjB.png "")


* REWARD FUNCTION: different possibilities to model this: 
* a) right, wrong (QA) or yes, no, true, false, (0,1) up to probabilities of agreement, disagreement (0...1)
* b) sentiment (+1 pos,-1 neg) up to probabilities of positivity, negativity, neutrality (0...1)
* c) praise and blame (you did this very well, because ...., this is not very good, because) (0..1)
* d) common sense, common ground, understanding, reflection => related to theory of mind => how much does my opponent understand me? can he follow me? are we on the same mental ground?
* e) multimodal feedback -> face expressions, smiling, emotions
* f) ...?

## C) Practice

 **Script based conversational agents**
 
 NLTK provides simple, pattern based chatbot functionality. 
 
 It  can be used for QA-Chatbots or Chatbots that retrieve specific answers out of document corpora. 

In [1]:
# IMPORTS
from nltk.chat.util import Chat, reflections

In [2]:
# TRAIN // Hard coded // scripted responses // pattern matching
pairs =[
    [r"my name is (.*)", ['Hello %1!']],
    ['(hi|hello|hey|holla|hola)', ['Hey there !', 'Hi there !', 'Hey !']],
    ['(.*) your name ?', ['My name is Geeky']],
    ['(.*) do you do ?', ['We provide a platform for tech enthusiasts, a wide range of options !']],
    ['(.*) created you ?', ['Geeksforgeeks created me using python and NLTK']], 
    ['(.*) need help', ['how can i help you?']],
]

# Usage examples:
# used for QA -> e.g. find similar phrases in a corpus using TFIDF and using these as answers
# Covid Vaccine Notifier Bots
# 

In [None]:
# TEST
chat = Chat(pairs, reflections)
chat.converse()

> whats your name ?


My name is Geeky


> what do you do?


We provide a platform for tech enthusiasts, a wide range of options !


> how do you do?


We provide a platform for tech enthusiasts, a wide range of options !


#### Rasa Chatbots

<img src="https://miro.medium.com/max/2400/1*Bs0JvC6bmiwrC7we49-tjw.png" width="500"/>

Script based conversational agents can be build using RASA. 

Rasa extends the pattern matching approach to **DIALOGUE FLOWS** which allow guided dialogues. 

Rasa furthermore uses pre-trained ML models for better language understanding. 

Find the package here:

https://rasa.com/

https://github.com/RasaHQ/rasa

You can install it this way: 

`pip install rasa`

`pip install spacy`

`python -m spacy download en`

`pip install nest_asyncio==1.3.3`


You can find a detailed explanation using **google colab** here: 

https://www.youtube.com/watch?v=G9Z6NQ3EcEw&list=PLnmxk2xwn3zHL9NwYKV2_7vJUXw8YVhZA

-----

***

# TODO's

1. Send your finished presentations (+ possibly annotated paper) by **Monday 12.00 AM/midnight** via email to henrik.voigt@uni-jena.de

2. Send your little HOMEWORK to henrik.voigt@uni-jena.de by using the naming convention: HOMEWORK_02_FIRSTNAME_LASTNAME.ipynb until **June 16th 12.00 AM/midnight**

***
