In [None]:
!pip install pandas
!pip install dtale

## Mathematically formulating Coreference Resolution:

$t$ : Current time-step or mention

$M^t$: List of Mentions sorted by position**

$m_t$:  $t^{th}$ mention in $M$**

 ****

$E^t$: List of Entities/Clusters sorted by position**

$E_{i,k}^t:$ $k^{th}$ $**k^{th}$$i^{th}$  entity in $E^t$**

$L_{E^t} = L_t$ : **Length of $E^t$ (Number of Entities)**

$Err_{i}^t:$  **A list containing mentions of entity  $i$, where a decision error occurred.(Should have been mapped to $i$ but the model didn’t).**   

 ****

$C_{ik}^{t}$ : $k^{th}$  mention (rep. by string) of the $i^{th}$ entity

$e_{m_t}$ : Entity to which $m_t$ belongs as per the ground truth

$\bar{e}_{m_t}$ : Entity to which $m_t$ is mapped by the model

$F$: Entity Ranking MLP

$T$: End time
 ****
 

### **Conditions:**

1. $e_{m_t} == \bar e_{m_t}$ ::: No error
2. $e_{m_t} \neq \bar e_{m_t}:$
    1. $e_{m_t} == L_t+1$  and $\bar e_{m_t} <= L_t$
        1. **Name:** o_c error
        2. **Ground truth:** Create a new entity
        3. **Action:** Map to an existing entity
    2. $e_{m_t} \leq L_t$ and $\bar e_{m_t} == L_t+1$
        1. **Name:** c_o error
        2. **Ground truth:** Map to an existing entity
        3. **Action:** Create a new entity
    3. $e_{m_t} \leq L_t$  and $\bar e_{m_t} \leq L_t$
        1. **Name:** c_c error
        2. **Ground truth:** Map to entity $e_{m_{t}}$
        3. **Action:** Map to entity **$\bar e_{m_t}$**

 ****

### Recorded Data:

**Intuitive:**

- **doc_key:** Document id of the document
- **decision_type:** `o` or `c`
- **no_error: `0`** if there is an error else **`1`**
- **detailed_error_type:** `o_c` (`o` error type) & [ `c_o` and `c_c` ] ( `c` error type)
- **mention_ind:** Index of mention in $M$
- **entity_ind:** Index of the entity in $E^T = E$
- **entity_len: $L_t$**
- **category:** [`PROP`, `PRON`, `NOM`]
- **antecedent_ind:** $E_{e_{m_t},-1}$
- **mapped_antecedent_ind:** $E_{\bar e_{m_t},-1}$
- **antecedent_category:** Of mention $m_t$: $category(C_{e_{m_t},-1})$
- **mapped_antecedent_category:** Of mention  $m_t$ mapped to entity  $\bar{e}_{m_t}$: $category(C_{\bar{e}_{m_t},-1})$
- **type:** [ `PER`, `FAC`, `LOC`, `GPE`, `VEH`, `ORG` ]
- **mapped_antecedent_type:** Of mention $m_t$ mapped to entity $\bar{e}_{m_t}$: $type(C_{\bar{e}_{m_t},-1})$
- **mention_str**: String representation of a mention
- **cluster_str:** Let mention be $m_t$ ,
    
     $cluster\_str(m_t,t)$ = {$k\_count(C_{e_{m_t}}^t,k)$ for $k$  in $set(C_{e_{m_t}}^t)$}, i.e. concatenation of unique mention strings and the number of occurences till time t.
    
    - Example:  {mr. villars_1 my good sir_1 you_9 your_2 dear sir_1}: There are 9 instances of you in the cluster
- **mapped_cluster_str:** Let mention be $m_t$ mapped to entity $\bar e_{m_t}$, $mapped\_cluster\_str(m_t)$ = { $k\_count(C_{\bar e_{m_t}}^t,k)$ for $k$ in $set(C_{\bar e_{m_t}}^t)$}

**Calculated for mention $m_t$:**  

- **antecedent_dist** : $t - E_{e_{m_{t}},-1}^t -1$
- **mapped_antecedant_dist** : $t - E_{\bar e_{m_{t}},-1}^t -1$
- antecedent_dist_ent : Number of unique entities between antecedent and the current index.
- mapped_antecedent_dist_ent : Number of unique entities between mapped_antecedent and the current index.
- **cluster_size**: For mention $m_t$ : $|E_{e_{m_t}}^t|$
- **mapped_cluster_size**: For mention $m_t$  : $|E_{\bar e_{m_t}}^t|$
- **cluster_errors**:  For mention $m_t$ : $|Err_{e_{m_t}}^t|$
- **mapped_cluster_err**: For mention $m_t$ : $|Err_{\bar e_{m_t}}^t|$
- **{mapped/cluster}_props:** Fraction of proper nouns in **{mapped/cluster}** Cluster
- **{mapped/cluster}_prons:** Fraction of pronouns in **{mapped/cluster}** Cluster
- **{mapped/cluster}_noms:** Fraction of nominals in **{mapped/cluster}** Cluster
- **entity_score:** For mention $m_t$ : $F(m_t,e_{m_t})$
- **wrong_score:** For mention $m_i$ : $F(m_t,\bar e_{m_t})$
- **num_wrong_inst: $\sum_{k = 1}^{L_t+1} 1(F(m_t, e_k) >  F(m_t, e_{m_i})), where\ F(m_{t}, e_{L_t+1}) = 0$**
- **wrong_mean_score**: ( $\frac{\sum_{k = 1}^{L_t+1}F(m_t, e_k) \cdot 1(F(m_t, e_k) >  F(m_t, e_{m_i}))}{num\_wrong\_inst}$)
- **entropy:** $-\sum_{k=1}^{L_t+1} P_k \cdot \log(P_k)$  where $P_k$  is the probability of assigning $m_t$ to entity $k$
- **cluster_final_size:** $|E_{e_{m_i}}^T|$
- **mapped_cluster_final_size**: $|E_{\bar e_{m_i}}^T|$
- **[LingMess](https://arxiv.org/pdf/2205.12644.pdf)** adapted to Entity Space:
    
    ```python
    PRONOUNS_GROUPS = {
                'i': 0, 'me': 0, 'my': 0, 'mine': 0, 'myself': 0, "i.":0,
                'you': 1, 'your': 1, 'yours': 1, 'yourself': 1, 'yourselves': 1, "thou":1,
                'he': 2, 'him': 2, 'his': 2, 'himself': 2,
                'she': 3, 'her': 3, 'hers': 3, 'herself': 3,
                'it': 4, 'its': 4, 'itself': 4,
                'we': 5, 'us': 5, 'our': 5, 'ours': 5, 'ourselves': 5,
                'they': 6, 'them': 6, 'their': 6, 'themselves': 6,
                'that': 7, 'this': 7
    }
    ```
    
    cluster **:** For mention $m_i$:  $p = e_{m_i}$
    
    mapped **:** For mention $m_i$: $p = \bar e_{m_i}$
    
    **Entity adapted value: {mapped/cluster}_{$x$}** =  $\frac{\sum_{k=1}^{|C_p|}f_x(C_{pk}, m_i)}{|C_p|}$
    
    1.
    
    - $**x$  = pron_pron_c**
    - $f_x$ = Returns 1 if Pronouns belong to the same class, otherwise 0
    
    2.
    
    - $x$ = **pron_pron_nc**
    - $f_x$ = Returns 1 if Pronouns belong to different classes, otherwise 0
    
    3.
    
    - $x$ = **ent_pron**
    - $f_x$ = Returns 1 if entity-pronoun interaction, otherwise 0
    
    4.
    
    - $x$ = **match**
    - $f_x$ = Returns 1 if mentions completely match, otherwise 0
    
    5.
    
    - $x$ = **contains**
    - $f_x$ = Return 1 if one mention contains the other, otherwise 0
    
    6.
    
    - $x$  = **other**
    - $f_x$ = Return 1 if all other $f$s return 0, otherwise 0

In [1]:
import pandas as pd
import dtale

In [2]:
decision_df = pd.read_csv('decision_table.csv')
d = dtale.show(decision_df,subprocess=False, host='localhost')

2023-09-08 15:56:15,559 - INFO     - D-Tale started at: http://localhost:40001
