## DDB-Tagger Usage Example

---

This notebooks provides examples of how to use the DDB Tagger inside a Jupyter notebook. Inside a notebook, the tagger can be used to (1) tag a string of text input or (2) all `.txt` files in a directory. Note, that when using the tagger to (1) tag a string of text, a dataframe with the result is returned and no output is automatically saved. This also means that no information about the disambiguation process is returned/saved. When using the tagger to (2) tag files in a directory, all output is saved in the defined output directory (note, that the output directory should be created beforehand). This output contains the results of the tagger and information about the disambiguation process for each file. 

### 0 Loading Tagger 
When loading the tagger, the following parameters can be defined:
- `dict` (str, optional): Path to semantic dictionary. Defaults to "dict/dict.pkl".
- `da_model` (str, optional): Danish Language Model to use, "spacy" or "dacy". Defaults to "spacy".

In [12]:
# Load Tagger
from src.DDB_tagger import DDB_tagger
Tagger = DDB_tagger(da_model="spacy")

### 1 Tagging A String of Text Input

A string of text can be tagged by using the `tag_text` function. When calling the function, the following parameters can be defined:
- `input` (str): String of input text or path to input file. 
- `input_file` (bool, optional): Defines whether input is path (True) to input file or string of text (False). Defaults to False.
- `only_top3_results` (bool, optional): Defines whether only the top3 tags or all should be in the results. Defaults to True.
- `only_tagged_results` (bool, optional): Defines whether results should only contain tags (True), or also scores (False). Defaults to False.

In [13]:
# Define path to file
file = "in/2020-09-15.politiken.txt"

# Read in file to save as string
with open(file, 'r') as f:
    text = f.read()
    f.close()

# Print text 
text

'Da demonstrationerne mod politivold og racisme eksploderede overalt i USA først på sommeren, greb Donald Trump straks chancen for at gøre præsidentvalget til et spørgsmål om lov og orden. Siden har han tordnet mod demokratiske borgmestre i uroplagede byer som Chicago, Portland og Seattle – hvis han da ikke bare har tweetet ordene ’LOV OG ORDEN’ med versaler.\n\nDesværre for Trump har strategien ikke virket. Hans demokratiske modkandidat Joe Biden fører stadig solidt i meningsmålingerne. Trumps republikanske kernevælgere har godt nok taget imod lov og orden-budskabet – i en grad, så hardcore Trump-tilhængere går til modangreb mod demonstranter i gaderne. Men flest amerikanere svarer rent faktisk i undersøgelser, at de mener, Joe Biden er bedre til at skabe lov og orden end Trump. I en måling i det politiske medie The Hill viser 54 procent af vælgerne for eksempel mest tillid til Biden i spørgsmålet.\n\nJoe Biden fører i øjeblikket over Donald Trump med cirka 7 procentpoint i et vægtet 

In [3]:
# Use tagger to get only top3 and only tags, returns dataframe with results
output = Tagger.tag_text(input=text, input_file=False, only_top3_results=True, only_tagged_results=True)
output.head()

Unnamed: 0,ORIGINAL_IDX,TOKEN,POS,DDB1,DDB2,DDB3
0,0,Da,KONJ,11|013|Begrundelse,06|017|Straks,-
1,1,demonstrationerne,NOUN,"18|017|Protest, oprør","11|019|Mening, holdning",09|061|Hjælp
2,2,mod,ADP,"08|023|Styre mod, retning",05|024|Modsætning,15|039|Modstand
3,3,politivold,NOUN,21|025|Politi,-,-
4,4,og,KONJ,04|028|Tilføje,-,-


In [4]:
# Use tagger to get all tags and scores, returns dataframe with results
output = Tagger.tag_text(input=text, input_file=False, only_top3_results=False, only_tagged_results=False)
output.head()

Unnamed: 0,ORIGINAL_IDX,TOKEN,POS,DDB1,DDB2,DDB3,DDB4+
0,0,Da,KONJ,"(11|013|Begrundelse, 0.9999725591350639)","(06|017|Straks, 1.0)",-,-
1,1,demonstrationerne,NOUN,"(18|017|Protest, oprør, 0.9999722553616514)","(11|019|Mening, holdning, 0.9999725598880443)","(09|061|Hjælp, 0.99997933500031)",-
2,2,mod,ADP,"(08|023|Styre mod, retning, 0.999949847033452)","(05|024|Modsætning, 0.9999509395084138)","(15|039|Modstand, 0.9999791818465702)","[(09|062|Ombytte, 0.9999793354273434), (03|007..."
3,3,politivold,NOUN,"(21|025|Politi, 1.0)",-,-,-
4,4,og,KONJ,"(04|028|Tilføje, 0.9999154155212518)",-,-,-


### 2 Tagging All Files in A Directory

Files in a directory can be tagged by using the `tag_directory` function. When calling the function, the following parameters can be defined:
- `input_directory` (str, optional): Input directory containing .txt files. Defaults to "in/".
- `output_directory` (str, optional): Output directory to save results. Defaults to "out/".
- `only_top3_results` (bool, optional): Defines whether only the top3 tags or all should be in the results. Defaults to True.
- `only_tagged_results` (bool, optional): Defines whether results should only contain tags (True), or also scores (False). Defaults to False.

In [5]:
# Tagging all files in a directory, with output saved in output directory
input_directory = "in/"
Tagger.tag_directory(input_directory="in/", output_directory="out/", only_tagged_results=False)

[INFO] Found 1 files, starting tagging...


100%|██████████| 1/1 [00:03<00:00,  3.29s/it]

[INFO] ...done! Results saved in /Users/nicoledwenger/Documents/CHCAA/DDB_Tagger/src/../out/scores_tagged_2020-09-15.politiken.csv
[INFO] 1 files tagged in 3.37 seconds!





