# Extraction of Relevant Action Knowledge from the Web

Before starting this tutorial, make sure the necessary packages are installed (see requirements.txt).
Throughout the tutorial, we will import a .csv files that has been created with an external tool ([WikiHow Analysis Tool](https://github.com/Food-Ninja/WikiHow-Instruction-Extraction)). Due to the time constraints of the tutorial and to keep the interactivity as high as possible, we will not look into the inner workings or the usage of this tools but just use the extracted results.

In general, the extraction of knowledge about different and relevant actions consists of three main steps:

1. Setting the central verb & providing an exemplary sentence (e.g. 'cut')
2. Extracting synonyms and hyponyms from WordNet & VerbNet
3. Filtering the extracted words on their relevance using a recipe and a WikiHow corpus

In [None]:
# imports
import pandas as pd

# download wordnet & verbnet corpus
import nltk
nltk.download('wordnet')
nltk.download('verbnet')
from nltk.corpus import verbnet, wordnet

### Extracting Synonyms and Hyponyms from WordNet & VerbNet

In [None]:
# setting the target action
target_action = "cut"
verbs = []

# iterating over all WordNet synsets containing the verb and ...
synsets = wordnet.synsets(target_action, pos=wordnet.VERB)
print(f"{len(synsets)} synsets found for '{target_action}'")
for syn in synsets:
    # ... gathering all synonyms & direct hyponyms
    verbs.extend(syn.lemma_names())
    for h in syn.hyponyms():
        verbs.extend(h.lemma_names())

    # ... getting the associated VerbNet class
    key = str(syn.lemmas()[0].key()).replace("::", "")
    vn_classes = verbnet.classids(wordnetid=key)
    for vn_class in vn_classes:
        verbs.extend(verbnet.lemmas(vn_class))

# removing duplicates and printing results
verbs = set(verbs)
print(f"{len(verbs)} synonyms or hyponyms found for '{target_action}'")

In [None]:
# pre-process the found synonyms and hyponyms
filtered_verbs = {v.split('_')[0] for v in verbs}
filtered_verbs = sorted(set(filtered_verbs))

print(f"{len(filtered_verbs)} remaining words:")
for verb in filtered_verbs:
    print(verb)

### Filtering the extracted verbs

In [None]:
# read the (extracted) occurrence data
v_occurrences = "./verb_occurrences.csv"
voc_dat = pd.read_csv(v_occurrences)

# remove all verbs with 0 occurrences
most_used = voc_dat[(voc_dat['SUM'] > 0)]
print(f"{len(most_used)} verbs that occur at least once")

# remove all verbs with too few available sentences (Step Desc <= threshold)
thresh = 100
most_used = most_used[(most_used['Step Desc'] >= thresh)]
print(f"{len(most_used)} verbs that occur in more than {thresh} step descriptions:")
print(most_used['Verb'].to_string(index=False))