Artificial Argument Corpus

Code for generating the synthetic argumentative texts in the paper "Critical Thinking for Language Models", accepted for IWCS 2021.

G Betz, C Voigt and K Richardson: Critical Thinking for Language Models, IWCS 2021

This repository doesn't contain the actual datasets used in the paper, which, however, can be downloaded here.

The trained models are released via at Hugging Face's model hub.

Pipeline

Step 1

In step 1, a formal argument scheme is chosen from the given config file (conf_syllogistic_corpus-01.json or conf_syllogistic_corpus-02.json), such as

{
    "id": "mb1",
    "base_scheme_group": "Modus barbara",
    "scheme_variant": "negation_variant",
    "scheme": [
        ["(x): ¬${A}x -> ${B}x", {"A": "${F}", "B": "${G}"}],
        ["¬${A}${a}", {"A": "${F}", "a": "${a}"}],
        ["${A}${a}", {"A": "${G}", "a": "${a}"}]
    ],
    "predicate-placeholders": ["F","G"],
    "entity-placeholders": ["a"]
}

Step 2

Next, each symbolic formula in the selected inference is replaced with a natural-language sentence scheme, e.g.

{
    "scheme": [
        "If someone is not a ${F}, then they are a ${G}.",
        "${a} isn't a ${F}",
        "${a} is a ${G}"
    ],
    "predicate-placeholders": ["F","G"],
    "entity-placeholders": ["a"]
}

while appropriate translations of symbolic formulas to natural-language sentence schemes are retrieved from the config file, e.g.:

{
    "(x): ¬${A}x -> ${B}x" : [
        "Whoever is not a ${A} is a ${B}. ",
        "Nobody is neither a ${A} nor a ${B}. "
    ]
}

Step 3

In step 3, appropriate substitutions for the schemes' placeholders are retrieved from the config file, such as

{
    "F": "supporter of FC Liverpool",
    "G": "fan of Tottenham Hotspurs",
    "a": "Mila"
}

Substitutes for predicate placeholders are actually generated from a binary predicate (x is a supporter of y) and an object term (FC Liverpool).

The config files (conf_syllogistic_corpus-01.json or conf_syllogistic_corpus-02.json) contain five (training and testing) respectively two (testing only) different domains: female_relatives, male_relatives, football_fans, consumers_personalcare and chemical_ingredients; dinos and philosophers.

Each domain used for training provides at least several hundreds of (complex) predicates.

Step 4

The premises are mixed.

Step 5

The argument (premise-conclusion list) is, finally, rendered as text paragraph by framing the argument and prepending premise and conclusion indicators to the corresponding sentences. These items are retrieved from the config file, too.

Requirements

create_trainfiles.py requires, as is, that spgutenberg and Reuters trc2 be available.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
LICENSE		LICENSE
README.md		README.md
aacorpus.py		aacorpus.py
conf_syllogistic_corpus-01.json		conf_syllogistic_corpus-01.json
conf_syllogistic_corpus-02.json		conf_syllogistic_corpus-02.json
create_jsoncorpus.py		create_jsoncorpus.py
create_trainfiles.py		create_trainfiles.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Artificial Argument Corpus

Pipeline

Step 1

Step 2

Step 3

Step 4

Step 5

Requirements

About

Releases

Packages

Languages

License

debatelab/aacorpus

Folders and files

Latest commit

History

Repository files navigation

Artificial Argument Corpus

Pipeline

Step 1

Step 2

Step 3

Step 4

Step 5

Requirements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages