HADES

This is a work-in-progress repository for the CLiPS HAte speech DEtection System (HADES).

Currently, the repository contains the supplementary materials from the paper: "A Dictionary-based Approach to Racism Detection in Dutch Social Media", presented at the TA-COS workshop at LREC 2016.

license

The dictionaries in this repository are available under a CC BY-SA 4.0 License. If you use the dictionaries in your work, please cite:

@inproceedings{tulkens2016a,
  title={A Dictionary-based Approach to Racism Detection in {Dutch} Social Media},
  author={Tulkens, St\'{e}phan and Hilte, Lisa and Lodewyckx, Elise and Verhoeven, Ben and Daelemans, Walter},
  booktitle={Proceedings of the LREC 2016 Workshop on Text Analytics for Cybersecurity and Online Safety (TA-COS)},
  year={2016},
  organization={European Language Resources Association (ELRA)}
}

Note that we expanded the TA-COS submission into a journal paper, which was published in the CLIN Journal.

If you use the dictionary expansion techniques from this paper, please also consider citing it:

@article{tulkens2016automated,
  title={The automated detection of racist discourse in dutch social media},
  author={Tulkens, St{\'e}phan and Hilte, Lisa and Lodewyckx, Elise and Verhoeven, Ben and Daelemans, Walter},
  journal={Computational Linguistics in the Netherlands Journal},
  volume={6},
  number={1},
  pages={3--20},
  year={2016}
}

usage

The dictionaries are in .csv format. The first word of each line is the category name, while the other words are the words in that category. Included is a python (2.7 & 3.x) script which reads in the dictionaries and outputs relative frequencies. It can be used for similar dictionaries, such as the LIWC dictionaries.

example

from dictfeaturizer import DictFeaturizer

# Load from csv
d = DictFeaturizer.load("expanded.csv")
text = "this is an example text".split()
score = d.transform(text)

# Direct initialization
direct = {"good": ["good", "splendid"], "bad": ["bad", "useless"]}
d = DictFeaturizer(direct, relative=False)
text = "This stuff is splendid".split()
score_2 = d.transform(text)

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cleaned.csv		cleaned.csv
dictfeaturizer.py		dictfeaturizer.py
expanded.csv		expanded.csv
original.csv		original.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HADES

license

usage

example

About

Releases

Packages

Languages

License

clips/hades

Folders and files

Latest commit

History

Repository files navigation

HADES

license

usage

example

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages