Neural Media Bias Detection Using Distant Supervision With BABE - Bias Annotations By Experts

The repository contains the data files and scripts corresponding to the paper "Neural Media Bias Detection Using Distant Supervision With BABE".

The models are uploaded on https://zenodo.org/record/5861846#.YeUoolkxkvE because of the GitHub file size restrictions.

Description of the data files found in /data folder" (all files are also provided in csv format)

"raw_labels_MBIC.xlsx": individual annotator labels of MBIC's crowdsourcers.
"raw_labels_SG1.xlsx": individual annotator labels of SG1 (8 expert annotators).
"raw_labels_SG2.xlsx": individual annotator labels of SG2 (5 expert annotators).
"final_labels_MBIC.xlsx": MBIC's aggregated labels over all annotators based on majority vote (1700 sentences).
"final_labels_SG1.xlsx": SG1's aggregated labels over all annotators based on majority vote (same 1700 sentences as in MBIC).
"final_labels_SG2.xlsx": SG2's aggregated labels over all annotators based on majority vote (3700 sentences).
"silver-standart-dataset.xlsx": Silver standard dataset containing 1000 additional unlabeled sentences with potential biased text instances.

Columns:

"text": sentences extracted from news articles and labeled in terms of bias and opinion.
"news_link": url to the news article from which the sentence is extracted.
"outlet": news platform publishing the news article.
"topic": news topic.
"type": political orientation of news platform according to mediacloud.org.
"label_bias": bias label for the sentence ("Biased" or "Non-biased").
"label_opinion": opinion label for the sentence ("Expresses writer's opinion" or "Somewhat factual but also opinionated" or "Entirely factual".
"biased_words": words marked as biased by the annotators.

Additional data used for training purporses

"bias_word_lexicon.xlsx": dictionary of biased words used to craft features
"dt_final_SG1.xlsx": final SG1 with engineered features
"dt_final_SG2.xlsx": final SG2 with engineered features
"news_headlines_usa_biased.csv": data set with distant labels of class biased
"news_headlines_usa_neutral.csv": data set with distant labels of class neutral

Description of scripts

"data_set_evaluation.ipynb": script containing relevant code and results for the evaluation of the data sets (agreement calculations, label distribution...).
"features_engineering.ipynb": engineering features for the baseline classifier
"classification_baseline_model.ipynb": training and evaluation of the baseline classifier
"classification.ipynb": training and evaluation of neural language models
"distant_supervision.ipynb": pre-training on the data set with distant labels

Other files

"topics_keywords_platforms.txt": a file containing all news topics, keywords to retrieve relevant news articles, and news platforms for the data set creation.
"annotator_demographics.csv": a file containing demographic information about the annotators (The corresponding demographic questionnaire can be found under "demographic_questionnaire.pdf") .

Cite as

@InProceedings{Spinde2021f,
    title = "Neural Media Bias Detection Using Distant Supervision With {BABE} - Bias Annotations By Experts",
    author = "Spinde, Timo  and
      Plank, Manuel  and
      Krieger, Jan-David  and
      Ruas, Terry  and
      Gipp, Bela  and
      Aizawa, Akiko",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2021",
    month = nov,
    year = "2021",
    address = "Punta Cana, Dominican Republic",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.findings-emnlp.101",
    doi = "10.18653/v1/2021.findings-emnlp.101",
    pages = "1166--1177",
}

More about our work can be found here: https://media-bias-research.org/

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
data		data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
annotation_guidelines_BABE.pdf		annotation_guidelines_BABE.pdf
annotator_demographics.csv		annotator_demographics.csv
classification.ipynb		classification.ipynb
classification_baseline_model.ipynb		classification_baseline_model.ipynb
data_set_evaluation.ipynb		data_set_evaluation.ipynb
demographic_questionnaire.pdf		demographic_questionnaire.pdf
distant_supervision.ipynb		distant_supervision.ipynb
features_engineering.ipynb		features_engineering.ipynb
topics_keywords_platforms.txt		topics_keywords_platforms.txt

License

Media-Bias-Group/Neural-Media-Bias-Detection-Using-Distant-Supervision-With-BABE

Folders and files

Latest commit

History

Repository files navigation

Neural Media Bias Detection Using Distant Supervision With BABE - Bias Annotations By Experts

Description of the data files found in /data folder" (all files are also provided in csv format)

Additional data used for training purporses

Description of scripts

Other files

Cite as

About

Resources

License

Stars

Watchers

Forks

Languages