# 2. EDA Huggingface Dataset

Logical Fallacy Classification Dataset by MidhunKanadan

If you use this dataset, please cite the Hugging Face repository link:

@dataset{MidhunKanadan_logical_fallacy_classification,
  title={Logical Fallacy Classification},
  author={Midhun Kanadan},
  year={2024},
  howpublished={\url{https://huggingface.co/datasets/MidhunKanadan/logical-fallacy-classification}},
}

In [173]:
# You need to pip install datasets
from datasets import load_dataset
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

In [174]:
# Load the dataset
dataset = load_dataset("MidhunKanadan/logical-fallacy-classification")

In [175]:
# Access the splits
train_data = dataset["train"]
dev_data = dataset["dev"]
test_data = dataset["test"]

In [176]:
# Convert to pandas dataframe
train_data = train_data.to_pandas()
test_data = test_data.to_pandas()
dev_data = dev_data.to_pandas()

In [177]:
# Concatenate the dataframes
df = pd.concat([train_data, test_data, dev_data])

# Remove last column
df = df.iloc[:, :-1]

In [178]:
df

Unnamed: 0,statement,label
0,"The book is popular because it's good, and it's good because it's popular.",circular reasoning
1,"This policy is effective because it's popular, and it's popular because it's effective.",circular reasoning
2,"I know that our TV advertisements are more effective than radio. The numbers show that we hit twice the audience with TV, and our focus groups remember the TV commercial 38 percent more than the radio slot.",faulty generalization
3,"President Trump , who in the past has called global warming a hoax , has vowed to pull the United States out of the accord as soon as that becomes possible , in 2020 . The 2017 National Climate Assessment , released in November , concluded what it has for nearly three decades : Human-made climate change is real , and the impacts have already started .",intentional
4,A commercial shows a group of friends all hanging out wearing clothes from a popular clothing company. This is an example of,ad populum
...,...,...
865,"If you're not early, you're late.",false dilemma
866,Open-minded readers should have very little difficulty dismissing the mythical global warming crisis after examining the top 10 assertions in the alarmist playbook .,appeal to emotion
867,"In school science , Shorten would have learned carbon dioxide is the food of life and without this natural gas , which occurs in space and all planets , there would be no life .",fallacy of relevance
868,"Using new salad dressing, my plants produce more fruit. The dressing helps my plants.",false causality


In [179]:
df.describe().T

Unnamed: 0,count,unique,top,freq
statement,5801,4745,"Access to cleaner water has increased ; mortality from ' Extreme Weather Events ' has declined by 99 per cent since the 1920s ; fewer people are dying from heat ; death rates from climate-sensitive diseases like malaria and diarrhoea have decreased ( since 1900 malaria death rates have declined 96 per cent ) ; hunger rates have declined ; poverty has declined ( GDP per capita has quadrupled since 1950 even as CO2 levels have sextupled ) ; life expectancy has more than doubled since the start of industrialisation ; health adjusted life expectancy has increased ; global inequality has decreased in terms of incomes , life expectancies and access to modern-day amenities ; the earth is green and more productive ; habitat lost to agriculture has peaked due to fossil fuel dependent technologies .",13
label,5801,13,faulty generalization,726


In [180]:
# Get NaN values
df.isnull().sum()

statement    0
label        0
dtype: int64

In [181]:
# Get duplicate values
nr_duplicates = df.duplicated().sum()
print("There are", nr_duplicates, "duplicates in the dataset")

# Remove duplicates
df = df.drop_duplicates()
print("Duplicates removed")

There are 854 duplicates in the dataset
Duplicates removed


In [182]:
# Check whether statements are duplicated
nr_duplicates_statement = df.duplicated(subset=["statement"]).sum()
print("There are", nr_duplicates_statement, "duplicated statements in the dataset")

# Note: the duplicated statements are not dropped yet!

There are 202 duplicated statements in the dataset


In [183]:
count_intentional = df[df["label"]=="intentional"]
count_intentional

Unnamed: 0,statement,label
3,"President Trump , who in the past has called global warming a hoax , has vowed to pull the United States out of the accord as soon as that becomes possible , in 2020 . The 2017 National Climate Assessment , released in November , concluded what it has for nearly three decades : Human-made climate change is real , and the impacts have already started .",intentional
19,"Just like Dr Mann ' s ' hockey stick ' graph he had cut off the tree-ring data just at the point where it stopped showing an upward trend and swapped in thermometer temperatures for recent decades , making them look much warmer .",intentional
20,Area burned by wildfire increasing ,intentional
32,But don ' t blame climate change on humans . There are bigger forces at work here .,intentional
39,"With such relatively clean air throughout America , how can even reputable news agencies like Reuters continue spreading the well-worn lie that the United States is one of the "" biggest polluters  in the world ? Rather than follow the time-tested practice used by the World Health Organization , which measures levels of disease-causing pollutants that get into people ' s lungs , some have played a shell game , swapping a new measure of "" pollution  based solely on emissions of carbon dioxide .",intentional
...,...,...
829,A harrowing scenario analysis of how human civilization might collapse in coming decades due to climate change has been endorsed by a former Australian defense chief and senior royal navy commander .,intentional
830,"He now says : "" Anyone who tries to predict more than five to 10 years is a bit of an idiot , because so many things can change unexpectedly. ",intentional
842,"But by studying a very short time interval , it is possible to sidestep most of the complications , like "" isostatic adjustment  of the shoreline ( as continents rise after the overlying ice has melted ) and "" subsidence  of the shoreline ( as ground water and minerals are extracted ) .",intentional
852,The musician's addiction influences the melancholy in their songs.,intentional


In [184]:
df.duplicated(subset=["statement"]).sum()

202

In [185]:
# Get labels
labels = df['label'].value_counts().sort_index()
print(labels)

label
ad hominem                505
ad populum                420
appeal to emotion         302
circular reasoning        249
equivocation              195
fallacy of credibility    264
fallacy of extension      317
fallacy of logic          280
fallacy of relevance      392
false causality           461
false dilemma             313
faulty generalization     664
intentional               585
Name: count, dtype: int64


In [186]:
# Show full text in all cells
pd.set_option('display.max_colwidth', None)
df

Unnamed: 0,statement,label
0,"The book is popular because it's good, and it's good because it's popular.",circular reasoning
1,"This policy is effective because it's popular, and it's popular because it's effective.",circular reasoning
2,"I know that our TV advertisements are more effective than radio. The numbers show that we hit twice the audience with TV, and our focus groups remember the TV commercial 38 percent more than the radio slot.",faulty generalization
3,"President Trump , who in the past has called global warming a hoax , has vowed to pull the United States out of the accord as soon as that becomes possible , in 2020 . The 2017 National Climate Assessment , released in November , concluded what it has for nearly three decades : Human-made climate change is real , and the impacts have already started .",intentional
4,A commercial shows a group of friends all hanging out wearing clothes from a popular clothing company. This is an example of,ad populum
...,...,...
864,"The noble forefathers who created this great country did not fail; it is a great country now as it was then, by their very making.",ad hominem
865,"If you're not early, you're late.",false dilemma
866,Open-minded readers should have very little difficulty dismissing the mythical global warming crisis after examining the top 10 assertions in the alarmist playbook .,appeal to emotion
867,"In school science , Shorten would have learned carbon dioxide is the food of life and without this natural gas , which occurs in space and all planets , there would be no life .",fallacy of relevance


In [187]:
# Save dataset
df.to_csv("../data/2_Huggingface_dataset.csv")