This notebook represents our approach for Task 1B of the CheckThat! Lab at CLEF 2022. We ranked #1 on the task leaderboard on CodaLab.
![image](https://private-user-images.githubusercontent.com/84636031/264465907-09d7fa11-6aac-4890-878a-65b867d93011.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTgyOTMwNzcsIm5iZiI6MTcxODI5Mjc3NywicGF0aCI6Ii84NDYzNjAzMS8yNjQ0NjU5MDctMDlkN2ZhMTEtNmFhYy00ODkwLTg3OGEtNjViODY3ZDkzMDExLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA2MTMlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNjEzVDE1MzI1N1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTgwYmMzZTgxNWEwZGEzMzc2ZDFkMDU3MTFhZmY1Yjg1MzRjMDcyZDg3ZDQ1NTFhZjI0NzhjZDk4MWIyNTg4ZTgmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.UA_Sitz5BTZs4lueCoCLNSimLWFOu4AwJ9TYXlX9Cmk)
Verifiable factual claims detection: Given a tweet, predict whether it contains a verifiable factual claim. This is a binary task with two labels: Yes and No. This is a classification task.
We translated the Dutch and Bulgarian datasets for the task into English to increase the amount of training data we have.
We used the twitter API to extract the following data about a tweet:
- Numerical Features:
- number of 'followers', 'following', 'posts' of the author of the tweet
- number of 'likes', 'retweets' of the tweet itself
- Categorical Features: -'verified' as an attribute of the author of the tweet -'url' indicating presence of a URL in the tweet
We used the Multimodal Toolkit for text and tabular data with HuggingFace transformers as building block for text data.