## EESTEC 2025 Hackathon Challenge

### The Challenge
Welcome to the hackathon. Let me introduce you to your tasks. Consider the dataset shown below. It contains political statements from various sources, including public speeches, online posts, and radio shows. Each statement is categorized for its truthfulness, from "pants-fire," meaning the liar was caught red-handed, to "true." 

Your goal is to build a `lie_detector` function that can categorize statements similar to your dataset but not contained in it as well as possible. Your function may also use the metadata collected with each statement. For further details on the dataset, please refer to the text file in the same subfolder.

In [1]:
import pandas as pd
dataset = pd.read_csv('eestec_hackathon_2025_train.tsv' ,sep = '\t',names=['ID', 'Label', 'Statement', 'Subjects', 'Speaker Name', 'Speaker Title', 'State', 'Party Affiliation', 'Credit History: barely-true', 'Credit History: false', 'Credit History: half-true', 'Credit History: mostly-true', 'Credit History: pants-fire', 'Context/Location'])
dataset

Unnamed: 0,ID,Label,Statement,Subjects,Speaker Name,Speaker Title,State,Party Affiliation,Credit History: barely-true,Credit History: false,Credit History: half-true,Credit History: mostly-true,Credit History: pants-fire,Context/Location
0,11972.json,true,Building a wall on the U.S.-Mexico border will...,immigration,rick-perry,Governor,Texas,republican,30,30,42,23,18,Radio interview
1,11685.json,false,Wisconsin is on pace to double the number of l...,jobs,katrina-shankland,State representative,Wisconsin,democrat,2,1,0,0,0,a news conference
2,11096.json,false,Says John McCain has done nothing to help the ...,"military,veterans,voting-record",donald-trump,President-Elect,New York,republican,63,114,51,37,61,comments on ABC's This Week.
3,5209.json,half-true,Suzanne Bonamici supports a plan that will cut...,"medicare,message-machine-2012,campaign-adverti...",rob-cornilles,consultant,Oregon,republican,1,1,3,1,1,a radio show
4,9524.json,pants-fire,When asked by a reporter whether hes at the ce...,"campaign-finance,legal-issues,campaign-adverti...",state-democratic-party-wisconsin,,Wisconsin,democrat,5,7,2,2,7,a web video
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
11548,5473.json,mostly-true,There are a larger number of shark attacks in ...,"animals,elections",aclu-florida,,Florida,none,0,1,1,1,0,"interview on ""The Colbert Report"""
11549,3408.json,mostly-true,Democrats have now become the party of the [At...,elections,alan-powell,,Georgia,republican,0,0,0,1,0,an interview
11550,3959.json,half-true,Says an alternative to Social Security that op...,"retirement,social-security",herman-cain,,Georgia,republican,4,11,5,3,3,a Republican presidential debate
11551,2253.json,false,On lifting the U.S. Cuban embargo and allowing...,"florida,foreign-policy",jeff-greene,,Florida,democrat,3,1,3,0,0,a televised debate on Miami's WPLG-10 against ...


### Submission Criteria
To embrace fairness, please follow these guidelines in your submission:
1. Hand in a folder with the name "team_name" or "1stmemberfullname_2ndmemberfullname..." (e.g. "Steffen_Stand_In_Emma_Example") containing only your edited jupyter notebook, a savefile of one machine learning model your function may use and the hackathon dataset. That is 3 files in total (you may additionally keep the README.txt in there). 
1. Write all code in this jupyter notebook, follow common standards of academic integrity (i.e. mark and cite anything that is not your contribution), and only use publicly available resources.
1. You may train a neural network (or similar) during the hackathon, save it in your submission folder, and load it into your lie detector function. If you do so, only train the model on the provided dataset. Also, provide the training loop in your submitted notebook. It needs to be able to recreate and overwrite your saved model (plus minus differently initialized weights). (This is needed to prove that you did not use our test data to train your model.)
1. Submit your folder before the end of the hackathon, any later submission will be ignored. 

### Grading Criteria
Consider a score of the truthfulness of a statement from zero to five where zero refers to "pants-fire", one to "false", and five to "true". Your lie detector function is graded on the mean squared error (of these truthfulness scores) achieved on our test dataset. Your lie detector function needs to fulfill the following three criteria as well:
1. Be a valid submission (see above).
1. Take the inputs as defined below.
1. Return a string of the statement's label seen in the dataset.

Therefore, only your lie detector function is considered in grading your submission; everything else validates your submission. Further, only your last submission before the deadline is graded, you may create multiple though. 

### Hints
1. Detecting lies is a hard task, do your best, and don't worry if your lie detector is not super accurate.
1. Statistical methods and different types of neural networks have been shown to perform not too badly on this task. You may even combine them.
1. Your lie detector may encounter new speakers, States, or locations during grading. Make sure it can handle these.
1. Make sure your submitted notebook runs with a reset kernel using only the originally provided dataset file. 
1. Enjoy the snacks and have fun.

### Example Submission
Below you find an example submission using Pytorch. I recommend editing it or using this structure. 

In [2]:
# Model Creation Cell
from torch import nn, tensor, save, load

# define model
class Lie_Detector_Model(nn.Module):
    def __init__(self):
        super().__init__()

    def forward(self, x):
        return tensor(5)

# train model
model = Lie_Detector_Model()

# save model
with open('model_weights.pt', 'wb') as save_file: # this saves/overwrites our model
    save(model.state_dict(), save_file) 

In [3]:
# Lie Detector Function Cell
def lie_detector(statement, 
                 subjects, 
                 speaker_name, 
                 speaker_title, 
                 state, 
                 party_affiliation, 
                 history_barely_true, 
                 history_false, 
                 history_half_true, 
                 history_mostly_true, 
                 history_pants_fire, 
                 context_location):
    
    model = Lie_Detector_Model()
    with open('model_weights.pt', 'rb') as save_file: # this loads our model
        model.load_state_dict(load(save_file, weights_only=True)) # please set weights_only to True for secure model loading
    
    return ['pants-fire', 'false', 'barely-true', 'half-true', 'mostly-true', 'true'][model(statement).item()]

In [4]:
# Testing the example lie detector function (not relevant for the submission) 
a, b, c, d, e, f, g, h, i, j, k, l = dataset.iloc[0].values[2:]
lie_detector(a, b, c, d, e, f, g, h, i, j, k, l)

'true'