# Baseline Models

In this notebook we'll implement a few baseline methods to compare the results of our multitasking model.
We'll implement the following methods:

- Single task model that regresses the `DAYS_UNTIL_NEXT_ACCIDENT` target
- Single task model that classifies according to the `FUTURE_TOTAL_COUNT` target if the worksite has accidents in the future
- XGBoost model that regresses the `DAYS_UNTIL_NEXT_ACCIDENTS` target
- XGBoost model that classifies according to the `FUTURE_TOTAL_COUNT` target if the worksite has accidents in the future



In [1]:
%load_ext autoreload
%autoreload 2

In [10]:
import os
from dataclasses import dataclass, asdict, field
from typing import List
from pathlib import Path

from torch.utils.data import DataLoader
from pytorch_lightning import Trainer

from model import MultiTaskLearner
from run import build_datasets

## Single task model: `DAYS_UNTIL_NEXT_ACCIDENT` target

In [11]:
@dataclass
class RegressionSingleTaskConfig:
    epochs: int = 100
    batch_size: int = 32
    dataset_path: str = "./datasets/worksites.csv"
    comet_logging: bool = False
    dataloader_workers: int = 8
    root_dir: str = "./logs/"
    single_task: str = "regression"
    label_columns: List[str] = field(default_factory=lambda: ["DAYS_UNTIL_NEXT_ACCIDENT"])

args = RegressionSingleTaskConfig()
train_ds, val_ds, test_ds = build_datasets(args, args.label_columns)
len(train_ds), len(val_ds), len(test_ds)

(134062, 19151, 38303)

In [12]:
asdict(args)

{'epochs': 100,
 'batch_size': 32,
 'dataset_path': './datasets/worksites.csv',
 'comet_logging': False,
 'dataloader_workers': 8,
 'root_dir': './logs/',
 'single_task': 'regression',
 'label_columns': ['DAYS_UNTIL_NEXT_ACCIDENT']}