In [1]:
%load_ext autoreload
%autoreload 2
from scoring.classifier_scoring import *

# DataLables
`DataLabels` are an enum class to define which possible labels a `Case` (explained later) can have. They need to have the possibility to be ENDEMIC or NON_CASE but all other classes need to be defined by the user. For demonstration purposes we include three DataLabels, namely ONE, TWO, THREE.

In [2]:
print(list(DataLabels))

[<DataLabels.ONE: 1>, <DataLabels.TWO: 2>, <DataLabels.THREE: 3>, <DataLabels.ENDEMIC: 4>, <DataLabels.NON_CASE: 5>]


# Case
A Case is the smallest unit measurable unit in our scoring definition. A Case is defined by its DataLabels and its coordinates, where each Case must have a propability value assigned for all DataLabels and the sum of all probabilites must sum up to 1. A Case is only defined by it DataLabel probabilites and coordinates and therefore indistinguishable from a Case with identic DataLabel probabilites and coordinates.

In [3]:
Case(coordinates=(WeekNumber(2019, 1), SpatialDim(1)))

Case(disease_probas='{<DataLabels.ONE: 1>: 0, <DataLabels.TWO: 2>: 0, <DataLabels.THREE: 3>: 0, <DataLabels.ENDEMIC: 4>: 0, <DataLabels.NON_CASE: 5>: 1}', coordinates='(WeekNumber(weeknumber='2019W01'), SpatialDim(spatial_id='1'))')

# AggCase
Asuming we only have access to aggregated data, AggCase summarizes the amount of cases for a coordinate and for all possible DataLabels

In [4]:
isinstance((WeekNumber(2019, 1), SpatialDim(1))[0], WeekNumber)

True

In [5]:
AggCase(coordinates=(WeekNumber(2019, 1), SpatialDim(1)))

Case(disease_probas='{<DataLabels.ONE: 1>: 0, <DataLabels.TWO: 2>: 0, <DataLabels.THREE: 3>: 0, <DataLabels.ENDEMIC: 4>: 0, <DataLabels.NON_CASE: 5>: 1}', coordinates='(WeekNumber(weeknumber='2019W01'), SpatialDim(spatial_id='1'))')

# DataCell and coordinates
`Cases` usually exist within `DataCells` that are defined by their coordinates. For demonstration purposes we implemented an spatial and temporal dimension that a `DataCell` can assume. A `DataCell` is identical to another `DataCell` that has the same `Cases` (independent of the order) and the same coordinates. A `DataCell` can also tell me how many cases it has by summing up the probabilites for some `DataLable` of all `Case`s.

In [8]:
# A DataCell with two Cases and the coordinate 2020W25 and the 
cell = DataCell(
    [Case(
        data_label_probas={
            DataLabels.ONE: 0.2,
            DataLabels.TWO: 0.2,
            DataLabels.THREE: 0.2,
            DataLabels.ENDEMIC: 0.2,
            DataLabels.NON_CASE: 0.2,
        },
        coordinates=(WeekNumber(2020, 25), SpatialDim(5))
    ),
    Case(
        data_label_probas={
            DataLabels.ONE: 0.4,
            DataLabels.TWO: 0.2,
            DataLabels.THREE: 0.2,
            DataLabels.ENDEMIC: 0,
            DataLabels.NON_CASE: 0.2,
        },
        coordinates=(WeekNumber(2020, 25), SpatialDim(5))
    )],
    coordinates=(WeekNumber(2020, 25), SpatialDim(5)),
)

In [9]:
print(cell, "\n")
print(WeekNumber(2020, 25), "\n")
print(SpatialDim(5))

{<DataLabels.ONE: 1>: 0.3, <DataLabels.TWO: 2>: 0.2, <DataLabels.THREE: 3>: 0.2, <DataLabels.ENDEMIC: 4>: 0.1, <DataLabels.NON_CASE: 5>: 0.2} 

2020W25 

5


In [10]:
print(f"The sum of all cases having DataLabels.ONE is {cell.case_number(DataLabels.ONE)}")

The sum of all cases having DataLabels.ONE is 0.6


# CellGrid
The `CellGrid` combines `DataCells` that share the same coordinate system. A CellGrid has no gaps, which means that within a predefined closed intervall the (in this example) spatial and temporal dimension, there is one `DataCell` with one "dummy"-`Case` that has the `DataLabel` NON_CASE.

In [18]:
case_1 = Case(
    data_label_probas={
        DataLabels.ONE: 0.3,
        DataLabels.TWO: 0.2,
        DataLabels.THREE: 0.2,
        DataLabels.ENDEMIC: 0.2,
        DataLabels.NON_CASE: 0.1,
    },
    coordinates=(WeekNumber(2020, 24), SpatialDim(4))
)
case_2 = Case(
    data_label_probas={
        DataLabels.ONE: 0.3,
        DataLabels.TWO: 0.2,
        DataLabels.THREE: 0.2,
        DataLabels.ENDEMIC: 0.2,
        DataLabels.NON_CASE: 0.1,
    },
    coordinates=(WeekNumber(2020, 24), SpatialDim(5))
)
cells = [
    DataCell(case_1, coordinates=(WeekNumber(2020, 24), SpatialDim(4))),
    DataCell(case_2, coordinates=(WeekNumber(2020, 24), SpatialDim(5))),
]
grid = CellGrid(
    cells,
    time_range=(WeekNumber(2020, 24), WeekNumber(2020, 25)),
    spatial_range=(SpatialDim(4), SpatialDim(6)),
)

In [19]:
print(grid.time_range, "\n")
print("There is now one SpatialDim between 4 and 6: ", grid.spatial_range, "\n")
print("One Cell is added. Amount of Cells now: ", len(grid.cells), "\n")
print("New Cell: ", grid.cells[-1])

[WeekNumber(weeknumber='2020W24'), WeekNumber(weeknumber='2020W25')] 

There is now one SpatialDim between 4 and 6:  [SpatialDim(spatial_id='4'), SpatialDim(spatial_id='5'), SpatialDim(spatial_id='6')] 

One Cell is added. Amount of Cells now:  6 

New Cell:  {<DataLabels.ONE: 1>: 0.0, <DataLabels.TWO: 2>: 0.0, <DataLabels.THREE: 3>: 0.0, <DataLabels.ENDEMIC: 4>: 0.0, <DataLabels.NON_CASE: 5>: 1.0}


In [20]:
from itertools import product

In [21]:
list(product(grid.spatial_range, grid.time_range))

[(SpatialDim(spatial_id='4'), WeekNumber(weeknumber='2020W24')),
 (SpatialDim(spatial_id='4'), WeekNumber(weeknumber='2020W25')),
 (SpatialDim(spatial_id='5'), WeekNumber(weeknumber='2020W24')),
 (SpatialDim(spatial_id='5'), WeekNumber(weeknumber='2020W25')),
 (SpatialDim(spatial_id='6'), WeekNumber(weeknumber='2020W24')),
 (SpatialDim(spatial_id='6'), WeekNumber(weeknumber='2020W25'))]

In [15]:
grid.spatial_range

[SpatialDim(spatial_id='4'),
 SpatialDim(spatial_id='5'),
 SpatialDim(spatial_id='6')]

## Open Questions:
- How should Cases, DataCells, and CellGrids be comparable?
- How do we add dummy-Cases to a continous dimension?