# devlog 2024-12-16

_Author: Tyler Coles_

## Table output tools now support time grouping

Implemented for [Issue 196](https://github.com/NAU-CCL/Epymorph/issues/196), it is now allowed to pass a time aggregation to all of the table output functions, where it used to only be possible to pass time selections.

The results vary somewhat depending on the function, so each is described below with a simple example.

In [1]:
import pandas as pd

from epymorph.kit import *
from epymorph.adrio import acs5, us_tiger

rume = SingleStrataRume.build(
    ipm=ipm.Sirs(),
    mm=mm.Centroids(),
    init=init.SingleLocation(7, 100),
    scope=CountyScope.in_states(["AZ"], year=2020),
    time_frame=TimeFrame.of("2020-01-01", 90),
    params={
        "beta": 0.35,
        "gamma": 1 / 10,
        "xi": 1 / 90,
        "phi": 40.0,
        "population": acs5.Population(),
        "centroid": us_tiger.InternalPoint(),
        "meta::geo::label": us_tiger.Name(),
    },
)

In [2]:
with sim_messaging():
    out = BasicSimulator(rume).run()

Loading epymorph.adrio.acs5.Population:
  |####################| 100%  (1.416s)
Loading epymorph.adrio.us_tiger.InternalPoint:
  |####################| 100%  (0.688s)
Loading epymorph.adrio.us_tiger.Name:
  |####################| 100%  (0.634s)
Running simulation (BasicSimulator):
• 2020-01-01 to 2020-03-31 (90 days)
• 15 geo nodes
  |####################| 100%                    
Runtime: 0.198s


## Quantiles

Providing a time-grouping can impact results. For example considering the 0.5 quantile, instead of the default "what's the median number of infected per tick?" if you group by week you're asking "what's the median number of infected per week?"

In [3]:
df1 = out.table.quantiles(
    quantiles=(0.025, 0.25, 0.5, 0.75, 0.975),
    geo=rume.scope.select.all().sum(),
    time=rume.time_frame.select.all(),
    quantity=rume.ipm.select.events("S->I"),
)
df1.insert(0, "time group", "tick")

df2 = out.table.quantiles(
    quantiles=(0.025, 0.25, 0.5, 0.75, 0.975),
    geo=rume.scope.select.all().sum(),
    time=rume.time_frame.select.all().group("day").agg(),
    quantity=rume.ipm.select.events("S->I"),
)
df2.insert(0, "time group", "day")

df3 = out.table.quantiles(
    quantiles=(0.025, 0.25, 0.5, 0.75, 0.975),
    geo=rume.scope.select.all().sum(),
    time=rume.time_frame.select.all().group("week").agg(),
    quantity=rume.ipm.select.events("S->I"),
)
df3.insert(0, "time group", "week")

pd.concat((df1, df2, df3)).reset_index(drop=True)

Unnamed: 0,time group,geo,quantity,0.025,0.25,0.5,0.75,0.975
0,tick,*,S → I,33.225,3345.25,20210.5,63047.5,216137.375
1,day,*,S → I,75.75,7338.0,39226.0,135151.75,337145.5
2,week,*,S → I,307.825,42043.0,255048.0,722059.5,2107984.2


## Range

Similarly, grouping in a range query impacts results. The min/max day is different from the min/max week.

In [4]:
df1 = out.table.range(
    geo=rume.scope.select.all().sum(),
    time=rume.time_frame.select.all(),
    quantity=rume.ipm.select.events("S->I"),
)
df1.insert(0, "time group", "tick")

df2 = out.table.range(
    geo=rume.scope.select.all().sum(),
    time=rume.time_frame.select.all().group("day").agg(),
    quantity=rume.ipm.select.events("S->I"),
)
df2.insert(0, "time group", "day")

df3 = out.table.range(
    geo=rume.scope.select.all().sum(),
    time=rume.time_frame.select.all().group("week").agg(),
    quantity=rume.ipm.select.events("S->I"),
)
df3.insert(0, "time group", "week")

pd.concat((df1, df2, df3)).reset_index(drop=True)

Unnamed: 0,time group,geo,quantity,min,max
0,tick,*,S → I,18.0,228238.0
1,day,*,S → I,41.0,342835.0
2,week,*,S → I,41.0,2323844.0


## Sum

Providing grouping in a sum, on the other hand, has no impact on results.

In [5]:
df1 = out.table.sum(
    geo=rume.scope.select.all().sum(),
    time=rume.time_frame.select.all(),
    quantity=rume.ipm.select.events(),
)
df1.insert(0, "time group", "tick")

df2 = out.table.sum(
    geo=rume.scope.select.all().sum(),
    time=rume.time_frame.select.all().group("week").agg(),
    quantity=rume.ipm.select.events(),
)
df2.insert(0, "time group", "day")

pd.concat((df1, df2)).reset_index(drop=True).sort_values(by=["quantity", "time group"])

Unnamed: 0,time group,geo,quantity,sum
4,day,*,I → R,7586213
1,tick,*,I → R,7586213
5,day,*,R → S,2179781
2,tick,*,R → S,2179781
3,day,*,S → I,8019965
0,tick,*,S → I,8019965


## Chart

Providing a grouping to chart allows the user precise control over the time period covered by each bar. The default is to condense the time series into a set number of bins, but grouping by "week" for example draws one bar per week.

In [6]:
print("chart by n-bins (default)")
out.table.chart(
    geo=rume.scope.select.all().sum(),
    time=rume.time_frame.select.all(),
    quantity=rume.ipm.select.events("S->I"),
    result_format="print",
)

print("\nchart by day")
out.table.chart(
    geo=rume.scope.select.all().sum(),
    time=rume.time_frame.select.all().group("day").agg(),
    quantity=rume.ipm.select.events("S->I"),
    result_format="print",
)

print("\nchart by week")
out.table.chart(
    geo=rume.scope.select.all().sum(),
    time=rume.time_frame.select.all().group("week").agg(),
    quantity=rume.ipm.select.events("S->I"),
    result_format="print",
)

chart by n-bins (default)
geo quantity                   chart
  *    S → I ▁▁▁▁▁▁▁▂▃▅▇█▇▆▄▃▂▂▂▂▂▂▁

chart by day
geo quantity                                                                                      chart
  *    S → I ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▂▂▂▂▃▃▃▄▄▅▆▆▇▇██████▇▇▇▆▆▅▅▅▄▄▄▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂

chart by week
geo quantity          chart
  *    S → I ▁▁▁▁▁▂▆█▆▃▂▂▂▁
