This is a demo for the `streaking` function, available at https://github.com/ArturoSbr

Import libraries

In [1]:
import streaks
import numpy as np
import pandas as pd

Create a made-up `pandas` data frame with three groups, each with 12 consecutive dates in `str` format

In [2]:
t = pd.DataFrame({'group':np.repeat(['a','b','c'], 12),
                  'date_str':['2020-' + ('0' + str(month + 1))[-2:] for month in range(12)] * 3})
t.head(6)

Unnamed: 0,group,date_str
0,a,2020-01
1,a,2020-02
2,a,2020-03
3,a,2020-04
4,a,2020-05
5,a,2020-06


Randomly select three indeces and remove them from the data frame to skip rows
- Note how group `a` skips from `'2020-02'` to `'2020-04'`

In [3]:
np.random.seed(123)
to_drop = np.random.randint(low=0, high=len(t), size=5)
t = t.drop(index=to_drop, axis=0)
t.head(3)

Unnamed: 0,group,date_str
0,a,2020-01
1,a,2020-02
3,a,2020-04


Convert `date_str` column to a numeric value that expresses its numeric order
- The string-formatted dates are now available in numeric form (`date_num`)

In [4]:
order = {'2020-01':1, '2020-02':2, '2020-03':3, '2020-04':4, '2020-05':5, '2020-06':6,
         '2020-07':7, '2020-08':8, '2020-09':9, '2020-10':10, '2020-11':11, '2020-12':12}
t['date_num'] = t['date_str'].replace(order)
t.head(5)

Unnamed: 0,group,date_str,date_num
0,a,2020-01,1
1,a,2020-02,2
3,a,2020-04,4
4,a,2020-05,5
5,a,2020-06,6


### Option 1.

Get the current streak of each row

In [5]:
streaks.counter(id_array=t['group'], period_array=t['date_num'], return_grouped=False)

Unnamed: 0,group,period,streak
0,a,1,1
1,a,2,2
2,a,4,1
3,a,5,2
4,a,6,3
5,a,7,4
6,a,8,5
7,a,9,6
8,a,10,7
9,a,11,8


### Option 2

Get the longest streak of each group

In [6]:
streaks.counter(id_array=t['group'], period_array=t['date_num'], return_grouped=True)

Unnamed: 0,group,streak
0,a,9
1,b,5
2,c,5
