# How long has each report been running?

**We are still considering active reports!**


In [1]:
import pandas as pd
import datetime as dt

Loading active reports data.

In [2]:
active_reports = pd.read_csv("datasets/active_reports.csv")

Building a function to calculate the period of days between the `LastRunDate` and the last change date (`LastModifiedDate`) for each report.

In [3]:
def calc_running_days_span(report):
    timedelta = dt.datetime.fromisoformat(report.LastRunDate).date()
    timedelta -= dt.datetime.fromisoformat(report.LastModifiedDate).date()
    return timedelta.days + 1 # fix to get reports with LastModifiedDate == LastRunDate

Adding a new column to store `RunningDaysSpan` information.

In [4]:
active_reports['RunningDaysSpan'] = active_reports.apply(lambda r: calc_running_days_span(r), axis=1)

The following is a preview of the new dataframe we will be getting from this new info.

In [5]:
active_reports[['REPORT_ID_DERIVED', 'LastRunDate', 'RunningDaysSpan']]\
    .groupby('REPORT_ID_DERIVED')\
    .agg({'LastRunDate': 'count', 'RunningDaysSpan': 'first'})\
    .rename(columns={'LastRunDate': 'RunCount'})\
    .head()

Unnamed: 0_level_0,RunCount,RunningDaysSpan
REPORT_ID_DERIVED,Unnamed: 1_level_1,Unnamed: 2_level_1
00O0b000004AnheEAC,152,4
00O0b000004kTazEAE,5617,22
00O0b000004kkZKEAY,305,4
00O2R000003JSoUUAW,344,26
00O2R000003JUDGUA4,22,8


From this we can evaluate how often each report was run during each period.

In [6]:
runtime_span = active_reports[['REPORT_ID_DERIVED', 'LastRunDate', 'RunningDaysSpan']]\
    .groupby('REPORT_ID_DERIVED')\
    .agg({'LastRunDate': 'count', 'RunningDaysSpan': 'first'})\
    .rename(columns={'LastRunDate': 'RunCount'})\
    .reset_index()

In [7]:
days_span_mean = runtime_span.RunningDaysSpan.mean()

In [8]:
days_span_mean

14.744186046511627

In our sample, the reports are running on average over a 14-day period.

In [9]:
runtime_span.RunningDaysSpan.min(), runtime_span.RunningDaysSpan.max()

(1, 27)

Ranging from 1 to 27 days span.

In order to sort the reports by the highest running period of execution. We can either consider the most executed reports in the last 30 days or, in addition, consider the period in which the executions took place. Let's investigate both.

### Reports that have been running the longest

In [10]:
runtime_span.sort_values('RunningDaysSpan', ascending=False).head()

Unnamed: 0,REPORT_ID_DERIVED,RunCount,RunningDaysSpan
23,00O6P000000ZLmYUAW,97,27
3,00O2R000003JSoUUAW,344,26
25,00O6P000000ZNLzUAO,14,25
14,00O2R000004Im1oUAC,9,23
39,00O6P000001B7qrUAC,1,23


In [11]:
running_the_longest = runtime_span.sort_values('RunningDaysSpan', ascending=False)
running_the_longest.to_csv("./datasets/active_reports_that_have_been_running_the_longest.csv", index=False)

### Reports that are running longer per period

We can create a ratio between the number of runs and the period in order to estimate that. Let's define a `AvgDailyRun`.

In [12]:
runtime_span['AvgDailyRun'] = runtime_span.RunCount / runtime_span.RunningDaysSpan

In [13]:
runtime_span.sort_values('AvgDailyRun', ascending=False).head()

Unnamed: 0,REPORT_ID_DERIVED,RunCount,RunningDaysSpan,AvgDailyRun
1,00O0b000004kTazEAE,5617,22,255.318182
2,00O0b000004kkZKEAY,305,4,76.25
0,00O0b000004AnheEAC,152,4,38.0
5,00O2R000003s2NZUAY,124,6,20.666667
3,00O2R000003JSoUUAW,344,26,13.230769


In [14]:
running_the_longest_in_period = runtime_span.sort_values('AvgDailyRun', ascending=False)
running_the_longest_in_period.to_csv("./datasets/active_reports_that_have_been_running_the_longest_in_period.csv", index=False)