# How long has each report been running?

**We are still considering active reports!**

---

>Questions being answered in this notebook.
>- [x] What is the execution period of these reports?


In [1]:
import pandas as pd
import datetime as dt

Loading active reports data.

In [2]:
active_reports = pd.read_csv("datasets/active_reports.csv", low_memory=False)

## 1. What is the execution period of these reports?

Building a function to calculate the period of days between the `LastRunDate` and the last change date (`LastModifiedDate`) for each report.

In [3]:
ref_date = dt.date(2022, 6, 20)

In [4]:
def running_days_since_creation(report):
    timedelta = ref_date - dt.datetime.fromisoformat(report.CreatedDate).date()
    return timedelta.days + 1 # fix to get reports with ReferenceDate == CreatedDate

In [5]:
def running_days_since_last_modification(report):
    timedelta = ref_date - dt.datetime.fromisoformat(report.LastModifiedDate).date()
    return timedelta.days + 1 # fix to get reports with ReferenceDate == LastModifiedDate

Adding a new column to store `DaysSinceCreation` and `DaysSinceLastModifiedDate` information.

In [6]:
active_reports['DaysSinceCreation'] = active_reports\
    .apply(lambda r: running_days_since_creation(r), axis=1)

active_reports['DaysSinceLastModifiedDate'] = active_reports\
    .apply(lambda r: running_days_since_last_modification(r), axis=1)

### 1.1. Loading run report logs

Loading a day sample.

In [7]:
report_logs = pd.read_csv('../data/Salesforce/ELF/Report/2022-06-04_Report.csv', low_memory=False)

In [8]:
report_logs.shape

(73258, 31)

In [9]:
report_run_count = report_logs.REPORT_ID_DERIVED\
    .value_counts()\
    .reset_index()\
    .rename(columns={'index': 'Id', 'REPORT_ID_DERIVED': 'RunCount'})

In [10]:
report_run_count.head() 

Unnamed: 0,Id,RunCount
0,00O2R000003zUb9UAE,12926
1,00O0b000006iNwdEAE,8817
2,00O0b000004kTazEAE,8347
3,00O0b000004kkZKEAY,6612
4,00O2R000003zUTgUAM,5397


Run count from 2022-06-04 (our daily sample).

Getting run count from active reports present in our sample.

In [11]:
active_reports = pd.merge(left=active_reports, right=report_run_count, on='Id')

In [12]:
active_reports.shape

(458, 23)

The following is a preview of the new dataframe we will be getting from this new info.

In [13]:
run_span = active_reports[['Name', 'RunCount', 'DaysSinceCreation', 'DaysSinceLastModifiedDate']]\
    .sort_values('RunCount', ascending=False)\
    .rename(columns={'Name': 'ReportName', 'RunCount': 'RunCount'})\

run_span.head()

Unnamed: 0,ReportName,RunCount,DaysSinceCreation,DaysSinceLastModifiedDate
1,ATM,12926,391,288
4,New Email By Team - w\o Sup filter,8817,1130,103
8,My Cases and Tasks,8347,1452,22
17,Agent Timesheet_Omni,6612,1319,4
50,Pankaj's Report,5397,399,102


Most recent report and the oldest one

In [14]:
oldest, most_recent = run_span.DaysSinceCreation.min(), run_span.DaysSinceCreation.max()

In [15]:
run_span[(run_span.DaysSinceCreation == oldest) | (run_span.DaysSinceCreation == most_recent)]\
    .sort_values(['DaysSinceCreation', 'RunCount'], ascending=[True, False])

Unnamed: 0,ReportName,RunCount,DaysSinceCreation,DaysSinceLastModifiedDate
381,AMER PDP PowerScale - Config Scheduled,19,18,18
382,PowerScale - WIP - Unscheduled Execute,4,18,18
14,Chat Average Handle Time,893,1591,766
162,Cases by Priority,79,1591,157
27,Cases by Status,63,1591,585
87,Completed Chat Sessions,54,1591,857
290,Overdue Tasks by Case,18,1591,857
251,Cases by Age & Status,2,1591,319
