#  Problematic Reports Regarding Page Load Time 

---

>Questions being answered in this notebook.
>- [x] What is a problematic report regarding page load time?
>- [x] How to retrieve report ID from PAGE_URL pageview logs?
>- [x] How to filter out reports that take longer to load?
>- [x] Which reports are active and page load time problematics?

In [1]:
import pandas as pd
import numpy as np

Loading active reports.

In [2]:
active_reports = pd.read_csv("../datasets/active_reports.csv", low_memory=False)

## 1. Problematic reports regarding time

Reports that take more than 60 seconds to load since PAGE_START_TIME. Following the definition in Salesforce documentation for `EFFECTIVE_PAGE_TIME` present on Lightning PageView event logs.

Loading page view logs sample.

In [3]:
pageview_logs = pd.read_csv("../../data/Salesforce/ELF/LightningPageView/2022-06-04_LightningPageView.csv")

  pageview_logs = pd.read_csv("../../data/Salesforce/ELF/LightningPageView/2022-06-04_LightningPageView.csv")


In [4]:
pageview_logs.shape

(792349, 50)

### 1.1. Retrieving Repord ID from PAGE_URL

Removing missing values for `PAGE_URL`.

In [5]:
pageview_logs.dropna(subset=['PAGE_URL'], inplace=True)

In [6]:
pageview_logs.shape

(789728, 50)

Filtering only run reports from endpoints.

In [7]:
import re

pattern = re.compile(r'\/lightning\/r\/(?P<report_type>[a-zA-Z]{4,})\/(?P<report_id>[0-9a-zA-Z]{18})')

def filter_run_report_endpoints(pattern, url, field):
    m = re.match(pattern, url)
    if m:
        return m.group(field)

In [8]:
pageview_logs['REPORT_ID_DERIVED'] =\
    pageview_logs.PAGE_URL.apply(lambda url: filter_run_report_endpoints(pattern, url, 'report_id'))

In [9]:
pageview_logs['REPORT_TYPE_DERIVED'] =\
    pageview_logs.PAGE_URL.apply(lambda url: filter_run_report_endpoints(pattern, url, 'report_type'))

Removing missing values after endpoint filter.

In [10]:
pageview_logs.dropna(subset=['REPORT_ID_DERIVED'], inplace=True)

In [11]:
pageview_logs.shape

(478779, 52)

Checking for inconsistent report IDs.

In [12]:
assert pageview_logs[pageview_logs.REPORT_ID_DERIVED.str.len()==18].shape == pageview_logs.shape

### 1.2. Filtering pages that take longer to load

Reports that takes more than 60s to load or that reaches in an error have a `Nan` inside the column `EFFECTIVE_PAGE_TIME`.

In [13]:
take_longer_to_load_pageview_logs = pageview_logs[pageview_logs.EFFECTIVE_PAGE_TIME.isna()]

To capture those that take longer than 60s we can check the `DURATION` column. We get the max duration for reports that have more than one pageview logs.

In [14]:
take_longer_to_load_pageview_logs = take_longer_to_load_pageview_logs[['REPORT_ID_DERIVED', 'DURATION']]\
    .groupby('REPORT_ID_DERIVED')\
    .max()\
    .reset_index()

In [15]:
take_longer_to_load_pageview_logs =\
    take_longer_to_load_pageview_logs[take_longer_to_load_pageview_logs.DURATION > 60000] # 60000ms = 60s

In [16]:
take_longer_to_load_pageview_logs.shape

(777, 2)

In [17]:
reports_on_pages_that_take_longer_to_load =\
    take_longer_to_load_pageview_logs.REPORT_ID_DERIVED.unique()

In [18]:
reports_on_pages_that_take_longer_to_load.shape

(777,)

### 1.3. Active & problematic reports

In [19]:
active_and_problematic_reports =\
    active_reports[active_reports.Id.apply(lambda report_id: report_id in reports_on_pages_that_take_longer_to_load)]

In [20]:
active_and_problematic_reports.shape

(60, 20)

Adding a flag for this problematic feature

In [22]:
active_and_problematic_reports['IsPageViewProblematic'] = True

Storing a dataset with only active and problematic reports regarding page load time.

In [23]:
active_and_problematic_reports.to_csv('datasets/pageview_problematic_reports.csv', index=False)