# Panopto Data Analysis

Panopto gives us "Stream Source" in their "Sessions created or edited" report, but usage data (including date last viewed) are in the Session usage report. Below, I use the pandas data manipulation library to stitch together these two data sources and compile some statistics.

In [17]:
import numpy as np
import pandas as pd

sessions_file = 'sessionscreatedoredited.csv'
usage_file = 'sessionusage.csv'
sessions = pd.read_csv(sessions_file, low_memory=False).set_index('Session ID')
usage = pd.read_csv(usage_file, low_memory=False).set_index('Session ID')
# trim redundant columns from usage so we can cleanly join them
usage = usage.drop([c for c in sessions.columns if c in usage.columns], axis='columns')
# convert most recent view to datetime
usage['Most Recent View Date'] = pd.to_datetime(usage['Most Recent View Date'])
data = sessions.join(usage)

## Total Hours

In [18]:
# derivative columns
data['hours'] = data['Session Length'] / 60
data['Root Folder'] = data['Root Folder (Level 0)']
# @TODO a "lowest-level folder" of sorts might be useful but complicated to code
data['folder path'] = '/' + data['Root Folder (Level 0)'] + '/' + data['Subfolder (Level 1)'].astype(str) + '/' + data['Subfolder (Level 2)'].astype(str) + '/' + data['Subfolder (Level 3)'].astype(str) + '/' + data['Subfolder (Level 4)'].astype(str)

In [19]:
table = data[['Stream Source', 'hours']].groupby(['Stream Source']).sum().sort_values('hours', ascending=False)
table.loc['Total'] = table.sum()
table

Unnamed: 0_level_0,hours
Stream Source,Unnamed: 1_level_1
Zoom,16169.95343
Web API,3188.285008
Panopto for Mac,134.287582
Panopto for Windows,71.140858
Panopto Capture,48.810205
Mixed,5.244231
RTMP,4.475007
iOS,3.401765
Unspecified,1.119587
Android,0.414145


In [20]:
table = data[['Root Folder', 'hours']].groupby(['Root Folder']).sum().sort_values('hours', ascending=False)
table

Unnamed: 0_level_0,hours
Root Folder,Unnamed: 1_level_1
Users,15941.91269
Moodle,3454.757359
Libraries,154.739848
CCA Departments,65.413153
Tutorials,6.732931
First Year Program,3.375854


### Unwatched Hours by Source

In [21]:

unwatched = data[data['Views and Downloads'].isnull()]
table = unwatched[['Stream Source', 'hours']].groupby(['Stream Source']).sum().sort_values('hours', ascending=False)
table.loc['Total'] = table.sum()
table

Unnamed: 0_level_0,hours
Stream Source,Unnamed: 1_level_1
Zoom,13313.311031
Web API,513.479401
Panopto for Mac,15.08516
Panopto Capture,5.897811
Panopto for Windows,5.234186
iOS,1.794449
Mixed,1.686691
Android,0.252345
RTMP,0.0
Unspecified,0.0


"Web API" refers to uploads via the website. See the list of "Stream Source" values on their reports documentation page: https://support.panopto.com/s/article/System-Usage-Report-Fields

### Unwatched Hours By Root Folder

In [22]:
unwatched[['Root Folder', 'hours']].groupby(['Root Folder']).sum().sort_values('hours', ascending=False)

Unnamed: 0_level_0,hours
Root Folder,Unnamed: 1_level_1
Users,13155.459902
Moodle,682.489447
Libraries,16.602274
CCA Departments,1.927585
Tutorials,0.261867


Most unwatched videos come from Zoom and are in User folders. There's still a significant amount of unwatched videos in the Moodle folder hierarchy, though (almost 700 hours).

### Watched Hours by Source

In [23]:

watched = data[data['Views and Downloads'] >= 1]
table = watched[['Stream Source', 'hours']].groupby(['Stream Source']).sum().sort_values('hours', ascending=False)
table.loc['Total'] = table.sum()
print('Watched Hours by Source')
table

Watched Hours by Source


Unnamed: 0_level_0,hours
Stream Source,Unnamed: 1_level_1
Zoom,2856.642399
Web API,2674.805607
Panopto for Mac,119.202422
Panopto for Windows,65.906672
Panopto Capture,42.912394
RTMP,4.475007
Mixed,3.557541
iOS,1.607316
Unspecified,1.119587
Android,0.1618


### Watched Hours by Root Folder

In [24]:
watched[['Root Folder', 'hours']].groupby(['Root Folder']).sum().sort_values('hours', ascending=False)

Unnamed: 0_level_0,hours
Root Folder,Unnamed: 1_level_1
Users,2786.452788
Moodle,2772.267913
Libraries,138.137575
CCA Departments,63.485569
Tutorials,6.471065
First Year Program,3.375854


A few conclusions:
  - Zoom is the source of the majority of hours (≈80%) but much Zoom content goes unwatched
  - Zoom recordings tend to go into User folders and thus User folders hold most of the unwatched content
  - Content uploaded by other sources is almost always watched at least once. Despite Zoom making up far more hours, uploaded sessions were watched almost as much. There's only a trace amount of content from other sources (like the Panopto desktop and web apps) and this content is almost always watched.
  - Similar to how non-Zoom content makes up almost as much watched content despite being a small share of our overall storage, sessions in Moodle folders were watched almost as much as the far-more-numerous sessions in Users folders

----

Now we want to look at videos that were watched at least once, but not recently.

In [25]:
from datetime import datetime
from dateutil.relativedelta import relativedelta

six_months_ago = datetime.now() - relativedelta(months=6)
one_year_ago = datetime.now() - relativedelta(years=1)
eighteen_mo_ago = datetime.now() - relativedelta(months=18)

unwatched_for_6_months = data[data['Most Recent View Date'] > six_months_ago]
unwatched_for_12_months = data[data['Most Recent View Date'] > one_year_ago]
unwatched_for_18_months = data[data['Most Recent View Date'] > eighteen_mo_ago]

print("Unwatched for...")
print("6 months:", round(unwatched_for_6_months['hours'].sum(), 2))
print("12 months:", round(unwatched_for_12_months['hours'].sum(), 2))
print("18 months:", round(unwatched_for_18_months['hours'].sum(), 2))

Unwatched for...
6 months: 1673.09
12 months: 3088.16
18 months: 4732.47


Only 5770 hours have been watched ever, but if we keep _all_ of these hours in storage, then incoming created hours will overflow our storage allotment. If we keep videos that have been watched in the last year (3095 hours) that should give us room for new content (8800 - 3095 = 5705).