# Panopto Data Analysis

Panopto gives us "Stream Source" in their "Sessions created or edited" report, but usage data (including date last viewed) are in the Session usage report. Below, I use the pandas data manipulation library to stitch together these two data sources and compile some statistics.

In [11]:
import numpy as np
import pandas as pd

sessions_file = 'sessionscreatedoredited_2020-08-08--2022-06-01.csv'
usage_file = 'sessionusage_2020-08-08--2022-06-01.csv'
sessions = pd.read_csv(sessions_file, low_memory=False).set_index('Session ID')
usage = pd.read_csv(usage_file, low_memory=False).set_index('Session ID')
# trim redundant columns from usage so we can cleanly join them
usage = usage.drop([c for c in sessions.columns if c in usage.columns], axis='columns')
# convert most recent view to datetime
usage['Most Recent View Date'] = pd.to_datetime(usage['Most Recent View Date'])
data = sessions.join(usage)

In [27]:
# derivative columns
data['hours'] = data['Session Length'] / 60
# @TODO a "lowest-level folder" of sorts might be useful but complicated to code
data['folder path'] = '/' + data['Root Folder (Level 0)'] + '/' + data['Subfolder (Level 1)'].astype(str) + '/' + data['Subfolder (Level 2)'].astype(str) + '/' + data['Subfolder (Level 3)'].astype(str) + '/' + data['Subfolder (Level 4)'].astype(str)

print('Total Panopto stored hours:', round(data['hours'].sum(), 2))

Total Panopto stored hours: 19627.13


### Unwatched Hours by Source

In [20]:

unwatched = data[data['Views and Downloads'].isnull()]
table = unwatched[['Stream Source', 'hours']].groupby(['Stream Source']).sum().sort_values('hours', ascending=False)
table.loc['Total'] = table.sum()
table

Unwatched Hours


Unnamed: 0_level_0,hours
Stream Source,Unnamed: 1_level_1
Zoom,13313.311031
Web API,513.479401
Panopto for Mac,15.08516
Panopto Capture,5.897811
Panopto for Windows,5.234186
iOS,1.794449
Mixed,1.686691
Android,0.252345
RTMP,0.0
Unspecified,0.0


"Web API" refers to uploads via the website. See the list of "Stream Source" values on their reports documentation page: https://support.panopto.com/s/article/System-Usage-Report-Fields

### Unwatched Hours By Root Folder

In [14]:
unwatched[['Root Folder (Level 0)', 'hours']].groupby(['Root Folder (Level 0)']).sum().sort_values('hours', ascending=False)

Unnamed: 0_level_0,hours
Root Folder (Level 0),Unnamed: 1_level_1
Users,13155.459902
Moodle,682.489447
Libraries,16.602274
CCA Departments,1.927585
Tutorials,0.261867


Most unwatched videos come from Zoom and are in User folders. There's still a significant amount of unwatched videos in the Moodle folder hierarchy, though (almost 700 hours).

### Watched Hours by Source

In [19]:

watched = data[data['Views and Downloads'] >= 1]
table = watched[['Stream Source', 'hours']].groupby(['Stream Source']).sum().sort_values('hours', ascending=False)
table.loc['Total'] = table.sum()
print('Watched Hours by Source')
table

Unnamed: 0_level_0,hours
Stream Source,Unnamed: 1_level_1
Zoom,2856.642399
Web API,2674.805607
Panopto for Mac,119.202422
Panopto for Windows,65.906672
Panopto Capture,42.912394
RTMP,4.475007
Mixed,3.557541
iOS,1.607316
Unspecified,1.119587
Android,0.1618


### Watched Hours by Root Folder

In [16]:
watched[['Root Folder (Level 0)', 'hours']].groupby(['Root Folder (Level 0)']).sum().sort_values('hours', ascending=False)

Unnamed: 0_level_0,hours
Root Folder (Level 0),Unnamed: 1_level_1
Users,2786.452788
Moodle,2772.267913
Libraries,138.137575
CCA Departments,63.485569
Tutorials,6.471065
First Year Program,3.375854


Now we want to look at videos that were watched at least once, but not recently.

In [26]:
from datetime import datetime
from dateutil.relativedelta import relativedelta

six_months_ago = datetime.now() - relativedelta(months=6)
one_year_ago = datetime.now() - relativedelta(years=1)
eighteen_mo_ago = datetime.now() - relativedelta(months=18)

unwatched_for_6_months = data[data['Most Recent View Date'] > six_months_ago]
unwatched_for_12_months = data[data['Most Recent View Date'] > one_year_ago]
unwatched_for_18_months = data[data['Most Recent View Date'] > eighteen_mo_ago]

print("Unwatched for...")
print("6 months:", round(unwatched_for_6_months['hours'].sum(), 2))
print("12 months:", round(unwatched_for_12_months['hours'].sum(), 2))
print("18 months:", round(unwatched_for_18_months['hours'].sum(), 2))

Unwatched for...
6 months: 1726.55
12 months: 3094.91
18 months: 4783.67


Only 5770 hours have been watched ever, but if we keep _all_ of these hours in storage, then incoming created hours will overflow our storage allotment. If we keep videos that have been watched in the last year (3095 hours) that should give us room for new content (8800 - 3095 = 5705).