# The Boys - Screentime Analysis

Simple screentime analysis of **The Boys** (season 1-3) <br>
Content of the analysis:
- **Screentime**: per character, per episode
- **Runtime**
- **Shared screentime** between characters

Some visual: [Tableau Public](https://public.tableau.com/app/profile/mattia4114/viz/boys_16621399122060/TheBoys-ScreentimeAnalysis)

In [1]:
import pandas as pd

## Data

__Data Souce__:   _Amazon Prime Video_ <br>
__Collection and Preparation__:   _[link](https://www.curiousgnu.com/movie-character-screen-time)_

In [2]:
example = pd.read_csv('./Data/s01e01.csv')

In [3]:
example

Unnamed: 0,nconst,character,start,end
0,nm8488639,Benjy,32000,155000
1,nm4240263,Jamie,33000,155000
2,nm1069800,Queen Maeve,82000,155000
3,nm0651456,Desperate Thief #1,100000,155000
4,nm1102278,Homelander,130000,155000
...,...,...,...,...
81,nm1307435,Translucent,3165000,3423000
82,nm0881631,Billy Butcher,3226000,3423000
83,nm5092703,Mason,3426000,3490000
84,nm0637992,Mayor of Baltimore,3431000,3490000


## All characters in the show

In [4]:
def get_characters(file):
    # given a specific episode it extract all the characters in it
    
    f = pd.read_csv(file).set_index('nconst')
    f = f[['character']] # keep only characters id and name
    return f

In [5]:
characters = []
for season in range(1,4):
    for episode in range(1,9):
        file = './Data/' +'s0' + str(season) + 'e0' + str(episode) + '.csv'
        f = get_characters(file)
        characters.append(f)

In [6]:
characters = pd.concat(characters)

In [7]:
characters = characters[~characters.index.duplicated(keep='first')]

In [8]:
characters

Unnamed: 0_level_0,character
nconst,Unnamed: 1_level_1
nm8488639,Benjy
nm4240263,Jamie
nm1069800,Queen Maeve
nm0651456,Desperate Thief #1
nm1102278,Homelander
...,...
nm8493111,Starlighter
nm7624031,Hometeamer
nm0498083,Doctor
nm6364830,Stormchaser


## Screentime per episode

In [9]:
def get_screentime(file, characters):
    # given a specific episode it returns the total screentime of all characters in the show
    
    f = pd.read_csv(file)
    
    f['start'] = f['start']/1000 # in seconds
    f['end'] = f['end']/1000 # in seconds
    f['screentime'] = f['end']-f['start']
    
    f = f.groupby(by=['nconst']).sum() 
    
    f = characters.join(f)
    f = f.fillna(0)
    
    f = f[['screentime']]
    f.columns = ['screentime_'+file[7:-4]]
    f = f.round(2)

    return f

In [10]:
# initialize with first episode
screentime_per_episode = get_screentime('./Data/s01e01.csv', characters)

for season in range(1,4):
    for episode in range(1,9):
        if season == 1 and episode == 1:
            # skip first already consider with initialization
            continue
            
        file = './Data/' +'s0' + str(season) + 'e0' + str(episode) + '.csv'
        screentime_per_episode = screentime_per_episode.join(get_screentime(file, characters))

In [11]:
screentime_per_episode

Unnamed: 0_level_0,screentime_s01e01,screentime_s01e02,screentime_s01e03,screentime_s01e04,screentime_s01e05,screentime_s01e06,screentime_s01e07,screentime_s01e08,screentime_s02e01,screentime_s02e02,...,screentime_s02e07,screentime_s02e08,screentime_s03e01,screentime_s03e02,screentime_s03e03,screentime_s03e04,screentime_s03e05,screentime_s03e06,screentime_s03e07,screentime_s03e08
nconst,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
nm8488639,123.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
nm4240263,122.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
nm1069800,510.0,199.0,513.0,488.0,343.0,432.0,598.0,138.0,129.0,298.0,...,350.0,453.0,268.0,0.0,0.0,306.0,440.0,0.0,184.0,846.0
nm0651456,55.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
nm1102278,253.0,1084.0,724.0,605.0,1080.0,450.0,955.0,1064.0,1144.0,563.0,...,744.0,1509.0,1061.0,828.0,794.0,865.0,607.0,736.0,613.0,1142.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
nm8493111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,24.0
nm7624031,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,46.0
nm0498083,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,53.0
nm6364830,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,15.0


## Total Screentime

In [12]:
total_screentime = characters.copy()
total_screentime['Screentime'] = screentime_per_episode.sum(axis = 1)
total_screentime

Unnamed: 0_level_0,character,Screentime
nconst,Unnamed: 1_level_1,Unnamed: 2_level_1
nm8488639,Benjy,123.0
nm4240263,Jamie,122.0
nm1069800,Queen Maeve,8153.0
nm0651456,Desperate Thief #1,55.0
nm1102278,Homelander,20432.0
...,...,...
nm8493111,Starlighter,24.0
nm7624031,Hometeamer,46.0
nm0498083,Doctor,53.0
nm6364830,Stormchaser,15.0


In [13]:
# convert in minutes
total_screentime['Screentime'] = round(total_screentime['Screentime']/60)

In [14]:
total_screentime = total_screentime.sort_values(by = ['Screentime'], ascending = False)
total_screentime.head(15)

Unnamed: 0_level_0,character,Screentime
nconst,Unnamed: 1_level_1,Unnamed: 2_level_1
nm4425051,Hughie Campbell,466.0
nm0881631,Billy Butcher,447.0
nm3929195,Starlight,342.0
nm1102278,Homelander,341.0
nm0022306,Mother’s Milk,320.0
nm6150071,Frenchie,308.0
nm7232332,The Female,214.0
nm1069800,Queen Maeve,136.0
nm1900772,A-Train,133.0
nm2281371,Ashley Barrett,124.0


## The Seven vs The Boys Screentime

In [15]:
the_boys = ['Hughie Campbell', 'Billy Butcher', 'Mother’s Milk', 'Frenchie', 'The Female']
the_seven = ['Starlight', 'Homelander', 'Queen Maeve', 'A-Train', 'The Deep', 'Stormfront', 'Black Noir', 'Translucent']

In [16]:
tot_boys = 0
tot_seven = 0
for boy in the_boys:
    tot_boys = tot_boys + total_screentime.loc[total_screentime['character'] == boy]['Screentime'][0]
for seven in the_seven:
    tot_seven = tot_seven + total_screentime.loc[total_screentime['character'] == seven]['Screentime'][0]

In [17]:
# total screentime of the boys in minutes
tot_boys

1755.0

In [18]:
# total screentime of the seven in minutes
tot_seven

1252.0

## Shared Screentime between characters

In [19]:
# use nested dictionaries as data structure

shared_screentime = {}

# initialize all the nested dictionaries (one per character)
for i in range(len(characters)):
    shared_screentime[characters.index[i]]  = {}

In [20]:
def get_shared_screentime(file, shared_screentime):
    # given a specific episode it add the shared screentime of the episode to the data structure (shared_screentime)
    
    for i in range(len(f)):
        j = i + 1
        while j < len(f) and file.end[i] > file.start[j]:
            shared = file.end[i] - file.start[j]
            try:
                shared_screentime[file.index[i]][file.index[j]] = shared_screentime[file.index[i]][file.index[j]] + shared
            except:
                shared_screentime[file.index[i]][file.index[j]] =  shared
            try:
                shared_screentime[file.index[j]][file.index[i]] = shared_screentime[file.index[j]][file.index[i]] + shared
            except:
                shared_screentime[file.index[j]][file.index[i]] =  shared
            j += 1
    return shared_screentime

In [21]:
for season in range(1,4):
    for episode in range(1,9):
        file = './Data/s0'
        f = pd.read_csv(file + str(season) + 'e0' +str(episode) + '.csv').set_index('nconst')
        f = f[['start', 'end']]
        f = f/1000
        shared_screentime = get_shared_screentime(f, shared_screentime)

In [22]:
# convert in dataframe
character_1 = []
character_2 = []
shared = []

for char_1, char_2 in shared_screentime.items():
    for char_2, time in char_2.items():
        character_1.append(char_1)
        character_2.append(char_2)
        shared.append(time)

temp = {
    'First_character' : character_1,
    'Second_character' : character_2,
    'Shared_time' : shared
}

shared_screentime = pd.DataFrame(temp)

In [23]:
shared_screentime

Unnamed: 0,First_character,Second_character,Shared_time
0,nm8488639,nm4240263,122.0
1,nm8488639,nm1069800,73.0
2,nm8488639,nm0651456,55.0
3,nm8488639,nm1102278,25.0
4,nm4240263,nm8488639,122.0
...,...,...,...
3167,nm7624031,nm7805172,6.0
3168,nm0498083,nm0881631,53.0
3169,nm6364830,nm7624031,4.0
3170,nm5045671,nm1102278,9.0


In [24]:
# sort for max shared time
shared_screentime = shared_screentime.sort_values(by = ['Shared_time'], ascending = False)

# replace code with charecters' name
shared_screentime['First_character'] = shared_screentime['First_character'].apply(lambda x : characters.loc[x][0])
shared_screentime['Second_character'] = shared_screentime['Second_character'].apply(lambda x : characters.loc[x][0])

# in minutes
shared_screentime['Shared_time'] = round(shared_screentime['Shared_time']/60)

In [25]:
shared_screentime

Unnamed: 0,First_character,Second_character,Shared_time
830,Billy Butcher,Hughie Campbell,261.0
220,Hughie Campbell,Billy Butcher,261.0
1249,Mother’s Milk,Hughie Campbell,208.0
234,Hughie Campbell,Mother’s Milk,208.0
848,Billy Butcher,Mother’s Milk,203.0
...,...,...,...
2938,Claudio,Young Black Noir,0.0
2939,Claudio,Soldier Boy,0.0
2943,Young Black Noir,Claudio,0.0
3001,Blue Hawk,Doctor,0.0


In [26]:
shared_screentime = shared_screentime.reset_index(drop = True)

In [27]:
shared_screentime.head(20)

Unnamed: 0,First_character,Second_character,Shared_time
0,Billy Butcher,Hughie Campbell,261.0
1,Hughie Campbell,Billy Butcher,261.0
2,Mother’s Milk,Hughie Campbell,208.0
3,Hughie Campbell,Mother’s Milk,208.0
4,Billy Butcher,Mother’s Milk,203.0
5,Mother’s Milk,Billy Butcher,203.0
6,Mother’s Milk,Frenchie,198.0
7,Frenchie,Mother’s Milk,198.0
8,Frenchie,The Female,184.0
9,The Female,Frenchie,184.0
