This notebook we discuss some questions (mainly on strategies) and give some metrics to answer them.

## How to measure viewer's preference?

This can be like economics theory but we use more simple way to do that.

## 1. Promote/Acquisition

We have several streamers over several countries. Who should I recommend for the viewers?

For cold start: High popularity first.\
non cold-start : Similar category first, high popularity first.

In [3]:
import numpy as np
import pandas as pd

In [3]:
df = pd.DataFrame(
{
    'StreamerID': range(1,16)
})

In [8]:
arr = []
for i in range(1,6):
    arr += [i]*3

In [9]:
np.random.shuffle(arr)

In [11]:
df['Category'] = arr

In [16]:
arr = []
for i in range(1,4):
    arr += [i]*5

In [17]:
np.random.shuffle(arr)

In [18]:
df['Scale'] = arr

## Dataframe Description

1. StreamerID : ID for each streamer
2. Category: [int] stream type for this streamer (assume that this encoding scheme has similarity measure meaning, i.e.,
    1 is similar to 2, and 2 is similar to 1 and 3, etc.)
3. Scale : Viewer number scales of this streamer, 1 is the lowest level, and 3 is the highest.

## Cold Start: 

We give 6 recommendation for cold start situation.

In [27]:
recommend = df.sort_values(by='Scale',ascending=False).iloc[:6,:]

In [41]:
recommend = recommend.reset_index()

Then we recommend these streamers for viewers.

## Case1 : Viewers click one of these streamers

In [45]:
click_id = recommend.loc[2,'StreamerID']

We recommend
1. same category
2. other category with high popularity

based on click_id

In [63]:
cate = int(df[df['StreamerID'] == click_id].Category)

In [68]:
cate

3

In [76]:
recommend_click = df.loc[(df['Category'] == cate) & ~(df.StreamerID == click_id)]

In [77]:
recommend_click

Unnamed: 0,StreamerID,Category,Scale
4,5,3,2
10,11,3,1


In [86]:
recommend_click = pd.concat([recommend_click,
          df.loc[(df['Scale'] == 3) & ~(df.StreamerID == click_id)]
          ])

In [87]:
recommend_click

Unnamed: 0,StreamerID,Category,Scale
4,5,3,2
10,11,3,1
1,2,2,3
5,6,1,3
13,14,1,3
14,15,4,3


## Case2 : Stable State

After cold start and click recommendation, we can find that at some points viewer's click may show some patterns.

If some specific variable(e.g. Category) jumps little, then we can focus on recommend this content for this viewer.

## 2. Evaluation

How to evaluatate `drift` of viewer's preference?

We need to define what is `drift` first.\
We separate drift into several stages:

1. Signal: Viewer starts to click into channel with low similarity stream type w.r.t. the content he watched before.
2. Not a noise : He not just passes by, nor just stay for a low amount of time. Instead, he watches several times.
3. Compatible/Transition: Does he also watch those contents he watched before? Or he doesn't watch those content anymore and just watch the new content?

For `Signal` part, we give more precise definition:\
low similarity stream type: `Category` which differs a lot, i.e., $|x-y|$ is large for $x,y \in\text{Category}$.

For `Not a noise` part, we list some cases:
1. 3 times in a week and each time no shorter than 30 mins.

Special cases:

1. 3 times in a week but not all longer than 30 mins but the viewer only watch the stream for less than 30 mins.
2. less than 3 times in a weak but the viewer watch for no shorter than 30 mins and he watch all stream for less than 3 times.

For `Compatible/Transition` part, we ask the question that:\
**Does the viewer also watch the stream he watched before?**

If so, we try to recommend both category for this viewer,\
If not, we try to recommend new category that the viewer watches for him.

In [114]:
df_viewer = pd.DataFrame(
{
    'ViewerID' : [11] * 20 
})

In [115]:
import datetime

In [116]:
basis_date = datetime.datetime.strptime('20230101', '%Y%m%d')

In [117]:
watch_date = []
for _ in range(len(df_viewer)):
    basis_date += datetime.timedelta(days=1)
    watch_date += [basis_date]

In [118]:
df_viewer['Watch_Date'] = watch_date

In [119]:
view_streamID = [15] * 8 + [35] * (len(df_viewer) - 8)
np.random.shuffle(view_streamID)

In [120]:
df_viewer['StreamerID'] = view_streamID

In [121]:
watch_time = ['Morning'] * 2 + ['Afternoon'] * 3 + ['Night'] * 10 + ['MidNight'] * (len(df_viewer) - 2-3-10)
np.random.shuffle(watch_time)

In [122]:
df_viewer['Start_Watch_Time'] = watch_time 

In [123]:
watch_duration = [0.5] * 2 + [1] * 3 + [2] * 10 + [3] * (len(df_viewer) - 2-3-10)
np.random.shuffle(watch_duration)

In [124]:
df_viewer['Watch_Duration'] = watch_duration

In [125]:
df_viewer

Unnamed: 0,ViewerID,Watch_Date,StreamerID,Start_Watch_Time,Watch_Duration
0,11,2023-01-02,15,Morning,2.0
1,11,2023-01-03,15,Night,2.0
2,11,2023-01-04,15,Night,1.0
3,11,2023-01-05,35,Night,2.0
4,11,2023-01-06,35,MidNight,2.0
5,11,2023-01-07,15,Night,3.0
6,11,2023-01-08,15,Afternoon,0.5
7,11,2023-01-09,35,MidNight,2.0
8,11,2023-01-10,35,Night,2.0
9,11,2023-01-11,35,MidNight,1.0


## Same ViewerID but in another time period

In [126]:
df_viewer_2 = df_viewer.copy()

In [127]:
basis_date_2 = datetime.datetime.strptime('20230601', '%Y%m%d')
watch_date_2 = []
for _ in range(len(df_viewer_2)):
    basis_date_2 += datetime.timedelta(days=1)
    watch_date_2 += [basis_date_2]

In [128]:
df_viewer_2['Watch_Date'] = watch_date_2

In [132]:
view_streamID_2 = [15] * 4 + [35] * 10 + [50] * (len(df_viewer_2)-4-10)
np.random.shuffle(view_streamID_2)

In [133]:
df_viewer_2['StreamerID'] = view_streamID_2

In [134]:
df_viewer_2

Unnamed: 0,ViewerID,Watch_Date,StreamerID,Start_Watch_Time,Watch_Duration
0,11,2023-06-02,35,Morning,2.0
1,11,2023-06-03,50,Night,2.0
2,11,2023-06-04,50,Night,1.0
3,11,2023-06-05,50,Night,2.0
4,11,2023-06-06,35,MidNight,2.0
5,11,2023-06-07,50,Night,3.0
6,11,2023-06-08,15,Afternoon,0.5
7,11,2023-06-09,35,MidNight,2.0
8,11,2023-06-10,35,Night,2.0
9,11,2023-06-11,50,MidNight,1.0


## Drift Signal:

We assume that `StreamerID = 50` has low similarity with `StreamerID = 15` and `StreamerID = 35`

In [142]:
signal = df_viewer_2.loc[~df_viewer_2.StreamerID.isin(df_viewer.StreamerID)]

In [143]:
signal

Unnamed: 0,ViewerID,Watch_Date,StreamerID,Start_Watch_Time,Watch_Duration
1,11,2023-06-03,50,Night,2.0
2,11,2023-06-04,50,Night,1.0
3,11,2023-06-05,50,Night,2.0
5,11,2023-06-07,50,Night,3.0
9,11,2023-06-11,50,MidNight,1.0
14,11,2023-06-16,50,Night,2.0


From dataframe `Signal` we can see that this viewer's drift starts from `2023-06-03` 

In [150]:
signal['Cum_Signal'] = signal.rolling(window='7D', on='Watch_Date', min_periods=1)['StreamerID'].count()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  signal['Cum_Signal'] = signal.rolling(window='7D', on='Watch_Date', min_periods=1)['StreamerID'].count()


In [152]:
signal

Unnamed: 0,ViewerID,Watch_Date,StreamerID,Start_Watch_Time,Watch_Duration,Cum_Signal
1,11,2023-06-03,50,Night,2.0,1.0
2,11,2023-06-04,50,Night,1.0,2.0
3,11,2023-06-05,50,Night,2.0,3.0
5,11,2023-06-07,50,Night,3.0,4.0
9,11,2023-06-11,50,MidNight,1.0,3.0
14,11,2023-06-16,50,Night,2.0,2.0


In [153]:
Not_A_Noise = signal[signal.Cum_Signal>=3]

In [154]:
Not_A_Noise

Unnamed: 0,ViewerID,Watch_Date,StreamerID,Start_Watch_Time,Watch_Duration,Cum_Signal
3,11,2023-06-05,50,Night,2.0,3.0
5,11,2023-06-07,50,Night,3.0,4.0
9,11,2023-06-11,50,MidNight,1.0,3.0


From `2023-06-05` the drift signal transfers into `Not a noise` stage.

## Compatible or Transition?

In [155]:
df_viewer_2[(df_viewer_2.Watch_Date > '2023-06-05') & (df_viewer_2.StreamerID.isin(df_viewer.StreamerID))]

Unnamed: 0,ViewerID,Watch_Date,StreamerID,Start_Watch_Time,Watch_Duration
4,11,2023-06-06,35,MidNight,2.0
6,11,2023-06-08,15,Afternoon,0.5
7,11,2023-06-09,35,MidNight,2.0
8,11,2023-06-10,35,Night,2.0
10,11,2023-06-12,15,Night,0.5
11,11,2023-06-13,35,Night,2.0
12,11,2023-06-14,35,MidNight,3.0
13,11,2023-06-15,35,Afternoon,2.0
15,11,2023-06-17,15,Afternoon,3.0
16,11,2023-06-18,35,Night,1.0


Since this user still watches previous frequently watched channel, we conclude that this is a `Compatible` case.

Next, we discuss about how to measure donation.

## Measure about Donation

In [1]:
n = 10
view_id = []
for i in range(n):
    view_id += [i+1] * 10

In [4]:
df = pd.DataFrame(
{
    'ViewerID' : view_id
})

In [6]:
rng = np.random.default_rng(32)

In [13]:
stream_id = [1]*10 + [2]* 10 + [3] *10 + [4] * 10 + [5] *10 + [6] * 9 + [7] * 1 + [8] * 8 + [9] * 2 + [10] * 7 + [11] * 3 \
+ [12] * 7 + [13] * 2 + [14] * 1 + [15] * 6 + [16] * 2 + [17] * 2 

In [14]:
df['StreamerID'] = stream_id

In [16]:
arr_type = [1] * 50 + [2] * 30 + [3] * 20
np.random.shuffle(arr_type)

In [17]:
df['Category'] = arr_type

In [54]:
arr_amount = []
arr_time = []

In [46]:
amount_dict = {1 : 100, 2:1000, 3: 10000}

In [49]:
amount_dict[3]

10000

In [55]:
for i in range(10):
    donation_YN = rng.choice(2, 10,replace = True)
    donation_YN.sort()
    times = sum(donation_YN)
    timing = rng.choice(120, times,replace = False) + 1
    timing.sort()
    for YN in donation_YN:
        if YN == 0:
            arr_amount += [0]
            arr_time += [None]
        else:
            amount = rng.choice(3, 1)+1
            arr_amount += [amount_dict[int(amount)]]
    arr_time += list(timing)

In [56]:
df['Donate_Amount'] = arr_amount
df['Donate_Time'] = arr_time

In [57]:
df

Unnamed: 0,ViewerID,StreamerID,Category,Donate_Amount,Donate_Time
0,1,1,1,0,
1,1,1,3,0,
2,1,1,1,0,
3,1,1,1,0,
4,1,1,1,0,
...,...,...,...,...,...
95,10,15,2,10000,9.0
96,10,16,1,10000,17.0
97,10,16,3,1000,51.0
98,10,17,3,1000,81.0


In [61]:
df.groupby(
by = ['Category','ViewerID']).agg(
    {'Donate_Amount':sum,'Donate_Time':len}
)

Unnamed: 0_level_0,Unnamed: 1_level_0,Donate_Amount,Donate_Time
Category,ViewerID,Unnamed: 2_level_1,Unnamed: 3_level_1
1,1,10000,7.0
1,2,300,5.0
1,3,0,5.0
1,4,20200,7.0
1,5,11200,4.0
1,6,10200,6.0
1,7,100,2.0
1,8,11100,3.0
1,9,11000,5.0
1,10,10100,6.0


In [63]:
df.groupby(
by = ['Category','Donate_Amount'])['Donate_Amount'].count()

Category  Donate_Amount
1         0                27
          100              12
          1000              3
          10000             8
2         0                14
          100               4
          1000              6
          10000             6
3         0                10
          100               3
          1000              3
          10000             4
Name: Donate_Amount, dtype: int64

In [64]:
df.groupby(
by = ['Category','StreamerID']).agg(
    {'Donate_Amount':sum,'Donate_Time':len}
)

Unnamed: 0_level_0,Unnamed: 1_level_0,Donate_Amount,Donate_Time
Category,StreamerID,Unnamed: 2_level_1,Unnamed: 3_level_1
1,1,10000,7.0
1,2,300,5.0
1,3,0,5.0
1,4,20200,7.0
1,5,11200,4.0
1,6,10100,5.0
1,7,100,1.0
1,8,100,2.0
1,10,1100,2.0
1,11,10000,1.0
