# The Participant of Live Stream

We can generally enumarate 3 categories of participants as follows:

1. User/Watcher

2. Streamer

3. Company

These 3 categories of participants care about different things, and sometimes they hold contradictory opinions.\
I will conduct analysis from each part of participant, set the goal or ask the question \
and see what we can apply data science to solve the question and achieve the goal. 

## First, we start from `User`.

What user cares: `he/she wants to see what he/she wants to see.`

At first glance, it doesn't tell any information.

So, we ask more question about `what does the user really want to see?`

This is still not specific, we list some cases to discuss more.

## 1. First-Time User

This is related to so-called `cold-start` problem for recommendation system.

### Case1 : First time use, know what he/she wants to watch.

In this case, try to let the user leave footprint or some records like\ 
1. stream type such as `Talk Show`, `Gaming`, `IRL`
2. streamer type etc.

to make use of these attributes for future recommendation.

### Case2 : First time use, doesn't know what he/she wants to watch.

In this case, try to let the user leave footprint or some records like\ 
1. stream type such as `Talk Show`, `Gaming`, `IRL`
2. streamer type etc.

to make use of these attributes for recommendation.

## What Data Science Can Do in Case2 ?

In Case2, if the user know which type of stream he likes, then we can do the following things:

Suppose that there are 5 types of stream, naming 1 to 5.\
Also, we have 30 users data of ratings of each stream type \
(we assume that we can have explicit rating here, but it is not the case in general, we'll elaborate more on `streamer` parts)\
(note that there may be null values since each user may not watch all types of stream).\
We create the artificial data by

In [1]:
import numpy as np
import pandas as pd

In [2]:
df = pd.DataFrame(index=range(1,31),columns=range(1,6), dtype='int')

In [3]:
df.columns = ['1','2','3','4','5']

In [4]:
rng = np.random.default_rng(32)

In [5]:
len(df.columns)

5

In [6]:
for i in range(1,len(df.columns)+1):
    df[str(i)] = rng.choice(5, 30, replace=True)

In [7]:
df

Unnamed: 0,1,2,3,4,5
1,4,0,2,2,3
2,0,2,1,1,1
3,4,3,0,1,1
4,2,2,2,3,3
5,2,3,0,0,2
6,1,0,0,1,1
7,1,3,4,2,0
8,1,0,4,2,3
9,3,4,0,0,2
10,3,1,3,4,0


In [8]:
df = df.replace(0,np.nan)

In [9]:
row_mean = df.mean(axis=1)

In [10]:
df_mean_center = df.apply(lambda x: x-row_mean)

In [11]:
df_mean_center

Unnamed: 0,1,2,3,4,5
1,1.25,,-0.75,-0.75,0.25
2,,0.75,-0.25,-0.25,-0.25
3,1.75,0.75,,-1.25,-1.25
4,-0.4,-0.4,-0.4,0.6,0.6
5,-0.333333,0.666667,,,-0.333333
6,0.0,,,0.0,0.0
7,-1.5,0.5,1.5,-0.5,
8,-1.5,,1.5,-0.5,0.5
9,0.0,1.0,,,-1.0
10,0.25,-1.75,0.25,1.25,


Suppose the new user is with ID = 31, and he ratings stream type `2` and `5` as `3`, how can we know about other type?

We can do it by `Collaborative Filtering`.

Question1 : how to find the similarity of UserID = 31 with the existing UserID?

In [12]:
compute_df = df_mean_center.loc[:,['2','5']]

In [13]:
compute_df = pd.concat([pd.DataFrame([[3,3]], columns=compute_df.columns), compute_df])

In [14]:
compute_df

Unnamed: 0,2,5
0,3.0,3.0
1,,0.25
2,0.75,-0.25
3,0.75,-1.25
4,-0.4,0.6
5,0.666667,-0.333333
6,,0.0
7,0.5,
8,,0.5
9,1.0,-1.0


In [15]:
from sklearn.metrics.pairwise import cosine_similarity

In [16]:
result_df = pd.DataFrame(cosine_similarity(compute_df.fillna(0)),index = compute_df.index,columns = compute_df.index)

In [17]:
result_df

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,21,22,23,24,25,26,27,28,29,30
0,1.0,0.707107,0.447214,-0.242536,0.196116,0.316228,0.0,0.707107,0.707107,1.0146540000000001e-17,...,-0.707107,1.0,-0.707107,0.0,0.447214,0.707107,-0.707107,0.0,-0.707107,-0.707107
1,0.7071068,1.0,-0.316228,-0.857493,0.83205,-0.447214,0.0,0.0,1.0,-0.7071068,...,-1.0,0.7071068,0.0,0.0,0.948683,1.0,-1.0,0.0,-1.0,0.0
2,0.4472136,-0.316228,1.0,0.759257,-0.789352,0.989949,0.0,0.948683,-0.316228,0.8944272,...,0.316228,0.4472136,-0.948683,0.0,-0.6,-0.316228,0.316228,0.0,0.316228,-0.948683
3,-0.2425356,-0.857493,0.759257,1.0,-0.998868,0.843661,0.0,0.514496,-0.857493,0.9701425,...,0.857493,-0.2425356,-0.514496,0.0,-0.976187,-0.857493,0.857493,0.0,0.857493,-0.514496
4,0.1961161,0.83205,-0.789352,-0.998868,1.0,-0.868243,0.0,-0.5547,0.83205,-0.9805807,...,-0.83205,0.1961161,0.5547,0.0,0.964764,0.83205,-0.83205,0.0,-0.83205,0.5547
5,0.3162278,-0.447214,0.989949,0.843661,-0.868243,1.0,0.0,0.894427,-0.447214,0.9486833,...,0.447214,0.3162278,-0.894427,0.0,-0.707107,-0.447214,0.447214,0.0,0.447214,-0.894427
6,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,0.7071068,0.0,0.948683,0.514496,-0.5547,0.894427,0.0,1.0,0.0,0.7071068,...,0.0,0.7071068,-1.0,0.0,-0.316228,0.0,0.0,0.0,0.0,-1.0
8,0.7071068,1.0,-0.316228,-0.857493,0.83205,-0.447214,0.0,0.0,1.0,-0.7071068,...,-1.0,0.7071068,0.0,0.0,0.948683,1.0,-1.0,0.0,-1.0,0.0
9,1.0146540000000001e-17,-0.707107,0.894427,0.970143,-0.980581,0.948683,0.0,0.707107,-0.707107,1.0,...,0.707107,1.0146540000000001e-17,-0.707107,0.0,-0.894427,-0.707107,0.707107,0.0,0.707107,-0.707107


We can get the top-k most similar user w.r.t. UserID = 31 as

In [18]:
top_10_similar_user = result_df.loc[0,1:].sort_values(ascending=False)[:10]

In [19]:
top_10_similar_user

22    1.000000
20    0.813733
1     0.707107
8     0.707107
26    0.707107
17    0.707107
12    0.707107
16    0.707107
7     0.707107
2     0.447214
Name: 0, dtype: float64

## Then we can recommend what these users watch to UserID = 31.

## Following questions:
1. What if the new user cannot select which type of stream he likes?
2. If the similarities in the top 10 similar users are low, how to deal with it? 

For question 1, other than the random guess of what the new user would like,\
we can consider the following aspects:

1. The most popular streamer.
2. The most popular topic `at that time`.

For question 2, we consider the following cases:

## Case1 : Enough similar users to recommend.

Then it is not a problem if we set the similarity threshold carefully.

## Case2 : Less or even no similar users.

There are some possible reasons about this condition:
* Random Sample of Data:

In this case, we can try re-sample the user from database.

* Model Issue: 

For ranking matrix, we can also try correlation.\
And also, cosine similarity works well in larger dimension.\

* New User Is Indeed Different from Others:

In this case, it could be that

1. `0 similarity` : then we can try to recommend some marginal stream type, including 
    - the new type that pops out these days
    - existing types with small community.
    but we need to be careful that it could tend to some abnormal content like bloody or adult content.
    
2. `-1 similarity` : then we can try to find the opposite stream type of what the existing users watch.

## Until now, we have introduced the new user to what he wants to watch if everything works properly.

Now we move on to next type of users.

## 2. Active Users

For these users we care about 2 things:
* Make him want to watch the channel for a long time (not only 1 day, but also in a long period of time)
* Make him want to use the platform for a long time(not only 1 day, but also in a long period of time)

There is a slightly difference in these 2 things at:\
**If the user is not watching his previously frequent-watching channel, what can we do to help him to go to another channel he also likes?**

For the first case, we elaborate more on `streamer` notebook. \
For the second case, we elaborate more on `company` notebook.

Next, please see `streamer` notebook.