# Twitch Stream Recommendation
**Name:** Matthew Susanto  
**Project:** Twitch Live stream recommendation

In [None]:
# Data
import pandas as pd
import numpy as np

# Visualization
import seaborn as sns
import matplotlib.pyplot as plt
import networkx as nx
from pyvis.network import Network

# Model Training
from sklearn.model_selection import KFold

# Misc
import pickle

import warnings
warnings.filterwarnings("ignore")

sns.set_theme()

# Introduction
According to Forbes, live streaming has a 99% YOY growth in 2020. With the growth of live streaming, many companies such as Chipotle, KFC, Lexus, and many more, are investing in sponsoring streamers. These sponsors may include large events or smaller Ad segments.  

Streamers generally stream for about 3 - 6 hours and some maybe 10 hours. The length of these streams makes it difficult to keep the audience's retention rate. Therefore, streamers strive to create a connection between themselves and the audience; the content and entertainment aspect of the stream is very important to keep the audience's retention. The biggest difference between video streaming services like YouTube and live streaming like Twitch is in their content. YouTube videos usually have one focus in each video. For example, for gaming, a YouTube video would only be about 1 game. On the other hand, Twitch streams can have multiple content. For example, a twitch stream might include some video reacting followed with 3 different games. Therefore, recommending streams using the same method as YouTube videos would not be appropriate.  

In this paper, I will create a recommend new streamers based on my viewing habits. In order to achieve this, I will need to:
1. Collect personal data and streamer data
2. Find connections between streamers and chatters
3. Create a recommendation index  

# Data collection
Data collection is probably the most important part of recommending streams. There are two parts into data collection: my personal viewing habits and live stream data. For my personal viewing habits, I will need data about streamers that I follow and minutes watched for each streamer; the minutes watched will show how much I like a certain streamer. These data can be gathered by requesting Twitch.  

A larger portion of this data collection step was collecting data about each streamer. First, since there are over a million streamers on Twitch, I decided to only use the top 100 streamers on the platform. The top 100 streamers are determined by the number of subscriptions. By scraping websites like SocialBlade and TwitchTracker, we can gather the top 100 streamers on the platform based on subscriptions; the scraping is done through using beautiful soup. After getting all the top 100 streamers, we can proceed in gathering more information about their streams. Some data that we can gather from the streams are chatters (the people in the chat box), game played throughout the stream, title, tags and time of stream. To get these data, we can use the Twitch API. The Twitch API allows us to gather public data regarding a certain stream. There are two APIs that I used. The first API gathers data about viewers/chatters and the second API gathers the stream metadata.  

The next step would be gathering the data. Since we would need to gather data for multiple streams, we need a way to know when a streamer goes live. One way was to set a websocket connection and get a response when a streamer goes live. However, this would mean that I gather data only during the start of the stream. As aforementioned, streams tend to vary in content. With this in mind we would actually need gather data multiple times during the stream. Therefore, we could set some sort of schedule to gather data at increments. We can do this by creating a CRON scheduler. A CRON scheduler will run a script at set interval of times. For this project I set a scheduler to run in 3 hour increments. Next, I saved these values into a database on MongoDB since saving raw text files would take up a larger space. This was also done to avoid any errors when opening and closing a file since using a database only requires persistent connection. MongoDB allows about 500 MB of free storage. With this constraint, I was able to run the scheduler for about 2 weeks which gathered about 1800 stream data.

# Part 1: Exploration
The first part of the report is exploration. I will focus on looking at my personal data and creating connections between streamers and chatters. Before exploring the data, I will need to clean and analyze some of the datasets that I got from Twitch. After requesting my personal data from Twitch, I recieved 6 different datasets: ads, chats_cheers_sub_notifications, follow_unfollow, minutes_watched, pages_viewed, and videos_played.

In [None]:
PATH = "../data/"
minutes_watched = "pretator21_minutes_watched.csv"
follow_unfollow = "pretator21_follow_unfollow.csv"
video_played = "pretator21_video_s_played.csv"
page_viewed = "pretator21_pages_viewed.csv"

In [None]:
min_watched_data = pd.read_csv(PATH+minutes_watched)
fol_unfol_data = pd.read_csv(PATH+follow_unfollow)
vid_played_data = pd.read_csv(PATH+video_played)
page_view_data = pd.read_csv(PATH+page_viewed)

In [None]:
min_watched_data.head()

In [None]:
fol_unfol_data.head()

In [None]:
vid_played_data.head()

In [None]:
page_view_data.head()

I will only use minutes_watched and follow_unfollow since the other datasets do not really give any useful information about the streamers I watch. First we can clean and analyze the follow_unfollow dataset. I will check if the dataset includes other values other than follow and unfollow.

In [None]:
fol_unfol_data["event_type"].unique()

Since the dataset includes those that are unfollowed, I will delete rows that represent channels that I unfollowed.

In [None]:
fol_unfol_data[fol_unfol_data["event_type"] == 'unfollow']

In [None]:
unfollowed = list(fol_unfol_data[fol_unfol_data["event_type"] == "unfollow"]["channel"])
channels_unfollowed = fol_unfol_data[fol_unfol_data["channel"].isin(unfollowed)]
channels_unfollowed

We can see that the only channel that I followed after unfollowing is "dspstanky". So, we should remove all unfollowed channels from the data and only focus on those that are followed. For channels like "dspstanky", I will keep only the latest follow.

In [None]:
follow_data = fol_unfol_data.drop([2,7,18,19,23,54])
follow_data["event_type"].unique()

Now that the dataset includes only followed streamers, we can continue with the exploration. Another aspect of the data that needs to be cleaned is duplicate values. Since users can only follow Twitch streamers once, it is impossible to have multiple follows without unfollows in between. However, this problem happens in the dataset.

In [None]:
follow_data["channel"].value_counts()

In [None]:
follow_data[follow_data["channel"] == "valorant"]

It appears that the day in which the streamer is followed occurs on the same dates. Since there are also no unfollows, this would probably be a result of some glitch. To keep the data clean, I will include only unique streamers. This shouldn't affect the data since duplicate follows have no meaning in the dataset.

In [None]:
unique_channels = follow_data["channel"].unique()
cleaned_follow = follow_data.drop_duplicates(subset=["channel"])
cleaned_follow["channel"].value_counts()

Let's look at the channels sorted by followed date.

In [None]:
cleaned_follow[["day", "channel"]].sort_values(by="day")

Next, we should look at the followed channels sorted by the total watch time. This should provide more insights on which channels I enjoy watching. To do this I will use the "minutes_watched" dataset.

In [None]:
min_watched_data.columns

In [None]:
followed_min_watch = min_watched_data[min_watched_data["channel_name"].isin(unique_channels)]
followed_min_watch

In the dataset, the minutes watched is represented as integers in the context column.

In [None]:
followed_min_watch.groupby("channel_name").sum()["context"].sort_values(ascending=False)[:10]

Just by looking at the data above, from my own knowledge, it seems like there are multiple channels that are not in the list of followed channels. So, instead of using this dataset, I will instead take the followers data from the twitch api. This will give a more current and accurate followers list.

In [None]:
import requests
header = {"Client-ID": "7cud78uflv2g253xvxhle6jtcal7dk", "Authorization": "Bearer eonmsxbqksmjgut5knq4ugaetp0ljf"}
from_id = 162289168

follow_data = []

def get_user_follows(from_id, first=100, after=None):
    # Get who a user follows based on an id
    if after != None:
        response = requests.get('https://api.twitch.tv/helix/users/follows?from_id={from_id}&after={after}&first={first}'.format(from_id=from_id, after=after, first=first), headers=header)
    else:
        response = requests.get('https://api.twitch.tv/helix/users/follows?from_id={from_id}&first={first}'.format(from_id=from_id,first=first), headers=header)
    return response

still_paginate = True
cursor = None

# The API returns a pagination key after every 100 objects.
# We keep the cursor to the next pagination so we can get the next 100 items
while still_paginate:
    if cursor == None:
        res=get_user_follows(from_id)
        print(res)
        still_paginate = len(res.json()['pagination']) > 0
        follow_data += res.json()['data']
        if still_paginate:
            cursor = res.json()['pagination']['cursor']
    else:
        res=get_user_follows(from_id, after=cursor)
        still_paginate = len(res.json()['pagination']) > 0
        follow_data += res.json()['data']
        if still_paginate:
            cursor = res.json()['pagination']['cursor']
len(follow_data)

In [None]:
follow_df = pd.DataFrame(follow_data)
follow_df.head()

This is a more accurate list my followers.

In [None]:
follow_df.sort_values(by="followed_at")["to_name"]

In [None]:
follow_list = list(follow_df["to_login"])
followed_min_watch = min_watched_data[min_watched_data["channel_name"].isin(follow_list)]
channels_by_watch_time = followed_min_watch.groupby("channel_name").sum()["context"].sort_values(ascending=False)
channels_by_watch_time["xqcow"]

In [None]:
sns.barplot(x=channels_by_watch_time[:20].index, y=channels_by_watch_time[:20])
plt.xticks(rotation=90)

It appears that xqcow is the most watched channel compared to other streamers. Next, we can look at games. Specifically, we can look at which games are most common between the streamers and also Twitch in general. Do do this, I will need to use the dataset collected by the APIs. We can start by analyzing and collecting the dataset from the MongoDB database.

In [None]:
from pymongo import MongoClient

client = MongoClient("mongodb+srv://dbUser:dbUserPassword@cluster0.jemq8.mongodb.net/myFirstDatabase?retryWrites=true&w=majority")
db = client["stream_data"]
collection = db["streams"]

print(db.list_collection_names())

documents = []
for document in collection.find():
    documents.append(document)

In [None]:
chatters = pd.DataFrame(documents)
chatters.head()

I will start by looking at the trend in games for Twitch overall.

In [None]:
sns.barplot(chatters['game_name'].value_counts().keys()[:10], chatters['game_name'].value_counts()[:10])
plt.xticks(rotation=90)

It appears that Just Chatting is the most common game played among the top 100 streamers. Next, we can look at the overall trend among the followed streamers.

In [None]:
followed_chatters = chatters[chatters['user_login'].isin(follow_list)]
sns.barplot(followed_chatters['game_name'].value_counts().keys()[:10], followed_chatters['game_name'].value_counts()[:10])
plt.xticks(rotation=90)

Just Chatting is still the most common game. However, games like CSGO, League of Legends, Fortnite and Dota are not in the top 10 games. With this visualization, the top 4 games that my followed streamers play is Just Chatting, Valorant, New world, and Grand Theft Auto V. Eventhough, we know that these are the top 4 games, I cannot conclude that these are the type of games I enjoy watching. This is the main difference between YouTube videos and Twitch streams. This is because streamers tend to play multiple games in one stream. Instead, we can look at specific streamers and games they commonly play. This will give a better picture on the type of games I enjoy watching.

In [None]:
grouped_chatters = followed_chatters.groupby(["user_login", "game_name"]).size()
grouped_chatters = grouped_chatters.reset_index()
grouped_chatters = grouped_chatters.rename({0: "count"}, axis=1)
grouped_chatters.sort_values(["user_login", "count"], ascending=False)[:50]

From the data above we see that the games that are played by the streamers are quite random. There is no definite set that defines a streamer. A better way to show which streamers I enjoy watching is by visualizing how similar each streamer is. Perhaps there are common games that multiple streamers play. This could be a reason why I followed them. To do this, I can draw a network graph to show connections between streamers in my followed list and the top 100 streamers. There are three elements to create a network graph: source, target, and edges. To show if games determine similarity between streamers, we can create and edge from one streamer to the next if they played the same game. This will help me recommend streamers based on similarities between them.

# Graph Visualization: Part I
In the span of two weeks in October, not all 100 streamers streamed. In fact, there were only 79 unique streamers this month.

In [None]:
len(chatters['user_login'].unique())

The first graph that we could visualize is a streamer to streamer connection based on the games they play. To get the data to create the graph, we can group the user_login and count the number of times they play a game.

In [None]:
chatters_group = chatters.groupby(['user_login', 'game_name']).count()["_id"]
pd.DataFrame(chatters_group)

With this information, we have each user and the game that they played. We can use this to create the network graph. To do this we first make a dictionary of values where the key is the game name and value is a list of streamers who played the game.

In [None]:
game_stream_list = list(chatters_group.index)
# Create dictionary of game-streamer
streamer_dict = {}
for stream in game_stream_list:
    if stream[1] not in streamer_dict:
        streamer_dict[stream[1]] = [stream[0]]
    else:
        streamer_dict[stream[1]].append(stream[0])

Next, we create an adjacency list for the graph. The adjacency list will look like the following: {streamer_1: {streamer_2: weight}}. The weight will represent how many common games are played between the 2 streamers.

In [None]:
# {streamer: {streamer_2: 1}}
streamer_connection_dict = {}
for game in streamer_dict:
    for streamer in streamer_dict[game]:
        if streamer not in streamer_connection_dict:
                streamer_connection_dict[streamer] = {}
        for streamer_2 in streamer_dict[game]:
            if streamer_2 != streamer:
                if streamer_2 not in streamer_connection_dict[streamer]:
                    streamer_connection_dict[streamer][streamer_2] = 1
                else:
                    streamer_connection_dict[streamer][streamer_2] += 1

I wil be using pyviz to create the network graph. This is because pyviz allows interactivity and analyzing with this tool will be simpler. I will create a function to generate random colors that can be used to represent groups/clusters in the network graph.

In [None]:
# This will be used to assign a random color to nodes
import random

def get_random_color():
    color = ["#"+''.join([random.choice('0123456789ABCDEF') for j in range(6)])
                 for i in range(1)]
    return color
get_random_color()

For the first network, I will draw connections for streamers in the top 100 list. The colors on each node, will represent groups of streamers determined by how many games they have in common. I will use 3 games as a threshold to group streamers.

In [None]:
# Draw interactive graph based on the games that they play
network = Network(height='750px', width='100%', bgcolor='#222222', font_color='white', notebook=True)
network.repulsion(node_distance=100, central_gravity=0.2, spring_length=200, spring_strength=0.05,
damping=0.09)
node_color = {}

for node in streamer_connection_dict:
    for edge in streamer_connection_dict[node]:
        src = node
        target = edge
        width = streamer_connection_dict[node][edge]
        
        if width > 1:
            if node not in node_color:
                color = get_random_color()[0]
                node_color[node] = color
                node_color[edge] = color
            elif node in node_color:
                node_color[edge] = node_color[node]
        else:
            if node not in node_color:
                color = get_random_color()[0]
                node_color[node] = color
            if edge not in node_color:
                node_color[edge] = get_random_color()[0]
            
        network.add_node(src, src, title=src, color=node_color[node])
        network.add_node(target, target, title=target, color=node_color[edge])
        network.add_edge(src, target, value=width)
neighbor_map = network.get_adj_list()

for node in network.nodes:
    node['title'] += ' Neighbors:<br>' + '<br>'.join(neighbor_map[node['id']])
    node['value'] = len(neighbor_map[node['id']])

network.show_buttons(filter_=['physics']) 
    
network.show('top_100_streamer.html')

While its a little bit difficult to see connections, we can see that there are alot of green clusters. For example, streamers like kyedae, aceu, lirik, forsen, itztimmy, and more, are grouped into one cluster since they play the similar games at least 2 times. Furthermore, the network graph utilizes physics to group clusters by connection. This means that clusters that are closer together, are more interconnected than those that are further away. In the graph above, the nodes that are in the center are very closely connected and clustered together which means that those streamers share alot of common games. However, streamers like swagg or brawlhalla are far from the center since they only have 1 or 2 connections to nodes in the center. 

Next, we can look at the connections between my followed streamers, and those in the top 100. Note that not all followed streamers are in the top 100 so the number of nodes will be lesser than the number of followed streamers.

In [None]:
network = Network(height='750px', width='100%', bgcolor='#222222', font_color='white', notebook=True)
network.repulsion(node_distance=100, central_gravity=0.2, spring_length=200, spring_strength=0.05,
damping=0.09)
node_color = {}

for node in streamer_connection_dict:
    for edge in streamer_connection_dict[node]:
        src = node
        target = edge
        width = streamer_connection_dict[node][edge]
        
        if node in follow_list:
            node_color[node] = "#9E829C"
        else:
            node_color[node] = "#3A3E3B"
        if edge in follow_list:
            node_color[edge] = "#9E829C"
        else:
            node_color[edge] = "#3A3E3B"
                
        network.add_node(src, src, title=src, color=node_color[node])
        network.add_node(target, target, title=target, color=node_color[edge])
        network.add_edge(src, target, value=width)
neighbor_map = network.get_adj_list()

for node in network.nodes:
    node['title'] += ' Neighbors:<br>' + '<br>'.join(neighbor_map[node['id']])
    node['value'] = len(neighbor_map[node['id']])

network.show_buttons(filter_=['physics']) 
    
network.show('followed_stream_top_100.html')

In the graph above a very small portion of the top 100 streamers are actually streamers that I follow. Furthermore, by looking at where they are clustered could give us an intuition on which streamers are similar to those that I follow. For example xqcow, has a lot of connections to streamers inside the center, some with weights that are large. An example is xqcow and buddha. Just from the graph alone, the edge weight is thick which means that they have multiple common games. This could help us gather streamer to streamer similarities.

Next, we can look at only followed streamers and group them by game. This will provide insight on whether games affect follows.

In [None]:
network = Network(height='750px', width='100%', bgcolor='#222222', font_color='white', notebook=True)
network.repulsion(node_distance=100, central_gravity=0.2, spring_length=200, spring_strength=0.05,
damping=0.09)
node_color = {}

for node in streamer_connection_dict:
    for edge in streamer_connection_dict[node]:
        src = node
        target = edge
        width = streamer_connection_dict[node][edge]
        
        if width > 1:
            if node not in node_color:
                color = get_random_color()[0]
                node_color[node] = color
                node_color[edge] = color
            elif node in node_color:
                node_color[edge] = node_color[node]
        else:
            if node not in node_color:
                color = get_random_color()[0]
                node_color[node] = color
            if edge not in node_color:
                node_color[edge] = get_random_color()[0]
        
        if node in follow_list and edge in follow_list:
            network.add_node(src, src, title=src, color=node_color[node], value=int(channels_by_watch_time[node]))
            network.add_node(target, target, title=target, color=node_color[edge], value=int(channels_by_watch_time[edge]))
            network.add_edge(src, target, value=width)
neighbor_map = network.get_adj_list()

for node in network.nodes:
    node['title'] += ' Neighbors:<br>' + '<br>'.join(neighbor_map[node['id']])

network.show_buttons(filter_=['physics']) 
    
network.show('streamergraph.html')

From the graph above, we see that there is a large amount of streamers that are similar. Most streamers that are in the center of the graph, play similar games. That are a coupl outliers such as hiko, ninja, valorant, mizkif, ludwig, shroud and npmlol that are not apart of any groups. This could be becuase they are not "variety streamers". A variety streamer is someone who plays multiple games in one or more streams. For example, from personal knowledge, xqcow, is a variety streamer. Since xqcow plays a larger variety of games, there is a higher chance that he plays similar games with more streamers multiple times. on the other hand, streamers like mizkif are mainly in the Just Chatting section. This might be an indicator that most streamers that I follow are variety streamers.

To answer the question, the streamers that I follow are mainly variety streamers. For the most part, the streamers all play similar games since most of them are colored the same and are clustered in the center. For the next part, I can find out which streamers are variety streamers and what a user-streamer graph looks like.

# Graph Visualization: Part 2
For the next part of the graph visualization, we can look at how streamers are connected to each other based on common chatters. This will show common communities between streamers. First, I will have to create a dictionary of the following form: {chatter: {streamer: 1}}. This way I can group streamers together.

In [None]:
viewer_streamer = chatters[["viewers", "user_login"]]
viewer_dict = {}
for i in range(len(viewer_streamer["user_login"])):
    for viewer in viewer_streamer["viewers"][i]:
        if viewer not in viewer_dict:
            viewer_dict[viewer] = {viewer_streamer["user_login"][i]: None}
        else:
            viewer_dict[viewer][viewer_streamer["user_login"][i]] = None
viewer_dict

Next, I will convert the dictionary to an adjacency list for streamers: {streamer: {streamer_2: 1}}. This will result in a similar adjancency to the previous graph.

In [None]:
streamer_dict = {}

for viewer in viewer_dict:
    for streamer in viewer_dict[viewer]:
        if streamer not in streamer_dict:
            streamer_dict[streamer] = {}
        for streamer_2 in viewer_dict[viewer]:
            if streamer_2 != streamer:
                if streamer_2 in streamer_dict[streamer]:
                    streamer_dict[streamer][streamer_2] += 1
                else:
                    streamer_dict[streamer][streamer_2] = 1

In [None]:
streamer_dict

Now that we have the adjacency list, we can visualize it in a graph.

In [None]:
network = Network(height='750px', width='100%', bgcolor='#222222', font_color='white', notebook=True)
network.barnes_hut(central_gravity=0.2, spring_length=200, spring_strength=0.05,
damping=0.09)
for node in streamer_dict:
    for edge in streamer_dict[node]:
        src = node
        target = edge
        width = streamer_dict[node][edge]
        
        network.add_node(src, src, title=src)
        network.add_node(target, target, title=target)
        network.add_edge(src, target, value=width)

neighbor_map = network.get_adj_list()

for node in network.nodes:
    node['title'] += ' Neighbors:<br>' + '<br>'.join(neighbor_map[node['id']])
    node['value'] = len(neighbor_map[node['id']])

network.show_buttons(filter_=['physics']) 
    
network.show('streamer_chatter.html')

From the graph above, we can see that most of the top 100 streamers have similar communities. However, streamers that are pulled towards each other are generally more common. Visually this is very difficult to see which is why we would need an index to determine how similar two streamers are. There are two different indexes that could help us: Jaccard and Adamic/Adar. To perform these two indexes, I will use networkx.  

I will first apply the index to the streamer-chatter graph.

In [None]:
def jaccard_index(graph: nx.Graph, streamer):
    jaccard_rank = []
    adj_list = dict(graph.adjacency())
    
    for edges in adj_list[streamer]:
        pred = nx.jaccard_coefficient(graph, [(streamer, edges)])
        for u, v, p in pred:
            jaccard_rank.append((v, p))
    
    return sorted(jaccard_rank, key=lambda x: x[1], reverse=True)

def adamic_adar_index(graph: nx.Graph, streamer):
    adamic_rank = []
    adj_list = dict(graph.adjacency())
    
    for edges in adj_list[streamer]:
        pred = nx.adamic_adar_index(graph, [(streamer, edges)])
        for u, v, p in pred:
            adamic_rank.append((v, p))
    
    return sorted(adamic_rank, key=lambda x: x[1], reverse=True)

In [None]:
streamer_chatter = nx.Graph(streamer_dict)
jaccard_index(streamer_chatter, "xqcow")[:20]

In [None]:
adamic_adar_index(streamer_chatter, "xqcow")[:20]

We can see that the indexes are very similar to each other however they are not exactly the same. Before testing the indexes, we can check what the indexes look like for the streamer-game graph.

In [None]:
streamer_game = nx.Graph(streamer_connection_dict)

In [None]:
jaccard_index(streamer_game, "mizkif")[:20]

In [None]:
adamic_adar_index(streamer_game, "mizkif")[:20]

In [None]:
jaccard_index(streamer_game, "xqcow")[:20]

Even with personal knowledge, it is very difficult to see which index creates better recommendations. We can simply test the indexes by predicting which streamers I would follow. We can split the followed list into test and train sets. Since the followed list is very small, we can use some version of K-Fold cross-validation where we randomly select sections of the data as test sets K times. I will set an arbitrary threshold of the top N index.  

First, I will find the rank relative to the size of the index list for each index. The reason I am going to use the relative rank is because a rank 6 and 7 does not mean much if the list are of different lengths. However, if it the rank is 0.3, it is in the 70th percentile. A lower number means that the index concludes that the streamer is strongly connected to a followed streamer. We can then calculate the mean for each fold in the cross-validation and find the overall average for each index.

In [None]:
# Split followers train and test split with 10 splits
# Calculate the index for each person in the train set
# Find the rank of where the streamer is in relative to the size of the list

def find_k_fold_average(index, graph, n_splits=5):
    kf = KFold(n_splits=n_splits)
    top_100_streamer = list(chatters['user_login'])
    follow_100 = follow_df[follow_df["to_login"].isin(top_100_streamer)]['to_login']
    follow_list = list(follow_100)
    epoch_average = []

    for train, test in kf.split(follow_list):
        train_list = [follow_list[x] for x in train]
        test_list = [follow_list[x] for x in test]
        average_rank = []

        for streamer in train_list:
            streamer_list = index(graph, streamer)
            for idx, streamer_2 in enumerate(streamer_list):
                if streamer_2[0] in test_list:
                    average_rank.append(idx/len(streamer_list))
        epoch_average.append(np.mean(average_rank))
    return epoch_average

In [None]:
np.mean(find_k_fold_average(jaccard_index, streamer_chatter, 15))

In [None]:
np.mean(find_k_fold_average(adamic_adar_index, streamer_chatter, 15))

In [None]:
np.mean(find_k_fold_average(jaccard_index, streamer_game, 15))

In [None]:
np.mean(find_k_fold_average(adamic_adar_index, streamer_game, 15))