# Twitch Dataset

## Description
The Twitch dataset is a collection of data related to user activity on the streaming platform, Twitch. The data was collected every 10 minutes over a period of 43 days, capturing all streamers and users connected to their chats.

## Basic Statistics
- Total Users: 100k (full dataset: 15.5M)
- Total Streamers (items): 162.6k (full dataset: 465k)
- Total Interactions: 3M (full dataset: 124M)
- Time steps: 6148

## Metadata
The dataset includes the following metadata:
- User ID (anonymized)
- Stream ID
- Streamer username
- Time start
- Time stop

The start and stop times are provided as integers, representing periods of 10 minutes. The Stream ID can be used to retrieve a single broadcast segment from a streamer.

## Example
Here are some example entries from the dataset:

    1,34347669376,grimnax,5415,5419
    1,34391109664,jtgtv,5869,5870
    1,34395247264,towshun,5898,5899
    1,34405646144,mithrain,6024,6025
    2,33848559952,chfhdtpgus1,206,207
    2,33881429664,sal_gu,519,524
    2,33921292016,chfhdtpgus1,922,924

In [4]:
import pandas as pd

# Read the CSV file
df = pd.read_csv('100k_a.csv', header=None)

# Name the columns
df.columns = ['user_id', 'stream_id', 'streamer_username', 'time_start', 'time_stop']


# map the streamer_username to a unique number
stream_id_names = df['streamer_username'].unique()
print(len(stream_id_names))
stream_id_name_map = {stream_id_name: i for i, stream_id_name in enumerate(stream_id_names)}

df['streamer_username'] = df['streamer_username'].map(stream_id_name_map)



# Sort the DataFrame
df = df.sort_values(by=['user_id', 'time_start'])

# Group the DataFrame by 'user_id' and concatenate the 'stream_id's into a list
grouped = df.groupby('user_id')['streamer_username'].apply(list)

# Transform the grouped data into a DataFrame
df_grouped = grouped.apply(pd.Series)

# Replace NaN values with an empty string
# df_grouped = df_grouped.fillna('')

df_grouped = df_grouped.fillna(-1)

# Convert the DataFrame to integer
df_grouped = df_grouped.astype(int)

# Replace the temporary value with an empty string
df_grouped = df_grouped.replace(-1, '')

# Reset the index to include 'user_id' in the output file
df_grouped.reset_index(inplace=True)

# Write the DataFrame to a text file
df_grouped.to_csv('sequential_recommendation_data.txt', sep=' ', header=False, index=False)

162625


  df_grouped.reset_index(inplace=True)
