#Procedural instructions: Twitch Live Streamer Data
## Overview
- The purpose of this document is to be a documentation of the methods used to create a data set using python.
- The data used is derived from the platform Twitch, aspects such as total watch time, average viewership, and amounts of followers for live streaming content creators.
- Analyzing this data can be beneficial to put in perspective the important data to focus on optimizing to anyone apiring to be a content creator.
- Basic software such as python3 and the ability to download the data document will be necessary to conduct this procedure.
### Getting Started: Obtaining the data
- First, setup or use an existing google account to setup a google colab account to access Python3.
- Using [this link](https://www.kaggle.com/datasets/aayushmishra1512/twitchdata?resource=download), download the .csv file of the data.
- Find your google colab notebook folder in your google drive and upload the .csv file.
### Analyzing the data in Python
- To allow Python to access the .csv in google drive, this code bellow is ran


In [None]:
from google.colab import drive
drive.mount('/content/gdrive')

Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).


- Packages such as pandas and numby needed for Python must be encoded also.
- Use this code next to utilize these packages with np. or pd.


In [None]:
import numpy as np
import pandas as pd

- Next, to provide consistent acces to the data set in the google doc, run this code below.

In [None]:
df=pd.read_csv('gdrive/My Drive/Colab Notebooks/twitchdataupdate.csv')

## Investigating the data
- The next step is to filter our data. This is done by running the below code.
- This specific code is used provide the data for rows 1-50 and columns 1-6.

In [None]:
df.iloc[1:51,1:6]

Unnamed: 0,Watch time(Minutes),Stream time(minutes),Peak viewers,Average viewers,Followers
1,6091677300,211845,310998,25610,5310163
2,5644590915,515280,387315,10976,1767635
3,3970318140,517740,300575,7714,3944850
4,3671000070,123660,285644,29602,8938903
5,3668799075,82260,263720,42414,1563438
6,3360675195,136275,115633,24181,4074287
7,3301867485,147885,68795,18985,508816
8,2928356940,122490,89387,22381,3530767
9,2865429915,92880,125408,12377,2607076
10,2834436990,108780,142067,25664,5265659


> Filtering out this data for the Watch time, Stream time, Peak Viewers, Average Viewers, and Amount of followers for the Top 50 Twitch Streamers is done to create more usable and useful subsets.

- We will name this dataframe "Top50Streamers" by running the code below.

In [None]:
Top50Streamers = df.iloc[1:51,1:6]

> At any time, using this name, the data set can be encoded with the print() function. (example below)

In [None]:
print(Top50Streamers)

    Watch time(Minutes)  Stream time(minutes)  Peak viewers  Average viewers  \
1            6091677300                211845        310998            25610   
2            5644590915                515280        387315            10976   
3            3970318140                517740        300575             7714   
4            3671000070                123660        285644            29602   
5            3668799075                 82260        263720            42414   
6            3360675195                136275        115633            24181   
7            3301867485                147885         68795            18985   
8            2928356940                122490         89387            22381   
9            2865429915                 92880        125408            12377   
10           2834436990                108780        142067            25664   
11           2832930285                128490         89170            21739   
12           2674646715                 

## Finishing Up: Exporting the new subset.
- Now that more efficient subsets of data have been created, it's time to export them as .csv files.
- To export a .csv in our directory, run the below example code of the filename (Top50Streamers) at the beginning and in the parenthesis of the function .to_csv()

In [None]:
Top50Streamers.to_csv("Top50Streamers.csv", index=False)

> index=False must be included in the parentheses to eliminate the unwanted columns from the original file from being exported.