# Top Streamer Data Acquisition

#### Summary:
1. Importing Dependencies
2. Pull CSV file
3. Filter Data to only include English Language
4. Sort Data based on Followers
5. Data cleanup
6. Save data to CSV as top_50_channel.csv

Data Source: https://www.kaggle.com/aayushmishra1512/twitchdata


### 1. Import Dependencies:

In [11]:
# Importing Dependencies
import pandas as pd

### 2. Pull CSV File:

In [12]:
# Read twitch streamer file into panda dataframe
# We saved the kaggle data under the name twitchdata-update.csv

df_streamer = pd.read_csv("Data/twitchdata-update.csv")

df_streamer.head()


Unnamed: 0,Channel,Watch time(Minutes),Stream time(minutes),Peak viewers,Average viewers,Followers,Followers gained,Views gained,Partnered,Mature,Language
0,xQcOW,6196161750,215250,222720,27716,3246298,1734810,93036735,True,False,English
1,summit1g,6091677300,211845,310998,25610,5310163,1370184,89705964,True,False,English
2,Gaules,5644590915,515280,387315,10976,1767635,1023779,102611607,True,True,Portuguese
3,ESL_CSGO,3970318140,517740,300575,7714,3944850,703986,106546942,True,False,English
4,Tfue,3671000070,123660,285644,29602,8938903,2068424,78998587,True,False,English


### 3. Filter Data:

In [13]:
# Filter data based on language (English only)
# Putting more focus in streamer with English language only
df_streamer_english = df_streamer[df_streamer['Language'] == 'English']
df_streamer_english.head()

Unnamed: 0,Channel,Watch time(Minutes),Stream time(minutes),Peak viewers,Average viewers,Followers,Followers gained,Views gained,Partnered,Mature,Language
0,xQcOW,6196161750,215250,222720,27716,3246298,1734810,93036735,True,False,English
1,summit1g,6091677300,211845,310998,25610,5310163,1370184,89705964,True,False,English
3,ESL_CSGO,3970318140,517740,300575,7714,3944850,703986,106546942,True,False,English
4,Tfue,3671000070,123660,285644,29602,8938903,2068424,78998587,True,False,English
5,Asmongold,3668799075,82260,263720,42414,1563438,554201,61715781,True,False,English


### 4. Sort Data in Descending Order:

In [14]:
# Sorting values by followers
df_filtered = df_streamer_english.sort_values(by = 'Followers', ascending = False)
df_filtered.head()

Unnamed: 0,Channel,Watch time(Minutes),Stream time(minutes),Peak viewers,Average viewers,Followers,Followers gained,Views gained,Partnered,Mature,Language
4,Tfue,3671000070,123660,285644,29602,8938903,2068424,78998587,True,False,English
96,shroud,888505170,30240,471281,29612,7744066,833587,30621257,True,False,English
35,Myth,1479214575,134760,122552,9396,6726893,1421811,37384058,True,False,English
90,pokimane,964334055,56505,112160,16026,5367605,2085831,45579002,True,False,English
1,summit1g,6091677300,211845,310998,25610,5310163,1370184,89705964,True,False,English


### 5. Cleanup Data:

In [15]:
# Check to see if we have duplicates
print("Length of unique streamer: " + str(len(df_streamer_english['Channel'].unique())))
print("Length of streamer: " + str(len(df_streamer_english['Channel'])))

Length of unique streamer: 485
Length of streamer: 485


In [16]:
# Reset Index and getting the top 50 channel on twitch
df_filtered = df_filtered.reset_index(drop = True)
df_top_50 = df_filtered.head(50)
df_top_50.head()

Unnamed: 0,Channel,Watch time(Minutes),Stream time(minutes),Peak viewers,Average viewers,Followers,Followers gained,Views gained,Partnered,Mature,Language
0,Tfue,3671000070,123660,285644,29602,8938903,2068424,78998587,True,False,English
1,shroud,888505170,30240,471281,29612,7744066,833587,30621257,True,False,English
2,Myth,1479214575,134760,122552,9396,6726893,1421811,37384058,True,False,English
3,pokimane,964334055,56505,112160,16026,5367605,2085831,45579002,True,False,English
4,summit1g,6091677300,211845,310998,25610,5310163,1370184,89705964,True,False,English


### 6. Saving Data:

In [None]:
# Writing to csv in data called top_50_channel.csv
df_top_50.to_csv("Data/top_50_channel.csv", index = False)