# Esportsearning.com website scraper
#### Author: Kelvin García Muñiz
***
This notebook was created to scrape data from the website [esportsearnings.com](https://www.esportsearnings.com/) to perform a data analysis study. This was done using primarily the [Selenium library](https://selenium-python.readthedocs.io/) with the assistance of [BeautifulSoup](https://beautiful-soup-4.readthedocs.io/en/latest/index.html?highlight=find_all#). 

The scraper produces .csv files, each corresponding to the following data:
>1. [Video Games Data](#section1)
>2. [Players Data](#section2)
>3. [Esports Teams Data](#section3)
>4. [Participant Countries Data](#section4)

To view the analysis head over to the corresponding Github Repository. 

### Imports

In [None]:
from selenium import webdriver
from selenium.webdriver.common.by import By
import requests
import pandas as pd
from bs4 import BeautifulSoup
import re

### Make Dataframes function

The make_df function generates a dataframe format depending on the data to scrap. 

In [None]:
def make_df(data):
    if data == "games":
        print("True")
        df = pd.DataFrame(columns=['Game','Prize', 'NumPlayer', 'NumTournaments', 'Year'])
        return df
    elif data == "players":
        df = pd.DataFrame(columns=['Username','Name', 'Country', 'Prize', 'Year'])
        return df
    elif data == "teams":
        df = pd.DataFrame(columns=['Team', 'Prize', 'NumTournaments', 'Year'])
        return df
    elif data == "countries":
        df = pd.DataFrame(columns=['Country', 'Prize', 'NumPlayers', 'Year'])
        return df
    else:
        print("Error: The data to scrape is not an option")
        return False

The following variables determine the starting year for the analysis and the range of years to analyze. They are accessible for easy changes to the scope of the project. 

In [None]:
starting_year = 2002
year_range = 20

## Scrape Games Data
<a id="section1"></a>

In [None]:
driver = webdriver.Firefox()
df = make_df("games")
if(df.empty): #if the dataframe is not empty then run
    for year in range(year_range):
        urlYear = str(starting_year+year)
        url = "https://www.esportsearnings.com/history/" + urlYear + "/games"
        
        driver.get(url)

        games = driver.find_elements(By.CLASS_NAME, "detail_list_player")
        data = driver.find_elements(By.CLASS_NAME, "detail_list_prize") #the rest of the information is all contained under the same label
        gamesList = []
        prizesList = []
        numPlayersList = []
        tournamentsList = []
        for game in range(len(games)):
            gamesList.append(games[game].text)
        i = 7
        while i<(len(data)-20): #the website uses the same label to showcase other information. The "-20" attends to such scenario
            prizesList.append(data[i].text)
            numPlayersList.append(data[i+1].text)
            tournamentsList.append(data[i+2].text)
            i+=3
        data_tuples = list(zip(gamesList[0:],prizesList[0:], numPlayersList[0:], tournamentsList[0:])) # list of each players name and salary paired together
        temp_df = pd.DataFrame(data_tuples, columns=['Game','Prize', 'NumPlayer', 'NumTournaments']) # creates dataframe of each tuple in list
        temp_df['Year'] = urlYear # adds season beginning year to each dataframe
        df = df.append(temp_df) # appends to master dataframe
            # display(df)
    driver.close()
    display(df)
    df.to_csv("gamesData.csv")
else:
    driver.close()
    print("\tCheck make_df function argument")


## Scrape Players Data
<a id="section2"></a>

In [None]:
driver = webdriver.Firefox()
df = make_df("players")
if(df.empty): #if the dataframe is not empty then run
    for year in range(year_range):
        urlYear = str(starting_year+year)
        url = "https://www.esportsearnings.com/history/" + urlYear + "/top_players"
        
        driver.get(url)
        response = requests.get(url) #to be used by BS

        #the names of the countries are displayed in the website as flag images
        soup = BeautifulSoup(response.content, 'html.parser') #here we use BS since the selenium could not separate the title of the image from the data selected.
        results = soup.find_all("img")

        playerList = driver.find_elements(By.CLASS_NAME, "detail_list_player")
        data = driver.find_elements(By.CLASS_NAME, "detail_list_prize") #the rest of the information is all contained under the same label
        playerUserList = []
        playerNameList = []
        prizesList = []
        countryList = []
        for i in results:
            countryList.append(i.get('title')) # gets the title property from the BS response
        i = 0
        while i<(len(playerList)-1):
            playerUserList.append(playerList[i].text)
            playerNameList.append(playerList[i+1].text)
            i+=2

        j = 7
        while j<(len(data)-20): #the website uses the same label to showcase other information. The "-20" attends to such scenario
            prizesList.append(data[j].text)
            j+=3
        
        data_tuples = list(zip(playerUserList[0:], playerNameList[0:], countryList[0:], prizesList[0:])) # list of each players name and salary paired together
        temp_df = pd.DataFrame(data_tuples, columns=['Username','Name', 'Country', 'Prize']) # creates dataframe of each tuple in list
        temp_df['Year'] = urlYear # adds season beginning year to each dataframe
        df = df.append(temp_df) # appends to master dataframe
        # display(df)

    driver.close()
    display(df)
    df.to_csv("playerData.csv")
else:
    driver.close()
    print("\tCheck make_df function argument")
    


## Scrape Teams Data
<a id="section3"></a>

In [None]:
driver = webdriver.Firefox()
df = make_df("teams")
if(df.empty): #if the dataframe is not empty then run
    for year in range(year_range):
        urlYear = str(starting_year+year)
        url = "https://www.esportsearnings.com/history/" + urlYear + "/teams"
        
        driver.get(url)

        teams = driver.find_elements(By.CLASS_NAME, "detail_list_player")
        data = driver.find_elements(By.CLASS_NAME, "detail_list_prize") #the rest of the information is all contained under the same label
        teamsList = []
        prizesList = []
        tournamentsList = []
        for team in range(len(teams)):
            teamsList.append(teams[team].text)
        i = 7
        while i<(len(data)-20): #the website uses the same label to showcase other information. The "-20" attends to such scenario
            prizesList.append(data[i].text)
            tournamentsList.append(data[i+1].text)
            i+=2
        data_tuples = list(zip(teamsList[0:],prizesList[0:], tournamentsList[0:])) # list of each players name and salary paired together
        temp_df = pd.DataFrame(data_tuples, columns=['Team','Prize', 'NumTournaments']) # creates dataframe of each tuple in list
        temp_df['Year'] = urlYear # adds season beginning year to each dataframe
        df = df.append(temp_df) # appends to master dataframe
            # display(df)
    driver.close()
    display(df)
    df.to_csv("teamsData.csv")
else:
    driver.close()
    print("\tCheck make_df function argument")


## Scrape Countries Data
<a id="section4"></a>

In [None]:
driver = webdriver.Firefox()
df = make_df("countries")
if(df.empty): #if the dataframe is not empty then run
    for year in range(year_range):
        urlYear = str(starting_year+year)
        url = "https://www.esportsearnings.com/history/" + urlYear + "/countries"
        
        driver.get(url)

        countries = driver.find_elements(By.CLASS_NAME, "detail_list_player")
        data = driver.find_elements(By.CLASS_NAME, "detail_list_prize") #the rest of the information is all contained under the same label
        countryList = []
        prizesList = []
        playersList = []
        for country in range(len(countries)):
            countryList.append(countries[country].text)
        i = 7
        while i<(len(data)-20): #the website uses the same label to showcase other information. The "-20" attends to such scenario
            prizesList.append(data[i].text)
            playersList.append(data[i+1].text)
            i+=2
        data_tuples = list(zip(countryList[0:],prizesList[0:], playersList[0:])) # list of each players name and salary paired together
        temp_df = pd.DataFrame(data_tuples, columns=['Country','Prize', 'NumPlayers']) # creates dataframe of each tuple in list
        temp_df['Year'] = urlYear # adds season beginning year to each dataframe
        df = df.append(temp_df) # appends to master dataframe
            # display(df)
    driver.close()
    display(df)
    df.to_csv("countriesData.csv")
else:
    driver.close()
    print("\tCheck make_df function argument")

## References: 
> 1. [Bryan Pfalzgraf via Medium](https://towardsdatascience.com/how-to-use-selenium-to-web-scrape-with-example-80f9b23a843a)
> 2. [Selenium Documentation](https://selenium-python.readthedocs.io/)
> 3. [Beautiful Soup](https://beautiful-soup-4.readthedocs.io/en/latest/index.html?highlight=find_all#find-all)
> 4. [Stack Overflow](https://stackoverflow.com/)