# Choose a Data Set

You can choose to analyze any data that you would like! Remember, you need 1000 rows of non-null data in order to get 5 points for the "Data" criteria of my [rubric](https://docs.google.com/document/d/1s3wllcF3LLnytxwD8mZ-BCypXKnfaahnizWGNojT-B4/edit?usp=sharing). Consider looking at [Kaggle](https://www.kaggle.com/datasets) or [free APIs](https://free-apis.github.io/#/browse) for datasets of this size. Alternatively, you can scrape the web to make your own dataset! :D

Once you have chosen your dataset, please read your data into a dataframe and call `.info()` below. If you don't call `info` I will give you 0 points for the first criteria described on the [rubric](https://docs.google.com/document/d/1s3wllcF3LLnytxwD8mZ-BCypXKnfaahnizWGNojT-B4/edit?usp=sharing).

In [None]:
# Read data into a dataframe and call info()
    # Example call:
    # df = pd.DataFrame({"A":[1, 2, 3], "B":[4, 5, 6]})
    # df.info()
    

# My Question

# Does North America SUCK at League of Legends?
# Relative to Korea, which is considered the best region, how does North America compare in game length, how much game length affects the result, and international win rate?

# My Analysis

In [1]:
import pandas as pd
import seaborn as sns
import requests
from bs4 import BeautifulSoup
df=pd.read_csv('LeagueofLegends.csv')
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7620 entries, 0 to 7619
Data columns (total 57 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   League            7620 non-null   object
 1   Year              7620 non-null   int64 
 2   Season            7620 non-null   object
 3   Type              7620 non-null   object
 4   blueTeamTag       7582 non-null   object
 5   bResult           7620 non-null   int64 
 6   rResult           7620 non-null   int64 
 7   redTeamTag        7583 non-null   object
 8   gamelength        7620 non-null   int64 
 9   golddiff          7620 non-null   object
 10  goldblue          7620 non-null   object
 11  bKills            7620 non-null   object
 12  bTowers           7620 non-null   object
 13  bInhibs           7620 non-null   object
 14  bDragons          7620 non-null   object
 15  bBarons           7620 non-null   object
 16  bHeralds          7620 non-null   object
 17  goldred       

In [2]:
df.describe

<bound method NDFrame.describe of      League  Year  Season    Type blueTeamTag  bResult  rResult redTeamTag  \
0     NALCS  2015  Spring  Season         TSM        1        0         C9   
1     NALCS  2015  Spring  Season         CST        0        1        DIG   
2     NALCS  2015  Spring  Season         WFX        1        0         GV   
3     NALCS  2015  Spring  Season         TIP        0        1         TL   
4     NALCS  2015  Spring  Season         CLG        1        0         T8   
...     ...   ...     ...     ...         ...      ...      ...        ...   
7615    TCL  2018  Spring  Season          YC        0        1        SUP   
7616    TCL  2018  Spring  Season         GAL        0        1         DP   
7617    OPL  2018  Spring  Season         SIN        0        1         DW   
7618    OPL  2018  Spring  Season         LGC        1        0        TTC   
7619    OPL  2018  Spring  Season         TTC        0        1        LGC   

      gamelength             

In [38]:
Korea=df.loc[df['League'] == 'LCK']
Korea['gamelength'].mean()

38.811764705882354

In [39]:
NA=df.loc[df['League'] == 'NALCS']
NA['gamelength'].mean()

36.956761006289305

# Korea has on average longer games than North America

In [29]:
Korea['gamelength'].corr(Korea['rResult'])

0.07828043049054842

# My Answer

In [30]:
NA['gamelength'].corr(NA['rResult'])

0.03446126015075999

# Game length has little to no impact on the result of a game for both North America and Korea

In [90]:
inter=df.loc[df['Type'] == 'International']
SKTBLUE=inter.loc[inter['blueTeamTag']=='SKT']
TSMBLUE=inter.loc[inter['blueTeamTag']=='TSM']
SKTRED=inter.loc[inter['redTeamTag']=='SKT']
TSMRED=inter.loc[inter['redTeamTag']=='TSM']

In [96]:
TSMREDRESULT=TSMRED.loc[TSMRED['rResult'] == 1]
tsmred=TSMREDRESULT['rResult'].value_counts()
TSMBLUERESULT=TSMRED.loc[TSMRED['bResult'] == 1]
tsmblue=TSMBLUERESULT['bResult'].value_counts()

In [99]:
TSMREDRESULT = TSMRED.loc[TSMRED['rResult'] == 1]
SKTREDRESULT = SKTRED.loc[SKTRED['rResult'] == 1]
TSMBLUERESULT = TSMBLUE.loc[TSMBLUE['bResult'] == 1]
SKTBLUERESULT = SKTBLUE.loc[SKTBLUE['bResult'] == 1]

tsmred_wins = TSMREDRESULT['rResult'].count()
sktred_wins = SKTREDRESULT['rResult'].count()
tsmblue_wins = TSMBLUERESULT['bResult'].count()
sktblue_wins = SKTBLUERESULT['bResult'].count()

win_counts = pd.DataFrame({
    'Blue Wins': [sktblue_wins, tsmblue_wins],
    'Red Wins': [sktred_wins, tsmred_wins]
}, index=['SKT', 'TSM'])

win_counts

Unnamed: 0,Blue Wins,Red Wins
SKT,44,41
TSM,20,19


# SKT, which is Korea's best team, has significantly more wins internationally than TSM, which is North America's best team.

### The data shows that, on average, game length for Korea is longer than game length for North America. This is indicative of the games being closer, as the closer a game is, the more likely it will be to extend for a longer duration. However, there is little correlation between game length and a specific side winning for both North America and Korea, the reasoning behind this is quite obvious, since the map for League of Legends is mirrored, the side a team plays on will actually have a quite small impact on the result of a game, which in turn, reduces the correlation that game length and which side wins. The biggest denominator for what the actual deciding factor for North America being worse than Korea is the international wins. Internationally, TSM, North America's best team, only holds 39 wins, while SKT, Korea's best team, holds 85 wins. This gap is extremely massive, and shows that Korea is significantly better than North America. Does this mean that North America sucks? Absolutely. North America pales in comparison to Korea, the difference between them is 45 international wins, which may not seem like a lot, but most teams only play 4-5 matches a tournament. This means that on average, SKT has won 7 more tournaments than TSM ever has. Although, this is simply through theoretical math, in actuality, TSM has won 0 international trophies, while SKT has won 5. North America does not just suck at League, they are garbage at League.