# Capstone Project 2 Proposal - League of Legends Competitive Dataset

# Michael Phillips

## What is League of Legends?

Leage of Legends (LoL) is an online, 5 vs. 5 competitive PC game. It happens to be one of, if not the most, popular games on any platform. In September, 2016 the worldwide player count was estimated to be over 100 million by the game's developer - Riot Games. Despite that, LoL has surprisingly had very little mainstream coverage which is most likely due to the general complexity of the game. It's hard to learn the basics of the game, let alone actually be good at it.

If you have never heard of LoL, it might be helpful to imagine pick-up basketball games down at the park or gym. 5 vs. 5 competitive matches. Different tactics required for different players on each team. The same rules (more or less) no matter where or when you play. Not that difficult to understand the goals and how to win, but mastery takes a very long time.

Another layer on top of the base game is that LoL has a professional league. The top prize for the best team is over \$5 million dollars. The average player makes a six-figure income, with one player rumored to make $3 million dollars per year. Peak concurrent viewership for the league finals in 2016 was 14.7 million people, a number that compares favorably with recent MLB World Series TV ratings.

So, now that you have some idea of what LoL is, why is it exciting to have a dataset pulled from competitive matches?

Much the same way that data informs other professional sports throughout the world, data can show trends and best-practices in LoL as well.

## 1. What is the problem you want to solve?

League of Legends is very complex. Just a cursory tour of this dataset shows a wealth of variables covering many aspects of the game. 

While advanced statistics exist for traditional professional sports that quantify how and why teams win, there are no equivalent metrics for LoL. I want to find those signals in the data. 

## 2. Who is your client and why do they care about this problem?

I envision my client being a team owner or coach. I'll reduce the drive to win in professional sports to one thing - money. 

Riot has begun the process of franchising the professional LoL league sometime within the next six months to a year. This will function similarly to franchising in other professional sports. The teams that secure a franchise spot will be cut into the viewership revenue (the pot is roughly \$400 million dollars guaranteed over the next several years), as well as sponsorship, advertising, and merchandising deals that will bring in unquantifiable additional revenue.

The LoL league is highly attractive for branding reasons as well. Professional NBA, MLB, and soccer teams have begun buying LoL teams over the past 18 months. LoL is a global game with reach into the far corners of the globe. Being able to spread a brand worldwide to a complementary set of fans/viewers could be invaluable.

## 3. What data are you going to use?

The League of Legends competitive dataset was acquired from a public Kaggle repo, provided by user Chuck Ephron. It has complete data for 2015 up through the recent 'Mid-Season Invitational' tournament.

## 4. Outline your approach to solving this problem?

I plan to get a better sense of what the data holds using EDA methods. 

I think some data wrangling will be necessary to get the data into the form I would like, which is one row per team/game. I also want to split up the 'gold values' which are currently in list form into their own columns. 

Lastly, I will use machine learning to go through the wrangled data and find the signals that most strongly correlate with winning. 

## 5. What are you deliverables?

I will submit an EDA of the data, with my own analysis. The most important features as detailed by the machine learning techniques I use, and (ideally) a baseline set of statistics that correlate to winning, perhaps broken down by year as LoL does evolve significantly over time. 

Lastly, I will present my findings in a Tableau slidedeck.

## Dataset Link:

https://www.kaggle.com/chuckephron/leagueoflegends

In [None]:
import pandas as pd

In [12]:
df = pd.read_csv('_LeagueofLegends.csv')

In [13]:
df.shape

(3802, 56)

In [14]:
df.head()


Unnamed: 0,MatchHistory,League,Season,Year,blueTeamTag,bResult,rResult,redTeamTag,gamelength,golddiff,...,redMiddle,redMiddleChamp,goldredMiddle,redADC,redADCChamp,goldredADC,redSupportChamp,redSupport,goldredSupport,redBans
0,http://matchhistory.na.leagueoflegends.com/en/...,North_America,Spring_Season,2015,TSM,1,0,C9,40,"[0, 0, -14, -65, -268, -431, -488, -789, -494,...",...,Hai,Fizz,"[475, 475, 552, 842, 1178, 1378, 1635, 1949, 2...",Sneaky,Sivir,"[475, 475, 532, 762, 1097, 1469, 1726, 2112, 2...",Thresh,LemonNation,"[515, 515, 577, 722, 911, 1042, 1194, 1370, 14...","['Tristana', 'Leblanc', 'Nidalee']"
1,http://matchhistory.na.leagueoflegends.com/en/...,North_America,Spring_Season,2015,CST,0,1,DIG,38,"[0, 0, -26, -18, 147, 237, -152, 18, 88, -242,...",...,Shiphtur,Azir,"[475, 475, 552, 786, 1097, 1389, 1660, 1955, 2...",CoreJJ,Corki,"[475, 475, 532, 868, 1220, 1445, 1732, 1979, 2...",Annie,KiWiKiD,"[515, 515, 583, 752, 900, 1066, 1236, 1417, 15...","['RekSai', 'Janna', 'Leblanc']"
2,http://matchhistory.na.leagueoflegends.com/en/...,North_America,Spring_Season,2015,WFX,1,0,GV,40,"[0, 0, 10, -60, 34, 37, 589, 1064, 1258, 913, ...",...,Keane,Azir,"[475, 475, 533, 801, 1006, 1233, 1385, 1720, 1...",Cop,Corki,"[475, 475, 533, 781, 1085, 1398, 1782, 1957, 2...",Janna,BunnyFuFuu,"[515, 515, 584, 721, 858, 1002, 1168, 1303, 14...","['Leblanc', 'Zed', 'RekSai']"
3,http://matchhistory.na.leagueoflegends.com/en/...,North_America,Spring_Season,2015,TIP,0,1,TL,41,"[0, 0, -15, 25, 228, -6, -243, 175, -346, 16, ...",...,Fenix,Lulu,"[475, 475, 532, 771, 1046, 1288, 1534, 1776, 2...",KEITH,KogMaw,"[475, 475, 532, 766, 1161, 1438, 1776, 1936, 2...",Janna,Xpecial,"[515, 515, 583, 721, 870, 1059, 1205, 1342, 15...","['RekSai', 'Rumble', 'LeeSin']"
4,http://matchhistory.na.leagueoflegends.com/en/...,North_America,Spring_Season,2015,CLG,1,0,T8,35,"[40, 40, 44, -36, 113, 158, -121, -191, 23, 20...",...,Slooshi8,Lulu,"[475, 475, 532, 807, 1042, 1338, 1646, 1951, 2...",Maplestreet8,Corki,"[475, 475, 532, 792, 1187, 1488, 1832, 2136, 2...",Annie,Dodo8,"[475, 475, 538, 671, 817, 948, 1104, 1240, 136...","['Rumble', 'Sivir', 'Rengar']"


In [15]:
df.dtypes

MatchHistory        object
League              object
Season              object
Year                 int64
blueTeamTag         object
bResult              int64
rResult              int64
redTeamTag          object
gamelength           int64
golddiff            object
goldblue            object
bKills              object
bTowers             object
bInhibs             object
bDragons            object
bBarons             object
bHeralds            object
goldred             object
rKills              object
rTowers             object
rInhibs             object
rDragons            object
rBarons             object
rHeralds            object
blueTop             object
blueTopChamp        object
goldblueTop         object
blueJungle          object
blueJungleChamp     object
goldblueJungle      object
blueMiddle          object
blueMiddleChamp     object
goldblueMiddle      object
blueADC             object
blueADCChamp        object
goldblueADC         object
blueSupport         object
b