As an avid League of Legends player, I will check this dataset out often for insights. This is simply the first time of many. Today we are just going to be checking out the game results dataset, which is the one called LeagueofLegends.csv. This one will be simple to analyze, and hopefully make way for more kagglers to analyze the other datasets, and the main one, _LeagueofLegends.csv

In [None]:

library(ggplot2) # Data visualization
library(readr) # CSV file I/O, e.g. the read_csv function
library(dplyr)
system("ls ../input")

**Loading the data**
-----------------

Let's  start by loading the data! 

In [None]:
games <- read.csv('../input/LeagueofLegends.csv',stringsAsFactors=FALSE)

In [None]:
dim(games)

Let's check out the  dimension of the dataset, and observe how many rows and columns  it contains.

What does the dataset contain? Let's have a look:

In [None]:
head(games)

So here's what we have: 

**Match History**: url where the obtained data resides in.

**League**:  The region in which the match took place. These are North_America for US and Canada games, Europe for european games, LCK for Korean games, LMS for Taiwan and Hong Kong, Season World Championship is for world championship games for that season,  Mid season invitational is the mid season tournament that takes place in May, and CBLOL is the brazilian league. 

**Season**: The season in which the game took place. They are divided in Playoffs for the end season games, and Season for the regular season games. Spring season games determines Mid-Season Invitational entries, and Summer season games determine World championship entries. There are also Regional (off-season) games, Winter games( which I personally haven't seen) and International games.

**Year**: Match year.

**Blue/Red Team tag**: Team tag for each team. 

**Result**: Which team was victorious, with 1 being a win and 0 being a loss.

**Game Length** : Length of the game in minutes.

**Player Name/Champion for Position.**: There are 5 positions: Top laners (usually tanky, durable and self-sufficient characters), Junglers(Roaming characters that help all other positions and protect their own jungle/invade the enemy's jungle), Mid laners (Usually mages/assassins, heavy damage dealers and/or zone controlling characters), ADC( Long range marksman focused on dealing consistent damage or DPS), Support (support type character which focuses on protecting and assisting their team, as well as mantaining an upper hand on vision of the map.). These columns contain the name of the player of the team for a specific role in that match, as well as the name of the Champion(character) that the player was using. Almost every Champion is designed with a role or position in mind.

Now let's look at the general structure of the data.

In [None]:
str(games)

Well something is strange. There are 111 unique blue team tags, but only 109 red team tags. Looks like some times have never played on the red side.

There's also teams that are duplicated, since as we can see, ahq appears twice with lowercase and uppercase letters. 

Let's find out why:

In [None]:
head(games[games$redTeamTag=='AHQ',])
head(games[games$redTeamTag=='ahq',])

It seems like in their home turf, AHQ is spelled with lowercase letters, which is not the case on International tournaments. In case this happens with other teams,  let's just rename all teams to uppercase.

In [None]:
games$blueTeamTag <- (toupper(games$blueTeamTag))
games$redTeamTag <- (toupper(games$redTeamTag))

str(games$blueTeamTag)

Well something's still not right. There's a team with no name! Let's have a look at that match.

In [None]:
games[games$blueTeamTag == "" ,]

Let's fix that match. And continue on with the visualizations.

In [None]:
games$blueTeamTag[games$blueTeamTag == ""] <- "FW"

## Data Visualizations ##

First, let's ask ourselves some questions that could be answered if the current state of our dataset(Later we can do some modifications to the dataset to find even more insights). 

What is the average game time for each League?

What teams are the slowest/fastest to end a game?

What role has the most diversity?

What are the most/least played champions?

Let's answer these questions for now.

In [None]:
str(games)

In [None]:
#Average game time for each league.

ggplot(games, aes(x=League, y=gamelength, fill = League)) +
    stat_summary(fun.y="mean",geom="bar")+
    ggtitle("Average game time for each League")

On average, games on the brazilian league are the ones taking the most to finish, with over 40 minutes. On the other hand, games taking place on the Mid Season Invitational annual world tournament take the least, which, as those who have watched the games know, are usually stomps.

In [None]:
#Average game time per team. Since the team can be on the blue or red side, things could get complicated.
#I'll create a for loop to fill a dataframe with  the Team, number of games and a sum of their length.
#If anyone can provide a more efficient solution, I'd be glad.


teams<- data.frame(Team = unique(games$blueTeamTag),TotalGames = 0,TotalLength=0, AvgGameLength = NA)
for(i in 1:nrow(teams)){
    
    teams$TotalGames[i] <- nrow(games[games$blueTeamTag == teams$Team[i] ,]) 
    teams$TotalGames[i] <- teams$TotalGames[i] + nrow(games[games$redTeamTag==teams$Team[i],])
    teams$TotalLength[i] <- sum(games$gamelength[games$blueTeamTag == teams$Team[i]])
    teams$TotalLength[i] <- teams$TotalLength[i] + sum(games$gamelength[games$redTeamTag == teams$Team[i]])
    teams$AvgGameLength[i] <- teams$TotalLength[i]/teams$TotalGames[i]
}


head(teams)

Now that we have created our teams dataframe with their average game length, let's plot which teams take the most time, and which take the least. But since some teams have almost no games under their belt, let's also take into account those who have at least 10 games or more.

In [None]:
teams <- teams[teams$TotalGames>9,]

In [None]:
#Teams that take the least amount of time to end a game.

ggplot(teams[order(teams$AvgGameLength),][1:10,], aes(x=Team, y=AvgGameLength, fill = Team)) +
    geom_bar(stat="identity")+
    ggtitle("Fastest teams")

The team that takes the least amount of time to finish a game is DOR. Among the most popular fastest teams are Edward Gaming (EDG) from China, FlyQuest (FLY) from North America, Immortals (from NA), Splyce (SPY) from Europe, and back in their day, Samsung White (SSW now Samsung Galaxy) from Korea. On average these teams take 30 minutes to finish a game. Now which teams take the most amount of time?

In [None]:
#Teams that take the most amount of time to end a game.

ggplot(teams[order(-teams$AvgGameLength),][1:10,], aes(x=Team, y=AvgGameLength, fill = Team)) +
    geom_bar(stat="identity")+
    ggtitle("Slowest teams")

Wow these guys take 40 minutes on average to finish a game, 10 more than the ones that take the least amount of time. Some popular teams include, CJ Entus from Korea, Najin (NJE) also from Korea, and back in their day, Team Dragon Knights (TDK) from North America. 

In [None]:
#What role hast the most diversity? This is done by counting distinct champions in each role
#And comparing which team has the most unique champion appearances.

roleCount <- data.frame(Role=c("Top","Jungle","Mid","ADC","Support"),Count = 0)
roleCount$Count[roleCount$Role=="Top"] <- max(c(length(unique(games$blueTopChamp)),
                                                length(unique(games$redTopChamp))))
roleCount$Count[roleCount$Role=="Jungle"] <- max(c(length(unique(games$blueJungleChamp)),
                                                length(unique(games$redJungleChamp))))
roleCount$Count[roleCount$Role=="Mid"] <- max(c(length(unique(games$blueMiddleChamp)),
                                                length(unique(games$redMiddleChamp))))
roleCount$Count[roleCount$Role=="ADC"] <- max(c(length(unique(games$blueADCChamp)),
                                                length(unique(games$redADCChamp))))
roleCount$Count[roleCount$Role=="Support"] <- max(c(length(unique(games$blueSupportChamp)),
                                                length(unique(games$redSupportChamp))))

ggplot(roleCount, aes(x=Role, y=Count, fill = Role)) +
    geom_bar(stat="identity")+
    ggtitle("Most Diverse Role")

Top and Mid are by far the most diverse roles, with over 60 unique champions played there.  This is no surprise, as when the meta changes, this two roles are the one changing the most. ADC is the most stagnant role, with 25 unique champions played in the role.   This makes sense, as ADC is a fairly strict role on which champions should be played there ( I even find 25 a bit too much considering champion.gg lists about 18 champions as ADCs). 

In [None]:
#Most/Least Played champions. Again I would appreciate a more efficient approach to this.

TopChamps <- unique(games$blueTopChamp)
TopChamps <- append(TopChamps,unique(games$redTopChamp[!games$redTopChamp %in% TopChamps]))

JungleChamps <- unique(games$blueJungleChamp)
JungleChamps <- append(JungleChamps,unique(games$redJungleChamp[!games$redJungleChamp %in% JungleChamps]))

MiddleChamps <- unique(games$blueMiddleChamp)
MiddleChamps <- append(MiddleChamps,unique(games$redMiddleChamp[!games$redMiddleChamp %in% MiddleChamps]))

ADCChamps <- unique(games$blueADCChamp)
ADCChamps <- append(ADCChamps,unique(games$redADCChamp[!games$redADCChamp %in% ADCChamps]))

SupportChamps <- unique(games$blueSupportChamp)
SupportChamps <- append(SupportChamps,unique(games$redSupportChamp[!games$redSupportChamp %in% SupportChamps]))

AllChamps <- TopChamps
AllChamps <- append(AllChamps,JungleChamps[!JungleChamps %in% AllChamps])
AllChamps <- append(AllChamps,MiddleChamps[!MiddleChamps %in% AllChamps])
AllChamps <- append(AllChamps,ADCChamps[!ADCChamps %in% AllChamps])
AllChamps <- append(AllChamps,SupportChamps[!SupportChamps %in% AllChamps])

ChampCount <- data.frame(Champion= AllChamps, PlayCount = 0)

for(i in 1:length(AllChamps)){
    #This will be long...
    
    ChampCount$PlayCount[i] <- nrow(games[games$redTopChamp==AllChamps[i],])
    ChampCount$PlayCount[i] <- ChampCount$PlayCount[i] + nrow(games[games$blueTopChamp==AllChamps[i],])
    
    ChampCount$PlayCount[i] <- ChampCount$PlayCount[i] + nrow(games[games$redJungleChamp==AllChamps[i],])
    ChampCount$PlayCount[i] <- ChampCount$PlayCount[i] + nrow(games[games$blueJungleChamp==AllChamps[i],])
    
    ChampCount$PlayCount[i] <- ChampCount$PlayCount[i] + nrow(games[games$redMiddleChamp==AllChamps[i],])
    ChampCount$PlayCount[i] <- ChampCount$PlayCount[i] + nrow(games[games$blueMiddleChamp==AllChamps[i],])
    
    ChampCount$PlayCount[i] <- ChampCount$PlayCount[i] + nrow(games[games$redADCChamp==AllChamps[i],])
    ChampCount$PlayCount[i] <- ChampCount$PlayCount[i] + nrow(games[games$blueADCChamp==AllChamps[i],])
    
    ChampCount$PlayCount[i] <- ChampCount$PlayCount[i] + nrow(games[games$redSupportChamp==AllChamps[i],])
    ChampCount$PlayCount[i] <- ChampCount$PlayCount[i] + nrow(games[games$blueSupportChamp==AllChamps[i],])
}

rm(AllChamps)

In [None]:
##10 Most used Champions

ggplot(ChampCount[order(-ChampCount$PlayCount),][1:10,], aes(x=Champion, y=PlayCount, fill = Champion)) +
    geom_bar(stat="identity")+
    ggtitle("Most used champions")

In [None]:
ChampCount[ChampCount$Champion == "Jhin",]

So RekSai, in the jungle,  is the most used champ by quite a bit, followed by Sivir as ADC and Gragas in the Jungle/Top.  All three count for approximately one third of the games.

In [None]:
ggplot(ChampCount[order(ChampCount$PlayCount),][1:10,], aes(x=Champion, y=PlayCount, fill = Champion)) +
    geom_bar(stat="identity")+
    ggtitle("Least used champions")

And there are the least used champs (MonkeyKing = Wukong in Chinese).  Some champions have only been used once. A huge contrast with the most used champions.  

Now let's add the teams winrate to our teams Dataframe.

In [None]:
winners <- ifelse(games$bResult == 1,games$blueTeamTag,games$redTeamTag)
games$Winner <- winners

for(i in 1:length(teams$Team))
    {
    teams$Wins[i] <- nrow(games[games$Winner == teams$Team[i],])
    teams$Winrate[i] <- round(teams$Wins[i]/teams$TotalGames[i],4)
}

head(teams)

In [None]:
null