# Context

League of Legends is a MOBA (multiplayer online battle arena) where 2 teams (blue and red) face off. There are 3 lanes, a jungle, and 5 roles. The goal is to take down the enemy Nexus to win the game

# Content

This dataset contains the first 10min. stats of approx. 10k ranked games (SOLO QUEUE) from a high ELO (DIAMOND I to MASTER). Players have roughly the same level.

There are 19 features per team (38 in total) collected after 10min in-game. This includes kills, deaths, gold, experience, level etc.

# Glossary

- Warding totem: An item that a player can put on the map to reveal the nearby area. Very useful for map/objectives control.
- Minions: NPC that belong to both teams. They give gold when killed by players.
- Jungle minions: NPC that belong to NO TEAM. They give gold and buffs when killed by players.
- Elite monsters: Monsters with high hp/damage that give a massive bonus (gold/XP/stats) when killed by a team.
- Dragons: Elite monster which gives team bonus when killed. The 4th dragon killed by a team gives a massive stats bonus. The 5th dragon (Elder Dragon) offers a huge advantage to the team.
- Herald: Elite monster which gives stats bonus when killed by the player. It helps to push a lane and destroys structures.
- Towers: Structures you have to destroy to reach the enemy Nexus. They give gold.
- Level: Champion level. Start at 1. Max is 18.

In [None]:
from IPython.display import Image
Image(filename='../input/lol-map/map.jpg') 

League of Legends' famous Summoner's Rift! As you can see, Blue Team's base is on the lower left corner whereas Red Team's base is on the upper right corner. In order to win, you have to destroy enemy team's base. As a Blue Team, in early game, you have easier acces to Herald but on the other hand Red Team has easier acces to Dragons all game long. After 20 minutes into the game, Baron Nashor is landing to the Rift! Unfortunately, our data contains 10 minutes of the games so I won't be able to study Baron Nashor's impact in the game which is, as an ex Diamond 2 player, I think one of the most important, if not the most important object in the game.

# Exploratory Data Analysis

#### Let's start with importing esential libraries.

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

In [None]:
df = pd.read_csv("../input/league-of-legends-diamond-ranked-games-10-min/high_diamond_ranked_10min.csv")
pd.set_option("display.max_columns", None)

In [None]:
df.head()

In [None]:
df.shape

#### Data has 38 features and 9879 rows. Let's check if there is any missing values. 

In [None]:
df.isnull().sum()

In [None]:
df.info()

#### Great! No missing values! Now, for readibility, let's create new column from `blueWins`.

In [None]:
df["whoWins"] = df.blueWins.map({0:"RedWins", 1:"BlueWins"})

#### All this data is 10 min data so per min datas kind of useless. I will drop those.

In [None]:
columns = ["redGoldPerMin", "blueGoldPerMin", "redCSPerMin", "blueCSPerMin"]
df = df.drop(columns,axis=1)

#### Let's look at distribution of winning sides.

In [None]:
palette=['#FF3500',"#0059FF"]

In [None]:
plt.figure(figsize=(4,3), dpi=200)
sns.countplot(data=df, x="whoWins", palette=palette);

In [None]:
df.whoWins.value_counts()

In [None]:
# Same win rate for both teams.
print(f'Blue Team has %{round(len(df[df["blueWins"]==1]) / len(df),2) * 100} win rate.')
print("-"*27)
print(f'Red Team has %{round(len(df[df["blueWins"]==0]) / len(df),2) * 100} win rate.')

# Win Rate When Has a Gold Advantage

#### Let's check win rate when team has a gold advantage.

In [None]:
fig, ax = plt.subplots(ncols=2, figsize=(12,4), dpi=200)
sns.barplot(x=df[df["blueGoldDiff"]>0]["blueWins"].replace({1:"Win", 0:"Lose"}).value_counts().index, y=df[df["blueGoldDiff"]>0]["blueWins"].replace({1:"Win", 0:"Lose"}).value_counts().values, ax=ax[0], palette=palette[::-1])
sns.barplot(x=df[df["redGoldDiff"]>0]["blueWins"].replace({0:"Win", 1:"Lose"}).value_counts().index, y=df[df["redGoldDiff"]>0]["blueWins"].replace({0:"Win", 1:"Lose"}).value_counts().values, ax=ax[1], palette=palette)

ax[0].set_title("Blue Win When Blue Has Gold Advantage")
ax[1].set_title("Red Win When Red Has Gold Advantage")

plt.tight_layout()

In [None]:
print(f'When Blue Team has a gold advantage into the 10 minutes of the game, Blue Team has %{round(len(df[(df["blueGoldDiff"]>0) & (df["blueWins"]==1)]) / len(df[df["blueGoldDiff"]>0]),2) * 100} win rate.')
print("-"*98)
print(f'When Red Team has a gold advantage into the 10 minutes of the game, Red Team has %{round(len(df[(df["redGoldDiff"]>0) & (df["blueWins"]==0)]) / len(df[df["redGoldDiff"]>0]),2) * 100} win rate.')

#### Both teams winrate is identical when has a gold advantage. Let's look into first blood's impact.

# Impact of the First Blood

In [None]:
df["whoFirstBlood"] = df.blueFirstBlood.map({0:"Red", 1:"Blue"})
plt.figure(figsize=(12,8), dpi=200)
g = sns.countplot(data=df, x="whoFirstBlood", hue="whoWins", palette=palette)

In [None]:
print(f'When Blue Team got first blood, their win rate is %{round(len(df[(df["blueFirstBlood"]==1) & df["blueWins"]==1]) / len(df[df["blueFirstBlood"]==1]) * 100,2)}')
print(f'When Red Team got first blood, their win rate is %{round(len(df[(df["redFirstBlood"]==1) & (df["blueWins"]==0)]) / len(df[df["redFirstBlood"]==1]) * 100,2)}')


# Dragon

In [None]:
fig, ax = plt.subplots(ncols=2, figsize=(12,4), dpi=200)
sns.countplot(data=df, x="blueDragons", hue="whoWins", ax=ax[0], palette=palette)
sns.countplot(data=df, x="redDragons",hue="whoWins", ax=ax[1], palette=palette)
plt.suptitle("Dragon's Impact On The Game")

ax[0].set_title("Blue Team")
ax[1].set_title("Red Team")

plt.tight_layout()

In [None]:
print(f"In {len(df)} games, {df.redDragons.value_counts()[1]} times Red Team got dragon. When Red Team got the dragon their win rate is %{round(len(df[(df['redDragons'] == 1) & (df['blueWins'] == 0)]) / df.blueDragons.value_counts()[1] * 100,2)}")
print(f"In {len(df)} games, {df.blueDragons.value_counts()[1]} times Blue Team got dragon. When Blue Team got the dragon their win rate is %{round(len(df[(df['blueDragons'] == 1) & (df['blueWins'] == 1)]) / df.blueDragons.value_counts()[1] * 100,2)}")
print(f"In {len(df)} games, {len(df[(df['blueDragons'] == 0) & (df['redDragons'] == 0)])} times neither team got dragon. In those game both team has %{round(len(df[(df['redDragons'] == 0) & (df['blueDragons'] == 0) & (df['blueWins'] == 1)]) / len(df[(df['blueDragons'] == 0) & (df['redDragons'] == 0)]) * 100,2)} win rate.")

#### When neither team got dragon, both team has %50 win rate.

# Herald

#### Given this data contains just the first 10 minutes of the game, getting herald is hard for both of the games, especially given the map location, Red Team got herald less times compared to Blue Team. Let's look into herald's impact on the game.

In [None]:
fig, ax = plt.subplots(ncols=2, figsize=(12,4), dpi=200)
sns.countplot(data=df, x="blueHeralds", hue="whoWins", ax=ax[0], palette=palette)
sns.countplot(data=df, x="redHeralds",hue="whoWins", ax=ax[1], palette=palette)
plt.suptitle("Herald's Impact On The Game")

ax[0].set_title("Blue Team")
ax[1].set_title("Red Team")

plt.tight_layout()

In [None]:
print(f"Out of {len(df)} games, Blue Team got herald {len(df[(df['blueHeralds'] == 1)])} times. In those games Blue Team's win rate is {round(len(df[(df['blueHeralds'] == 1) & (df['blueWins'] == 1)]) / len(df[(df['blueHeralds'] == 1)]) * 100,2)}.")
print(f"Out of {len(df)} games, Red Team got herald {len(df[(df['redHeralds'] == 1)])} times. In those games Red Team's win rate is {round(len(df[(df['redHeralds'] == 1) & (df['blueWins'] == 0)]) / len(df[(df['redHeralds'] == 1)]) * 100,2)}")
print(f"Out of {len(df)} games, neither team got herald {len(df[(df['redHeralds'] == 0) & (df['blueHeralds'] == 0)])} times. In those games both team's win rate is {round(len(df[(df['redHeralds'] == 0) & (df['blueHeralds'] == 0) & (df['blueWins'] == 1)]) / len(df[(df['redHeralds'] == 0) & (df['blueHeralds'] == 0)]), 2) * 100}.")

#### Just like dragon, when neither of the teams got herald, win rate is %50 for both of them.

# Elite monsters

In [None]:
fig, ax = plt.subplots(ncols=2, figsize=(12,4), dpi=200)
sns.countplot(data=df, x="blueEliteMonsters", hue="whoWins", ax=ax[0], palette=palette)
sns.countplot(data=df, x="redEliteMonsters",hue="whoWins", ax=ax[1], palette=palette)
plt.suptitle("Elite Monsters' Impact On The Game")

ax[0].set_title("Blue Team")
ax[0].set_xlabel("Number of elite monster blue team got")

ax[1].set_title("Red Team")
ax[1].set_xlabel("Number of elite monster red team got")

plt.tight_layout()

In [None]:
print(f"Out of {len(df)} games, Blue Team got both herald and dragon for {len(df[df['blueEliteMonsters'] == 2])} times. In those games Blue Team's win rate is %{round(len(df[(df['blueEliteMonsters'] == 2) & (df['blueWins'] == 1)]) / len(df[df['blueEliteMonsters'] == 2]) * 100,2)}")
print(f"Out of {len(df)} games, Red Team got both herald and dragon for {len(df[df['redEliteMonsters'] == 2])} times. In those games Red Team's win rate is %{round(len(df[(df['redEliteMonsters'] == 2) & (df['blueWins'] == 0)]) / len(df[df['redEliteMonsters'] == 2]) * 100,2)}")

In [None]:
print(f"When both teams got no elite monsters, Blue Team's win rate is %{round(len(df[(df['redEliteMonsters'] == 0) & (df['blueEliteMonsters'] == 0) & (df['blueWins'] == 1)]) /len(df[(df['redEliteMonsters'] == 0) & (df['blueEliteMonsters'] == 0)]) * 100,2)}.")

#### If a team has taken 2 elite monsters within 10 minutes, the probability of winning the game is very high.

#### If neither team none of dragon and herald, win rate is almost the same. 

#### At the end of 10 minutes, the probability of winning the game is very low, whichever team has not yet got an elite monster. 

# Total Minions Killed

In [None]:
fig, ax = plt.subplots(ncols=2, figsize=(8,4), dpi=200)

sns.histplot(data=df, x="blueTotalMinionsKilled", bins=25, ax=ax[0], color="b")
sns.histplot(data=df, x="redTotalMinionsKilled", bins=25, ax=ax[1], color="r")

plt.suptitle("Total Minions Killed")

ax[0].set_title("Blue Team")
ax[1].set_title("Red Team")

plt.tight_layout()

In [None]:
print(f"When Blue Team wins, their average minion score is {round(df[df['blueWins']==1]['blueTotalMinionsKilled'].mean(),2)}")
print(f"When Blue Team loses, their average minions score is {round(df[df['blueWins']==0]['blueTotalMinionsKilled'].mean(),2)}")
print("-"*70)
print(f"When Red Team wins, their average minion score is {round(df[df['blueWins']==0]['redTotalMinionsKilled'].mean(),2)}")
print(f"When Red Team loses, their average minions score is {round(df[df['blueWins']==1]['redTotalMinionsKilled'].mean(),2)}")

#### Minion scores are close to each other when teams wins or loses.

# Jungle minions killed

In [None]:
fig, ax = plt.subplots(ncols=2, figsize=(8,4), dpi=200)

sns.histplot(data=df, x="blueTotalJungleMinionsKilled", bins=25, ax=ax[0], color="b")
sns.histplot(data=df, x="redTotalJungleMinionsKilled", bins=25, ax=ax[1], color="r")

plt.suptitle("Total Minions Killed")

ax[0].set_title("Blue Team")
ax[1].set_title("Red Team")

plt.tight_layout()

In [None]:
print(f'When Blue Team wins, their average jungle minion score is {round(df[df["blueWins"]==1]["blueTotalJungleMinionsKilled"].mean(),2)}')
print(f'When Blue Team loses, their average jungle minion score is {round(df[df["blueWins"]==0]["blueTotalJungleMinionsKilled"].mean(),2)}')
print("-"*65)
print(f'When Red Team wins, their average jungle minion score is {round(df[df["blueWins"]==0]["redTotalJungleMinionsKilled"].mean(),2)}')
print(f'When Red Team loses, their average jungle minion score is {round(df[df["blueWins"]==1]["redTotalJungleMinionsKilled"].mean(),2)}')

#### Again, jungle minion score is close too.

# Experience

In [None]:
fig, ax = plt.subplots(ncols=2, figsize=(8,4), dpi=200)

sns.histplot(data=df, x="blueTotalExperience", bins=25, ax=ax[0], color="b")

sns.histplot(data=df, x="redTotalExperience", bins=25, ax=ax[1], color="r")

plt.tight_layout()

In [None]:
print(f'When Blue Team wins, their average total experince is {round(df[df["blueWins"] == 1].blueTotalExperience.mean(),2)}')
print(f'When Blue Team loses, their average total experince is {round(df[df["blueWins"] == 0].blueTotalExperience.mean(),2)}')
print("-"*70)
print(f'When Red Team wins, their average total experince is {round(df[df["blueWins"] == 0].redTotalExperience.mean(),2)}')
print(f'When Red Team loses, their average total experince is {round(df[df["blueWins"] == 1].redTotalExperience.mean(),2)}')

#### Experince alone does not mean much to most players. Let's look into level difference between teams when they win or lose.

# Average Level

In [None]:
print(f'When Blue Team wins, their average level is {round(df[df["blueWins"]==1].blueAvgLevel.mean(),2)}')
print(f'When Blue Team loses, their average level is {round(df[df["blueWins"]==0].blueAvgLevel.mean(),2)}')
print("-"*50)
print(f'When Red Team wins, their average level is {round(df[df["blueWins"]==0].redAvgLevel.mean(),2)}')
print(f'When Red Team loses, their average level is {round(df[df["blueWins"]==1].redAvgLevel.mean(),2)}')

#### Almost identical average level.

# Wards

#### For high-elo players, warding is really important. It secures you from dying to a jungle gank or it gives importank information such as if enemy team getting drake or herald. Let's check if warding has really a huge impact on the games.

In [None]:
fig, ax = plt.subplots(ncols=2, figsize=(8,4), dpi=200)

sns.histplot(data=df, x="blueWardsPlaced", bins=25, ax=ax[0], color="b")
ax[0].set_xlim(0,120)

sns.histplot(data=df, x="redWardsPlaced", bins=25, ax=ax[1], color="r")
ax[1].set_xlim(0,120)
plt.tight_layout()

In [None]:
print(f"When Blue Team wins, their average ward score is {round(df[df['blueWins'] == 1]['blueWardsPlaced'].mean(),2)}")
print(f"When Blue Team loses, their average ward score is {round(df[df['blueWins'] == 0]['blueWardsPlaced'].mean(),2)}")
print("-"*60)
print(f'When Red Team wins, their average ward score is {round(df[df["blueWins"] == 0]["redWardsPlaced"].mean(),2)}0')
print(f'When Red Team loses, their average ward score is {round(df[df["blueWins"] == 1]["redWardsPlaced"].mean(),2)}')
print("-"*60)
print(f'When Blue Team wins, they destroyed {round(df[df["blueWins"] == 1]["blueWardsDestroyed"].mean(),2)} wards per game.')
print(f'When Red Team wins, they destroyed {round(df[df["blueWins"] == 0]["redWardsDestroyed"].mean(),2)} wards per game.')

In [None]:
fig, ax = plt.subplots(ncols=2, figsize=(8,4), dpi=200)

sns.histplot(data=df, x="blueWardsDestroyed", bins=25, ax=ax[0], color="b")
ax[0].set_xlim(0,30)

sns.histplot(data=df, x="redWardsDestroyed", bins=25, ax=ax[1], color="r")
ax[1].set_xlim(0,30)
plt.tight_layout()

#### Ward scores for both team when they are both winning or losing is so close. As I said before, all those players are really good in this game and wards is accepted as one of the most important thing in the game. I wonder how correlated is warding and winning given almost all games really close in terms of ward score.

# Kills

In [None]:
fig, ax = plt.subplots(ncols=2, figsize=(8,4), dpi=200)

sns.histplot(data=df, x="blueKills", bins=25, ax=ax[0], color="b")

sns.histplot(data=df, x="redKills", bins=25, ax=ax[1], color="r")
plt.tight_layout()

In [None]:
print(f'KDA means Kills/Deaths/Assists. Lets chech KDAs of each team when win and lose.')
print(f'When Blue Team Wins, their average KDA is {round(df[df["blueWins"] == 1].blueKills.mean(),1)}/{round(df[df["blueWins"] == 1].blueAssists.mean(),1)}/{round(df[df["blueWins"] == 1].blueDeaths.mean(),1)}')
print(f'When Blue Team loses, their average KDA is {round(df[df["blueWins"] == 0].blueKills.mean(),1)}/{round(df[df["blueWins"] == 0].blueAssists.mean(),1)}/{round(df[df["blueWins"] == 0].blueDeaths.mean(),1)}')
print("-"*60)
print(f'When Red Team Wins, their average KDA is {round(df[df["blueWins"] == 0].redKills.mean(),1)}/{round(df[df["blueWins"] == 0].redAssists.mean(),1)}/{round(df[df["blueWins"] == 0].redDeaths.mean(),1)}')
print(f'When Red Team Wins, their average KDA is {round(df[df["blueWins"] == 1].redKills.mean(),1)}/{round(df[df["blueWins"] == 1].redAssists.mean(),1)}/{round(df[df["blueWins"] == 1].redDeaths.mean(),1)}')

#### KDA's almost identical. 

#### Now let's check correlation between winning game and other features.

# Correlation

In [None]:
corr = df.corr()["blueWins"].sort_values(ascending=False).drop("blueWins")

In [None]:
plt.figure(figsize=(12,8), dpi=200)
sns.barplot(x=corr.index, y=corr.values)
plt.xticks(rotation=90);

In [None]:
corr

### We can clearly see from the bar plot of correlation that, Gold Difference between the team is the most important feature in terms of winning the game. It is followed by Experience Difference, Total Gold, Total Experience, KDA's. Wonderfull! Let's make conclusions by those findings.

# Conclusion

As I said in the beginning of the analysis, myself is an ex Diamond 2 League of Legends player. This data contains top %0.4 players which are called as 'High-Elo Players' among League players. When you are in high-elo in League, you usually play with the same players most of the time because there is not much high-elo players. That means, most of players know their opponents in the game, their gamestyle, their warding times, their reactions to repeated ganks from junglers, etc. It is so much different from low-elo games where you play with different people every game. Knowing you opponent really impacts the playstyle of players. In League of Legends, there are so many little things to consider beyond our analysis here which contains so little information to begin with. 


Let's consider correlation table. It says, gold difference between people is the single most important thing in winning the game. That is of course true, because if you have more golds then you can buy more items and become more powerfull than your enemies. But how can you get more golds from your opponents? By krilling you enemies is one answer, especially in low-elo games, most of the players will tell you this. What about high-elo players? In high-elo unlike low-elo, players play the game more macro style, which means seeing the big picture. They don't try to kill their enemies, of course if they can they do, they try to gain advantage with little wins rather than killing enemies. What are those little wins? You can get better base timing by manipulating minions better than your opponent and by winning those 30-40 seconds can help your team secure first dragon. If you are in bot lane, pushing your lane means your jungler can have enemy team's bot side jungle. etc. Those little victories over your enemy helps you win the game.


I am not trying to explain League of Legends' macro gameplan or make you understand how you win the game. I haven't played the game for months actually. What I am trying to say is *DOMAIN KNOWLEDGE IS MORE IMPORTANT THAN YOU THINK*. If I wasn't a League player myself, I would just say 'Look at the correlation plot. Warding has 0.000087 correlation with Blue Team winning the game. It means nothing.'. Although seems like it has very little correlation with winning, I know that it is one of the most important thing in game. Because I know that, without warding, you can't get herald or dragons and getting elite monsters has a huge impact on winning the game. Maybe great data scientist who has long time experince would still understand this without domain knowledge, but junior data scientists like me, most of the time, wouldn't understand the importance of warding. This EDA really made me understand how important to have domain knowledge. From now on, when I am working on a dataset what I have little to no domain knowledge, before getting to work, I will try to learn about dataset's domain.


This EDA was really great experince for me. I really enjoyed it and I hope you enjoyed it too. I will be waiting for your feedback which is really important for my improvement. Thank you very much.

### Ban Yasuo Please!!!