# Questions à traiter :

Q1 : Peut-on prédire l'issue d'une partie à 10 min de jeu ?

Q1.1 : Quels sont les facteurs qui jouent le plus sur le winrate ?

Q1.2 : Quels sont ceux qui n'en n'ont pas ?


Q2 : À partir de combien d'avantage la game est gagnée ou perdue ?

Q3 : Est-ce que le blue-side à des avantages comparé au red-side ?

Q4 : Quelle est le meilleur monstre épique ?


# Imports du dataset:

In [4]:
import os

import altair as alt
import pandas as pd
import warnings
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

warnings.filterwarnings('ignore')

pd.set_option('display.max_columns', 999)
alt.data_transformers.enable('csv')
# __path__ correspond au dossier qui contient les fichiers sources :
# https://www.kaggle.com/datasets/bobbyscience/league-of-legends-diamond-ranked-games-10-min
# https://www.kaggle.com/datasets/bobbyscience/league-of-legends-soloq-ranked-games


# Path to the repository
__path__ = os.getcwd() + '/'

# Load dataset
data = pd.read_csv(__path__+'high_diamond_ranked_10min.csv')

# Convertir les blueWin en booleen pour les graphiques.
data['blueWins'] = data['blueWins'].astype(bool)
data['blueFirstBlood'] = data['blueFirstBlood'].astype(bool)

data["blueCSDiff"] = data["blueCSPerMin"] - data["redCSPerMin"]
data["redCSDiff"] = data["redCSPerMin"] - data["blueCSPerMin"]

data.head(1)



Unnamed: 0,gameId,blueWins,blueWardsPlaced,blueWardsDestroyed,blueFirstBlood,blueKills,blueDeaths,blueAssists,blueEliteMonsters,blueDragons,blueHeralds,blueTowersDestroyed,blueTotalGold,blueAvgLevel,blueTotalExperience,blueTotalMinionsKilled,blueTotalJungleMinionsKilled,blueGoldDiff,blueExperienceDiff,blueCSPerMin,blueGoldPerMin,redWardsPlaced,redWardsDestroyed,redFirstBlood,redKills,redDeaths,redAssists,redEliteMonsters,redDragons,redHeralds,redTowersDestroyed,redTotalGold,redAvgLevel,redTotalExperience,redTotalMinionsKilled,redTotalJungleMinionsKilled,redGoldDiff,redExperienceDiff,redCSPerMin,redGoldPerMin,blueCSDiff,redCSDiff
0,4519157822,False,28,2,True,9,6,11,0,0,0,0,17210,6.6,17039,195,36,643,-8,19.5,1721.0,15,6,0,6,9,8,0,0,0,0,16567,6.8,17047,197,55,-643,8,19.7,1656.7,-0.2,0.2


# Traitement et analyse des données :

In [5]:
# Check if there is na values in the dataset.
print(data.isna().sum())

gameId                          0
blueWins                        0
blueWardsPlaced                 0
blueWardsDestroyed              0
blueFirstBlood                  0
blueKills                       0
blueDeaths                      0
blueAssists                     0
blueEliteMonsters               0
blueDragons                     0
blueHeralds                     0
blueTowersDestroyed             0
blueTotalGold                   0
blueAvgLevel                    0
blueTotalExperience             0
blueTotalMinionsKilled          0
blueTotalJungleMinionsKilled    0
blueGoldDiff                    0
blueExperienceDiff              0
blueCSPerMin                    0
blueGoldPerMin                  0
redWardsPlaced                  0
redWardsDestroyed               0
redFirstBlood                   0
redKills                        0
redDeaths                       0
redAssists                      0
redEliteMonsters                0
redDragons                      0
redHeralds    

### Analyse de l’équilibre du nombre de parties gagnées et du premier sang :

In [6]:
colors=alt.Scale(domain=['True', 'False'], range=['#1f77b4', '#d62728'])

blue_first_blood = alt.Chart(data).mark_bar().encode(
    alt.X('blueFirstBlood').sort(alt.SortOrder('descending')),
    y='count(blueFirstBlood)',
    color=alt.Color('blueFirstBlood', scale=colors),
    tooltip='count(blueFirstBlood)'
).properties(
    title='Nombre de premier sang gagné'
)

blue_win = alt.Chart(data).mark_bar().encode(
    alt.X('blueWins').sort(alt.SortOrder('descending')),
    y='count(blueWins)',
    color=alt.Color('blueWins', scale=colors),
    tooltip='count(blueWins)'
).properties(
    title='Nombre de parties gagnées'
)


blue_win | blue_first_blood

### Affichage des outlayers de chaque champs :

In [7]:
for field in data.columns:
   print(field + " max : " + str(max(data[field])) + " min: " + str(min(data[field])))
   print(data[field].describe())
   print("")

gameId max : 4527990640 min: 4295358071
count    9.879000e+03
mean     4.500084e+09
std      2.757328e+07
min      4.295358e+09
25%      4.483301e+09
50%      4.510920e+09
75%      4.521733e+09
max      4.527991e+09
Name: gameId, dtype: float64

blueWins max : True min: False
count      9879
unique        2
top       False
freq       4949
Name: blueWins, dtype: object

blueWardsPlaced max : 250 min: 5
count    9879.000000
mean       22.288288
std        18.019177
min         5.000000
25%        14.000000
50%        16.000000
75%        20.000000
max       250.000000
Name: blueWardsPlaced, dtype: float64

blueWardsDestroyed max : 27 min: 0
count    9879.000000
mean        2.824881
std         2.174998
min         0.000000
25%         1.000000
50%         3.000000
75%         4.000000
max        27.000000
Name: blueWardsDestroyed, dtype: float64

blueFirstBlood max : True min: False
count     9879
unique       2
top       True
freq      4987
Name: blueFirstBlood, dtype: object

blueKills

Nous n'avons pas de valeurs aberrantes dans le dataset, tout est cohérent avec le jeu.

### Nombre de parties ennuyantes (sans premier sang avant 10 minutes) :

In [8]:
data['boring'] = ((data['blueFirstBlood'] == False) & (data['redFirstBlood'] == False))
data['boring'].describe()


count      9879
unique        1
top       False
freq       9879
Name: boring, dtype: object

Il n'y a aucune partie ennuyante dans le dataset.

## Q1 : Peut-on prédire l'issue d'une partie à 10 min de jeu ?
#### On va regarder la corrélation entre les variables et la victoire.

In [9]:
correlation = data.corr()
correlation = correlation['blueWins'].sort_values(ascending=False)

correlation = correlation.to_frame(name='correlation').reset_index()
correlation = correlation.rename(columns={'index': 'variable'})
correlation["positif"] = correlation['correlation'] > 0

# Passage en valeur absolue de la corrélation.
correlation['correlation'] = correlation['correlation'].abs()


# Histogramme de la corrélation de chaque variable avec la victoire.

alt.Chart(correlation).mark_bar().transform_filter(alt.datum.variable != "blueWins").encode(
    x=alt.X('correlation'),
    y=alt.Y('variable').sort('-x'),
    color=alt.condition(
        alt.datum.positif == "True",
        alt.value('#1f77b4'),  
        alt.value('#d62728')
    ),
    tooltip='correlation'
).properties(
    title='Corrélation entre les variables et la victoire'
)



Au vue des résultats on ne peut pas dire qu'une partie est perdue à 10 min de jeu,
<br>
il n'y a pas de variable en corrélation assez forte avec la victoire pour le dire.

### On va faire une analyse approfondie en faisant une régression linéaire basique et voir comment le model performe sur les données.

In [10]:
# On retire les correlations inférieures à une certaine valeur pour voir quelle variable est la plus importante pour entraîner le model
score = []
correlations = [0.1, 0.2, 0.3, 0.4, 0.5]
for i in correlations:
    correlation_filtered = correlation[correlation['correlation'] > i]['variable']
    linear_regression_df = data[correlation_filtered]
    X = linear_regression_df.drop(columns=['blueWins'])
    y = linear_regression_df['blueWins']

    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

    model = LogisticRegression()
    model.fit(X_train, y_train)
    prediction = model.predict(X_test)
    score.append(accuracy_score(y_test, prediction))
    
alt.Chart(pd.DataFrame({'correlation': correlations, 'score': score})).mark_bar().encode(
    y='correlation:O',
    x='score',
    tooltip='score'
).properties(
    title='Score du model en fonction de la corrélation minimale autorisée',
    height=alt.Step(40)
)

Le model est performant, il a une accuracy de 0.73, on peut dire que les variables sont bien corrélées avec la victoire mais pas suffisamment pour dire qu'une partie est perdue à 10 min de jeu.
<br>
Le fait que le score n'augmente pas significativement en retirant les variables les moins corrélées montre que la différence de golds a un poids prédominant pour le model.

In [11]:
data.corr()['blueExperienceDiff']['blueGoldDiff']

0.8947294549589994

La corrélation entre la différence en XP et la différence en golds explique pourquoi prendre en consideration l'un ou les deux ne change pas le résultat.

In [12]:
alt.Chart(data).mark_point().encode(
    x=alt.X('blueGoldDiff', title='Différence de gold'),
    y=alt.Y('blueExperienceDiff', title="Différence d'XP"),
    color=alt.Color('blueWins', scale=colors),
).properties(
    title='la différence en xp en fonction de la différence de gold'
)


Ainsi on peut repondre qu'il n'est pas possible de predire dans tout les cas l'issue d'une partie à 10 min.
<br>
Mais y a-t-il des variables qui peuvent aider à prendre de meilleures décisions ?

## Q2 : À partir de combien d'avantage en golds/XP la game est gagnée ou perdue ?

In [13]:
xp_diff_when_win = data[data["blueWins"] == True]["blueExperienceDiff"]
xp_diff_when_lose = data[data["blueWins"] == False]["blueExperienceDiff"]
gold_diff_when_win = data[data["blueWins"] == True]["blueGoldDiff"]
gold_diff_when_lose = data[data["blueWins"] == False]["blueGoldDiff"]

trop_bas_pour_win_xp = xp_diff_when_win.quantile(0.05)
trop_haute_pour_lose_xp = xp_diff_when_lose.quantile(0.95)

trop_bas_pour_win_gold = gold_diff_when_win.quantile(0.05)
trop_haute_pour_lose_gold = gold_diff_when_lose.quantile(0.95)

gold_diff_when_win


5        698
6       2411
9      -1548
12      3274
14      -470
        ... 
9872     756
9873    2639
9874    2519
9875     782
9878     927
Name: blueGoldDiff, Length: 4930, dtype: int64

### Différence de Golds :

In [14]:
alt.Chart(data).mark_point().encode(
    x=alt.X('blueGoldDiff', title="Différence de golds"),
    y='blueWins',
    color=alt.Color('blueWins', scale=colors),
).properties(
    title='Victoire en fonction de la différence de golds'
)

# Quartile à 95% pour la victoire en fonction de la différence de golds :

In [15]:
range_ = ['#ADBEFF', '#308AFF']


gold_distrib = alt.Chart(data).mark_bar().encode(
    x=alt.X("blueGoldDiff", title="Différence de golds").bin(maxbins=100),
    y=alt.Y("count(blueGoldDiff)", title="Nombre de parties"),
    color=alt.Color('blueWins').scale(range=range_)
).properties(
    title='Seuil de golds a partir duquel l\'issue est connue (à 5%)'
)

x_rule1 = alt.Chart().mark_rule(size=2, color='red').encode(
    x=alt.datum(trop_bas_pour_win_gold)
)
x_rule2 = alt.Chart().mark_rule(size=2, color='green').encode(
    x=alt.datum(trop_haute_pour_lose_gold)
)
text1 = alt.Chart().mark_text(color='red').encode(
    text=alt.datum(trop_bas_pour_win_gold),
    x=alt.datum(-5000),
    y=alt.datum(750)
)

text2 = alt.Chart().mark_text(color='green').encode(
    text=alt.datum(trop_haute_pour_lose_gold),
    x=alt.datum(4500),
    y=alt.datum(750),
)


gold_distrib + x_rule1 + x_rule2 + text1 + text2

### Différence d'XP :

In [16]:
alt.Chart(data).mark_point().encode(
    x=alt.X('blueExperienceDiff:Q', title="Différence d'XP"),
    y='blueWins:O',
    color=alt.Color('blueWins', scale=colors),
).properties(
    title='Victoire en fonction de la différence en xp'
)


In [17]:
range_ = ['#ADBEFF', '#308AFF']

xp_distib = alt.Chart(data).mark_bar().encode(
    x=alt.X("blueExperienceDiff", title="Différence d'XP").bin(maxbins=100),
    y=alt.Y("count(blueExperienceDiff)", title="Nombre de parties"),
    color=alt.Color('blueWins').scale(range=range_)
).properties(
    title='Seuil d\'xp a partir duquel l\'issue est connue (à 5%)'
)

x_rule1 = alt.Chart().mark_rule(size=2, color='red').encode(
    x=alt.datum(trop_bas_pour_win_xp)
)
x_rule2 = alt.Chart().mark_rule(size=2, color='green').encode(
    x=alt.datum(trop_haute_pour_lose_xp)
)
text1 = alt.Chart().mark_text(color='red').encode(
    text=alt.datum(trop_bas_pour_win_xp),
    x=alt.datum(-3200),
    y=alt.datum(425)
)

text2 = alt.Chart().mark_text(color='green').encode(
    text=alt.datum(trop_haute_pour_lose_xp),
    x=alt.datum(3200),
    y=alt.datum(425),
)


x_rule1 + xp_distib + x_rule2 + text1 + text2

Sur ces graphs, on peut voir les seuils à partir desquels on peut dire à 95% qu'une partie est perdue ou gagnée à 10 minutes.
<br>
Pour un ecart de plus de 2080 golds ou 1700 xp on peut conclure assez certainement l'issue de la partie.

## Q3 : Est-ce que le blue-side à des avantages comparé au red-side ?

In [18]:
BlueList = ['blueWardsPlaced', 'blueWardsDestroyed', 'blueFirstBlood', 'blueKills', 'blueDeaths', 'blueAssists', 'blueEliteMonsters', 'blueDragons', 'blueHeralds', 'blueTowersDestroyed', 'blueTotalGold', 'blueAvgLevel', 'blueTotalExperience', 'blueTotalMinionsKilled', 'blueTotalJungleMinionsKilled', 'blueCSPerMin', 'blueGoldPerMin']
RedList = ['redWardsPlaced', 'redWardsDestroyed', 'redFirstBlood', 'redKills', 'redDeaths', 'redAssists', 'redEliteMonsters', 'redDragons', 'redHeralds', 'redTowersDestroyed', 'redTotalGold', 'redAvgLevel', 'redTotalExperience', 'redTotalMinionsKilled', 'redTotalJungleMinionsKilled', 'redCSPerMin', 'redGoldPerMin']


for i in range(len(BlueList)):
    redMean = data[RedList[i]].mean()
    blueMean = data[BlueList[i]].mean()

    Blue = alt.Chart(data).mark_bar().encode(
        x=alt.datum(BlueList[i]),
        y=alt.datum(blueMean)
    ).properties(
        title=BlueList[i]
    )

    Red = alt.Chart(data).mark_bar(color='red').encode(
        x=alt.datum(RedList[i]),
        y=alt.datum(redMean)
    ).properties(
        title=RedList[i]
    )
    
    if i == 0:
        Charts = Blue + Red
    else:
        Charts = Charts | Blue + Red



Charts # Visualisation des potentiels ecarts entre l'equipe bleu et l'equipe rouge pour chaque variable:

On constate un léger avantage aux bleus pour les hérauts et les tours détruites tandis que les rouges ont plus de dragons.

## Nouveau dataset: test de l'importance des differents monstres épiques.

Mêmes manipulations et verification de départ :

In [19]:
data2 = pd.read_csv(__path__+'lol_ranked_games.csv')

data2

Unnamed: 0,gameId,gameDuration,hasWon,frame,goldDiff,expDiff,champLevelDiff,isFirstTower,isFirstBlood,killedFireDrake,killedWaterDrake,killedAirDrake,killedEarthDrake,killedElderDrake,lostFireDrake,lostWaterDrake,lostAirDrake,lostEarthDrake,lostElderDrake,killedBaronNashor,lostBaronNashor,killedRiftHerald,lostRiftHerald,destroyedTopInhibitor,destroyedMidInhibitor,destroyedBotInhibitor,lostTopInhibitor,lostMidInhibitor,lostBotInhibitor,destroyedTopNexusTurret,destroyedMidNexusTurret,destroyedBotNexusTurret,lostTopNexusTurret,lostMidNexusTurret,lostBotNexusTurret,destroyedTopBaseTurret,destroyedMidBaseTurret,destroyedBotBaseTurret,lostTopBaseTurret,lostMidBaseTurret,lostBotBaseTurret,destroyedTopInnerTurret,destroyedMidInnerTurret,destroyedBotInnerTurret,lostTopInnerTurret,lostMidInnerTurret,lostBotInnerTurret,destroyedTopOuterTurret,destroyedMidOuterTurret,destroyedBotOuterTurret,lostTopOuterTurret,lostMidOuterTurret,lostBotOuterTurret,kills,deaths,assists,wardsPlaced,wardsDestroyed,wardsLost
0,4546233126,1443000,1,10,-448,-147,-0.2,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4,7,5,21,3,5
1,4546233126,1443000,1,12,-1306,-925,-0.6,0,1,0,0,0,1,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,6,11,6,28,4,6
2,4546233126,1443000,1,14,2115,2578,0.4,1,1,0,0,0,1,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,10,11,12,35,4,6
3,4546233126,1443000,1,16,1195,2134,0.4,1,1,0,0,0,1,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,10,12,12,45,6,10
4,4546233126,1443000,1,18,2931,4382,0.6,1,1,1,0,0,1,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,13,13,16,49,7,12
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
242567,4402156483,1774000,0,30,-8523,-13498,-1.6,1,1,1,0,0,0,0,1,0,1,1,0,0,1,0,2,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,1,0,0,0,1,0,0,1,1,33,41,50,80,18,17
242568,4379826739,1013000,0,10,-271,-1243,-0.2,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,6,5,6,18,1,2
242569,4379826739,1013000,0,12,-2013,-3493,-0.8,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,7,8,6,23,1,5
242570,4379826739,1013000,0,14,-2388,-4543,-0.8,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,10,13,9,27,5,6


In [20]:
print(data2.isna().sum()) # On remarque qu'ici non plus il n'y a pas de valeurs manquantes.

gameId                     0
gameDuration               0
hasWon                     0
frame                      0
goldDiff                   0
expDiff                    0
champLevelDiff             0
isFirstTower               0
isFirstBlood               0
killedFireDrake            0
killedWaterDrake           0
killedAirDrake             0
killedEarthDrake           0
killedElderDrake           0
lostFireDrake              0
lostWaterDrake             0
lostAirDrake               0
lostEarthDrake             0
lostElderDrake             0
killedBaronNashor          0
lostBaronNashor            0
killedRiftHerald           0
lostRiftHerald             0
destroyedTopInhibitor      0
destroyedMidInhibitor      0
destroyedBotInhibitor      0
lostTopInhibitor           0
lostMidInhibitor           0
lostBotInhibitor           0
destroyedTopNexusTurret    0
destroyedMidNexusTurret    0
destroyedBotNexusTurret    0
lostTopNexusTurret         0
lostMidNexusTurret         0
lostBotNexusTu

In [21]:
alt.Chart(data2).mark_arc().encode(
    color=alt.Color("hasWon:N").scale(domain=['0','1'], range=['#FF0203','#0203FF']),
    theta="count(hasWon)",
    tooltip="count(hasWon)"
).properties(
    title='pourcentage de victoire bleu/rouge'
)

# Vérification de l'équilibre du dataset:

le dataset est equilibré du point de vue du nombre de victoire

In [22]:
usefull_variable = ["hasWon","killedFireDrake","killedWaterDrake","killedAirDrake","killedEarthDrake","killedElderDrake","lostFireDrake","lostWaterDrake","lostAirDrake","lostEarthDrake","lostElderDrake","killedBaronNashor","lostBaronNashor","killedRiftHerald","lostRiftHerald"]

# On simplifi le dataset pour ne garder que les columns qui nous interresse.

monster_elit_df = data2[usefull_variable]
monster_elit_df

Unnamed: 0,hasWon,killedFireDrake,killedWaterDrake,killedAirDrake,killedEarthDrake,killedElderDrake,lostFireDrake,lostWaterDrake,lostAirDrake,lostEarthDrake,lostElderDrake,killedBaronNashor,lostBaronNashor,killedRiftHerald,lostRiftHerald
0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0
1,1,0,0,0,1,0,0,1,0,0,0,0,0,0,1
2,1,0,0,0,1,0,0,1,0,0,0,0,0,0,1
3,1,0,0,0,1,0,0,1,0,0,0,0,0,0,1
4,1,1,0,0,1,0,0,1,0,0,0,0,0,0,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
242567,0,1,0,0,0,0,1,0,1,1,0,0,1,0,2
242568,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0
242569,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0
242570,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0


In [23]:
monster_elit_df.corr()

Unnamed: 0,hasWon,killedFireDrake,killedWaterDrake,killedAirDrake,killedEarthDrake,killedElderDrake,lostFireDrake,lostWaterDrake,lostAirDrake,lostEarthDrake,lostElderDrake,killedBaronNashor,lostBaronNashor,killedRiftHerald,lostRiftHerald
hasWon,1.0,0.181943,0.178357,0.150468,0.178618,0.031146,-0.191382,-0.186028,-0.155285,-0.18239,-0.034934,0.165918,-0.181528,0.167418,-0.167824
killedFireDrake,0.181943,1.0,0.042943,0.03445,0.038339,0.041317,-0.182591,-0.0492,-0.038621,-0.041946,0.022871,0.192136,0.101226,0.171789,0.039953
killedWaterDrake,0.178357,0.042943,1.0,0.038825,0.040624,0.038611,-0.040689,-0.169755,-0.046433,-0.041288,0.030487,0.2134,0.107486,0.16136,0.051539
killedAirDrake,0.150468,0.03445,0.038825,1.0,0.039841,0.038355,-0.039332,-0.038113,-0.172645,-0.04113,0.028112,0.207201,0.111077,0.168218,0.040766
killedEarthDrake,0.178618,0.038339,0.040624,0.039841,1.0,0.043851,-0.045856,-0.046803,-0.039111,-0.17852,0.026279,0.206986,0.114853,0.159296,0.056996
killedElderDrake,0.031146,0.041317,0.038611,0.038355,0.043851,1.0,0.027172,0.025099,0.024276,0.032367,0.068892,0.157634,0.131659,0.018956,0.029791
lostFireDrake,-0.191382,-0.182591,-0.040689,-0.039332,-0.045856,0.027172,1.0,0.037377,0.036806,0.04239,0.040797,0.094294,0.210699,0.012308,0.188707
lostWaterDrake,-0.186028,-0.0492,-0.169755,-0.038113,-0.046803,0.025099,0.037377,1.0,0.036446,0.039151,0.040799,0.098067,0.217106,0.011425,0.194909
lostAirDrake,-0.155285,-0.038621,-0.046433,-0.172645,-0.039111,0.024276,0.036806,0.036446,1.0,0.037325,0.050972,0.116456,0.214777,0.010842,0.201843
lostEarthDrake,-0.18239,-0.041946,-0.041288,-0.04113,-0.17852,0.032367,0.04239,0.039151,0.037325,1.0,0.042505,0.110614,0.226386,0.018662,0.185691


In [24]:
def correlate(df, titre):
    correlation_drakes = df.corr()['hasWon'].sort_values(ascending=False)
    correlation_drakes = correlation_drakes.rename('correlation').reset_index()
    correlation_drakes = correlation_drakes.rename(columns={'index': 'drake'})
    correlation_drakes["positif"] = correlation_drakes['correlation'] > 0
    correlation_drakes['correlation'] = correlation_drakes['correlation'].abs()

    return alt.Chart(correlation_drakes).mark_bar().transform_filter(alt.datum.drake != "hasWon").encode(
        x=alt.X('correlation'),
        y=alt.Y('drake').sort('-x'),
        color=alt.condition(
            alt.datum.positif == "True",
            alt.value('#1f77b4'),  
            alt.value('#d62728')
        ),
        tooltip='correlation'
    ).properties(
        title=alt.Title(titre)
    )

correlate(monster_elit_df, titre='Corrélation entre les variables et la victoire')


Le dragon qui a le plus d'impact est donc l'infernal (FireDrake).
<br>
On remarque que perdre un drake est plus corrélé en valeur absolue que de le gagner.
<br>
Il est possible d'imaginer un déséquilibre dans le dataset.

Pour rendre les choses plus lisibles et palier a ce probleme on peut lier les "lost" et les "killed" en faisant la différence.

In [25]:
monster_elit_df["DiffFireDrake"] = monster_elit_df["killedFireDrake"] - monster_elit_df["lostFireDrake"]
monster_elit_df["DiffAirDrake"] = monster_elit_df["killedAirDrake"] - monster_elit_df["lostAirDrake"]
monster_elit_df["DiffEarthDrake"] = monster_elit_df["killedEarthDrake"] - monster_elit_df["lostEarthDrake"]
monster_elit_df["DiffWaterDrake"] = monster_elit_df["killedWaterDrake"] - monster_elit_df["lostWaterDrake"]
monster_elit_df["DiffBaronNashor"] = monster_elit_df["killedBaronNashor"] - monster_elit_df["lostBaronNashor"]
monster_elit_df["DiffElderDrake"] = monster_elit_df["killedElderDrake"] - monster_elit_df["lostElderDrake"]
monster_elit_df["DiffRiftHerald"] = monster_elit_df["killedRiftHerald"] - monster_elit_df["lostRiftHerald"]

diff_df = monster_elit_df[["hasWon","DiffFireDrake","DiffAirDrake","DiffEarthDrake","DiffWaterDrake","DiffBaronNashor","DiffElderDrake","DiffRiftHerald"]]

all = correlate(diff_df, titre='Pour toute les parties')

On remarque que le baron Nashor reprend la première place. 
<br>
En effet en prenant la différence, les valeures de zero ne correspondent plus qu'a des parties equilibrées contrairement aux "killed" ou "lost" qui pouvaient aussi contenir des parties sans Baron.

Pour repondre a notre question initial, le meilleur dragon est donc bien le dragon infernal.

On remarque aussi que l'ElderDrake n'est pas très correlé avec la victoire car il est uniquement disponible en fin de partie.
<br>
On va donc regarder la corrélation de ce dernier avec la victoire seulement quand il est pris.

In [26]:
# on se limite au partie ou un Elder a été obtenu:
game_with_Elder = diff_df[(monster_elit_df["killedElderDrake"] > 0) | (monster_elit_df["lostElderDrake"] > 0)]

game_with_Elder

late = correlate(game_with_Elder, titre='Losque l\'Elder est atteint')

In [27]:
(all & late).properties(title="Correlation entre les écarts d'obtention des montres élites:")

On remarque donc que dans les parties qui atteignent l'ElderDrake, l'obtention de celui-ci a une importance moyenne mais bien superieure à celle des autre drakes.

Pour conclure on remarque que globalement le meilleur dragon est l'infernal. 
<br>
Pour autant il reste un moins bon objectif que le baron Nashor, qui, pour la plupart des parties est le meilleur objectif à jouer. 
<br>
Cependant pour les parties assez longues pour l'atteindre, le dragon Ancestral (Elder Drake) est l'objectif le plus lié à la victoire.