<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Dependencies" data-toc-modified-id="Dependencies-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Dependencies</a></span></li><li><span><a href="#Reading-the-csv-file" data-toc-modified-id="Reading-the-csv-file-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Reading the csv file</a></span></li><li><span><a href="#Calculating-the-features" data-toc-modified-id="Calculating-the-features-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Calculating the features</a></span><ul class="toc-item"><li><span><a href="#Features" data-toc-modified-id="Features-3.1"><span class="toc-item-num">3.1&nbsp;&nbsp;</span>Features</a></span></li><li><span><a href="#Graph-structure" data-toc-modified-id="Graph-structure-3.2"><span class="toc-item-num">3.2&nbsp;&nbsp;</span>Graph structure</a></span></li><li><span><a href="#Less-Simple-Algorithm" data-toc-modified-id="Less-Simple-Algorithm-3.3"><span class="toc-item-num">3.3&nbsp;&nbsp;</span>Less Simple Algorithm</a></span></li><li><span><a href="#Construct-features-DataFrame" data-toc-modified-id="Construct-features-DataFrame-3.4"><span class="toc-item-num">3.4&nbsp;&nbsp;</span>Construct features DataFrame</a></span></li></ul></li><li><span><a href="#Split-Training-and-test-data" data-toc-modified-id="Split-Training-and-test-data-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Split Training and test data</a></span></li><li><span><a href="#Train" data-toc-modified-id="Train-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Train</a></span><ul class="toc-item"><li><span><a href="#Normal-equation-to-get-theta" data-toc-modified-id="Normal-equation-to-get-theta-5.1"><span class="toc-item-num">5.1&nbsp;&nbsp;</span>Normal equation to get theta</a></span></li><li><span><a href="#Calculate-theta" data-toc-modified-id="Calculate-theta-5.2"><span class="toc-item-num">5.2&nbsp;&nbsp;</span>Calculate theta</a></span></li></ul></li><li><span><a href="#Implementing-theta-to-calculate-predictions" data-toc-modified-id="Implementing-theta-to-calculate-predictions-6"><span class="toc-item-num">6&nbsp;&nbsp;</span>Implementing theta to calculate predictions</a></span><ul class="toc-item"><li><span><a href="#Calculate-succes-rate-for-test-data" data-toc-modified-id="Calculate-succes-rate-for-test-data-6.1"><span class="toc-item-num">6.1&nbsp;&nbsp;</span>Calculate succes rate for test data</a></span></li><li><span><a href="#Calculate-succes-rate-for-training-data" data-toc-modified-id="Calculate-succes-rate-for-training-data-6.2"><span class="toc-item-num">6.2&nbsp;&nbsp;</span>Calculate succes rate for training data</a></span></li></ul></li></ul></div>

# Machine learning prediction algorithm for CS:GO matches

## Dependencies

In [1]:
import pandas as pd
import numpy as np
from datetime import date
import math


## Reading the csv file
The file results14_10_2018.csv contains all results of professional csgo matches starting from 30-06-2017 to 14-10-2018.

In [2]:
resultsdf = pd.read_csv('../Datafiles/csv/results14_10_2018_csgo.csv', index_col=0, encoding='latin1')
resultsdf.sort_values(by='date', ascending=True, inplace=True)
resultsdf.reset_index(drop=True, inplace=True)

## Calculating the features
The features will be saved in a pandas dataframe. The results will one by one be loaded into a graph, and the features will be calculated before adding the result to the graph.
### Features
 - x<sub>1</sub>: +/- last shared match 
 - x<sub>2</sub>: +/- last 3 shared matches
 - x<sub>3</sub>: Relative momentum = (wins last ten results) - (wins last ten results of opponent)
 - x<sub>4</sub>: Less Simple Algorithm score
 - x<sub>5</sub>: x<sub>1</sub>x<sub>2</sub>
 - x<sub>6</sub>: x<sub>1</sub>x<sub>3</sub>
 - x<sub>7</sub>: x<sub>1</sub>x<sub>4</sub>
 - x<sub>8</sub>: x<sub>2</sub>x<sub>3</sub>
 - x<sub>9</sub>: x<sub>2</sub>x<sub>4</sub>
 - x<sub>10</sub>: x<sub>3</sub>x<sub>4</sub>

### Graph structure
Teams will be represented by vertices. Edges will represent the Head-to-Head results of the two teams it connects. The Result class represents a single result. A result consists of a winner and a loser, the difference in score between the winner and the loser, a date, and a map.

In [19]:
class Graph:
    def __init__(self):
        #dictionary of {teamname: Vertex object}
        self.vertices = dict()
        
    # team = string
    def addTeam(self, team):
        if team not in self.vertices:
            self.vertices[team] = Vertex(team)
        return self.vertices[team]
    
    #team = string
    def getTeam(self, team):
        if team in self.vertices:
            return self.vertices[team]
        
        

class Vertex:
    x = 5
    def __init__(self, name):
        self.name = name
        #dictionary of {teamname: Edge object}
        self.edges = dict()
        self.lastxresults = []
        
    def getEdges(self):
        return self.edges
    
    #team = string
    def hasPlayed(self, team):
        return team in self.edges
    
    #result = Result object
    def addToLastResults(self, result):
        if len(self.lastxresults) == self.x:
            self.lastxresults.pop(0)
        self.lastxresults.append(result)       
    
    # Will only be called on the winner of a result
    def addResult(self, result):
        opponent = result.getLoser() # Vertex object
        if opponent.toString() in self.edges:
            self.edges[opponent.toString()].addResult(result)
        else:
            newEdge = Edge()
            newEdge.addResult(result)
            self.edges[opponent.toString()] = newEdge
            opponent.addEdge(self.toString(), newEdge)
        self.addToLastResults(result)
        opponent.addToLastResults(result)
    
    #team = string, edge = Edge object
    def addEdge(self,team, edge):
        self.edges[team] = edge
    
    #opponent = string
    def getHeadToHeadScore(self, opponent):
        score = 0
        if self.hasPlayed(opponent):
            for result in self.edges[opponent].getResults():
                if result.isWinner(self.toString()):
                    score = score + result.getDif()
                else:
                    score = score - result.getDif()
        return int((score/len(self.edges[opponent].getResults()))*100)
    
    
    #opponent = Vertex object      // Feature
    def getRelativeMomentum(self, opponent):
        return self.getMomentum() - opponent.getMomentum()
    
    #opponent = string      // Feature & Feature
    def getSharedResults(self, opponent):
        lastOne = None
        lastThree = None
        if opponent in self.edges:
            lastOne = self.edges[opponent].getLastResultDif(self.toString())
            lastThree = self.edges[opponent].getLastThreeResultsDif(self.toString())
        return (lastOne, lastThree)
        
    def getMomentum(self):
        momentum = 0
        for result in self.lastxresults:
            if result.isWinner(self.toString()):
                momentum = momentum + 1
        return momentum
    
    def toString(self):
        return self.name
        
        
class Edge:
    #opponent = Vertex object
    def __init__(self):
        self.results = []
        
    def addResult(self, result):
        self.results.append(result)
        
    def getResults(self):
        return self.results
    
    def getLastResultDif(self, team1):
        if self.results[-1].isWinner(team1):
            return self.results[-1].getDif()
        else:
            return -self.results[-1].getDif()
    
    def getLastThreeResultsDif(self, team1):
        returnable = 1
        if (len(self.results) > 2):
            lastThree = self.results[-3:]
            for result in lastThree:
                if result.isWinner(team1):
                    returnable = returnable + result.getDif()
                else:
                    returnable = returnable - result.getDif()
            return returnable
        else:
            return None
        
        
class Result:
    #date = date object, winner & loser = Vertex object, dif = positive int, playedMap = string
    def __init__(self, winner, loser, dif, dateResult, playedMap):
        self.winner = winner
        self.loser = loser
        self.dif = dif
        self.dateResult = dateResult
        self.playedMap = playedMap
    
    def getDif(self):
        return self.dif
    
    def getDate(self):
        return self.dateResult
    
    #return Vertex object
    def getLoser(self):
        return self.loser
    
    
    #team = string
    def isWinner(self, team):
        return team == self.winner.toString()
    
    def __ge__(self, other):
        return self.dateResult > other.getDate()

### Less Simple Algorithm
The less simple algorithm looks at the shared opponents of both teams and calculates a score based on the results against these shared opponents.

In [4]:
#team1 = Vertex object, team2 = Vertex object       // Feature
def getLessSimpleAlgorithmScore(team1, team2, graph):
    score = team1.getHeadToHeadScore(team2.toString())
    divider = 1
    for key in team1.getEdges():
        sharedOpponent = graph.getTeam(key)
        if sharedOpponent.hasPlayed(team2.toString()):
            score = score + team1.getHeadToHeadScore(sharedOpponent.toString())
            score = score + sharedOpponent.getHeadToHeadScore(team2.toString())
            divider = divider + 1
    return int((score/divider))

### Construct features DataFrame
Add every result to the graph after calculating the features. A feature row will be added if the two teams have played eachother at least 3 times.

In [5]:
def getDate(string):
    year = int(string[:4])
    month = int(string[5:7])
    day = int(string[-2:])
    dateObject = date(year, month, day)
    return dateObject

In [20]:
# Running time about 20s
graph = Graph()
columns = ['matchcode', 'x0', 'x1', 'x2', 'x3', 'x4', 'x5', 'x6', 'x7', 'x8', 'x9', 'x10', 'y']
featureFrame = pd.DataFrame(columns=columns)
for index, row in resultsdf.iterrows():
    
    #Get Vertex objects
    team1 = graph.addTeam(row['team1'])
    team2 = graph.addTeam(row['team2'])
    
    #Get Result object
    #y = True if team1 is winner, False if team2 is winner
    y = row['score1'] > row['score2']
    winner = team1 if y else team2
    loser = team2 if y else team1
    dif = abs(row['score1'] - row['score2'])
    dateResult = getDate(row['date'])
    result = Result(winner=winner, loser=loser, dateResult=dateResult, dif=dif, playedMap=row['map'])
    
    #Get Features
    (x1, x2) = team1.getSharedResults(team2.toString())
    if(x1 is not None and x2 is not None):
        x3 = team1.getRelativeMomentum(team2)
        x4 = getLessSimpleAlgorithmScore(team1, team2, graph)
        x5 = x1*x2
        x6 = x1*x3
        x7 = x1*x4
        x8 = x2*x3
        x9 = x2*x4
        x10 = x3*x4
        featureFrame = featureFrame.append({'matchcode': row['matchcode'], 'x0':1, 'x1':x1,'x2':x2, 'x3':x3,'x4':x4, 'x5':x5,
                                        'x6':x6, 'x7':x7, 'x8':x8, 'x9':x9, 'x10':x10, 'y':int(y)*100}, ignore_index=True)
    #Add result to edge
    winner.addResult(result)
display(featureFrame[-10:])

Unnamed: 0,matchcode,x0,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,y
7009,2327903,1,2,8,2,-137,16,4,-274,16,-1096,-274,0
7010,2327900,1,-4,9,-1,-72,-36,4,288,-9,-648,72,100
7011,2327900,1,6,12,0,-68,72,0,-408,0,-816,0,100
7012,2327903,1,-11,-6,0,-163,66,0,1793,0,978,0,0
7013,2327386,1,-5,1,1,-135,-5,-5,675,1,-135,-135,0
7014,2327386,1,-6,4,0,-141,-24,0,846,0,-564,0,0
7015,2327813,1,8,20,2,-41,160,16,-328,40,-820,-82,100
7016,2327835,1,9,24,1,1116,216,9,10044,24,26784,1116,100
7017,2327812,1,-2,-17,1,146,34,-2,-292,-17,-2482,146,100
7018,2327387,1,9,12,2,-357,108,18,-3213,24,-4284,-714,0


## Split Training and test data
Turn the DataFrame with all the features into (train&test) matrixes and result vectors. 

In [21]:
#Decide length of train & test matrixes and result vectors
trainingSplit = 85
n = len(featureFrame['x0'])
nTraining = int((n/100)*trainingSplit)
nTest = n - nTraining

#Split featureFrame in training and test frames
trainingFrame = featureFrame.head(nTraining)
testFrame = featureFrame.tail(nTest)

#Get result vectors
testY = testFrame['y'].values.tolist()
trainY = trainingFrame['y'].values.tolist()

#Drop columns "matchcode" and 'y' so that the resulting dataframe
#can be converted to a matrix with a simple command: values.tolist()
testFrame.drop(['matchcode', 'y'], axis=1, inplace=True)
trainingFrame.drop(['matchcode', 'y'], axis=1, inplace=True)

#Get Training en Test matrices
testX = testFrame.values.tolist()
trainX = trainingFrame.values.tolist()


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


## Train

### Normal equation to get theta

In [8]:
def normalEquation(X, y):
    xT = np.transpose(X)
    xTx = xT.dot(X)
    XtX = np.linalg.inv(xTx)
    XtX_xT = XtX.dot(xT)
    theta = XtX_xT.dot(y)
    return(theta)

### Calculate theta

In [9]:
theta = normalEquation(trainX, trainY)
print(theta)

[  5.19425871e+01   3.69448516e-01   2.75875468e-01   1.15050090e+00
   2.62257187e-02  -1.89999776e-02   8.08290878e-02  -1.06286848e-03
   2.01099647e-02  -5.98258239e-04   3.55743588e-03]


## Implementing theta to calculate predictions

### Calculate succes rate for test data

In [10]:
def mdfdSigmoid(x):
    return 100/(1+(pow(math.e, -((x-50)/150))))

rights = 0
wrongs = 0
for i in range (0, nTest):
    inproduct = theta.dot(testX[i])
    prediction = mdfdSigmoid(inproduct)
    if prediction > 50 and testY[i] > 50:
        rights = rights + 1
    elif prediction < 50 and testY[i] < 50:
        rights = rights + 1
    else:
        wrongs = wrongs + 1
    #print (prediction, testY[i])
    
print("right: ", rights)
print("wrong: ", wrongs)
print("prediction rate: ", rights/(nTest))

right:  641
wrong:  412
prediction rate:  0.6087369420702754


### Calculate succes rate for training data

In [11]:
rights = 0
wrongs = 0
for i in range (0, nTraining):
    inproduct = theta.dot(trainX[i])
    prediction = mdfdSigmoid(inproduct)
    if prediction > 50 and trainY[i] > 50:
        rights = rights + 1
    elif prediction < 50 and trainY[i] < 50:
        rights = rights + 1
    else:
        wrongs = wrongs + 1
    #print (prediction, testY[i])
    
print("right: ", rights)
print("wrong: ", wrongs)
print("prediction rate: ", rights/(nTraining))

right:  3649
wrong:  2317
prediction rate:  0.6116325846463292
