# CHESS DATA ANALYSIS

TEAM: Abraham Borg, Mehar Rekhi, Sarom Thin, Cristian Vazquez

INTRO: We want to analyze thousands of chess games and use the information to make predictions about who will win a game of chess. The data is taken from this source: https://www.kaggle.com/datasets/milesh1/35-million-chess-games?resource=download 
and a sampling of all games is hosted here: https://raw.githubusercontent.com/abecsumb/DataScienceProject/main/Chess_Data.txt 

DESCRIPTION OF SOURCE DATA SET: The data shows all game moves for White and Black. The date of each game is specified. We are also given the results of each game. We know how many moves each player made. Finally, we know what the ELO rating of each player was at the time of the game. There is a lot of information that we are not interested in so that will need to be cleaned up. Also, the list of moves is a long string, but we want to have each move occupy its own dataframe column.

PREDICTION AND FEATURES TO USE AS PREDICTORS: 
- What were the most common first 8 moves for white and black each year? How have these opening moves changed over time?
- For each year in the data set, which pieces were most commonly left in play at the endgame? 
- Is there a piece in the game that is a good predictor of who will win? For example, if white loses a bishop first, does that correlate with more losses for white? 
- If a player still has their queen and the other player doesn't, does the player with the queen always win? 
- Can we build a model that predicts who will win a game based on the first 8 chess moves? 
- Can our model predict who will win based on the endgame pieces available to both players?

Predict: Who will win based on the opening for each player?

# DATA PREPARATION

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
from matplotlib import rcParams
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier

In [2]:
# allow output to span multiple output lines in the console
pd.set_option('display.max_columns', 500)

# switch to seaborn default stylistic parameters
# see the useful https://seaborn.pydata.org/tutorial/aesthetics.html
sns.set()
sns.set_context('paper') # 'talk' for slightly larger

# change default plot size
rcParams['figure.figsize'] = 9,7

In [4]:
# column names, without the chess moves column
misc_columnNames = ['PNG_File_Pos - DELETE ME', 'Date of Game', 'Game Result', 'W-ELO', 'B-ELO', 
                    'Num Moves', 'miscDate - DELETE ME', 'result - DELETE ME', 'wELO - DELETE ME', 'bELO - DELETE ME', 
                    'event date - DELETE ME', 'setup - DELETE ME', 'fen - DELETE ME', 'flag - DELETE ME', 'oyrange - DELETE ME', 
                    'bad len - DELETE ME']

In [8]:
# read all data except chess moves,
misc_chess_data = pd.read_csv('https://raw.githubusercontent.com/abecsumb/DataScienceProject/main/Chess_Data.txt', comment = '#', infer_datetime_format = True, header = None, sep = ' ', on_bad_lines = 'skip')
misc_chess_data.drop(misc_chess_data.columns[16], axis = 1, inplace = True)
misc_chess_data.columns = misc_columnNames

In [9]:
misc_chess_data.head()

Unnamed: 0,PNG_File_Pos - DELETE ME,Date of Game,Game Result,W-ELO,B-ELO,Num Moves,miscDate - DELETE ME,result - DELETE ME,wELO - DELETE ME,bELO - DELETE ME,event date - DELETE ME,setup - DELETE ME,fen - DELETE ME,flag - DELETE ME,oyrange - DELETE ME,bad len - DELETE ME
0,1,2000.03.14,1-0,2851,,67,date_false,result_false,welo_false,belo_true,edate_true,setup_false,fen_false,result2_false,oyrange_false,blen_false
1,2,2000.03.14,1-0,2851,,53,date_false,result_false,welo_false,belo_true,edate_true,setup_false,fen_false,result2_false,oyrange_false,blen_false
2,3,1999.11.20,1-0,2851,,57,date_false,result_false,welo_false,belo_true,edate_false,setup_false,fen_false,result2_false,oyrange_false,blen_false
3,4,1999.11.20,1-0,2851,,49,date_false,result_false,welo_false,belo_true,edate_false,setup_false,fen_false,result2_false,oyrange_false,blen_false
4,5,2000.02.20,1/2-1/2,2851,2633.0,97,date_false,result_false,welo_false,belo_false,edate_false,setup_false,fen_false,result2_false,oyrange_false,blen_false


In [5]:
# Isolate game moves from everything else.
game_moves = pd.read_csv('https://raw.githubusercontent.com/abecsumb/DataScienceProject/main/Chess_Data.txt', sep = '###', on_bad_lines = 'skip', header = None)

  game_moves = pd.read_csv('https://raw.githubusercontent.com/abecsumb/DataScienceProject/main/Chess_Data.txt', sep = '###', on_bad_lines = 'skip', header = None)


In [6]:
game_moves.head()

Unnamed: 0,0,1
0,1 2000.03.14 1-0 2851 None 67 date_false resul...,W1.d4 B1.d5 W2.c4 B2.e6 W3.Nc3 B3.Nf6 W4.cxd5...
1,2 2000.03.14 1-0 2851 None 53 date_false resul...,W1.e4 B1.d5 W2.exd5 B2.Qxd5 W3.Nc3 B3.Qa5 W4....
2,3 1999.11.20 1-0 2851 None 57 date_false resul...,W1.e4 B1.e5 W2.Nf3 B2.Nc6 W3.Bc4 B3.Bc5 W4.c3...
3,4 1999.11.20 1-0 2851 None 49 date_false resul...,W1.e4 B1.d5 W2.exd5 B2.Qxd5 W3.Nc3 B3.Qa5 W4....
4,5 2000.02.20 1/2-1/2 2851 2633 97 date_false r...,W1.e4 B1.e5 W2.Nf3 B2.Nc6 W3.Bb5 B3.a6 W4.Ba4...


In [10]:
# drop first column of game moves (this is the misc chess data)
game_moves.drop(game_moves.columns[0], axis = 1, inplace = True)

In [11]:
game_moves.head()

Unnamed: 0,1
0,W1.d4 B1.d5 W2.c4 B2.e6 W3.Nc3 B3.Nf6 W4.cxd5...
1,W1.e4 B1.d5 W2.exd5 B2.Qxd5 W3.Nc3 B3.Qa5 W4....
2,W1.e4 B1.e5 W2.Nf3 B2.Nc6 W3.Bc4 B3.Bc5 W4.c3...
3,W1.e4 B1.d5 W2.exd5 B2.Qxd5 W3.Nc3 B3.Qa5 W4....
4,W1.e4 B1.e5 W2.Nf3 B2.Nc6 W3.Bb5 B3.a6 W4.Ba4...


In [12]:
# split game moves df into columns for each move. 
game_moves = game_moves.iloc[:, 0].str.lstrip()
game_moves = game_moves.iloc[:].str.split(pat = ' ', expand = True)

In [13]:
game_moves.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191,192,193,194,195,196,197,198,199,200,201,202,203,204,205,206,207,208,209,210,211,212,213,214,215,216,217,218,219,220,221,222,223,224,225,226,227,228,229,230,231,232,233,234,235,236,237,238,239,240,241,242,243,244,245,246,247,248,249,250,251,252,253,254,255,256,257,258,259,260,261,262,263,264,265,266,267,268,269,270,271,272,273,274,275,276,277,278,279,280,281,282,283,284,285,286,287,288,289,290,291,292,293,294,295,296,297,298,299,300,301,302,303,304,305,306,307,308,309,310,311,312,313
0,W1.d4,B1.d5,W2.c4,B2.e6,W3.Nc3,B3.Nf6,W4.cxd5,B4.exd5,W5.Bg5,B5.Be7,W6.e3,B6.Ne4,W7.Bxe7,B7.Nxc3,W8.Bxd8,B8.Nxd1,W9.Bxc7,B9.Nxb2,W10.Rb1,B10.Nc4,W11.Bxc4,B11.dxc4,W12.Ne2,B12.O-O,W13.Nc3,B13.b6,W14.d5,B14.Na6,W15.Bd6,B15.Rd8,W16.Ba3,B16.Bb7,W17.e4,B17.f6,W18.Ke2,B18.Nc7,W19.Rhd1,B19.Ba6,W20.Ke3,B20.Kf7,W21.g4,B21.g5,W22.h4,B22.h6,W23.Rh1,B23.Re8,W24.f3,B24.Bb7,W25.hxg5,B25.fxg5,W26.d6,B26.Nd5+,W27.Nxd5,B27.Bxd5,W28.Rxh6,B28.c3,W29.d7,B29.Re6,W30.Rh7+,B30.Kg8,W31.Rbh1,B31.Bc6,W32.Rh8+,B32.Kf7,W33.Rxa8,B33.Bxd7,W34.Rh7+,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1,W1.e4,B1.d5,W2.exd5,B2.Qxd5,W3.Nc3,B3.Qa5,W4.d4,B4.Nf6,W5.Nf3,B5.c6,W6.Ne5,B6.Bf5,W7.g4,B7.Be4,W8.f3,B8.Bd5,W9.a3,B9.Nbd7,W10.Be3,B10.Nxe5,W11.dxe5,B11.Nxg4,W12.Bd4,B12.e6,W13.b4,B13.Qd8,W14.Nxd5,B14.Qxd5,W15.c4,B15.Ne3,W16.cxd5,B16.Nxd1,W17.dxc6,B17.bxc6,W18.Rxd1,B18.Be7,W19.Ba6,B19.O-O,W20.Ke2,B20.Rab8,W21.Rc1,B21.Rfd8,W22.Rhd1,B22.c5,W23.Bxc5,B23.Rxd1,W24.Rxd1,B24.Bxc5,W25.bxc5,B25.g6,W26.c6,B26.Rb2+,W27.Rd2,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
2,W1.e4,B1.e5,W2.Nf3,B2.Nc6,W3.Bc4,B3.Bc5,W4.c3,B4.Nf6,W5.d3,B5.d6,W6.Bb3,B6.O-O,W7.Nbd2,B7.Be6,W8.O-O,B8.Qd7,W9.Re1,B9.Rfe8,W10.Nf1,B10.Ne7,W11.Ng3,B11.Bg4,W12.h3,B12.Be6,W13.Bg5,B13.Kh8,W14.Bxf6,B14.gxf6,W15.d4,B15.exd4,W16.cxd4,B16.Bb4,W17.Re3,B17.Rg8,W18.d5,B18.Bxh3,W19.Qd4,B19.Rg6,W20.Qxb4,B20.c5,W21.Qc3,B21.Bg4,W22.Bc2,B22.Rh6,W23.Nh2,B23.b5,W24.b4,B24.Rc8,W25.Bd3,B25.c4,W26.Bc2,B26.Bh5,W27.Nxh5,B27.Rxh5,W28.Qxf6+,B28.Kg8,W29.Bd1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
3,W1.e4,B1.d5,W2.exd5,B2.Qxd5,W3.Nc3,B3.Qa5,W4.d4,B4.e6,W5.Nf3,B5.c6,W6.Bd3,B6.Nf6,W7.O-O,B7.Be7,W8.Re1,B8.Nbd7,W9.Ne5,B9.O-O,W10.Bg5,B10.Qd8,W11.Qf3,B11.Re8,W12.Rad1,B12.Nf8,W13.Ne4,B13.Ng6,W14.h4,B14.Nxe5,W15.dxe5,B15.Nxe4,W16.Bxe4,B16.Qc7,W17.Bxe7,B17.Qxe7,W18.h5,B18.Bd7,W19.h6,B19.gxh6,W20.Qf4,B20.h5,W21.Qh6,B21.f5,W22.exf6,B22.Qf7,W23.Re3,B23.Kh8,W24.Rg3,B24.Rg8,W25.Rg7,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
4,W1.e4,B1.e5,W2.Nf3,B2.Nc6,W3.Bb5,B3.a6,W4.Ba4,B4.Nf6,W5.O-O,B5.Be7,W6.Re1,B6.b5,W7.Bb3,B7.d6,W8.c3,B8.O-O,W9.h3,B9.Na5,W10.Bc2,B10.c5,W11.d4,B11.Qc7,W12.Nbd2,B12.Bd7,W13.Nf1,B13.cxd4,W14.cxd4,B14.Rac8,W15.Ne3,B15.Nc6,W16.d5,B16.Nb4,W17.Bb1,B17.a5,W18.a3,B18.Na6,W19.b4,B19.Ra8,W20.Bd2,B20.Rfc8,W21.Bd3,B21.Qb7,W22.g4,B22.g6,W23.Nf1,B23.axb4,W24.axb4,B24.Bd8,W25.Ng3,B25.Nc7,W26.Qe2,B26.Rxa1,W27.Rxa1,B27.Ra8,W28.Qe1,B28.Nfe8,W29.Qc1,B29.Ng7,W30.Rxa8,B30.Qxa8,W31.Bh6,B31.Nce8,W32.Qb2,B32.Qa4,W33.Kg2,B33.Bb6,W34.Bc2,B34.Qa7,W35.Bd3,B35.Qa4,W36.Ne2,B36.Nc7,W37.Nxe5,B37.dxe5,W38.Qxe5,B38.Nce8,W39.Bxg7,B39.Qd1,W40.Bh6,B40.Qxd3,W41.Qe7,B41.Ng7,W42.Ng3,B42.Qc2,W43.Qf6,B43.Nf5,W44.Qxb6,B44.Nh4+,W45.Kh2,B45.Nf3+,W46.Kg2,B46.Nh4+,W47.Kh2,B47.Nf3+,W48.Kg2,B48.Nh4+,W49.Kh2,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


In [14]:
# merge misc data and game moves into one df, and drop all unnecessary columns
chess_data = pd.concat([misc_chess_data, game_moves], axis = 1)
chess_data.drop(labels = ['PNG_File_Pos - DELETE ME', 'miscDate - DELETE ME', 'result - DELETE ME', 
               'wELO - DELETE ME', 'bELO - DELETE ME', 'event date - DELETE ME', 
               'setup - DELETE ME', 'fen - DELETE ME', 'flag - DELETE ME', 'oyrange - DELETE ME', 'bad len - DELETE ME'], axis = 1, inplace = True)

In [15]:
chess_data.head()

Unnamed: 0,Date of Game,Game Result,W-ELO,B-ELO,Num Moves,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191,192,193,194,195,196,197,198,199,200,201,202,203,204,205,206,207,208,209,210,211,212,213,214,215,216,217,218,219,220,221,222,223,224,225,226,227,228,229,230,231,232,233,234,235,236,237,238,239,240,241,242,243,244,245,246,247,248,249,250,251,252,253,254,255,256,257,258,259,260,261,262,263,264,265,266,267,268,269,270,271,272,273,274,275,276,277,278,279,280,281,282,283,284,285,286,287,288,289,290,291,292,293,294,295,296,297,298,299,300,301,302,303,304,305,306,307,308,309,310,311,312,313
0,2000.03.14,1-0,2851,,67,W1.d4,B1.d5,W2.c4,B2.e6,W3.Nc3,B3.Nf6,W4.cxd5,B4.exd5,W5.Bg5,B5.Be7,W6.e3,B6.Ne4,W7.Bxe7,B7.Nxc3,W8.Bxd8,B8.Nxd1,W9.Bxc7,B9.Nxb2,W10.Rb1,B10.Nc4,W11.Bxc4,B11.dxc4,W12.Ne2,B12.O-O,W13.Nc3,B13.b6,W14.d5,B14.Na6,W15.Bd6,B15.Rd8,W16.Ba3,B16.Bb7,W17.e4,B17.f6,W18.Ke2,B18.Nc7,W19.Rhd1,B19.Ba6,W20.Ke3,B20.Kf7,W21.g4,B21.g5,W22.h4,B22.h6,W23.Rh1,B23.Re8,W24.f3,B24.Bb7,W25.hxg5,B25.fxg5,W26.d6,B26.Nd5+,W27.Nxd5,B27.Bxd5,W28.Rxh6,B28.c3,W29.d7,B29.Re6,W30.Rh7+,B30.Kg8,W31.Rbh1,B31.Bc6,W32.Rh8+,B32.Kf7,W33.Rxa8,B33.Bxd7,W34.Rh7+,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1,2000.03.14,1-0,2851,,53,W1.e4,B1.d5,W2.exd5,B2.Qxd5,W3.Nc3,B3.Qa5,W4.d4,B4.Nf6,W5.Nf3,B5.c6,W6.Ne5,B6.Bf5,W7.g4,B7.Be4,W8.f3,B8.Bd5,W9.a3,B9.Nbd7,W10.Be3,B10.Nxe5,W11.dxe5,B11.Nxg4,W12.Bd4,B12.e6,W13.b4,B13.Qd8,W14.Nxd5,B14.Qxd5,W15.c4,B15.Ne3,W16.cxd5,B16.Nxd1,W17.dxc6,B17.bxc6,W18.Rxd1,B18.Be7,W19.Ba6,B19.O-O,W20.Ke2,B20.Rab8,W21.Rc1,B21.Rfd8,W22.Rhd1,B22.c5,W23.Bxc5,B23.Rxd1,W24.Rxd1,B24.Bxc5,W25.bxc5,B25.g6,W26.c6,B26.Rb2+,W27.Rd2,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
2,1999.11.20,1-0,2851,,57,W1.e4,B1.e5,W2.Nf3,B2.Nc6,W3.Bc4,B3.Bc5,W4.c3,B4.Nf6,W5.d3,B5.d6,W6.Bb3,B6.O-O,W7.Nbd2,B7.Be6,W8.O-O,B8.Qd7,W9.Re1,B9.Rfe8,W10.Nf1,B10.Ne7,W11.Ng3,B11.Bg4,W12.h3,B12.Be6,W13.Bg5,B13.Kh8,W14.Bxf6,B14.gxf6,W15.d4,B15.exd4,W16.cxd4,B16.Bb4,W17.Re3,B17.Rg8,W18.d5,B18.Bxh3,W19.Qd4,B19.Rg6,W20.Qxb4,B20.c5,W21.Qc3,B21.Bg4,W22.Bc2,B22.Rh6,W23.Nh2,B23.b5,W24.b4,B24.Rc8,W25.Bd3,B25.c4,W26.Bc2,B26.Bh5,W27.Nxh5,B27.Rxh5,W28.Qxf6+,B28.Kg8,W29.Bd1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
3,1999.11.20,1-0,2851,,49,W1.e4,B1.d5,W2.exd5,B2.Qxd5,W3.Nc3,B3.Qa5,W4.d4,B4.e6,W5.Nf3,B5.c6,W6.Bd3,B6.Nf6,W7.O-O,B7.Be7,W8.Re1,B8.Nbd7,W9.Ne5,B9.O-O,W10.Bg5,B10.Qd8,W11.Qf3,B11.Re8,W12.Rad1,B12.Nf8,W13.Ne4,B13.Ng6,W14.h4,B14.Nxe5,W15.dxe5,B15.Nxe4,W16.Bxe4,B16.Qc7,W17.Bxe7,B17.Qxe7,W18.h5,B18.Bd7,W19.h6,B19.gxh6,W20.Qf4,B20.h5,W21.Qh6,B21.f5,W22.exf6,B22.Qf7,W23.Re3,B23.Kh8,W24.Rg3,B24.Rg8,W25.Rg7,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
4,2000.02.20,1/2-1/2,2851,2633.0,97,W1.e4,B1.e5,W2.Nf3,B2.Nc6,W3.Bb5,B3.a6,W4.Ba4,B4.Nf6,W5.O-O,B5.Be7,W6.Re1,B6.b5,W7.Bb3,B7.d6,W8.c3,B8.O-O,W9.h3,B9.Na5,W10.Bc2,B10.c5,W11.d4,B11.Qc7,W12.Nbd2,B12.Bd7,W13.Nf1,B13.cxd4,W14.cxd4,B14.Rac8,W15.Ne3,B15.Nc6,W16.d5,B16.Nb4,W17.Bb1,B17.a5,W18.a3,B18.Na6,W19.b4,B19.Ra8,W20.Bd2,B20.Rfc8,W21.Bd3,B21.Qb7,W22.g4,B22.g6,W23.Nf1,B23.axb4,W24.axb4,B24.Bd8,W25.Ng3,B25.Nc7,W26.Qe2,B26.Rxa1,W27.Rxa1,B27.Ra8,W28.Qe1,B28.Nfe8,W29.Qc1,B29.Ng7,W30.Rxa8,B30.Qxa8,W31.Bh6,B31.Nce8,W32.Qb2,B32.Qa4,W33.Kg2,B33.Bb6,W34.Bc2,B34.Qa7,W35.Bd3,B35.Qa4,W36.Ne2,B36.Nc7,W37.Nxe5,B37.dxe5,W38.Qxe5,B38.Nce8,W39.Bxg7,B39.Qd1,W40.Bh6,B40.Qxd3,W41.Qe7,B41.Ng7,W42.Ng3,B42.Qc2,W43.Qf6,B43.Nf5,W44.Qxb6,B44.Nh4+,W45.Kh2,B45.Nf3+,W46.Kg2,B46.Nh4+,W47.Kh2,B47.Nf3+,W48.Kg2,B48.Nh4+,W49.Kh2,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


In [16]:
# remove rows with missing player ratings. Only the players playing as Black had their ELO missing.
chess_data = chess_data[chess_data['B-ELO'] != 'None']

In [17]:
# remove rows where the number of game moves is 0.
chess_data = chess_data[chess_data['Num Moves'] != 0]

In [18]:
# we only want games with openings that we can analyze, so get all games that 
# have num moves at least 16 (8 move each side)
chess_data = chess_data[chess_data['Num Moves'] >= 16]

In [19]:
# we only care about the year that the game took place. reformat the date col to reflect that
# try a string function on the df.
new_column = chess_data['Date of Game'].str.slice(0, 4, 1)
chess_data['Date of Game'] = new_column

In [20]:
chess_data.head()

Unnamed: 0,Date of Game,Game Result,W-ELO,B-ELO,Num Moves,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191,192,193,194,195,196,197,198,199,200,201,202,203,204,205,206,207,208,209,210,211,212,213,214,215,216,217,218,219,220,221,222,223,224,225,226,227,228,229,230,231,232,233,234,235,236,237,238,239,240,241,242,243,244,245,246,247,248,249,250,251,252,253,254,255,256,257,258,259,260,261,262,263,264,265,266,267,268,269,270,271,272,273,274,275,276,277,278,279,280,281,282,283,284,285,286,287,288,289,290,291,292,293,294,295,296,297,298,299,300,301,302,303,304,305,306,307,308,309,310,311,312,313
4,2000,1/2-1/2,2851,2633,97,W1.e4,B1.e5,W2.Nf3,B2.Nc6,W3.Bb5,B3.a6,W4.Ba4,B4.Nf6,W5.O-O,B5.Be7,W6.Re1,B6.b5,W7.Bb3,B7.d6,W8.c3,B8.O-O,W9.h3,B9.Na5,W10.Bc2,B10.c5,W11.d4,B11.Qc7,W12.Nbd2,B12.Bd7,W13.Nf1,B13.cxd4,W14.cxd4,B14.Rac8,W15.Ne3,B15.Nc6,W16.d5,B16.Nb4,W17.Bb1,B17.a5,W18.a3,B18.Na6,W19.b4,B19.Ra8,W20.Bd2,B20.Rfc8,W21.Bd3,B21.Qb7,W22.g4,B22.g6,W23.Nf1,B23.axb4,W24.axb4,B24.Bd8,W25.Ng3,B25.Nc7,W26.Qe2,B26.Rxa1,W27.Rxa1,B27.Ra8,W28.Qe1,B28.Nfe8,W29.Qc1,B29.Ng7,W30.Rxa8,B30.Qxa8,W31.Bh6,B31.Nce8,W32.Qb2,B32.Qa4,W33.Kg2,B33.Bb6,W34.Bc2,B34.Qa7,W35.Bd3,B35.Qa4,W36.Ne2,B36.Nc7,W37.Nxe5,B37.dxe5,W38.Qxe5,B38.Nce8,W39.Bxg7,B39.Qd1,W40.Bh6,B40.Qxd3,W41.Qe7,B41.Ng7,W42.Ng3,B42.Qc2,W43.Qf6,B43.Nf5,W44.Qxb6,B44.Nh4+,W45.Kh2,B45.Nf3+,W46.Kg2,B46.Nh4+,W47.Kh2,B47.Nf3+,W48.Kg2,B48.Nh4+,W49.Kh2,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
5,2000,1/2-1/2,2851,2748,52,W1.d4,B1.e6,W2.Nf3,B2.Nf6,W3.c4,B3.d5,W4.Nc3,B4.dxc4,W5.e4,B5.Bb4,W6.Bg5,B6.c5,W7.Bxc4,B7.cxd4,W8.Nxd4,B8.Qa5,W9.Bd2,B9.O-O,W10.Nc2,B10.Bxc3,W11.Bxc3,B11.Qg5,W12.Qe2,B12.Qxg2,W13.O-O-O,B13.Qxe4,W14.Rhg1,B14.g6,W15.Ne3,B15.e5,W16.f4,B16.Be6,W17.Bd3,B17.Qxf4,W18.Rgf1,B18.Qh4,W19.Be1,B19.Qa4,W20.Rxf6,B20.Nc6,W21.Rxe6,B21.Nd4,W22.Qg4,B22.Qxa2,W23.Bxg6,B23.hxg6,W24.Rxg6+,B24.fxg6,W25.Qxg6+,B25.Kh8,W26.Qh5+,B26.Kg8,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
6,2000,1-0,2851,2191,79,W1.e4,B1.c5,W2.Nf3,B2.Nc6,W3.Bb5,B3.g6,W4.Bxc6,B4.dxc6,W5.d3,B5.Bg7,W6.h3,B6.Nf6,W7.Nc3,B7.O-O,W8.Be3,B8.Qa5,W9.Qd2,B9.Rd8,W10.O-O,B10.Bd7,W11.Bh6,B11.Qc7,W12.Bxg7,B12.Kxg7,W13.Qe3,B13.b6,W14.Nh2,B14.Rf8,W15.f4,B15.Rad8,W16.Rae1,B16.Bc8,W17.f5,B17.e5,W18.Rf2,B18.Qd6,W19.Ref1,B19.h6,W20.b3,B20.Qd4,W21.Qe1,B21.b5,W22.Ne2,B22.Qd6,W23.Ng3,B23.c4,W24.dxc4,B24.bxc4,W25.Qa5,B25.cxb3,W26.axb3,B26.Rd7,W27.Nf3,B27.c5,W28.Qc3,B28.Re7,W29.Ra1,B29.Rfe8,W30.Ra5,B30.Rc7,W31.Qe3,B31.Qb6,W32.Ra4,B32.Bb7,W33.Qc3,B33.Nd7,W34.Ra1,B34.c4,W35.b4,B35.f6,W36.fxg6,B36.Kxg6,W37.Nh4+,B37.Kh7,W38.Qf3,B38.Qe6,W39.Rxa7,B39.Rg8,W40.Nhf5,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
8,2000,1/2-1/2,2851,2175,72,W1.c4,B1.e6,W2.g3,B2.d5,W3.Bg2,B3.Nf6,W4.Nf3,B4.Be7,W5.b3,B5.O-O,W6.O-O,B6.c5,W7.Bb2,B7.Nc6,W8.e3,B8.b6,W9.Nc3,B9.Bb7,W10.cxd5,B10.Nxd5,W11.Nxd5,B11.Qxd5,W12.d4,B12.Rad8,W13.Ne5,B13.Qd6,W14.dxc5,B14.Qxc5,W15.Qe2,B15.Nxe5,W16.Bxb7,B16.Qc7,W17.Bg2,B17.Bc5,W18.Rfd1,B18.a5,W19.Rxd8,B19.Rxd8,W20.Rd1,B20.Rxd1+,W21.Qxd1,B21.Qd7,W22.Qxd7,B22.Nxd7,W23.Bc6,B23.Nf6,W24.Kg2,B24.Kf8,W25.Kf3,B25.Ke7,W26.g4,B26.h6,W27.h4,B27.g6,W28.Bb5,B28.h5,W29.g5,B29.Nd5,W30.Ke2,B30.Nc7,W31.Bd3,B31.Nd5,W32.Bc4,B32.Kd6,W33.a3,B33.Ne7,W34.e4,B34.Nc6,W35.f4,B35.Nd4+,W36.Bxd4,B36.Bxd4,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
9,2000,1-0,2851,2646,49,W1.e4,B1.c5,W2.Nf3,B2.d6,W3.d4,B3.cxd4,W4.Nxd4,B4.Nf6,W5.Nc3,B5.a6,W6.Be3,B6.e6,W7.f3,B7.b5,W8.g4,B8.h6,W9.Qd2,B9.Nbd7,W10.O-O-O,B10.Bb7,W11.h4,B11.b4,W12.Na4,B12.d5,W13.Bh3,B13.g5,W14.Bg2,B14.gxh4,W15.Rxh4,B15.dxe4,W16.g5,B16.Nd5,W17.Rxe4,B17.hxg5,W18.Bxg5,B18.Qa5,W19.f4,B19.Rh2,W20.Nxe6,B20.fxe6,W21.Rxe6+,B21.Kf7,W22.Qd3,B22.Bg7,W23.Qf5+,B23.Kg8,W24.Rxd5,B24.Qxa4,W25.Re7,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


# DATA EXPLORATION AND VISUALIZATION

In [21]:
# consider the opening as the first 8 moves (each side).
openings_white = chess_data.loc[:, 0: 15: 2] 
openings_black = chess_data.loc[:, 1: 16: 2] 

In [22]:
openings_white.head()

Unnamed: 0,0,2,4,6,8,10,12,14
4,W1.e4,W2.Nf3,W3.Bb5,W4.Ba4,W5.O-O,W6.Re1,W7.Bb3,W8.c3
5,W1.d4,W2.Nf3,W3.c4,W4.Nc3,W5.e4,W6.Bg5,W7.Bxc4,W8.Nxd4
6,W1.e4,W2.Nf3,W3.Bb5,W4.Bxc6,W5.d3,W6.h3,W7.Nc3,W8.Be3
8,W1.c4,W2.g3,W3.Bg2,W4.Nf3,W5.b3,W6.O-O,W7.Bb2,W8.e3
9,W1.e4,W2.Nf3,W3.d4,W4.Nxd4,W5.Nc3,W6.Be3,W7.f3,W8.g4


# MACHINE LEARNING