<a href="https://colab.research.google.com/github/GOTWIC/AI-Chess-Engine/blob/main/Chess_Evaluator.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Import Libraries

In [2]:
import pandas as pd
import numpy as np
from sklearn.metrics import mean_absolute_error
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
import gc
from google.colab import drive

# Import Original Dataset and Reformat to Custom Dataframe (Not Recommended)

The steps in this section achieve the following:
1. Import the Original Kaggle dataset
2. Load the Dataset into a dataframe
3. Parse and Reformat the dataframe for all 13 million positions into a new dataframe
4. Upload new dataframe to Google Drive as a .CSV file.

Because reformatting 13 million chess positions is incredibly resource intensive, it's better to download the reformatted .CSV file that is generated at the end of this process. Using the reformatted .CSV file will significantly reduce the setup time for subsequent sessions.



### *If you want to download the reformatted .CSV file:*
*It is better to save the reformatted file to drive and download from there (10 minutes total), rather than save the file to colab's local storage and download directly from here (20 minutes or more).*

## Import Dataset 

1.   Download Kaggle API .json file
2.   Upload to Google Drive (root folder)
3.   Mount Google Drive



In [None]:
! pip install kaggle
! mkdir ~/.kaggle
! cp /content/drive/MyDrive/kaggle.json ~/.kaggle/kaggle.json
! chmod 600 ~/.kaggle/kaggle.json
! kaggle datasets download ronakbadhe/chess-evaluations
! unzip chess-evaluations.zip

mkdir: cannot create directory ‘/root/.kaggle’: File exists
chess-evaluations.zip: Skipping, found more recently modified local copy (use --force to force download)
Archive:  chess-evaluations.zip
replace chessData.csv? [y]es, [n]o, [A]ll, [N]one, [r]ename: 

## Load and Read Dataset

In [3]:
path = 'chessData.csv' 
rawData = pd.read_csv(path)

## Parse FEN + Evaluation into Custom Data Frame

The original data frame includes only two columns: the FEN and the Evaluation. To prepare the dataset for inputting, we need to reformat the input. The reformatting parses the FEN string and allocates a column for every square on the board (64 squares), and an additional column for the evaluation (65 columns in total).

***This step will take around 18 minutes to complete.***

In [4]:
def parsePiece(piece):
  return {
        'P': 1,
        'N': 2,
        'B': 3,
        'R': 4,
        'Q': 5,
        'K': 6,
        'p': -1,
        'n': -2,
        'b': -3,
        'r': -4,
        'q': -5,
        'k': -6,
    }[piece]

rows = len(rawData)
rawParse = np.empty([rows, 65])

for row in range(rows):
  FENIndex = 0
  for char in rawData.FEN[row]:
    if char.isalpha():
      rawParse[row][FENIndex] = parsePiece(char)
      FENIndex += 1
    elif char.isdigit():
      for emptySpace in range(int(char)):
        rawParse[row][FENIndex] = 0
        FENIndex += 1
    elif char == ' ':
      break

  eval = rawData.Evaluation[row]
  if eval[0] == '+':
    rawParse[row][64] = int(eval[1:])
  elif eval[0] == '-':
    rawParse[row][64] = -1 * int(eval[1:])
  elif eval[0] == '#':
    if eval[1] == '+':
      rawParse[row][64] = 100000
    elif eval[1] == '-':
      rawParse[row][64] = -100000
  else:
    rawParse[row][64] = 0

  if row%100000 == 0:
    print("Currently reformatting position #" + str(row))

print("Finished processing " + str(rows) + " positions")
  

squareLabels = []
for i in range(1, 9):
  for j in range(1, 9):
    squareLabels.append(chr(j + 96) + chr((8-i) + 49))
squareLabels.append('Evaluation')

data = pd.DataFrame(rawParse, columns = squareLabels)

print("Reformatting Complete")





Currently reformatting position #0
Currently reformatting position #100000
Currently reformatting position #200000
Currently reformatting position #300000
Currently reformatting position #400000
Currently reformatting position #500000
Currently reformatting position #600000
Currently reformatting position #700000
Currently reformatting position #800000
Currently reformatting position #900000
Currently reformatting position #1000000
Currently reformatting position #1100000
Currently reformatting position #1200000
Currently reformatting position #1300000
Currently reformatting position #1400000
Currently reformatting position #1500000
Currently reformatting position #1600000
Currently reformatting position #1700000
Currently reformatting position #1800000
Currently reformatting position #1900000
Currently reformatting position #2000000
Currently reformatting position #2100000
Currently reformatting position #2200000
Currently reformatting position #2300000
Currently reformatting position

## Delete the Original Dataframe
Reformatting pretty much deletes all of your ram. This step will free up some memory space.

In [7]:
del rawData
gc.collect()

533

## Save New Dataframe as a .CSV file and download to Google Drive
After this step, you can import the new dataframe directly from Google Drive. 

***This process will take about 7 minutes.*** 

In [9]:
path = '/content/drive/My Drive/reformattedChessData.csv'
with open(path, 'w', encoding = 'utf-8-sig') as f:
  data.to_csv(f)

# The Reformatted Dataset