1. Title:
   Chess Endgame Database for White King and Rook against Black King (KRK) -
   **Black-to-move Positions Drawn or Lost in N Moves**.


2. Source Information:
   -- Creators: Database generated by Michael Bain and Arthur van Hoff
      at the Turing Institute, Glasgow, UK.
   -- Donor: Michael Bain (mike@cse.unsw.edu.au), AI Lab, Computer Science,
      University of New South Wales, Sydney 2052, Australia.
      (tel) +61 2 385 3939
      (fax) +61 2 663 4576
   -- Date: June, 1994.


3. Past Usage:

	Chess endgames are complex domains which are enumerable. Endgame
   databases are tables of stored game-theoretic values for the enumerated
   elements (legal positions) of the domain. The game-theoretic values stored
   denote whether or not positions are won for either side, or include also
   the depth of win (number of moves) assuming minimax-optimal play. From the
   point of view of experiments on computer induction such databases provide
   not only a source of examples but also an oracle (Roycroft, 1986) for
   testing induced rules. However a chess endgame database differs from, say,
   a relational database containing details of parts and suppliers in the
   following important respect. The combinatorics of computing the required
   game-theoretic values for individual position entries independently would
   be prohibitive. Therefore all the database entries are generated in a single
   iterative process using the standard backup algorithm (Thompson, 1986).

   A KRK database was described by Clarke (1977). The current database was
   described and used for machine learning experiments in Bain (1992; 1994). It
   should be noted that our database is not guaranteed correct, but the class
   distribution is the same as Clarke's database. In (Bain 1992; 1994) the
   task was classification of positions in the database as won for white in a
   fixed number of moves, assuming optimal play by both sides. The problem was
   structured into separate sub-problems by depth-of-win ordered draw, zero,
   one, ..., sixteen. When learning depth d all examples at depths > d are
   used as negatives. Quinlan (1994) applied Foil to learn a complete and
   correct solution for this task.

   The typical complexity of induced classifiers in this domain suggest
   that the task is demanding when background knowledge is restricted.


4. Relevant Information:
   An Inductive Logic Programming (ILP) or relational learning framework is
   assumed (Muggleton, 1992). The learning system is provided with examples
   of chess positions described only by the coordinates of the pieces on the
   board. Background knowledge in the form of row and column differences is
   also supplied. The relations necessary to form a correct and concise
   classifier for the target concept must be discovered by the learning system
   (the examples already provide a complete extensional definition).
   The task is closely related to Quinlan's (1983) application of ID3 to
   classify White King and Rook against Black King and Knight (KRKN) positions
   as lost 2-ply or lost 3-ply. The framework is similar in that the example
   positions supply only low-grade data. An important difference is that
   additional background predicates of the kind supplied in the KRKN study via
   hand-crafted attributes are not provided for this KRK domain.


5. Number of Instances: 28056


6. Number of Attributes:
   There are six attribute variables and one class variable.


7. Attribute Information:
   1. White King file (column)
   2. White King rank (row)
   3. White Rook file
   4. White Rook rank
   5. Black King file
   6. Black King rank
   7. optimal depth-of-win for White in 0 to 16 moves, otherwise drawn
	{draw, zero, one, two, ..., sixteen}.

In [2]:
import pandas as pd
data =  pd.read_csv('krkopt.data', sep=",",engine='python', header=None)
print(data)

       0  1  2  3  4  5        6
0      a  1  b  3  c  2     draw
1      a  1  c  1  c  2     draw
2      a  1  c  1  d  1     draw
3      a  1  c  1  d  2     draw
4      a  1  c  2  c  1     draw
...   .. .. .. .. .. ..      ...
28051  b  1  g  7  e  5  sixteen
28052  b  1  g  7  e  6  sixteen
28053  b  1  g  7  e  7  sixteen
28054  b  1  g  7  f  5  sixteen
28055  b  1  g  7  g  5  sixteen

[28056 rows x 7 columns]


## Problématiques : A partir de trois positions, prévoir le nombre de coup pour la victoire des blancs

1. Proportion de victoire et égalité
2. Correlation entre chaque case et la victoire
3. Dependance entre chaque case et la victoire
4. Chercher les éventuelles traitements nécessaires pour un modèle arbre de décision

nombre de possbilité possible totale : 
64 * 63 * 62 = 249 984

Roi dans un coin : 3 restrictions ou 2 si la tour est à côté (4 cas où le roi noir a 60 choix)(la tour à 62 choix)
Roi sur un côté : 5 restictions ou 4 si la tour est à côté (24 cas où le roi noir a 58 choix)(la tour à 62 choix)
Roi au milieu : 8 restrictions ou 7 si la tour est à côté (36 cas où le roi noir a 55 choix)(la tour à 62 choix)

- 4 * 60 * 62 = 14 880
- 24 * 58 * 62 = 86 304
- 36 * 55 * 62 = 122 760
- 14880 + 86304 + 122760 = 223 944




### Proportion de Victoire et Egalité

In [3]:
x = data.iloc[:,6].value_counts()
Results = pd.DataFrame({'Number':x,'pourcent':(x/x.sum())*100}, index = x.index)
Results

Unnamed: 0,Number,pourcent
fourteen,4553,16.228258
thirteen,4194,14.948674
twelve,3597,12.820787
eleven,2854,10.172512
draw,2796,9.965783
fifteen,2166,7.720274
ten,1985,7.075135
nine,1712,6.102082
eight,1433,5.107642
seven,683,2.434417


### Pré-traitement

In [4]:
print(data.iloc[0,0] + str(data.iloc[0,1]))


a1


In [5]:
WhiteKing = []
WhiteRook = []
BlackKing = []

for i in range (data.shape[0]):
    WhiteKing.append(data.iloc[i,0] + str(data.iloc[i,1]))
    WhiteRook.append(data.iloc[i,2] + str(data.iloc[i,3]))
    BlackKing.append(data.iloc[i,4] + str(data.iloc[i,5]))
    
data2 = pd.DataFrame({'WhiteKing':WhiteKing,'WhiteRook':WhiteRook,'BlackKing':BlackKing,'Results': data.iloc[:,6]})
data2
                

Unnamed: 0,WhiteKing,WhiteRook,BlackKing,Results
0,a1,b3,c2,draw
1,a1,c1,c2,draw
2,a1,c1,d1,draw
3,a1,c1,d2,draw
4,a1,c2,c1,draw
...,...,...,...,...
28051,b1,g7,e5,sixteen
28052,b1,g7,e6,sixteen
28053,b1,g7,e7,sixteen
28054,b1,g7,f5,sixteen


### Première Visualisation des données

In [6]:
print(data2.iloc[:,0].value_counts())
print(len(data2.iloc[:,0].value_counts()))

c1    3596
b1    3596
d1    3596
d2    3410
d3    3410
c2    3410
a1    1878
c3    1720
d4    1720
b2    1720
Name: WhiteKing, dtype: int64
10


In [7]:
print(data2.iloc[:,1].value_counts())
print(len(data2.iloc[:,1].value_counts()))

e2    455
e3    455
e1    454
e4    454
e5    453
     ... 
d2    403
d3    402
c1    400
b1    399
d1    398
Name: WhiteRook, Length: 64, dtype: int64
64


In [8]:
print(data2.iloc[:,2].value_counts())
print(len(data2.iloc[:,2].value_counts()))

h2    620
f3    620
h6    620
g4    620
f4    620
     ... 
d2    248
c1    248
b2    220
c3    220
c2    124
Name: BlackKing, Length: 64, dtype: int64
64


### Correlation entre chaque case et la victoire

In [11]:
selection = data2[data2.Results == 'sixteen']
selection

Unnamed: 0,WhiteKing,WhiteRook,BlackKing,Results
27666,a1,b2,c2,sixteen
27667,a1,b2,c3,sixteen
27668,a1,b2,d2,sixteen
27669,a1,b2,d3,sixteen
27670,a1,b2,d4,sixteen
...,...,...,...,...
28051,b1,g7,e5,sixteen
28052,b1,g7,e6,sixteen
28053,b1,g7,e7,sixteen
28054,b1,g7,f5,sixteen


In [23]:
# print(selection.iloc[:,2].value_counts())
a = selection.iloc[:,1].value_counts()
a

c6    27
f6    25
c2    23
d6    21
f2    20
g6    18
f3    18
c7    18
g7    17
g3    16
f7    14
h3    13
c3    12
h7    12
b2    12
h6    11
b6    11
g2    10
h8     9
h2     9
f8     9
c8     9
g8     8
b8     6
b7     6
b3     6
d7     6
d2     5
d3     4
e2     4
g4     4
d4     3
f4     2
c4     1
b5     1
Name: WhiteRook, dtype: int64

### Analyse de la position A1 du Roi Blanc 

In [26]:
Resultat = ["draw","zero","one","two","three","four","five","six","seven","eight","nine","ten","eleven","twelve","thirteen","fourteen","fifteen","sixteen"]
P = []
total = data2.iloc[:,0].value_counts()

for x in Resultat :
    selection = data2[data2.Results == x]
    position = selection.iloc[:,0].value_counts()
    try :
        P.append(position['a1'])
    except :
        P.append(0)
        

data_P = pd.DataFrame({'Nombre':P,'Probabilité':P/total['a1']},index = Resultat)
data_P

Unnamed: 0,Nombre,Probabilité
draw,200,0.106496
zero,0,0.0
one,0,0.0
two,0,0.0
three,0,0.0
four,0,0.0
five,0,0.0
six,0,0.0
seven,12,0.00639
eight,19,0.010117


Interprétation : Mettre le roi en A1 amène à une probabilité de 0.1 d'amener sur un 'draw'

### Analyse de la position C6 de la Tour Blanche

In [28]:
Resultat = ["draw","zero","one","two","three","four","five","six","seven","eight","nine","ten","eleven","twelve","thirteen","fourteen","fifteen","sixteen"]
P = []
total = data2.iloc[:,1].value_counts()

for x in Resultat :
    selection = data2[data2.Results == x]
    position_tour = selection.iloc[:,1].value_counts()
    try :
        P.append(position_tour['c6'])
    except :
        P.append(0)
        

data_P = pd.DataFrame({'Nombre':P,'Probabilité':P/total['c6']},index = Resultat)
data_P

Unnamed: 0,Nombre,Probabilité
draw,48,0.110345
zero,0,0.0
one,1,0.002299
two,4,0.009195
three,1,0.002299
four,0,0.0
five,5,0.011494
six,9,0.02069
seven,2,0.004598
eight,23,0.052874


In [12]:
import openpyxl
def dataframes_to_excel(df):
    # create excel writer object
    writer = pd.ExcelWriter(r'C:\Users\Andy\Documents\3A\GHATTAS\analyse.xlsx',engine='openpyxl')
    # write dataframe to excel
    df.to_excel(writer)
    # save the excel
    writer.save()
    return print('DataFrame is written successfully to Excel File.')

dataframes_to_excel(selection)

DataFrame is written successfully to Excel File.
