# **Initial Analysis**
##### Chess Games Database - Available at [Kaggle](https://www.kaggle.com/datasets/arevel/chess-games/data)

This database will be used throughout the project in order to analyse the data, find insights and create a model that evaluates a player's weaknesses. This model can then be used alongside treated information from the LiChess or chess.com APIs to assess the performace of our target audience.

### **Content**

This dataset contains 6.25 Million chess games played on lichess.org during July of 2016.
Some of the games have Stockfish analysis evaluations like* [%eval 2.35] (235 centipawn advantage)* always from White's point of view. These are evaluations of the movement made by a player.

- **Event**: Game type.

- **White**: White's ID.
- **Black**: Black's ID.
- **Result**: Game Result (1-0 White wins) (0-1 Black wins)
- **UTCDate**: UTC Date.
- **UTCTime**: UTC Time.
- **WhiteElo**: White's ELO.
- **BlackElo**: Black's ELO.
- **WhiteRatingDiff**: White's rating points difference after the game.
- **BlackRatingDiff**: Blacks's rating points difference after the game.
- **ECO**: Opening in ECO encoding.
- **Opening**: Opening name.
- **TimeControl**: Time of the game for each player in seconds. The number after the increment is the number of seconds before the player's clock starts ticking in each turn.
- **Termination**: Reason of the game's end.
- **AN**: Movements in Movetext format.

#### 1. Importing Packages

In [294]:
import pandas as pd

In [295]:
path = '../../dados/experimentos/chess_games.csv'
df = pd.read_csv(path, nrows=100000)
df.head()

Unnamed: 0,Event,White,Black,Result,UTCDate,UTCTime,WhiteElo,BlackElo,WhiteRatingDiff,BlackRatingDiff,ECO,Opening,TimeControl,Termination,AN
0,Classical,eisaaaa,HAMID449,1-0,2016.06.30,22:00:01,1901,1896,11.0,-11.0,D10,Slav Defense,300+5,Time forfeit,1. d4 d5 2. c4 c6 3. e3 a6 4. Nf3 e5 5. cxd5 e...
1,Blitz,go4jas,Sergei1973,0-1,2016.06.30,22:00:01,1641,1627,-11.0,12.0,C20,King's Pawn Opening: 2.b3,300+0,Normal,1. e4 e5 2. b3 Nf6 3. Bb2 Nc6 4. Nf3 d6 5. d3 ...
2,Blitz tournament,Evangelistaizac,kafune,1-0,2016.06.30,22:00:02,1647,1688,13.0,-13.0,B01,Scandinavian Defense: Mieses-Kotroc Variation,180+0,Time forfeit,1. e4 d5 2. exd5 Qxd5 3. Nf3 Bg4 4. Be2 Nf6 5....
3,Correspondence,Jvayne,Wsjvayne,1-0,2016.06.30,22:00:02,1706,1317,27.0,-25.0,A00,Van't Kruijs Opening,-,Normal,1. e3 Nf6 2. Bc4 d6 3. e4 e6 4. Nf3 Nxe4 5. Nd...
4,Blitz tournament,kyoday,BrettDale,0-1,2016.06.30,22:00:02,1945,1900,-14.0,13.0,B90,"Sicilian Defense: Najdorf, Lipnitsky Attack",180+0,Time forfeit,1. e4 c5 2. Nf3 d6 3. d4 cxd4 4. Nxd4 Nf6 5. N...


In [296]:
df.describe()

Unnamed: 0,WhiteElo,BlackElo,WhiteRatingDiff,BlackRatingDiff
count,100000.0,100000.0,99954.0,99954.0
mean,1737.93875,1735.91776,0.623887,-0.189527
std,264.168375,265.711759,22.549863,22.435702
min,799.0,738.0,-537.0,-570.0
25%,1557.0,1554.0,-9.0,-10.0
50%,1738.0,1735.0,1.0,-1.0
75%,1914.25,1914.0,10.0,9.0
max,2737.0,2731.0,638.0,644.0


In [297]:
print(f"Columns: {df.columns}")

Columns: Index(['Event', 'White', 'Black', 'Result', 'UTCDate', 'UTCTime', 'WhiteElo',
       'BlackElo', 'WhiteRatingDiff', 'BlackRatingDiff', 'ECO', 'Opening',
       'TimeControl', 'Termination', 'AN'],
      dtype='object')


In [298]:
df.Event.value_counts()

Event
 Blitz                    37532
 Classical                24832
 Bullet                   19766
 Bullet tournament         7379
 Blitz tournament          7342
 Classical tournament      2752
 Correspondence             396
Blitz tournament              1
Name: count, dtype: int64

In [299]:
df.White.value_counts()

White
playfasterordie     110
BeautifulSquares     94
ssww94               93
companal2            91
palang1359           89
                   ... 
billy_bazooka         1
StopDryFarting        1
Waclock               1
Humus_Land            1
users7777             1
Name: count, Length: 24307, dtype: int64

In [300]:
df.Black.value_counts()

Black
playfasterordie     110
companal2            99
BeautifulSquares     88
ssww94               86
Beserking101         82
                   ... 
Uz_2020               1
Nidert                1
Diegouuu              1
Jok777                1
Drex78                1
Name: count, Length: 24112, dtype: int64

In [301]:
df.Result.value_counts()

Result
1-0        49713
0-1        46535
1/2-1/2     3752
Name: count, dtype: int64

In [302]:
df.Opening.value_counts()

Opening
Van't Kruijs Opening                                             2064
Scandinavian Defense: Mieses-Kotroc Variation                    1854
Modern Defense                                                   1613
Horwitz Defense                                                  1485
Sicilian Defense                                                 1402
                                                                 ... 
Queen's Indian Defense: Fianchetto Variation                        1
Blackmar-Diemer Gambit Declined, Elbert Countergambit               1
Russian Game: Moody Gambit                                          1
Benko Gambit Declined, Quiet Line                                   1
Bishop's Opening: Calabrese Countergambit, Jaenisch Variation       1
Name: count, Length: 2087, dtype: int64

In [303]:
df.TimeControl.value_counts()

TimeControl
300+0      16840
180+0      15297
60+0       15001
600+0       9974
30+0        3360
           ...  
2100+10        1
960+10         1
1500+4         1
2100+8         1
2700+2         1
Name: count, Length: 366, dtype: int64

In [304]:
df.Termination.value_counts()

Termination
Normal              67644
Time forfeit        32117
Abandoned             238
Rules infraction        1
Name: count, dtype: int64