# AI Bot: Teaching model to play like Magnus Carlsen

<b>Overview</b> <br>
The project involves building a neural network-based AI that plays chess, with the goal of mimicking the playing style of Magnus Carlsen. The AI is trained on data extracted from games played by Magnus Carlsen, with the evaluation of positions performed by the Stockfish chess engine. The project uses TensorFlow for training the neural networks and employs a phased approach, training separate models for different stages of the game: opening, postopening, middlegame, postmiddlegame, and endgame.unctions.

<b>Data Preparation</b> <br>
The data used for training the models is sourced from a PGN (Portable Game Notation) file containing games played by Magnus Carlsen. The parse_pgn function reads the PGN file and extracts chess positions (in FEN format) and their evaluations from Stockfish. Each position is annotated with the corresponding phase of the game and the best move. This information is saved in a CSV file for further use. The phases are determined based on the ply count (half-moves) of the game, with each phase representing a specific segment of a game.  

<b>Data Encoding</b>
<br>
To make the data suitable for training a neural network, the chess board positions are encoded into a numerical format. The encode_board function converts a FEN string into a one-hot encoded matrix, where each piece type (including empty squares) is represented by a unique binary vector. This encoding captures the spatial configuration of pieces on the board, which is essential for the neural network to learn and make predictions.  <br>

<b>Training the Models</b> <br>
The project involves training five separate models, each corresponding to a different phase of the game. This phased approach is employed because the nature of the game changes significantly from the opening to the endgame, and a single model might not effectively capture the nuances of each phase. For each phase, the dataset is split into training and validation sets, and a neural network model is built and trained using these sets. The models are relatively simple, consisting of a flattening layer followed by dense layers with ReLU activation functions. This can be changed for better results but 
for our purposes this was enough (adding more layers with more neurons)<br>

<b> Examples of data fed to a model</b> <br>
Since we explaind basic stuff for this project let's dive a little deeper. What we are actually feeding the model is matrix. <br>
For example: <br>
Each piece has it's own index: <br>
<blockquote>rnbqkpRNBQKP.</blockquote> <br>
Small letters represent white pieces and capital black pieces. Dot at the end means that that square on board is empty. So now each piece has it's own index which is great since the board in csv file is presented in fen notation. We do this for each square on board and we will end up with 64x13 matrix
(64 squares on board and 13 posibilities for pieces including empty square). <br>
So if there is white rook that we encounter the outpu will be like this: <br>
<blockquote>[1, 0, 0, ... , 0]</blockquote> 

<b>Problems and possible solutions</b> <br>
The main problem of this model was that it was not thinking ahead like humans usually do or engines. It was only fixed on current position
and did not think what is going to happen after few moves. So in order to make him think ahead and ignore stupid moves that he thinks are good
we had to 'play' tha move he thinks is the best and then analyze the postion again and see if he is missing something. This does not fix the problem 
completly but removes some big mistakes that can happen. 

<b>Scripts and csv used for this project:</b> <br>
[AI_model](https://github.com/PastMatter/PR24MVDBACSAJR/tree/main/Ai_model) <br>
5phases_model.py (For better result you can add more layers to models and more neurons)<br>
positions.csv (NOTE: this is not the whole data used for training the model-original one was too bit to uplaod to git) <br>
To get the actual data use script sorting_data.py combined with Carlsen.pgn file. (NOTE: this takes some time since original positions.csv had 400 000 lines)

# Elo Rating and Age Correlation: Analyzing the Impact of Age on Chess Performance

The relationship between age and performance in chess has long intrigued enthusiasts and researchers alike. One of the most prominent metrics for evaluating a player's skill level in chess is the Elo rating system, developed by Arpad Elo. Elo ratings offer a standardized method to compare players' proficiency, dynamically updating based on their game outcomes. By analyzing the correlation between players' ages and their Elo ratings, we can gain valuable insights into how age impacts performance, identifying trends such as peak performance ages and the typical career lifespan of top players.

<b>Code Overview</b> <br>
The provided code performs an analysis of the correlation between chess players' ages and their Elo ratings. The process includes data preparation, visualization, correlation analysis, and polynomial regression to explore the relationship between age and performance.

The data is read from a CSV file containing chess players' rankings from 1851 to 2001. The pd.read_csv function reads the data, handling potential encoding issues. The 'Age' column is converted to numeric, and rows with missing values in either 'Age' or 'Rating' are dropped to ensure clean data for analysis.

```Python
# Correlation analysis
correlation, _ = pearsonr(data_df['Age'], data_df['Rating'])
print(f"Correlation between age and Elo rating: {correlation:.2f})e> de>

Correlation between age and Elo rating: 0.02

The Pearson correlation coefficient is calculated to quantify the strength and direction of the relationship between age and Elo rating. A correlation coefficient close to 0 suggests a weak or no linear relationship.

![](https://github.com/PastMatter/PR24MVDBACSAJR/blob/main/report_alem/peakPerformance-Alem.png?raw=true)

Polynomial regression is used to model the relationship between age and Elo rating. The PolynomialFeatures class from sklearn.preprocessing transforms the age data into polynomial features, and LinearRegression fits a polynomial model to the data. The predicted ratings are then plotted against the actual data, showing the fitted polynomial curve.

<b>Scripts and data used:</b> <br>
[ELO studies](https://github.com/PastMatter/PR24MVDBACSAJR/tree/main/report_alem) <br>
elo_studies.ipynb <br>
ranking_chessplayers_1851_2001.csv

# Cheaters in chess

As our world becomes more and more competitive, so does the world of chess. For many it seems the easy way to just cheat.
Either the old fashioned way with a microphone in the ear or by simply copying a machine.
Cheating is evolving as are the ways of detecting it.

<b>ANALYSIS OF ELO RANK GROWTH USING A Q-Q PLOT</b> <br>
<br>
I will try to find if in the dataset "ranking_chessplayers_1851_2001.csv" which consists of the changes in ELO
over twenty years anyone's ELO has changes to such a degree that it would be an outlier in the data.
For that I will use the Q-Q plot which shows us, if the data fits the distribution.

As a general assumption I will set the growth rate threshhold at 50, which is the normal growth in ELO
we can expect of high ranking players.

With the Q-Q plot we can see if the values do indeed follow the way data is distributed in the dataset.
Outliers would overperform the distribution and would we quickly visible.

![](https://github.com/PastMatter/PR24MVDBACSAJR/blob/main/report-Rozman/qqNormalDistribution.png?raw=true)

We can clearly see that the distribution isn't normal so, which would be an interesting implication.
A normal distribution would mean that some players naturally just climb really fast way past the threshold of fifty points
and some would fall quite quickly, but that isn't the case.
We can try with an uniform distribution.

![](https://github.com/PastMatter/PR24MVDBACSAJR/blob/main/report-Rozman/qqUniformDistribution.png?raw=true)

It's quite interesting. An uniform distribution implies that the players from the data, which are of high ranking
have an equal chance of gaining and loosing ELO and that noone was outside this bound, which shows great sportsmanship or
a great ability to chear.
We can conclude that highly rated players don't simply climb to their top-tier ranking quickly and then stay there, but
that it's a slow and gradual process of continually and slowly grinding out their well deserved rankings.

<b>Conclusion</b> <br>
If you want to become a chess grandmaster that you have to play a lot of game and there are no shortcuts.

<b>Scripts and data used</b> <br>
[Cheaters in chess](https://github.com/PastMatter/PR24MVDBACSAJR/tree/main/outliersCheatersQQ) <br>
outliersCheatersQQ.ipynb <br>
ranking_chessplayers_1851_2001.csv

# Socio-Economic Factors and Chess Performance

<b>Introduction</b> <br>
This report explores the relationship between socio-economic factors and chess performance, focusing on the ratings of titled chess players (Grandmasters, International Masters, and Candidate Masters) across different countries. The analysis utilizes various socio-economic indicators, including GDP per capita, education enrollment rates, and happiness scores, to understand how these factors correlate with chess ratings and the number of titled players per capita.


<b>Data Collection and Preparation</b> <br>

The data used in this analysis was sourced from several datasetsalysis.

<b>GDP per Capita:</b> Extracted from the World Bank.

<b>World Happiness Report:</b> Provided insights into the happiness scores and various socio-economic indicators for countries worldwide.

<b>FIDE 2023 Chess Players:</b> Contained detailed information about titled chess players, including their ratings and countries.

Additional Socio-Economic Data: Included various socio-economic metrics from different sources.
The data was merged and cleaned to ensure consistency and accuracy. Non-numeric values were converted, and missing values were handled appropriately. Titled players were grouped by country, and their total numbers were calculated for analysis.

<b>Correlation Analysis Between Chess Ratings and Socio-Economic Stats</b>

In this section, we examined the correlation between chess ratings (Standard, Rapid, Blitz) and various socio-economic factors. The data was normalized using MinMaxScaler to ensure consistent scaling across different metrics. A correlation matrix was computed to quantify the relationships.
The heatmap visualization provided insights into the strength of these correlations. The analysis revealed that while most correlations were relatively weak, certain factors like health, economy, and family support showed a positive influence on chess ratings. Conversely, factors such as corruption had a negative impact on chess performance.

![](https://github.com/PastMatter/PR24MVDBACSAJR/blob/main/socio-economics%20vs%20chess/headmap-David.png?raw=true)

<b>Distribution of Titled Players per Country</b> <br>

A bar plot was created to visualize the distribution of titled chess players (Grandmasters, International Masters, and Candidate Masters) across the top 30 countries. The countries were sorted in descending order based on the total number of titled players. This visualization highlighted the countries with the highest concentrations of chess talent.


![](https://github.com/PastMatter/PR24MVDBACSAJR/blob/main/socio-economics%20vs%20chess/dist-David.png?raw=true)

<b>Clustering Analysis Based on Socio-Economic Profiles</b> <br>

To identify groups of countries with similar socio-economic profiles, K-Means clustering was applied to the socio-economic data. Principal Component Analysis (PCA) was used to visualize the clusters. The analysis grouped countries into clusters based on GDP per capita, education enrollment rates, and happiness scores.

The clusters were analyzed to understand their socio-economic characteristics. For example, Cluster 1 included high-income countries with high education enrollment and happiness scores, while Cluster 0 comprised low-income countries with lower education enrollment and moderate happiness scores.


![](https://github.com/PastMatter/PR24MVDBACSAJR/blob/main/socio-economics%20vs%20chess/gdp-clusters-David.png?raw=true)

<b>Analysis of Titled Players per Capita and Socio-Economic Factors</b> <br>

This section analyzed the relationship between the number of titled players per capita and various socio-economic factors. The data was processed to calculate the number of titled players per 100,000 people for each country. The correlation matrix showed how socio-economic factors like health, wealth, and social support influenced the development of titled chess players.


![](https://github.com/PastMatter/PR24MVDBACSAJR/blob/main/socio-economics%20vs%20chess/clusters-David.png?raw=true)

<b>Happiness and Chess Performance</b> <br>

The final analysis visualized the relationship between chess player ratings (Standard, Rapid, Blitz) and the Happiness Score for various countries. Scatter plots depicted the relationship between happiness scores and each type of chess rating. The analysis found that while there was no strong correlation, certain trends indicated that higher happiness scores could be associated with better chess performance.


![](https://github.com/PastMatter/PR24MVDBACSAJR/blob/main/socio-economics%20vs%20chess/happines-David.png?raw=true)

<b>Conclusion</b> <br>
The analysis provides a comprehensive understanding of how socio-economic factors impact chess performance and the distribution of titled players across countries. The findings suggest that while socio-economic conditions play a role, the development of chess talent is influenced by a complex interplay of various factors. Further research could explore additional socio-economic metrics to deepen the understanding of these relationships.

<b>Scripts and data used:</b> <br>
[Socio Economics VS Chess](https://github.com/PastMatter/PR24MVDBACSAJR/tree/main/socio-economics%20vs%20chess) <br>
soc-eco-che.ipynb

# Women in chess

Chess has long been regarded as a game that challenges intellect, strategy, and mental fortitude. Historically dominated by male players, the realm of competitive chess has seen a growing number of formidable female players making their mark on the global stage. This analysis delves into the historical trends of top female chess players, exploring how their ratings have evolved over time and comparing the initial performance of different generations. By examining these patterns, we gain insights into the progress and ongoing challenges within women's chess, highlighting the dedication and skill of these exceptional players as they strive for excellence in a traditionally male-dominated sport.


![](https://github.com/PastMatter/PR24MVDBACSAJR/blob/main/women%20in%20chess/rtg-Stefanija.png?raw=true)

The graph shows a general decline in average ratings for players born in the 2000s, possibly indicating fewer high-performing players or younger players not yet at their peak, which is the age of 41. Early 20th century data reveals high variations, particularly in Standard ratings, likely due to a few exceptional players or incomplete data. From the 1930s to the 1970s, ratings are stable across all formats, with Standard ratings consistently higher than Rapid and Blitz. Post-1980s, there's a noticeable decline, especially in Blitz ratings, suggesting fewer top female players or incomplete data for these birth years.

![](https://github.com/PastMatter/PR24MVDBACSAJR/blob/main/women%20in%20chess/boxplot-Stefanija.png?raw=true)

The boxplot compares the initial Standard ratings of female chess players across different generations. It shows that median ratings have remained relatively stable from the 1960s through the 2000s, with slight variations. The interquartile range (IQR) and the spread of outliers indicate a consistent level of top performers in each generation, although the 1980s and 1990s show a slightly wider IQR, suggesting more variability in initial ratings during these decades. The presence of outliers above the 2400 rating mark across all generations highlights a number of exceptional players consistently emerging over time. However, the 2000s exhibit a slightly lower median and IQR, suggesting that younger players might still be developing their skills.

<b>Distribution of ratings</b>

![](https://github.com/PastMatter/PR24MVDBACSAJR/blob/main/women%20in%20chess/rtg-dist-Stefanija.png?raw=true)

The presented histograms provide a detailed look at the distributions of Standard, Rapid, and Blitz ratings among top female chess players, as well as the distribution of chess titles held by these players. The Standard Rating Distribution shows a right-skewed distribution with a peak around 2000, indicating that most players have Standard ratings between 1800 and 2200, with fewer players reaching the higher echelons above 2400. In contrast, the Rapid and Blitz Rating Distributions exhibit a more bell-shaped curve, centered around 1900 and 1800 respectively, suggesting a more typical distribution with fewer extreme outliers.

The Title Distribution highlights the prevalence of different chess titles among female players. The most common titles are Woman FIDE Master (WFM) and Woman International Master (WIM), reflecting a substantial number of players achieving significant but not top-tier titles. There are fewer Grandmasters (GM) and International Masters (IM), which are the highest titles in chess, indicating the challenges women face in reaching the pinnacle of chess excellence. The distributions provide a comprehensive overview of the competitive landscape, illustrating the concentration of ratings and titles, and highlighting the areas where female players excel and where there might be opportunities for further advancement.

![](https://github.com/PastMatter/PR24MVDBACSAJR/blob/main/women%20in%20chess/rtg-comparison-Stefanija.png?raw=true)

The comparison of top female and male chess players in Standard, Rapid, and Blitz categories reveals the dominance of key figures like Judit Polgar and Hou Yifan among women, and Magnus Carlsen and Hikaru Nakamura among men. Polgar, even in retirement, and Yifan consistently rank at the top, while Carlsen leads in Standard and Rapid, and Nakamura excels in Blitz. This highlights the exceptional talent and skill in both genders, though notable rating gaps between male and female players suggest ongoing discussions about gender disparity in competitive chess.

<b>Scripts and data used</b> <br>

[Women in chess](https://github.com/PastMatter/PR24MVDBACSAJR/tree/main/women%20in%20chess) <br>
distribution_of_ratings.ipynb <br>
averages_across_the_years.ipynb<br>
female_distribution.ipynb<br>
male_and_female_top_players.ipynb<br>