The Algorithmic Approach to Winning the "Guess Who?" Game

This repository emerges from the necessity observed during numerous sessions of playing the "Guess Who?" game with my children. Recognizing an opportunity to enhance their gameplay, I embarked on developing a model aimed at determining the optimal strategy for winning the game.

Repository Description

This repository serves as a comprehensive resource center for implementing an algorithmic approach to winning the "Guess Who?" game. Developed with scrupulous attention to detail and informed by advanced machine learning techniques, it offers a systematic methodology for enhancing gameplay strategy and optimizing decision-making processes within the game.

Key Features:

Algorithmic Strategy Development: Through rigorous analysis and experimentation, this repository provides insights into the optimal strategies for playing the "Guess Who?" game. By leveraging machine learning algorithms and data-driven decision-making, it offers a sophisticated framework for achieving strategic superiority in gameplay.
Dataset Utilization: The repository incorporates a rigorously curated dataset comprising characters from the "Guess Who?" game. This dataset serves as the foundation for training machine learning models, enabling the identification of key features and patterns essential for strategic gameplay.
Model Implementation: Utilizing state-of-the-art machine learning libraries such as Scikit-Learn and LightGBM, the repository implements a range of ensemble models including Random Forest, Gradient Boosting, and Light Gradient Boosting. These models are trained on the dataset to facilitate data-driven decision-making and strategic optimization.
Visualization Tools: In addition to model implementation, the repository offers visualization tools to enhance comprehension and insight generation. Utilizing draw.io, it enables the creation of informative diagrams that offer intuitive representations of decision-making processes and strategic pathways within the game.
Documentation and Resources: Comprehensive documentation and resources accompany the repository, providing detailed insights into methodology, model implementation, and visualization techniques. These resources serve as invaluable guides for understanding and leveraging the algorithmic approach to winning the "Guess Who?" game.

Target Audience:

Game Enthusiasts: Individuals passionate about the "Guess Who?" game and eager to enhance their gameplay strategy through advanced algorithms and machine learning techniques.
Machine Learning Practitioners: Data scientists, researchers, and enthusiasts interested in applying machine learning methodologies to unconventional domains such as board games.
Educators and Researchers: Professionals seeking educational resources or research insights into the application of machine learning in recreational activities and strategic decision-making.

By offering a unique blend of gaming and machine learning expertise, this repository empowers users to delve into the fascinating intersection of artificial intelligence and recreational gameplay, unlocking new dimensions of strategy, analysis, and enjoyment within the "Guess Who?" universe.

Understanding the "Guess Who?" Game

For those unfamiliar with the mechanics of the "Guess Who?" game, it involves players selecting a mystery character each. Through a series of yes or no questions, players attempt to deduce the identity of their opponent's mystery character. A successful guess leads to victory, while an incorrect guess results in defeat. Furthermore, players can engage in a Championship Series, where the first to win 5 games claims the title of "Guess Who?" champion.

For detailed instructions, refer to Hasbro's Official Instructions.

Dataset Overview

Utilizing characters from the © 2018 Hasbro collection, this machine learning (ML) model is trained on data extracted from the game, as shown in the figure taken from Geeky Hobbies History of Guess Who? webpage.

The dataset, comprising 24 characters, includes their English and Greek names (pertaining to the Greek version of the game). Each character is described by a set of attributes, with a binary (Yes/No) value indicating their possession of each trait. A comprehensive summary of these attributes is provided below:

Initial Approach: Leveraging Oracle Data Mining

To expedite the process of identifying optimal strategies, the dataset was integrated into an Oracle Database using SQL Developer.

Employing Oracle Data Mining (ODM), an attempt was made to leverage decision tree models. Guided by the instructions available on the Oracle Help Center, the following steps were undertaken:

Workflow Creation and Model Selection:
- Selected the Class Build node in the workflow.
- Deselected all models except for the Decision Tree (DT) model.
- Parameters of the Decision Tree were adjusted to foster the growth of a maximally informative tree.
- Added a new Data Source node.
- Selected the table GUESS_WHO.
- Connected the GUESS_WHO node to the Class Build node.
- Copied and pasted the GUESS_WHO node and renamed the node to GUESS_WHO_APPLY.
- Added an Apply node.
- Connected the Class Build node and the GUESS_WHO_APPLY node to the Apply node.
Model Execution:
- Ran the Apply node.
- Checked for green check marks on all workflow nodes.

Regrettably, this approach failed to yield anticipated results, encountering a formidable challenge. Despite meticulous parameterization, an unresolved error surfaced, primarily attributed to the presence of unique values within the target variable. Consequently, an alternative avenue was pursued, invoking Python Scikit-Learn as the subsequent analytical framework.

Alternative Approach: Employing Python Scikit-Learn

In lieu of the Oracle Data Mining (ODM) approach, Python's Scikit-Learn library was harnessed to delve into the intricacies of optimal strategy formulation for the "Guess Who?" game. The implemented code orchestrates a carefully structured sequence of tasks encompassing data preparation, model training, feature importance analysis, and decision tree visualization. This holistic approach is tailored to enhance comprehension and strategic insight into effective gameplay tactics.

Package Management and Environment Setup

To ensure streamlined management of Python package dependencies, the provided code block initiates with an endeavor to capture the current state of installed packages within the environment. This information is then preserved in a file named "freeze_file.txt", serving as a comprehensive record before proceeding with the installation of requisite packages specific to the Jupyter Notebook. The "requirements.txt" file specifies essential packages alongside their version numbers, facilitating seamless installation and alignment of the project environment with necessary dependencies.

Data Preparation

The initial phase entails reading a dataset comprising all 24 characters of the game from a CSV file. This dataset encapsulates characters' English and Greek names in the first two columns, with subsequent columns representing binary responses (Yes/No) to various questions posed for each character.

Following dataset ingestion, segmentation into distinct entities is performed, delineating features (X) and the target variable (y).

Decision Tree Modeling

Decision Tree Overview

Decision Trees, renowned for their versatility, represent a non-parametric supervised learning technique employed across classification and regression tasks. These models endeavor to construct a decision-making framework predicated on elementary decision rules deduced from dataset features, thereby facilitating prediction of the target variable. In our context, Decision Trees play a pivotal role in character identification within the "Guess Who?" game, employing a systematic questioning approach akin to gameplay mechanics.

Model Instantiation and Configuration

The Decision Tree Classifier instantiation involves conscientious configuration of various parameters governing model behavior and structure. Each parameter's value is diligently chosen to optimize model efficacy and align with the inherent characteristics of the dataset. Key parameters include:

criterion: Determines the impurity measure for node splitting, with 'gini' indicating Gini impurity.
splitter: Dictates the strategy for selecting node splits, with 'best' opting for optimal splits.
max_depth: Specifies the maximum depth of the decision tree.
min_samples_split: Sets the minimum number of samples required to split an internal node.
min_samples_leaf: Specifies the minimum number of samples required to be at a leaf node.
random_state: Ensures reproducibility of results by fixing the random seed.

Model Training and Visualization

Following instantiation, the Decision Tree Classifier undergoes training on the dataset's features and classes. Post-training, the decision tree is visualized to elucidate the decision-making process underlying character identification. This visualization, facilitated by DOT data and the Graphviz library, offers a comprehensive depiction of decision pathways and criteria, enhancing strategic insight and gameplay efficacy.

The aforementioned decision tree visualization methodology is replicated across the top 11 features of each ensemble model, namely Random Forest, Gradient Boosting, and Light Gradient Boosting. This process entails subjecting the identified features from each ensemble model to the decision tree visualization procedure delineated previously.

Random Forest Modeling

Random Forest Overview

Random Forests, constituting an ensemble learning technique, excel in predictive accuracy and mitigate overfitting tendencies. Comprising a multitude of decision tree classifiers, Random Forests leverage averaging to enhance predictive efficacy.

Model Instantiation and Configuration

Similar to Decision Trees, Random Forest instantiation involves parameter configuration to tailor model behavior. Key parameters include:

n_estimators: Specifies the number of decision trees in the forest.
max_depth: Determines the maximum depth of each decision tree.
min_samples_split: Sets the minimum number of samples required to split an internal node.
min_samples_leaf: Specifies the minimum number of samples required to be at a leaf node.
random_state: Ensures reproducibility of results.

Model Training and Feature Importance Analysis

Post-instantiation, the Random Forest Classifier undergoes training, followed by feature importance analysis to ascertain each feature's predictive significance. This analysis, depicted via bar plots, offers graphical representation of feature contributions, aiding strategic decision-making and gameplay optimization.

Gradient Boosting Modeling

Gradient Boosting Overview

Gradient Boosting, leveraging boosting within a functional space, aims to minimize pseudo-residuals, yielding an ensemble model comprising weak learners, typically simplistic decision trees.

Model Instantiation and Configuration

Gradient Boosting instantiation entails parameter configuration to tailor model behavior. Key parameters include:

loss: Specifies the loss function optimized during training.
learning_rate: Determines the step size during gradient descent.
n_estimators: Specifies the number of decision trees in the ensemble.
max_depth: Determines the maximum depth of each decision tree.
random_state: Ensures reproducibility of results.

Model Training and Feature Importance Analysis

Upon instantiation, the Gradient Boosting Classifier undergoes training, followed by feature importance analysis akin to previous models. This analysis facilitates identification of salient features crucial for predictive accuracy and strategic gameplay.

LightGBM Modeling

LightGBM Overview

LightGBM, a gradient boosting framework, boasts accelerated training speed, reduced memory consumption, and enhanced accuracy. It represents an ideal choice for large-scale datasets and complex predictive modeling tasks.

Model Instantiation and Configuration

LightGBM instantiation involves parameter configuration, aligning model behavior with dataset characteristics. Key parameters include:

boosting_type: Specifies the boosting algorithm employed.
num_leaves: Sets the maximum number of leaves in each tree.
learning_rate: Determines the step size during gradient descent.
n_estimators: Specifies the number of boosting rounds.
random_state: Ensures reproducibility of results.

Model Training and Feature Importance Analysis

Post-instantiation, the LightGBM Classifier undergoes training, followed by feature importance analysis akin to previous models. This analysis facilitates identification of salient features crucial for predictive accuracy and strategic gameplay.

Model Summary and Character Identification

The performance and efficacy of each model are summarized based on the minimum and maximum number of questions required for character identification. This summary aids in selecting the optimal model for efficient gameplay and strategic advantage.

Models	Min # Questions	Max # Questions
1. GradientBoostingClassifier	3	6
2. LGBMClassifier	3	7
3. RandomForestClassifier	3	8
4. DecisionTreeClassifier	3	8

The results of the models implemented above can be reproduced by running the Jupyter Notebook.

Diagram Creation

To enhance the visualization of the implemented optimal model, we employed the draw.io tool. This facilitated the creation of diagrams that offer greater insight, conciseness, and informativeness compared to those generated by the Python code utilized previously. Within the Diagrams folder, users can find the corresponding .drawio files along with the exported images representing both the English and Greek versions of the game, in both Horizontal and Vertical flows.

Below are the Horizontal flow representations of the created diagrams:

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
Diagrams		Diagrams
LICENSE		LICENSE
README.md		README.md
guess_who_code.ipynb		guess_who_code.ipynb
guess_who_dataset.csv		guess_who_dataset.csv
requirements.txt		requirements.txt

License

Lefteris-Souflas/The-Algorithmic-Approach-to-Winning-Guess-Who

Folders and files

Latest commit

History

Repository files navigation

The Algorithmic Approach to Winning the "Guess Who?" Game

Repository Description

Key Features:

Target Audience:

Understanding the "Guess Who?" Game

Dataset Overview

Initial Approach: Leveraging Oracle Data Mining

Alternative Approach: Employing Python Scikit-Learn

Package Management and Environment Setup

Data Preparation

Decision Tree Modeling

Decision Tree Overview

Model Instantiation and Configuration

Model Training and Visualization

Random Forest Modeling

Random Forest Overview

Model Instantiation and Configuration

Model Training and Feature Importance Analysis

Gradient Boosting Modeling

Gradient Boosting Overview

Model Instantiation and Configuration

Model Training and Feature Importance Analysis

LightGBM Modeling

LightGBM Overview

Model Instantiation and Configuration

Model Training and Feature Importance Analysis

Model Summary and Character Identification

Diagram Creation

Best Strategy for Winning the "Guess Who?" Game (English - Αγγλικά)

Η καλύτερη στρατηγική για να κερδίσετε το "Μάντεψε ποιος;" (Greek - Ελληνικά)

About

Topics

Resources

License

Stars

Watchers

Forks

Languages