- Overview
- Prerequisites
- Deployment Steps
- Deployment Validation
- Running the Guidance
- Next Steps
- Cleanup
Game studios rely on a variety of key metrics to make informed, data-driven decisions that allow them to improve the overall player experience, and better understand their players. However, these studios usually lack the required skills, resources, and personal to effectively employ machine learning (ML) techniques to gleam these player insights. This guidance enables novice ML practitioners to extract predictive insights from their labeled, tabular player data, by leveraging automated machine learning (AutoML) to streamline the very manual, complex, and iterative processes of engineering data features, training ML models, tuning ML models, and evaluating ML models. By using this guidance, novice ML practitioners need only upload their labelled player data to Amazon S3, and the automated end-to-end workflow will generate a production grade model for predicting player behavior, and gaining player insights. Thereby eliminating the heavy lifting for the following tasks:
- Data exploration, with respect to the target column, to infer the ML problem type.
- Data preprocessing, and feature engineering for the inferred ML problem type.
- ML algorithm section for the inferred ML problem type.
- Ensemble model training to explore different model configurations, and find the best model for the inferred problem type.
- Evaluate the best model against a preset performance threshold.
- Deploy the evaluated model either as a hosted, or serverless endpoint.
You are responsible for the cost of the AWS services used while running this Guidance. As of January 2024, the cost for running this Guidance with the default settings in the N. Virginia AWS Region (us-east-1) is approximately $4.50 per end-to-end pipeline execution.
NOTE: The deployment automatically creates an AWS cost allocation tag based on the configured
WORKLOAD_NAME
variable in theconstants.py
file (see Deployment Steps). You can use the AWS Cost Explorer, and cost allocation tags to view the per month costs for each Amazon SageMaker Pipeline execution.
These deployment instructions are optimized to best work on a pre-configured Amazon Linux 2023 AWS Cloud9 development environment. Refer to the Individual user setup for AWS Cloud9 for more information on how to set up Cloud9 as a user in the AWS account. Deployment using another OS may require additional steps, and configured python libraries(see Third-party tools).
NOTE: A Github dev container configuration has been provided should you wish to use GitHub codespaces, or Visual Studio Code Dev Containers as your development environment.
Before deploying the guidance code, ensure that the following required tools have been installed:
- AWS Cloud Development Kit (CDK) >= 2.126
- Python >= 3.8
- NodeJS >= 18
NOTE: The Guidance has been tested using AWS CDK version 2.126. If you wish to update the CDK application to later version, make sure to update the
requirements.txt
file, in the root of the repository, with the updated version of the AWS CDK.
This deployment requires that you have an existing Amazon SageMaker Domain in your AWS account. A SageMaker Domain is required in order to provide access to monitor, and track the following SageMaker resources:
- SageMaker AutoML
- SageMaker Pipelines
- SageMaker Model Registry
NOTE: See the Quick onboard to Amazon SageMaker Domain section of the Amazon SageMaker Developer Guide for more information on how to configure an Amazon SageMaker Domain in your AWS account.
This Guidance uses AWS CDK. If you are using aws-cdk
for first time, please see the Bootstrapping section of the AWS Cloud Development Kit (AWS CDK) v2 developer guide, to provision the required resources, before you can deploy AWS CDK apps into an AWS environment.
Before deploying the guidance code, it needs to be customized to suite your specific usage requirements. Guidance configuration, and customization, is managed using the constants.py
file, located in the root of the repository. The following steps will walk you through how to customize the guidance code configuration to suite your use case, and then deploy the guidance code:
-
In the Cloud9 IDE, use the terminal to clone the repository:
git clone https://github.com/aws-solutions-library-samples/guidance-for-ai-driven-player-insights-on-aws player-insights
-
Change to the repository root folder:
cd player-insights
-
Initialize the Python virtual environment:
python3 -m venv .venv
-
Activate the virtual environment:
source .venv/bin/activate
-
Install the necessary python libraries in the virtual environment:
python -m pip install -r requirements.txt
-
Open the
constants.py
file for editing. The following settings can be adjusted to suite your use case:WORKLOAD_NAME
- Description: The name of the workload that matches your use case. This will be used as a prefix for an component deployed in your AWS account.
- Type: String
- Example:
"PlayerChurn"
REGION
- Description: The name of the AWS region into which you want to deploy the use case.
- Type: String
- Example:
"us-east-1"
SM_DOMAIN_ID
- Description: The ID for your prerequisite Amazon SageMaker Domain in your configured AWS region. You can view the ID for your domain in the AWS Console, or by running the
aws sagemaker list-domains --query "Domains[*].DomainId" --output text
command. - Type: String
- Example:
"d-abcdef12gh3i"
- Description: The ID for your prerequisite Amazon SageMaker Domain in your configured AWS region. You can view the ID for your domain in the AWS Console, or by running the
DATA_FILE
- Description: The name of the comma-separated values (CSV) file representing your player data. (See Player churn data for more information.)
- Type: String
- Example:
"player-churn.csv"
TARGET_ATTRIBUTE
- Description: The name of the target variable column of the
DATA_FILE
that you wish to train the machine learning model to predict on. (See Player churn data for more information.) - Type: String
- Example:
"player_churn"
- Description: The name of the target variable column of the
PERFORMANCE_THRESHOLD
- Description: The decision threshold, to indicate the wether or not the trained model is considered production grade. If the model evaluation metric is above or equal to this value, the model will be deployed into production.
- Type: Float
- Example:
0.5
ENDPOINT_TYPE
- Description: The type of inference endpoint for a production model, either
SERVERLESS
for Amazon SageMaker Serverless Inference, orHOSTED
for Amazon SageMAker Real-time Inference. - Type: String
- Example:
"SERVERLESS"
- Description: The type of inference endpoint for a production model, either
NOTE: Make sure to save the
constants.py
file after updating your use case settings. -
Verify that the CDK deployment correctly synthesizes the proper CloudFormation templates:
cdk synth
-
Deploy the guidance:
cdk deploy
To verify that the guidance has been successfully deployed, open the AWS CloudFormation Console, select the AWS region into which you deployed the guidance, and verify that the stack <WORKLOAD_NAME>-Stack
has a CREATE_COMPLETE
status. For example, if the WORKLOAD_NAME
variable in the constants.py
file is PlayerChurn
, then the CloudFormation stack name is PlayerChurn-Stack
.
NOTE: Make sure to capture the
DataBucketName
value from the Output tab of the CloudFormation stack.
The following example will demonstrate how to leverage the deployed guidance to create an end-to-end ML workflow to train, evaluate, and deploy an AutoML generated model, from player churn data.
Being able to identify , and predict player churn, as in games is often a key metric, or valuable player insight that can be used used to prevent player attrition. More importantly, identifying these players before they leave, or stop playing the game allows game developers to implement prescriptive solution that mitigate churn. However, detecting the various patterns, or factors that influence churn behavior, can be a difficult task for data analysts. Therefore, game developers look to employ ml-based prediction models, often requiring them to hire teams of qualified data scientists, to build, and manage these models in production.
A synthetic sample data file of player event telemetry, player-churn.csv has been provided. The dataset features various player metrics, like the player_type
, session_count
, and various session details over a period of a month, and classifies a player as "churned" if they have not logged into the game after 3 days.
The following shows an example of this data:
player_id | cohort_id | cohort_day_of_week | player_type | player_lifetime | session_count | player_churn | begin_session_count_last_day(-1) | end_session_count_last_day(-1) | begin_session_count_last_day(-2) | end_session_count_last_day(-2) | begin_session_count_last_day(-3) | end_session_count_last_day(-3) | begin_session_count_last_day(-4) | end_session_count_last_day(-4) | begin_session_count_last_day(-5) | end_session_count_last_day(-5) | begin_session_count_last_day(-6) | end_session_count_last_day(-6) | begin_session_count_last_day(-7) | end_session_count_last_day(-7) | begin_session_count_last_day(-8) | end_session_count_last_day(-8) | begin_session_count_last_day(-9) | end_session_count_last_day(-9) | begin_session_count_last_day(-10) | end_session_count_last_day(-10) | begin_session_count_last_week(-1) | end_session_count_last_week(-1) | begin_session_count_last_week(-2) | end_session_count_last_week(-2) | begin_session_count_last_week(-3) | end_session_count_last_week(-3) | begin_session_count_last_month(-1) | end_session_count_last_month(-1) | begin_session_count_last_month(-2) | end_session_count_last_month(-2) | begin_session_time_of_day_mean_last_day(-1) | end_session_time_of_day_mean_last_day(-1) | begin_session_time_of_day_mean_last_day(-2) | end_session_time_of_day_mean_last_day(-2) | begin_session_time_of_day_mean_last_day(-3) | end_session_time_of_day_mean_last_day(-3) | begin_session_time_of_day_mean_last_day(-4) | end_session_time_of_day_mean_last_day(-4) | begin_session_time_of_day_mean_last_day(-5) | end_session_time_of_day_mean_last_day(-5) | begin_session_time_of_day_mean_last_day(-6) | end_session_time_of_day_mean_last_day(-6) | begin_session_time_of_day_mean_last_day(-7) | end_session_time_of_day_mean_last_day(-7) | begin_session_time_of_day_mean_last_day(-8) | end_session_time_of_day_mean_last_day(-8) | begin_session_time_of_day_mean_last_day(-9) | end_session_time_of_day_mean_last_day(-9) | begin_session_time_of_day_mean_last_day(-10) | end_session_time_of_day_mean_last_day(-10) | begin_session_time_of_day_mean_last_week(-1) | end_session_time_of_day_mean_last_week(-1) | begin_session_time_of_day_mean_last_week(-2) | end_session_time_of_day_mean_last_week(-2) | begin_session_time_of_day_mean_last_week(-3) | end_session_time_of_day_mean_last_week(-3) | begin_session_time_of_day_mean_last_month(-1) | end_session_time_of_day_mean_last_month(-1) | begin_session_time_of_day_mean_last_month(-2) | end_session_time_of_day_mean_last_month(-2) | begin_session_time_of_day_std_last_day(-1) | end_session_time_of_day_std_last_day(-1) | begin_session_time_of_day_std_last_day(-2) | end_session_time_of_day_std_last_day(-2) | begin_session_time_of_day_std_last_day(-3) | end_session_time_of_day_std_last_day(-3) | begin_session_time_of_day_std_last_day(-4) | end_session_time_of_day_std_last_day(-4) | begin_session_time_of_day_std_last_day(-5) | end_session_time_of_day_std_last_day(-5) | begin_session_time_of_day_std_last_day(-6) | end_session_time_of_day_std_last_day(-6) | begin_session_time_of_day_std_last_day(-7) | end_session_time_of_day_std_last_day(-7) | begin_session_time_of_day_std_last_day(-8) | end_session_time_of_day_std_last_day(-8) | begin_session_time_of_day_std_last_day(-9) | end_session_time_of_day_std_last_day(-9) | begin_session_time_of_day_std_last_day(-10) | end_session_time_of_day_std_last_day(-10) | begin_session_time_of_day_std_last_week(-1) | end_session_time_of_day_std_last_week(-1) | begin_session_time_of_day_std_last_week(-2) | end_session_time_of_day_std_last_week(-2) | begin_session_time_of_day_std_last_week(-3) | end_session_time_of_day_std_last_week(-3) | begin_session_time_of_day_std_last_month(-1) | end_session_time_of_day_std_last_month(-1) | begin_session_time_of_day_std_last_month(-2) | end_session_time_of_day_std_last_month(-2) |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
97cea075a9954326bb0c71b31fcab437 | 2022_06_08 | 2 | churner | 97891.783281 | 2 | False | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 2 | 0 | 0 | 0 | 0 | 2 | 2 | 0 | 0 | 69078.272027 | 76150.867241 | 64659.083961 | 71881.314174 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 23668.677994 | 30816.090708 | 0 | 0 | 0 | 0 | 23668.677994 | 30816.090708 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 64218.863743 | 64113.05582 | 0 | 0 | 0 | 0 | 64218.863743 | 64113.05582 | 0 | 0 | ||||
56d4b533b42740e990ce0aac3bdfcfc6 | 2022_06_08 | 2 | churner | 78252.746837 | 2 | False | 2 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 2 | 0 | 0 | 0 | 0 | 2 | 2 | 0 | 0 | 15081.716635 | 22274.939247 | 0.0 | 0.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 15081.716635 | 22274.939247 | 0 | 0 | 0 | 0 | 15081.716635 | 22274.939247 | 0 | 0 | 50229.649963 | 50263.69293 | 0.0 | 0.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 50229.649963 | 50263.69293 | 0 | 0 | 0 | 0 | 50229.649963 | 50263.69293 | 0 | 0 |
16ca20d622c04f96971ac359cd8f4151 | 2022_06_08 | 2 | churner | 161992.623939 | 3 | False | 2 | 2 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 3 | 0 | 0 | 0 | 0 | 3 | 3 | 0 | 0 | 18328.872222 | 25516.40685 | 73935.322063 | 81085.45611 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 65664.355502 | 72839.42327 | 0 | 0 | 0 | 0 | 65664.355502 | 72839.42327 | 0 | 0 | 53215.18547 | 53190.74877 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 77421.168823 | 77431.646079 | 0 | 0 | 0 | 0 | 77421.168823 | 77431.646079 | 0 | 0 | ||
e14c495dc6544134bd51e7eb7bfd91f4 | 2022_06_08 | 2 | churner | 89544.033403 | 2 | False | 1 | 2 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 2 | 0 | 0 | 0 | 0 | 2 | 2 | 0 | 0 | 76617.072141 | 42513.73832 | 80601.831075 | 0.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 35409.451608 | 42513.73832 | 0 | 0 | 0 | 0 | 35409.451608 | 42513.73832 | 0 | 0 | 58311.032016 | 0.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 58276.37583 | 58311.032016 | 0 | 0 | 0 | 0 | 58276.37583 | 58311.032016 | 0 | 0 |
As you can see, the player_churn
attribute of the example data, is the variable we want the ML model to predict. To automatically train, evaluate, and deploy an ML that predicts this variable, perform the following steps:
- Upload the example data,
./assets/examples/player-churn.csv
to the S3 Bucket created by the deployment:export ACCOUNT=$(aws sts get-caller-identity --query Account --output text) export REGION=$(aws configure get region) export WORKLOAD=$(python3 -c "import constants; print(constants.WORKLOAD_NAME.lower())") export BUCKET=s3://$WORKLOAD-data-$REGION-$ACCOUNT aws s3 cp ./assets/examples/player-churn.csv $BUCKET/raw-data/player-churn.csv
- Open SageMaker Studio Classic IDE, and view the SageMaker Pipelines execution. The pipeline is named after the
WORKLOAD_NAME
variable in theconstants.py
file, e.g.PlayerChurn-AutoMLPipeline
.
NOTE: For more information on how to view the pipeline execution, see the View, Track, and Execute SageMaker Pipelines in SageMaker Studio section of the Amazon SageMaker developer guide.
The pipeline should take approximately 40 minutes to run, and should look as follows once complete:
To review the best model candidates, that are automatically generated during the AutoMLTrainingStep
of the SageMaker Pipeline, perform the following steps:
- Using the SageMaker Studio Classic IDE, view the SageMaker Pipelines execution, and select the
AutoMLTrainingStep
of the pipeline. - Select the
Details
tab for the step and review theJob name
of theInformation
section. - Select the AutoML option in the studio IDE navigation panel, and click on the name of the Autopilot job that matches the
Job name
from the pipeline step. - Review the various automated trails and the respective model details, by clicking on each
Trial name
.
NOTE: For more information on model details, see the View model details section of the Amazon SageMaker developer guide.
A test script, inference_test.py
, has been provided to test deriving player churn insights from the deployed model. To execute the test on a sample data, run the following:
- Using the Cloud9 IDE terminal, change to the
assets
folder:cd ~/environment/player-insights/assets/examples
- Run the test script, supplying the name of the workload endpoint. For example, if the
WORKLOAD_NAME
variable in theconstants.py
file isPlayerChurn
, then the SageMaker Endpoint name isPlayerChurn-Endpoint
python3 churn_inference.py --endpoint-name PlayerChurn-Endpoint
The output from the test script should look as follows:
Using SageMaker Endpoint: PlayerChurn-Endpoint
Sending inference request with test payload ...
SageMaker returned the following response: False
As you can see, the deployed player churn model predicts that, based on the sample player event data, this sample player is NOT predicted to leave the game. At this point, the game client, or game servers can be configured to call the player churn model to make predictions for new users, based on their event telemetry.
Each deployment of the guidance is specific to a unique business case, and the supporting labeled dataset. For each use case, update the constants.py
with the variables specific to the use case and the dataset, and then deploy the CDK application, as shown in the Deployment Steps section.
To update, and generate a newer version of the deployed model with newer data, simply add the updated dataset to the Amazon S3 bucket. The SageMaker Endpoint will be automatically updated with the newer version of the best model.
There are two options for deleting the deployed guidance:
-
Using the AWS console
- Open the AWS CloudFormation console, and select the AWS region into which you have deployed the guidance.
- Select the radio button for the deployed stack, e.g.
PlayerChurn-Stack
. - Click the Delete button to start the stack deletion.
- Click the Delete to confirm stack deletion.
-
Using the CDK CLI
- Using the Cloud9 terminal window, change to the root of the cloned repository:
cd ~/environment/player-insights
- Run the command to delete the CloudFormation stack:
cdk destroy
- When prompted,
Are you sure you want to delete
, entery
to confirm stack deletion.
- Using the Cloud9 terminal window, change to the root of the cloned repository:
Deleting the deployed resources will not delete the Amazon S3 bucket, in order to protect any training data already stored. See the Deleting a bucket section of the Amazon Simple Storage Service user guide for the various ways to delete the S3 bucket. Additionally, deleting the deployed resources will not delete the SageMaker Endpoint. See the Delete Endpoints and Resources section of the Amazon SageMaker developer guide on how to delete the Endpoint, Endpoint Configuration, and Models.