# üéÆ GameRx | App Data Preparation  

**Goal:**  
Prepare small, clean datasets for the Streamlit app.  
No analysis. No modeling. Just organize, filter, and export.

### What This Notebook Does
- Loads the final master dataset  
- Creates lightweight slices for each relief tag  
- Prepares helper files the app needs  
- Cleans column names for consistency  
- Saves everything into the `app_data` folder

### Why This Matters
These files are what the app will read directly.  
They keep loading fast, the UI smooth, and the logic simple.

### Outputs
- `comfort_games.csv`  
- `catharsis_games.csv`  
- `distraction_games.csv`  
- `validation_games.csv`  
- `master_dataset_final.csv` (clean copy)  
- `data_dictionary.csv`

Clean, small, and app-ready.

---

## Table of Contents
1. [Setup & Imports](#setup--imports)  
2. [Load Master Dataset](#load-master-dataset)  
3. [Column Cleanup](#column-cleanup)  
4. [Create Relief Tag Slices](#create-relief-tag-slices)  
    - [Comfort Games](#comfort-games)  
    - [Catharsis Games](#catharsis-games)  
    - [Distraction Games](#distraction-games)  
    - [Validation Games](#validation-games)  
5. [Build Helper Files](#build-helper-files)  
6. [Export App-Ready Files](#export-app-ready-files)  
7. [Quick Review](#quick-review)

---

## 1. Setup & Imports  

#### What this section does  
- Loads the tools we need  
- Keeps everything simple and clean  
- Only uses Pandas and a few basics  

#### Quick note  
Nothing complex here.  
Just getting the notebook ready to work.

In [1]:
# Setup and Imports

import pandas as pd
import numpy as np
import os

# Display settings (optional, helps keep tables readable)
pd.set_option("display.max_columns", None)
pd.set_option("display.width", 1200)

----

## 2. Load Final Master Dataset  

### What this step does  
- Loads the final cleaned dataset  
- Serves as the single source of truth for the app  
- Provides the base for all steps in this notebook  

### Reminder  
Only one file is loaded here.  
`11_master_dataset_final.csv`

In [3]:
# Load the dataset
master_path = r"D:\YVC\YVC Portfolio Implementation\Data Analytics Projects\GameRx Your Digital Dose\02 Data\cleaned\app_data\11_master_dataset_final.csv"
df = pd.read_csv(master_path, low_memory=False)

# Quick shape check to confirm it loaded
df.shape

# Preview
df.head()

Unnamed: 0,AppID,Name,Release date,About the game,Languages,Developers,Publishers,Metacritic score,User score,Positive,Negative,Recommendations,Genres,Tags,genre_list,primary_genre,genre_count,anger_per_100w,anticipation_per_100w,disgust_per_100w,fear_per_100w,joy_per_100w,sadness_per_100w,surprise_per_100w,trust_per_100w,positive_per_100w,negative_per_100w,primary_emotion,emotion_richness,normalized_intensity,relief_tag,hybrid_relief_tag,cluster_label,archetype,Average playtime forever,Average playtime two weeks,Median playtime forever,Median playtime two weeks,Categories,Release date_hyb,About the game_hyb,Languages_hyb,Metacritic score_hyb,User score_hyb,Positive_hyb,Negative_hyb,Recommendations_hyb,Average playtime forever_hyb,Average playtime two weeks_hyb,Median playtime forever_hyb,Median playtime two weeks_hyb,Developers_hyb,Publishers_hyb,Categories_hyb,Genres_hyb,Tags_hyb,genre_list_hyb,primary_genre_hyb,genre_count_hyb,Name_dup,Name_review,Review,review_score,review_votes,anger,anticipation,disgust,fear,joy,sadness,surprise,trust,positive,negative,review_words,affect_terms,affect_coverage_pct,primary_genre_relief,genre_list_emotion,primary_genre_emotion,genre_count_emotion,Name_review_emotion,Review_emotion,review_score_emotion,review_votes_emotion,review_clean,review_length,anger_emotion,anticipation_emotion,disgust_emotion,fear_emotion,joy_emotion,sadness_emotion,surprise_emotion,trust_emotion,positive_emotion,negative_emotion,review_words_emotion,affect_terms_emotion,affect_coverage_pct_emotion,anger_per_100w_emotion,anticipation_per_100w_emotion,disgust_per_100w_emotion,fear_per_100w_emotion,joy_per_100w_emotion,sadness_per_100w_emotion,surprise_per_100w_emotion,trust_per_100w_emotion,positive_per_100w_emotion,negative_per_100w_emotion,primary_emotion_emotion,emotion_richness_emotion,normalized_intensity_emotion,primary_genre_g,relief_tag_cluster,anger_per_100w_cluster,anticipation_per_100w_cluster,disgust_per_100w_cluster,fear_per_100w_cluster,joy_per_100w_cluster,sadness_per_100w_cluster,surprise_per_100w_cluster,trust_per_100w_cluster,positive_per_100w_cluster,negative_per_100w_cluster,game_display_name,AppID_str,description_preview,emotion_relief_combo,missing_metadata_flag
0,20200,Galactic Bowling,10/21/2008,Galactic Bowling is an exaggerated and stylize...,['English'],Perpetual FX Creative,Perpetual FX Creative,0,0,6,11,30,"Casual,Indie,Sports","Indie,Casual,Sports,Bowling","['Casual', 'Indie', 'Sports']",Casual,3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,,,,Comfort,1,Balanced Mixers,0,0,0,0,"Single-player,Multi-player,Steam Achievements,...",10/21/2008,Galactic Bowling is an exaggerated and stylize...,['English'],0,0,6,11,30,0,0,0,0,Perpetual FX Creative,Perpetual FX Creative,"Single-player,Multi-player,Steam Achievements,...","Casual,Indie,Sports","Indie,Casual,Sports,Bowling","['Casual', 'Indie', 'Sports']",Casual,3,,,,,,,,,,,,,,,,,,,Casual,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Casual,Comfort,1.593892,3.160882,0.869213,1.849752,3.367774,1.6556,1.720136,3.096474,6.298978,3.139139,Galactic Bowling,20200,Galactic Bowling is an exaggerated and stylize...,nan_Comfort,False
1,655370,Train Bandit,10/12/2017,THE LAW!! Looks to be a showdown atop a train....,"['English', 'French', 'Italian', 'German', 'Sp...",Rusty Moyher,Wild Rooster,0,0,53,5,12,"Action,Indie","Indie,Action,Pixel Graphics,2D,Retro,Arcade,Sc...","['Action', 'Indie']",Action,2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,,,,Catharsis,1,Balanced Mixers,0,0,0,0,"Single-player,Steam Achievements,Full controll...",10/12/2017,THE LAW!! Looks to be a showdown atop a train....,"['English', 'French', 'Italian', 'German', 'Sp...",0,0,53,5,12,0,0,0,0,Rusty Moyher,Wild Rooster,"Single-player,Steam Achievements,Full controll...","Action,Indie","Indie,Action,Pixel Graphics,2D,Retro,Arcade,Sc...","['Action', 'Indie']",Action,2,,,,,,,,,,,,,,,,,,,Action,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Action,Catharsis,1.187622,2.922583,0.667616,1.436899,3.037823,1.006064,1.38561,2.418105,5.151687,2.075129,Train Bandit,655370,THE LAW!! Looks to be a showdown atop a train....,nan_Catharsis,False
2,1732930,Jolt Project,11/17/2021,Jolt Project: The army now has a new robotics ...,"['English', 'Portuguese - Brazil']",Campi√£o Games,Campi√£o Games,0,0,0,0,0,"Action,Adventure,Indie,Strategy",,"['Action', 'Adventure', 'Indie', 'Strategy']",Action,4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,,,,Catharsis,1,Balanced Mixers,0,0,0,0,Single-player,11/17/2021,Jolt Project: The army now has a new robotics ...,"['English', 'Portuguese - Brazil']",0,0,0,0,0,0,0,0,0,Campi√£o Games,Campi√£o Games,Single-player,"Action,Adventure,Indie,Strategy",,"['Action', 'Adventure', 'Indie', 'Strategy']",Action,4,,,,,,,,,,,,,,,,,,,Action,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Action,Catharsis,1.187622,2.922583,0.667616,1.436899,3.037823,1.006064,1.38561,2.418105,5.151687,2.075129,Jolt Project,1732930,Jolt Project: The army now has a new robotics ...,nan_Catharsis,True
3,1355720,Henosis‚Ñ¢,7/23/2020,HENOSIS‚Ñ¢ is a mysterious 2D Platform Puzzler w...,"['English', 'French', 'Italian', 'German', 'Sp...",Odd Critter Games,Odd Critter Games,0,0,3,0,0,"Adventure,Casual,Indie","2D Platformer,Atmospheric,Surreal,Mystery,Puzz...","['Adventure', 'Casual', 'Indie']",Adventure,3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,,,,Validation,1,Balanced Mixers,0,0,0,0,"Single-player,Full controller support",7/23/2020,HENOSIS‚Ñ¢ is a mysterious 2D Platform Puzzler w...,"['English', 'French', 'Italian', 'German', 'Sp...",0,0,3,0,0,0,0,0,0,Odd Critter Games,Odd Critter Games,"Single-player,Full controller support","Adventure,Casual,Indie","2D Platformer,Atmospheric,Surreal,Mystery,Puzz...","['Adventure', 'Casual', 'Indie']",Adventure,3,,,,,,,,,,,,,,,,,,,Adventure,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Adventure,Validation,1.295086,3.343911,0.873413,1.24501,3.291786,1.061535,1.316523,2.516317,5.93775,2.443059,Henosis‚Ñ¢,1355720,HENOSIS‚Ñ¢ is a mysterious 2D Platform Puzzler w...,nan_Validation,False
4,1139950,Two Weeks in Painland,2/3/2020,ABOUT THE GAME Play as a hacker who has arrang...,"['English', 'Spanish - Spain']",Unusual Games,Unusual Games,0,0,50,8,17,"Adventure,Indie","Indie,Adventure,Nudity,Violent,Sexual Content,...","['Adventure', 'Indie']",Adventure,2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,,,,Validation,1,Balanced Mixers,0,0,0,0,"Single-player,Steam Achievements",2/3/2020,ABOUT THE GAME Play as a hacker who has arrang...,"['English', 'Spanish - Spain']",0,0,50,8,17,0,0,0,0,Unusual Games,Unusual Games,"Single-player,Steam Achievements","Adventure,Indie","Indie,Adventure,Nudity,Violent,Sexual Content,...","['Adventure', 'Indie']",Adventure,2,,,,,,,,,,,,,,,,,,,Adventure,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Adventure,Validation,1.295086,3.343911,0.873413,1.24501,3.291786,1.061535,1.316523,2.516317,5.93775,2.443059,Two Weeks in Painland,1139950,ABOUT THE GAME Play as a hacker who has arrang...,nan_Validation,False


---

## 3. Column Cleanup  

### What this step does  
- Cleans and standardizes column names  
- Fixes small formatting issues  
- Keeps the dataset simple for the app  

### Why this matters  
Clean columns reduce errors  
and keep the next steps smooth.

In [4]:
# Clean column names
# Lowercase, replace spaces with underscores, remove stray characters
df.columns = (
    df.columns
    .str.strip()
    .str.lower()
    .str.replace(" ", "_")
    .str.replace("__", "_")
)

# Quick check of cleaned column names
df.columns.tolist()[:25]

['appid',
 'name',
 'release_date',
 'about_the_game',
 'languages',
 'developers',
 'publishers',
 'metacritic_score',
 'user_score',
 'positive',
 'negative',
 'recommendations',
 'genres',
 'tags',
 'genre_list',
 'primary_genre',
 'genre_count',
 'anger_per_100w',
 'anticipation_per_100w',
 'disgust_per_100w',
 'fear_per_100w',
 'joy_per_100w',
 'sadness_per_100w',
 'surprise_per_100w',
 'trust_per_100w']

### üîç Results: Column Cleanup  
The column names are now standardized and easier to work with.

- All names are lowercase  
- Spaces were replaced with underscores  
- Extra characters were removed  
- Formatting is consistent across the dataset  

This makes filtering, slicing, and exporting more reliable in the next steps.

---

## 4. Create Relief Tag Slices  

### Goal  
Build small filtered datasets for each relief category.  
These files will be used directly in the app.

### Relief Groups Created  
- Comfort  
- Catharsis  
- Distraction  
- Validation  

### Why this matters  
Each group loads faster in the app  
and keeps the recommendation flow simple.

In [7]:
# Create relief tag slices (updated to match capitalized tags)

comfort_df = df[df["hybrid_relief_tag"] == "Comfort"].copy()
catharsis_df = df[df["hybrid_relief_tag"] == "Catharsis"].copy()
distraction_df = df[df["hybrid_relief_tag"] == "Distraction"].copy()
validation_df = df[df["hybrid_relief_tag"] == "Validation"].copy()

# Quick checks
comfort_df.shape, catharsis_df.shape, distraction_df.shape, validation_df.shape

((35078, 130), (75804, 130), (3801, 130), (22830, 130))

### üîç Results: Relief Tag Slices  

The relief groups were created successfully.  
Each group now contains the correct number of games.

- Comfort: 35,078 games  
- Catharsis: 75,804 games  
- Distraction: 3,801 games  
- Validation: 22,830 games  

The slices are now ready to be cleaned and exported in the next steps.

---

## 5. Build Helper Files  

### Purpose  
Create small support files the app can use for quick lookups.  
These files help with labels, descriptions, and basic structure.

### Files created in this step  
- A clean data dictionary  
- Optional lightweight copies of key columns  
- Any small reference tables needed by the app  

### Why this matters  
Helper files keep the app fast  
and make the interface easier to build and maintain.

In [10]:
# Build Helper Files

# Create a simple data dictionary
data_dictionary = pd.DataFrame({
    "column_name": df.columns,
    "description": [""] * len(df.columns)   # descriptions can be filled in later if needed
})

# Quick preview
data_dictionary.head()


# Create a lightweight master copy using only the columns that exist in the dataset

light_master = df[[
    # Core identifiers
    "appid",
    "appid_str",
    "name",
    "game_display_name",

    # Metadata
    "primary_genre",
    "genre_count",
    "average_playtime_forever",
    "median_playtime_forever",
    "metacritic_score",
    "user_score",

    # Relief and emotion tags
    "hybrid_relief_tag",
    "primary_emotion",
    "emotion_relief_combo",

    # Emotion intensities
    "joy_per_100w",
    "trust_per_100w",
    "sadness_per_100w",
    "fear_per_100w",
    "anger_per_100w",
    "anticipation_per_100w",
    "disgust_per_100w",
    "surprise_per_100w",

    # Text fields
    "description_preview",
    "about_the_game"
]].copy()

# Quick preview
light_master.head()

Unnamed: 0,appid,appid_str,name,game_display_name,primary_genre,genre_count,average_playtime_forever,median_playtime_forever,metacritic_score,user_score,hybrid_relief_tag,primary_emotion,emotion_relief_combo,joy_per_100w,trust_per_100w,sadness_per_100w,fear_per_100w,anger_per_100w,anticipation_per_100w,disgust_per_100w,surprise_per_100w,description_preview,about_the_game
0,20200,20200,Galactic Bowling,Galactic Bowling,Casual,3,0,0,0,0,Comfort,,nan_Comfort,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,Galactic Bowling is an exaggerated and stylize...,Galactic Bowling is an exaggerated and stylize...
1,655370,655370,Train Bandit,Train Bandit,Action,2,0,0,0,0,Catharsis,,nan_Catharsis,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,THE LAW!! Looks to be a showdown atop a train....,THE LAW!! Looks to be a showdown atop a train....
2,1732930,1732930,Jolt Project,Jolt Project,Action,4,0,0,0,0,Catharsis,,nan_Catharsis,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,Jolt Project: The army now has a new robotics ...,Jolt Project: The army now has a new robotics ...
3,1355720,1355720,Henosis‚Ñ¢,Henosis‚Ñ¢,Adventure,3,0,0,0,0,Validation,,nan_Validation,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,HENOSIS‚Ñ¢ is a mysterious 2D Platform Puzzler w...,HENOSIS‚Ñ¢ is a mysterious 2D Platform Puzzler w...
4,1139950,1139950,Two Weeks in Painland,Two Weeks in Painland,Adventure,2,0,0,0,0,Validation,,nan_Validation,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,ABOUT THE GAME Play as a hacker who has arrang...,ABOUT THE GAME Play as a hacker who has arrang...


### üîç Results: Helper Files Created  

The lightweight master copy was created successfully.  
The preview shows the key columns needed for the app.

- Core identifiers are included  
- Primary genre and genre count are visible  
- Relief tags are present and match each game  
- Emotion intensity values loaded correctly  
- Description fields are available for the app UI  

This smaller table will help the app load faster  
and keep the interface simple and responsive.

---

## 6. Export App-Ready Files  

### Goal  
Save the final cleaned files that the app will load directly.

### Files exported in this step  
- Relief tag slices  
- Lightweight master file  
- Data dictionary  

### Why this matters  
Exporting these files keeps the app fast, organized, and easy to maintain.

In [11]:
# Export App-Ready Files (Notebook 12)

# Set export path
export_path = r"D:\YVC\YVC Portfolio Implementation\Data Analytics Projects\GameRx Your Digital Dose\02 Data\cleaned\app_data"

# ---------------------------------
# Export relief tag slices
# ---------------------------------
comfort_df.to_csv(os.path.join(export_path, "12_comfort_games.csv"), index=False)
catharsis_df.to_csv(os.path.join(export_path, "12_catharsis_games.csv"), index=False)
distraction_df.to_csv(os.path.join(export_path, "12_distraction_games.csv"), index=False)
validation_df.to_csv(os.path.join(export_path, "12_validation_games.csv"), index=False)

# ---------------------------------
# Export lightweight master file
# ---------------------------------
light_master.to_csv(os.path.join(export_path, "12_light_master_dataset.csv"), index=False)

# ---------------------------------
# Export data dictionary
# ---------------------------------
data_dictionary.to_csv(os.path.join(export_path, "12_data_dictionary.csv"), index=False)

# Confirmation
print("All Notebook 12 files exported successfully.")

All Notebook 12 files exported successfully.


---

## 7. Quick Review  

Everything for the app is now organized and exported.  
The files are clean, small, and ready to load instantly.

### What was prepared  
- relief tag slices for all four categories  
- a lightweight master dataset for fast app performance  
- a data dictionary for easy reference  
- consistent column names and structure  
- clean text previews for the display

### Why this matters  
Having these files in the app_data folder keeps the app simple and stable.  
It creates a clear data flow the app can rely on without extra processing.

### ‚û°Ô∏è Next Step  
Move into `13_build_recommendation_engine.ipynb`  

This will:  
- generate fit percentages  
- create recommendation logic  
- add explanation text  
- build the ranking functions the app will use  

Simple, organized, and ready for the next phase.