# How GPU Specifications Influence Gaming Performance

### 1. Introduction  

The relationship between hardware and gaming performance is inherently complex, shaped by multiple factors including hardware specifications, display resolution, graphical settings, and the characteristics of individual games. Within the hardware domain, both the central processing unit (CPU) and the graphics processing unit (GPU) contribute significantly to performance outcomes. However, the GPU plays the dominant role in modern gaming, as it is primarily responsible for rendering complex visuals and executing the parallelized computations required for real-time graphics.

This study isolates the GPU and investigates how its technical specifications influence frames per second (FPS), a common measure of gaming performance. By focusing on GPU-level features, the analysis aims to clarify the specific contribution of graphics hardware while controlling for the broader complexity of full system configurations.

### 2. Research Question

    How do GPU specifications influence gaming performance (FPS), and how does this relationship change across different resolutions and graphics settings?

### 3. Data and Features

The dataset used in this analysis includes GPU hardware specifications alongside measured FPS values across multiple games, resoultions, and graphical settings. FPS benchmarks were derived from https://www.kaggle.com/datasets/baraazaid/gpus-fps-on-games?resource=download and the hardware specifications were scraped from https://www.techpowerup.com/.

- Hardware Features (continuous variables)
    - Process Size  
    - Transistors  
    - Density  
    - Die_size  
    - Base Clock  
    - Memory Size  
    - Memory Type  
    - Memory Bus  
    - Bandwidth  
    - Shading Units  
    - Texture Mapping Units (TMUS)
    - Render Output Units (ROPs)  
    - L1_cache  
    - L2_cache  
    - Directx  
    - Thermal Design Power (TDP)  
    - Memory Clock  
    - Fp32  
    - Fp64  
    - Pixel Rate  
    - Texture Rate  

- Categorical Features
    - Architecture (build type)
    - Memory Type
    - Game Title
    - Grapic Setting (low, med, high, ultra)
    - Resolution (e.g. 1920x1080)

- Target Variable (continious)
    - Average Frames Per Second (Hz)

### 4. Preprocessing and Feature Engineering

1. **Raw Data Sources**
    - **Hardware Specifications Table**
        - Raw data scraped for each GPU Name in the FPS Benchmark dataset
        - Data stored in a dictionary with one entry per GPU and corresponding specifications
    - **Benchmark FPS Table**
        - Downloaded raw json from Kaggle

2. **Feature Selection and Cleaning per Table**
    - **Hardware Specifications**
        - Removed several unwanted features (kept relavant performance modeling features)
        - Extracted units from all columns and attached to column headers
        - Converted all numeric columns to float
    - **Benchmark FPS Table**
        - Extracted unique GPU Names 
        - Included meaningful variables (*GPU_Name*, *Game_Name*, *Avg_FPS*, *Setting*, *Resolution*)
        - GPU Name set as index with one entry for each combination of (*Game_Name* X *Setting* X *Resolution*) with *Avg_FPS* listed
        - ~820 observations for each GPU

3. **Merging Tables**
    - Inner joined hardware specifications table on benchmark table
    - Set GPU Name as the index

4. **Feature Engineering**
    - Kept categorical variables (*architecture*, *memory_type*, *Game_Name*, *Resolution*, *Setting*) as strings
    - Limited *architecture* and *memory type* to the top 5 most common with all others grouped as "other"
    - Categorized separate feature lists
        - Numerical hardware features
        - Categorical hardware features
        - Software features (*Resolution*, *Setting*)
        - All hardware features
        - All features
    - Applied one-hot encoding for categorical variables
    - Dropped Avg_FPS (outcome variable) from predictors
    - Dropped Game_Name from predictors
        - The focus was to analyze how hardware specifications, *resolution*, and *setting* are correlated with performance

### 5. Model Training

1. **Train/Test Splitting**
    - The merged dataset was split into training (80%) and test (20%) subsets. StandScalar was applied to ensure balanced representation of variables

2. **Modeling Approaches**
    - Two model families were evaluated: 
    - linear regression as a baseline 
    - XGBoost as a non-linear, tree-based model robust to feature interactions

3. **Hyperparameter Optimization**
    - Hyperparameters for the XGBoost model (e.g., learning rate, depth, subsampling) were tuned using five-fold randomized cross-validation on the training set
    - The linear model was used in its default form

3. **Evaluation Metrics**
    - Model performance was assessed on the test set using R² and mean squared error (MSE)
    - To enhance interpretability, feature importance rankings were generated for tree-based models
    - SHAP values were used to quantify each feature’s contribution to FPS predictions

### 6. Analysis

The analysis was structured to evaluate how different subsets of features influenced predictive performance. For each subset, results are reported for both the baseline linear regression model and the XGBoost model, allowing for direct comparison between linear and non-linear approaches.

1. **Hardware Specifications only**
    - This feature set included all numerical hardware variables as well as categorical descriptors such as architecture and memory_type. When trained on this subset, both models produced similar results, with R² values of 0.4132 for linear regression and 0.4241 for XGBoost. These outcomes suggest that hardware specifications alone provide a moderate but incomplete explanation of FPS performance.

    ![Feature importance for hardware-only model](images/lin_reg_cof_hardware.png)  
   *Figure 1. Feature importances (Linear Regression) for hardware-only features, highlighting memory types, architecture types, and fp64 as the most predictive.*  

    ![Feature importance for hardware-only model](images/xg_boost_importance_hardware.png)  
   *Figure 2. Feature importances (XGBoost) for hardware-only features, highlighting pixel_rate and fp32 as the most predictive.*  

2. **Game_Name only**
    - Because FPS outcomes are also dependent on the computational demands of individual games, the predictive value of Game_Name was analyzed independently. Using this feature alone, the models achieved R² values of 0.1854 (Linear Regression) and 0.1790 (XG Boost), indicating that while game identity contributes to FPS variance, it is insufficient as a standalone predictor. So for the scope of this question, this variable was dropped in the overall analysis.

3. **Overall Analysis (All features except Game_Name)**
    - The linear regression model was able to capture a substantial portion of the variance in FPS, achieving an R² value of 0.6236. This indicates that hardware specifications, resolution, and setting collectively explain over 60% of the observed variability, marking a significant improvement over using hardware or game identity alone.
    - The XGBoost model provided a stronger fit. With default parameters, XGBoost achieved an R² of 0.7007, outperforming linear regression by nearly eight percentage points. After hyperparameter tuning, the R² improved slightly further to 0.7034, suggesting that most of the performance gains were realized even before tuning.
    - These results highlight two important findings: first, that combining hardware and context-related features substantially improves predictive accuracy, and second, that tree-based models such as XGBoost are more effective than linear models at capturing complex, non-linear relationships in the data. Nonetheless, the relatively small gain from tuning suggests that the model is already approaching the ceiling imposed by the available features, leaving some variance unexplained by hardware and resolution alone.
    - While Game_Name alone produced only limited predictive power (R² ≈ 0.18), excluding it from the overall feature set leaves roughly 18% of variance unexplained. This reflects the fact that FPS is highly game-dependent, and omitting game identity inevitably reduces the model’s ability to fully capture performance differences across titles.

4. **Model Interpretability (SHAP analysis)**
    - To better understand the drivers of FPS predictions in the tuned XGBoost model, SHAP (SHapley Additive exPlanations) analysis was conducted.
    - **Global Feature Importance**
        - A bar chart of mean absolute SHAP values (Figure 3) showed that setting, resolution, pixel_rate, fp32 throughput, transistor count, and TDP were the strongest contributors to model predictions. The prominence of setting and resolution is intuitive, given their direct role in FPS performance. More notable are the hardware-related factors—particularly fp32 throughput, transistors, and TDP—which highlight the importance of raw computational power and energy capacity in explaining frame rate variability.
    ![SHAP Values XG Boost](images/xg_boost_SHAP_bar.png)  
   *Figure 3. Mean absolute SHAP values for the tuned XGBoost model, showing the strongest global contributors.*  

    - **Distribution of Contributions**
        - The SHAP beeswarm plot (Figure 4) complements the global ranking by illustrating the spread of feature effects across individual samples. Features such as pixel_rate and fp32 throughput exhibit wide variation, indicating strong context-dependent effects. In contrast, features like TDP show more stable contributions, reinforcing their consistent role in performance prediction.

    ![SHAP Values XG Boost](images/xg_boost_SHAP_beeswarm.png)  
   *Figure 4. SHAP beeswarm plot for the tuned XGBoost model, showing the distribution of feature contributions across observations.*  

    - **Contextual Trends by Resolution and Setting**
        - - To explore how feature importance changes under different usage contexts, SHAP values were averaged across Resolution and Setting categories. This produced trend plots (Figures 5 and 6) that highlight systematic shifts in predictive drivers.
        - **Resolution Trends**
            - At lower resolutions (1920×1080), pixel_rate and fp32 throughput are the dominant contributors to FPS prediction. As resolution increases, the importance of transistors and TDP rises, reflecting the growing computational and energy demands at higher pixel densities. Interestingly, pixel_rate peaks at 3440×1440 before declining at 4K, suggesting a bottleneck effect where other resources become limiting.
        - **Settings Trends**
            - Across graphical settings (low → ultra), the importance of pixel_rate and fp32 throughput steadily declines. This suggests that at more demanding settings, raw compute throughput alone is insufficient; instead, aggregate hardware capacity (transistors, TDP) plays a stronger role.

    ![SHAP Values XG Boost](images/xg_boost_SHAP_MA_lines.png)  
   *Figure 5. Mean absolute SHAP values aggregated by resolution (left) and setting (right)*  
