# CSCA 5622 Final Project - Consumer PC Hardware Trends and Predictions
### By Moshiur Howlader

## Introduction

In the digital age, computer hardware is ubiquitous, and its performance continues to improve year by year. Intel's co-founder provided valuable insight into how computers would evolve, known as **Moore's Law** ([see Wikipedia](https://en.wikipedia.org/wiki/Moore%27s_law)). This observation states that the number of transistors in an integrated circuit (IC) doubles approximately every two years. The chart below illustrates the trend from 1970 to 2020:

<br><br>
<img src="../images/moores_law_transistor_count_1970_2020.png" alt="Transistor count over time" width="1200" height="800">

Based on Moore's Law, consumers might expect to get computer hardware with double the transistors every two years—leading to predictable and consistent increases in computing power. However, the reality is far more complex. As the number of transistors crammed into a fixed area increases, **quantum physics** begins to interfere, imposing physical limitations. These constraints prevent engineers from continuing to follow Moore's Law indefinitely. According to [nano.gov](https://www.nano.gov/nanotech-101/what/nano-size), the average size of a gold atom is 1/3 nm! Clearly, there is a limit to how many transistors can be packed into computer parts. Below are the trends in chip lithography size according to Wikipedia:

| Feature Size | Year |
|--------------|------|
| 20 μm        | 1968 |
| 10 μm        | 1971 |
| 6 μm         | 1974 |
| 3 μm         | 1977 |
| 1.5 μm       | 1981 |
| 1 μm         | 1984 |
| 800 nm       | 1987 |
| 600 nm       | 1990 |
| 350 nm       | 1993 |
| 250 nm       | 1996 |
| 180 nm       | 1999 |
| 130 nm       | 2001 |
| 90 nm        | 2003 |
| 65 nm        | 2005 |
| 45 nm        | 2007 |
| 32 nm        | 2009 |
| 28 nm        | 2010 |
| 22 nm        | 2012 |
| 14 nm        | 2014 |
| 10 nm        | 2016 |
| 7 nm         | 2018 |
| 5 nm         | 2020 |
| 3 nm         | 2022 |
| 2 nm         | ~2025 (Future) |

According to Jensen Huang, the CEO of Nvidia, **Moore's Law is dead** ([TechSpot article](https://www.techspot.com/news/96094-nvidia-jensen-huang-once-again-claims-moore-law.html)). This statement seems reasonable given the physical limitations of current chip designs. As the rate of improvement in transistor count decreases year over year, will consumers start paying more for diminishing performance gains?

## Why Should Consumers Care About the Death of Moore's Law?

With the decline of Moore's Law, we can expect fewer improvements in transistor density in upcoming generations. This poses a concern for consumers, as we may start paying more for diminishing returns on performance. As traditional computing approaches its physical limits, incremental improvements will become smaller, potentially benefiting corporations more than consumers. This could lead to a scenario where consumers pay more for fewer benefits, which is undesirable.

## The Economic Reality Today

Inflation has steadily eroded purchasing power in the USA over the last 50 years. As inflation rises, the real cost of consumer goods, including technology, increases, affecting affordability. Here are links to inflation-related data:

- [Purchasing power of the US dollar over time](https://elements.visualcapitalist.com/purchasing-power-of-the-u-s-dollar-over-time/)
- [America's growing rent burden](https://www.axios.com/2023/05/22/americas-growing-rent-burden)

## Purpose of This Project

This project aims to answer the following key questions:

1. What are the trends in CPU and GPU parts over the past 20 years?
2. Is the price-to-performance ratio of these parts keeping up? Are consumers getting a fair deal compared to 10 to 20 years ago?
3. Can we predict the performance of next-gen, unreleased CPU and GPU parts using supervised machine learning models?


## Data Collection & Description

The data for both CPU/GPU was collected from:

Note that various other sources were considered but was difficult to scrape/obtain or the data quality was not thorough enough. Hence they were skipped for the purposes of data source.
- https://www.hwcompare.com/
- https://www.userbenchmark.com/Software
- https://www.tomshardware.com/reviews/gpu-hierarchy,4388.html
- https://www.tomshardware.com/reviews/cpu-hierarchy,4312.html

The script used to collect them is below (uncomment entire code to run):

In [4]:
import os
import requests
import time

# Define the relative paths to save HTML files for GPU and CPU
gpu_html_directory = os.path.join('..', 'data', 'gpu')
cpu_html_directory = os.path.join('..', 'data', 'cpu')

# Create the directories if they don't exist
for directory in [gpu_html_directory, cpu_html_directory]:
    if not os.path.exists(directory):
        os.makedirs(directory)

# Base URLs for TechPowerUp GPU and CPU Specs by year and manufacturer
gpu_base_url = 'https://www.techpowerup.com/gpu-specs/?mfgr={}&released={}&sort=name'
cpu_base_url = 'https://www.techpowerup.com/cpu-specs/?mfgr={}&released={}&sort=name'
cpu_mobile_filter_url = 'https://www.techpowerup.com/cpu-specs/?mfgr={}&released={}&mobile={}&sort=name'

# Intel hard-coded queries for 2023 (socket types)
intel_2023_queries = [
    'https://www.techpowerup.com/cpu-specs/?mfgr=Intel&released=2023&mobile=No&socket=Intel%20BGA%202579&sort=name',
    'https://www.techpowerup.com/cpu-specs/?mfgr=Intel&released=2023&mobile=No&socket=Intel%20Socket%201700&sort=name',
    'https://www.techpowerup.com/cpu-specs/?mfgr=Intel&released=2023&mobile=No&socket=Intel%20Socket%204677&sort=name'
]

intel_server_queries = {
    2021: [
        'https://www.techpowerup.com/cpu-specs/?mfgr=Intel&released=2021&mobile=No&server=No&sort=name',
        'https://www.techpowerup.com/cpu-specs/?mfgr=Intel&released=2021&mobile=No&server=Yes&sort=name'
    ],
    2014: [
        'https://www.techpowerup.com/cpu-specs/?mfgr=Intel&released=2014&mobile=No&server=No&sort=name',
        'https://www.techpowerup.com/cpu-specs/?mfgr=Intel&released=2014&mobile=No&server=Yes&sort=name'
    ],
    2012: [
        'https://www.techpowerup.com/cpu-specs/?mfgr=Intel&released=2012&mobile=No&server=No&sort=name',
        'https://www.techpowerup.com/cpu-specs/?mfgr=Intel&released=2012&mobile=No&server=Yes&sort=name'
    ]
}

# List of years from 2004 to 2024
years = list(range(2004, 2024 + 1))
gpu_manufacturers = ['AMD', 'Intel', 'NVIDIA']
cpu_manufacturers = ['AMD', 'Intel']

# List of years with more than 100 CPUs for Intel and AMD, techpowerup limits query result to 100 max
intel_cpu_years_over_100 = [2024, 2023, 2021, 2015, 2014, 2013, 2012, 2011, 2010]
amd_cpu_years_over_100 = [2005, 2023]

# Intel filtering for years with more than 100 CPUs and mobile=no
intel_additional_filters = {
    2023: {
        'mobile_no': ['Intel BGA 2579', 'Intel BGA 1700', 'Intel BGA 4677']
    },
    2021: {
        'mobile_no': ['server_yes', 'server_no']
    },
    2014: {
        'mobile_no': ['server_yes', 'server_no']
    },
    2012: {
        'mobile_no': ['server_yes', 'server_no']
    }
}

# Helper function to generate filenames based on year, spec type, and manufacturer
def get_filename_from_year_and_type(year, spec_type, manufacturer=None, mobile=None, filter_type=None):
    file_name = f'{year}_{manufacturer}_{spec_type}_database_TechPowerUp.html'
    if mobile is not None:
        file_name = file_name.replace('.html', f'_{mobile}_mobile.html')
    if filter_type:
        file_name = file_name.replace('.html', f'_{filter_type}.html')
    return file_name

# Function to download HTML for a specific year and manufacturer (handles GPU)
def download_html_for_year_and_manufacturer(year, manufacturer, spec_type, base_url, directory):
    full_url = base_url.format(manufacturer, year)
    file_name = get_filename_from_year_and_type(year, spec_type, manufacturer)
    file_path = os.path.join(directory, file_name)

    if os.path.exists(file_path):
        print(f"{file_name} already exists. Skipping download.")
        return True

    try:
        print(f"Downloading {full_url}...")
        response = requests.get(full_url, timeout=10)
        if response.status_code == 200:
            with open(file_path, 'w', encoding='utf-8') as file:
                file.write(response.text)
            print(f"Saved {file_name}")
            return True
        else:
            print(f"Error downloading {full_url}: {response.status_code}")
            return False
    except requests.exceptions.Timeout:
        print(f"Timeout error for {year} - {manufacturer}. Retrying...")
        return False
    except Exception as e:
        print(f"Failed to download {full_url}: {e}")
        return False

# Function to download HTML for Intel 2023 using hard-coded queries
def download_html_for_intel_2023(directory):
    for url in intel_2023_queries:
        # Extract the socket name to use in the filename
        socket_name = url.split('socket=')[1].replace('%20', '_')
        file_name = f'2023_Intel_CPU_database_TechPowerUp_No_mobile_{socket_name}.html'
        file_path = os.path.join(directory, file_name)

        if os.path.exists(file_path):
            print(f"{file_name} already exists. Skipping download.")
            continue

        try:
            print(f"Downloading {url}...")
            response = requests.get(url, timeout=10)
            if response.status_code == 200:
                with open(file_path, 'w', encoding='utf-8') as file:
                    file.write(response.text)
                print(f"Saved {file_name}")
            else:
                print(f"Error downloading {url}: {response.status_code}")
        except requests.exceptions.Timeout:
            print(f"Timeout error for Intel 2023 socket query {url}. Retrying...")
        except Exception as e:
            print(f"Failed to download {url}: {e}")

# Function to download HTML for CPU with mobile and additional filters
def download_html_for_year_and_manufacturer_with_filters(year, manufacturer, spec_type, base_url, directory, mobile=None, filter_type=None):
    mobile_filter = 'Yes' if mobile else 'No' if mobile is not None else ''
    additional_filter = f'&socket={filter_type}' if filter_type and 'socket' in filter_type else f'&server={filter_type}' if filter_type and 'server' in filter_type else ''
    
    full_url = base_url.format(manufacturer, year, mobile_filter) + additional_filter
    file_name = get_filename_from_year_and_type(year, spec_type, manufacturer, mobile_filter, filter_type)
    file_path = os.path.join(directory, file_name)

    if os.path.exists(file_path):
        print(f"{file_name} already exists. Skipping download.")
        return True

    try:
        print(f"Downloading {full_url}...")
        response = requests.get(full_url, timeout=10)
        if response.status_code == 200:
            with open(file_path, 'w', encoding='utf-8') as file:
                file.write(response.text)
            print(f"Saved {file_name}")
            return True
        else:
            print(f"Error downloading {full_url}: {response.status_code}")
            return False
    except requests.exceptions.Timeout:
        print(f"Timeout error for {year} - {manufacturer}. Retrying...")
        return False
    except Exception as e:
        print(f"Failed to download {full_url}: {e}")
        return False

# Function to download GPU specs
def download_gpu_specs():
    idx = 0
    retries = 0
    total_iterations = len(years) * len(gpu_manufacturers)
    iteration_count = 0

    while idx < len(years):
        year = years[idx]

        for manufacturer in gpu_manufacturers:
            success = download_html_for_year_and_manufacturer(year, manufacturer, 'GPU', gpu_base_url, gpu_html_directory)
            iteration_count += 1

            if success:
                retries = 0  # Reset retries after successful download
            else:
                retries += 1
                if retries >= 3:
                    print(f"Skipping year {year} - {manufacturer} after 3 failed attempts.")
                    retries = 0  # Reset retries for the next year
                else:
                    print(f"Retrying download for {manufacturer} year {year} due to error...")
                    time.sleep(1)  # Wait before retrying
                    iteration_count -= 1  # Adjust for retry

            time.sleep(1)  # Delay to avoid overwhelming the server

            # Pause every 7 iterations
            if iteration_count % 7 == 0 and iteration_count != 0:
                print(f"Pausing for after {iteration_count} GPU iterations...")
                time.sleep(0.5)

        idx += 1  # Move to the next year

# Function to download HTML for Intel 2021, 2014, and 2012 using hard-coded queries
def download_html_for_intel_with_server_filters(year, directory):
    if year not in intel_server_queries:
        return
    
    for url in intel_server_queries[year]:
        # Extract the server type from the URL for the filename
        server_type = "server_no" if "server=No" in url else "server_yes"
        file_name = f'{year}_Intel_CPU_database_TechPowerUp_No_mobile_{server_type}.html'
        file_path = os.path.join(directory, file_name)

        if os.path.exists(file_path):
            print(f"{file_name} already exists. Skipping download.")
            continue

        try:
            print(f"Downloading {url}...")
            response = requests.get(url, timeout=10)
            if response.status_code == 200:
                with open(file_path, 'w', encoding='utf-8') as file:
                    file.write(response.text)
                print(f"Saved {file_name}")
            else:
                print(f"Error downloading {url}: {response.status_code}")
        except requests.exceptions.Timeout:
            print(f"Timeout error for Intel {year} server query {url}. Retrying...")
        except Exception as e:
            print(f"Failed to download {url}: {e}")

# Function to download CPU specs with filtering for Intel and AMD
def download_cpu_specs_with_filters():
    for year in years:
        # Process AMD CPUs (even if they are not in amd_cpu_years_over_100)
        if year in amd_cpu_years_over_100:
            print(f"Processing AMD CPUs for year {year} with more than 100 CPUs")
            for mobile_status in [True, False]:
                download_html_for_year_and_manufacturer_with_filters(year, 'AMD', 'CPU', cpu_mobile_filter_url, cpu_html_directory, mobile=mobile_status)
        else:
            print(f"Processing AMD CPUs for year {year}")
            download_html_for_year_and_manufacturer(year, 'AMD', 'CPU', cpu_base_url, cpu_html_directory)

        # Process Intel CPUs separately (even if AMD was processed)
        if year in intel_cpu_years_over_100:
            print(f"Processing Intel CPUs for year {year}")
            for mobile_status in [True, False]:
                if year == 2023 and not mobile_status:
                    download_html_for_intel_2023(cpu_html_directory)
                elif year in intel_server_queries and not mobile_status:
                    # Use hard-coded queries for Intel 2021, 2014, 2012
                    download_html_for_intel_with_server_filters(year, cpu_html_directory)
                elif year in intel_additional_filters and not mobile_status:
                    for filter_type in intel_additional_filters[year]['mobile_no']:
                        download_html_for_year_and_manufacturer_with_filters(year, 'Intel', 'CPU', cpu_mobile_filter_url, cpu_html_directory, mobile=mobile_status, filter_type=filter_type)
                else:
                    download_html_for_year_and_manufacturer_with_filters(year, 'Intel', 'CPU', cpu_mobile_filter_url, cpu_html_directory, mobile=mobile_status)

        # Process other years (if needed)
        else:
            for manufacturer in cpu_manufacturers:
                download_html_for_year_and_manufacturer(year, manufacturer, 'CPU', cpu_base_url, cpu_html_directory)





# Start downloading GPU and CPU specs
# download_gpu_specs()
download_cpu_specs_with_filters()

print("Script finished!")


2004_AMD_CPU_database_TechPowerUp.html already exists. Skipping download.
2004_AMD_CPU_database_TechPowerUp.html already exists. Skipping download.
2004_Intel_CPU_database_TechPowerUp.html already exists. Skipping download.
2005_AMD_CPU_database_TechPowerUp_Yes_mobile.html already exists. Skipping download.
2005_AMD_CPU_database_TechPowerUp_No_mobile.html already exists. Skipping download.
2005_AMD_CPU_database_TechPowerUp.html already exists. Skipping download.
2005_Intel_CPU_database_TechPowerUp.html already exists. Skipping download.
2006_AMD_CPU_database_TechPowerUp.html already exists. Skipping download.
2006_AMD_CPU_database_TechPowerUp.html already exists. Skipping download.
2006_Intel_CPU_database_TechPowerUp.html already exists. Skipping download.
2007_AMD_CPU_database_TechPowerUp.html already exists. Skipping download.
2007_AMD_CPU_database_TechPowerUp.html already exists. Skipping download.
2007_Intel_CPU_database_TechPowerUp.html already exists. Skipping download.
2008_AMD_

Please note that the website rate limits how much data you can scrape in a day. After running the script above, I went and manually grabbed the sublinks for all the GPU and CPU and stored them into a Python list to use for further webscraping for individual GPU and CPU details.

I essentially looked for parts of the HTML that started with:
```html
<div id="list" class="table-wrapper">
```
and ended near:
```
<div id="ajaxresults" class="table-wrapper">
```

In [7]:
# Obtained list for GPU and CPU:
import sys
import os

# Get the current working directory (where the notebook is)
notebook_dir = os.getcwd()

# Navigate to the parent directory (where the 'src' folder is)
project_dir = os.path.abspath(os.path.join(notebook_dir, '..'))

# Add the 'src' folder to the Python path
src_dir = os.path.join(project_dir, 'src')
sys.path.append(src_dir)

# Now you can import your module
from gpu_cpu_link_web_scrape import *


print(len(gpu_2024))
print(len(gpu_2023))
print(len(gpu_2022))
print(len(gpu_2021))
print(len(gpu_2020))
print(len(gpu_2019))
print(len(gpu_2018))
print(len(gpu_2017))
print(len(gpu_2016))
print(len(gpu_2015))
print(len(gpu_2014))
print(len(gpu_2013))
print(len(gpu_2012))
print(len(gpu_2011))
print(len(gpu_2010))
print(len(gpu_2009))
print(len(gpu_2008))
print(len(gpu_2007))
print(len(gpu_2006))
print(len(gpu_2005))
print(len(gpu_2004))

print(cpu_2024)
print(cpu_2023)
print(cpu_2022)
print(cpu_2021)
print(cpu_2020)
print(cpu_2019)
print(cpu_2018)
print(cpu_2017)
print(cpu_2016)
print(cpu_2015)
print(cpu_2014)
print(cpu_2013)
print(cpu_2012)
print(cpu_2011)
print(cpu_2010)
print(cpu_2009)
print(cpu_2008)
print(cpu_2007)
print(cpu_2006)
print(cpu_2005)
print(cpu_2004)



SyntaxError: invalid syntax (gpu_cpu_link_web_scrape.py, line 2936)

## Exploratory Data Analysis

## Feature Engineering

## Model Selection

## Model Evaluation

## Prediction & Results

## Conclusion

## References