# CSCA 5622 Final Project - Consumer PC Hardware Trends and Predictions
### By Moshiur Howlader

## Introduction

In the digital age, computer hardware is ubiquitous, and its performance continues to improve year by year. Intel's co-founder provided valuable insight into how computers would evolve, known as **Moore's Law** ([see Wikipedia](https://en.wikipedia.org/wiki/Moore%27s_law)). This observation states that the number of transistors in an integrated circuit (IC) doubles approximately every two years. The chart below illustrates the trend from 1970 to 2020:

<br><br>
<img src="../images/moores_law_transistor_count_1970_2020.png" alt="Transistor count over time" width="1200" height="800">

Based on Moore's Law, consumers might expect to get computer hardware with double the transistors every two years—leading to predictable and consistent increases in computing power. However, the reality is far more complex. As the number of transistors crammed into a fixed area increases, **quantum physics** begins to interfere, imposing physical limitations. These constraints prevent engineers from continuing to follow Moore's Law indefinitely. According to [nano.gov](https://www.nano.gov/nanotech-101/what/nano-size), the average size of a gold atom is 1/3 nm! Clearly, there is a limit to how many transistors can be packed into computer parts. Below are the trends in chip lithography size according to Wikipedia:

| Feature Size | Year |
|--------------|------|
| 20 μm        | 1968 |
| 10 μm        | 1971 |
| 6 μm         | 1974 |
| 3 μm         | 1977 |
| 1.5 μm       | 1981 |
| 1 μm         | 1984 |
| 800 nm       | 1987 |
| 600 nm       | 1990 |
| 350 nm       | 1993 |
| 250 nm       | 1996 |
| 180 nm       | 1999 |
| 130 nm       | 2001 |
| 90 nm        | 2003 |
| 65 nm        | 2005 |
| 45 nm        | 2007 |
| 32 nm        | 2009 |
| 28 nm        | 2010 |
| 22 nm        | 2012 |
| 14 nm        | 2014 |
| 10 nm        | 2016 |
| 7 nm         | 2018 |
| 5 nm         | 2020 |
| 3 nm         | 2022 |
| 2 nm         | ~2025 (Future) |

According to Jensen Huang, the CEO of Nvidia, **Moore's Law is dead** ([TechSpot article](https://www.techspot.com/news/96094-nvidia-jensen-huang-once-again-claims-moore-law.html)). This statement seems reasonable given the physical limitations of current chip designs. As the rate of improvement in transistor count decreases year over year, will consumers start paying more for diminishing performance gains?

## Why Should Consumers Care About the Death of Moore's Law?

With the decline of Moore's Law, we can expect fewer improvements in transistor density in upcoming generations. This poses a concern for consumers, as we may start paying more for diminishing returns on performance. As traditional computing approaches its physical limits, incremental improvements will become smaller, potentially benefiting corporations more than consumers. This could lead to a scenario where consumers pay more for fewer benefits, which is undesirable.

## The Economic Reality Today

Inflation has steadily eroded purchasing power in the USA over the last 50 years. As inflation rises, the real cost of consumer goods, including technology, increases, affecting affordability. Here are links to inflation-related data:

- [Purchasing power of the US dollar over time](https://elements.visualcapitalist.com/purchasing-power-of-the-u-s-dollar-over-time/)
- [America's growing rent burden](https://www.axios.com/2023/05/22/americas-growing-rent-burden)

## Purpose of This Project

This project aims to answer the following key questions:

1. What are the trends in CPU and GPU parts over the past 20 years?
2. Is the price-to-performance ratio of these parts keeping up? Are consumers getting a fair deal compared to 10 to 20 years ago?
3. Can we predict the performance of next-gen, unreleased CPU and GPU parts using supervised machine learning models?


## Data Collection & Description

The data for both CPU/GPU was collected from:

Note that various other sources were considered but was difficult to scrape/obtain or the data quality was not thorough enough. Hence they were skipped for the purposes of data source.
- https://www.hwcompare.com/
- https://www.userbenchmark.com/Software
- https://www.tomshardware.com/reviews/gpu-hierarchy,4388.html
- https://www.tomshardware.com/reviews/cpu-hierarchy,4312.html

The script used to collect them is below (uncomment entire code to run):

In [4]:
import os
import requests
import time

# Define the relative paths to save HTML files for GPU and CPU
gpu_html_directory = os.path.join('..', 'data', 'gpu')
cpu_html_directory = os.path.join('..', 'data', 'cpu')

# Create the directories if they don't exist
for directory in [gpu_html_directory, cpu_html_directory]:
    if not os.path.exists(directory):
        os.makedirs(directory)

# Base URLs for TechPowerUp GPU and CPU Specs by year
gpu_base_url = 'https://www.techpowerup.com/gpu-specs/?released='
cpu_base_url = 'https://www.techpowerup.com/cpu-specs/?released='

# List of years from 2004 to 2024
years = list(range(2004, 2024 + 1))

def get_filename_from_year_and_type(year, spec_type):
    # Generate the filename in the format <year>_<type>_database_TechPowerUp.html
    return f'{year}_{spec_type}_database_TechPowerUp.html'

def download_html_for_year(year, spec_type, base_url, directory):
    full_url = f'{base_url}{year}&sort=name'
    file_name = get_filename_from_year_and_type(year, spec_type)
    file_path = os.path.join(directory, file_name)

    # If file already exists, skip downloading
    if os.path.exists(file_path):
        print(f"{file_name} already exists. Skipping download.")
        return True  # Indicate that the download was successful or skipped

    try:
        print(f"Downloading {full_url}...")
        response = requests.get(full_url, timeout=10)  # Set a 10-second timeout for the request

        # Check if request was successful
        if response.status_code == 200:
            # Write the HTML content to a file
            with open(file_path, 'w', encoding='utf-8') as file:
                file.write(response.text)
            print(f"Saved {file_name}")
            return True  # Indicate success
        else:
            print(f"Error downloading {full_url}: {response.status_code}")
            return False  # Indicate failure
    except requests.exceptions.Timeout:
        print(f"Timeout error occurred for {year}. Retrying...")
        return False  # Indicate failure due to timeout
    except Exception as e:
        print(f"Failed to download {full_url}: {e}")
        return False  # Indicate failure due to other exceptions

# Function to handle downloading for both GPU and CPU
def download_specs(spec_type, base_url, directory):
    idx = 0
    retries = 0
    while idx < len(years):
        year = years[idx]

        success = download_html_for_year(year, spec_type, base_url, directory)
        if success:
            idx += 1  # Move to the next year only if the download is successful
            retries = 0  # Reset retries after successful download
        else:
            retries += 1
            if retries >= 3:
                print(f"Skipping year {year} after 3 failed attempts.")
                idx += 1  # Move to the next year after 3 failed attempts
                retries = 0  # Reset retries for the next year
            else:
                print(f"Retrying download for {spec_type} year {year} due to error...")
                time.sleep(10)  # Wait 10 seconds before retrying

        time.sleep(1)  # Add a delay between requests to avoid overwhelming the server

        # Pause for 1 minute after every 7 iterations
        if idx % 7 == 0 and idx != 0:
            print(f"Pausing for 1 minute after {idx} {spec_type} iterations...")
            time.sleep(60)  # Sleep for 1 minute

# Start downloading GPU and CPU specs
download_specs('GPU', gpu_base_url, gpu_html_directory)
download_specs('CPU', cpu_base_url, cpu_html_directory)
print("Script finished!")

2004_GPU_database_TechPowerUp.html already exists. Skipping download.
2005_GPU_database_TechPowerUp.html already exists. Skipping download.
2006_GPU_database_TechPowerUp.html already exists. Skipping download.
2007_GPU_database_TechPowerUp.html already exists. Skipping download.
2008_GPU_database_TechPowerUp.html already exists. Skipping download.
2009_GPU_database_TechPowerUp.html already exists. Skipping download.
2010_GPU_database_TechPowerUp.html already exists. Skipping download.
Pausing for 1 minute after 7 GPU iterations...
2011_GPU_database_TechPowerUp.html already exists. Skipping download.
2012_GPU_database_TechPowerUp.html already exists. Skipping download.
2013_GPU_database_TechPowerUp.html already exists. Skipping download.
2014_GPU_database_TechPowerUp.html already exists. Skipping download.
2015_GPU_database_TechPowerUp.html already exists. Skipping download.
2016_GPU_database_TechPowerUp.html already exists. Skipping download.
2017_GPU_database_TechPowerUp.html already 

After running that, I went and manually grabbed the sublinks for all the GPU and CPU and stored them into a Python list to use for further webscraping for individual.

Please note that the website rate limits how much data you can scrape in a day.

## Data Collection & Description

## Feature Engineering

## Model Selection

## Model Evaluation

## Prediction & Results

## Conclusion

## References