<h1>Random Forest Regressor model 

Let's try to predict the future Nvidia and AMD GPU names. Since their names change quite a lot this should be theoretically an impossible task, so let's do it.

In [413]:
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error
from sklearn.model_selection import train_test_split
from sklearn.multioutput import MultiOutputRegressor
from sklearn.impute import SimpleImputer
import numpy as np

df=pd.read_csv('..//Datasets/gpuspecs.csv')
df.head()

Unnamed: 0,manufacturer,productName,releaseYear,memSize,memBusWidth,gpuClock,memClock,unifiedShader,tmu,rop,pixelShader,vertexShader,igp,bus,memType,gpuChip
0,NVIDIA,GeForce RTX 4050,2023.0,8.0,128.0,1925,2250.0,3840.0,120,48,,,No,PCIe 4.0 x16,GDDR6,AD106
1,Intel,Arc A350M,2022.0,4.0,64.0,300,1500.0,768.0,48,24,,,No,PCIe 4.0 x8,GDDR6,DG2-128
2,Intel,Arc A370M,2022.0,4.0,64.0,300,1500.0,1024.0,64,32,,,No,PCIe 4.0 x8,GDDR6,DG2-128
3,Intel,Arc A380,2022.0,4.0,64.0,300,1500.0,1024.0,64,32,,,No,PCIe 4.0 x8,GDDR6,DG2-128
4,Intel,Arc A550M,2022.0,8.0,128.0,300,1500.0,2048.0,128,64,,,No,PCIe 4.0 x16,GDDR6,DG2-512


Based on this dataset, it seems that we can also make an attempt at predicting future gpu/memory clocks.

Preprocessing

In [414]:
targets=df[['memSize','memBusWidth','gpuClock', 'memClock','unifiedShader','tmu', 'rop', 'pixelShader','vertexShader']]
# Let's first do some preprocessing
# There are a few missing values, let's use the SimpleImputer to replace them with the mean values.
imputer=SimpleImputer(strategy='mean')
targets_imputed=imputer.fit_transform(targets)
df[['memSize', 'memBusWidth', 'gpuClock', 'memClock', 'unifiedShader', 'tmu', 'rop', 'pixelShader', 'vertexShader']] = targets_imputed

# The column Year is a bit different so strategy='most_frequent' is necessary
imputer_year=SimpleImputer(strategy='most_frequent')
releaseyear=df[['releaseYear']]
year_mf=imputer_year.fit_transform(releaseyear)
df['releaseYear']=year_mf

In [415]:
Amd_GPUs=df[df['manufacturer' ]=='NVIDIA']
Nvidia_GPUs=df[df['manufacturer' ]=='NVIDIA']
Nvidia_GPUs = Nvidia_GPUs.sort_values(by='releaseYear', ascending=False)
Nvidia_GPUs.head(5)

Unnamed: 0,manufacturer,productName,releaseYear,memSize,memBusWidth,gpuClock,memClock,unifiedShader,tmu,rop,pixelShader,vertexShader,igp,bus,memType,gpuChip
0,NVIDIA,GeForce RTX 4050,2023.0,8.0,128.0,1925.0,2250.0,3840.0,120.0,48.0,6.739078,2.622573,No,PCIe 4.0 x16,GDDR6,AD106
21,NVIDIA,GeForce RTX 3080 Ti Mobile,2022.0,16.0,256.0,810.0,2000.0,7424.0,232.0,96.0,6.739078,2.622573,No,PCIe 4.0 x16,GDDR6,GA103S
10,NVIDIA,GeForce MX550,2022.0,2.0,64.0,1065.0,1500.0,1024.0,32.0,16.0,6.739078,2.622573,No,PCIe 4.0 x8,GDDR6,TU117
52,NVIDIA,RTX A5500 Mobile,2022.0,16.0,256.0,900.0,1750.0,7424.0,232.0,96.0,6.739078,2.622573,No,PCIe 4.0 x16,GDDR6,GA103S
51,NVIDIA,RTX A5500,2022.0,24.0,384.0,1170.0,2000.0,10240.0,320.0,96.0,6.739078,2.622573,No,PCIe 4.0 x16,GDDR6,GA102


In [416]:
# Let's start with Nvidia
# Dropping the string values, since I'm using the multioutput regressor X can include the target columns as well
X_nvidia = Nvidia_GPUs.drop(['manufacturer','productName','bus','memType','gpuChip', 'igp'], axis=1)

# Multiple targets, using MultiOutputRegressor
targets_nvidia=Nvidia_GPUs[['memSize','memBusWidth','gpuClock', 'memClock','unifiedShader','tmu', 'rop', 'pixelShader','vertexShader']]
y_nvidia=targets_nvidia

model=MultiOutputRegressor(estimator=RandomForestRegressor())

model.fit(X_nvidia,y_nvidia)
y_pred=model.predict(X_nvidia)

results_df = pd.DataFrame(index=y_nvidia.index)  # Create an empty DataFrame with the same index as y_nvidia

for i, column in enumerate(y_nvidia.columns):
    predicted_column = 'predicted_' + column
    results_df[predicted_column] = y_pred[:, i]  # Assign the predicted values to the corresponding column in results_df

#print(X_test)
print(results_df.head())

    predicted_memSize  predicted_memBusWidth  predicted_gpuClock   
0                 8.0                  128.0             1911.25  \
21               16.0                  256.0              811.24   
10                2.0                   64.0             1064.97   
52               16.0                  256.0              899.82   
51               24.0                  384.0             1171.31   

    predicted_memClock  predicted_unifiedShader  predicted_tmu  predicted_rop   
0              2250.70               3850.24000         120.00          48.00  \
21             2000.03               7424.00000         231.92          96.00   
10             1500.00               1025.51938          32.00          16.00   
52             1750.00               7388.16000         231.52          96.00   
51             2000.02              10270.72000         319.60          98.08   

    predicted_pixelShader  predicted_vertexShader  
0                6.739078                2.622573  


Let's try the model on new data<br>
But first, what columns we should input in order to predict?

In [417]:
column_diff = set(X_nvidia.columns) - set(y_nvidia.columns)
print(column_diff)

{'releaseYear'}


In [419]:
# Let's test it on new GPU's 
# 'releaseYear' - the input that should be provided
# where 'memSize','memBusWidth','gpuClock', 'memClock','unifiedShader','tmu', 'rop', 'pixelShader','vertexShader' are the inputs predicted
# But just one input is not enough, need additional help
inputs=pd.DataFrame(data=[[2023,12,0,1313,0,0,0,0,0,0]], columns=X_nvidia.columns)
RTX4070prediction=model.predict(inputs)
predictions = RTX4070prediction.round(2)  # Round the values to 2 decimal places
formatted_predictions = ['{:g}'.format(value) for value in predictions.flatten()]
print(formatted_predictions)

['12', '32', '1314.78', '76.42', '8', '1', '1', '1.38', '0.01']
