<h1>Random Forest Regressor model 

Let's try to predict the future Nvidia and AMD GPU names. Since their names change quite a lot this should be theoretically an impossible task, so let's do it.

In [246]:
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error
from sklearn.model_selection import train_test_split
from sklearn.impute import SimpleImputer
import numpy as np
from sklearn.impute import KNNImputer

df=pd.read_csv('..//Datasets/gpuspecs.csv')
df.head()

Unnamed: 0,manufacturer,productName,releaseYear,memSize,memBusWidth,gpuClock,memClock,unifiedShader,tmu,rop,pixelShader,vertexShader,igp,bus,memType,gpuChip
0,NVIDIA,GeForce RTX 4050,2023.0,8.0,128.0,1925,2250.0,3840.0,120,48,,,No,PCIe 4.0 x16,GDDR6,AD106
1,Intel,Arc A350M,2022.0,4.0,64.0,300,1500.0,768.0,48,24,,,No,PCIe 4.0 x8,GDDR6,DG2-128
2,Intel,Arc A370M,2022.0,4.0,64.0,300,1500.0,1024.0,64,32,,,No,PCIe 4.0 x8,GDDR6,DG2-128
3,Intel,Arc A380,2022.0,4.0,64.0,300,1500.0,1024.0,64,32,,,No,PCIe 4.0 x8,GDDR6,DG2-128
4,Intel,Arc A550M,2022.0,8.0,128.0,300,1500.0,2048.0,128,64,,,No,PCIe 4.0 x16,GDDR6,DG2-512


Preprocessing by imputing missing values with the mean of each row

In [247]:
# Let's first do some preprocessing

# There are a few missing values, let's use the SimpleImputer to replace them with the mean values.
targets=df[['memSize','memBusWidth','gpuClock', 'memClock','unifiedShader','tmu', 'rop']]
imputer=SimpleImputer(strategy='mean')
targets_imputed=imputer.fit_transform(targets)
df[['memSize', 'memBusWidth', 'gpuClock', 'memClock', 'unifiedShader', 'tmu', 'rop']] = targets_imputed

# The column Year is a bit different so strategy='most_frequent' is necessary
# Perhaps the same could be said for pixelShader and vertexShader since the values differ quite a lot
targets=df[['releaseYear']]
imputer=SimpleImputer(strategy='most_frequent')
year_mf=imputer.fit_transform(targets)
df[['releaseYear']]=year_mf

# Let's try K-nearest (KNN) on those 2
targets=df[['pixelShader','vertexShader']]
imputer = KNNImputer(n_neighbors=2)
# Fit the imputer on your data and transform the missing values
imputed = imputer.fit_transform(targets)
df[['pixelShader','vertexShader']]=imputed

# the KNN imputer has the same results as the simpleimputer i used above with the mean strategy
# So dropping those 2 columns will probably be more beneficial

Let's sort manufacturers first

In [248]:
df.manufacturer.unique()
# Let's focus on the first 3

array(['NVIDIA', 'Intel', 'AMD', 'ATI', 'Sony', 'Matrox', 'XGI', '3dfx'],
      dtype=object)

Nvidia:

In [249]:
Nvidia_GPUs=df[df['manufacturer' ]=='NVIDIA']
Nvidia_GPUs = Nvidia_GPUs.sort_values(by='releaseYear', ascending=False)
Nvidia_GPUs.head(5)

Unnamed: 0,manufacturer,productName,releaseYear,memSize,memBusWidth,gpuClock,memClock,unifiedShader,tmu,rop,pixelShader,vertexShader,igp,bus,memType,gpuChip
0,NVIDIA,GeForce RTX 4050,2023.0,8.0,128.0,1925.0,2250.0,3840.0,120.0,48.0,6.739078,2.622573,No,PCIe 4.0 x16,GDDR6,AD106
21,NVIDIA,GeForce RTX 3080 Ti Mobile,2022.0,16.0,256.0,810.0,2000.0,7424.0,232.0,96.0,6.739078,2.622573,No,PCIe 4.0 x16,GDDR6,GA103S
10,NVIDIA,GeForce MX550,2022.0,2.0,64.0,1065.0,1500.0,1024.0,32.0,16.0,6.739078,2.622573,No,PCIe 4.0 x8,GDDR6,TU117
52,NVIDIA,RTX A5500 Mobile,2022.0,16.0,256.0,900.0,1750.0,7424.0,232.0,96.0,6.739078,2.622573,No,PCIe 4.0 x16,GDDR6,GA103S
51,NVIDIA,RTX A5500,2022.0,24.0,384.0,1170.0,2000.0,10240.0,320.0,96.0,6.739078,2.622573,No,PCIe 4.0 x16,GDDR6,GA102


AMD:

In [250]:
Amd_GPUs=df[df['manufacturer' ]=='AMD']
Amd_GPUs = Amd_GPUs.sort_values(by='releaseYear', ascending=False)
Amd_GPUs.head(5)

Unnamed: 0,manufacturer,productName,releaseYear,memSize,memBusWidth,gpuClock,memClock,unifiedShader,tmu,rop,pixelShader,vertexShader,igp,bus,memType,gpuChip
32,AMD,Radeon 660M,2022.0,3.113803,274.874445,1500.0,868.578119,384.0,24.0,16.0,6.739078,2.622573,Yes,PCIe 4.0 x8,System Shared,Rembrandt
43,AMD,Radeon RX 6700S,2022.0,8.0,128.0,1700.0,1750.0,1792.0,112.0,64.0,6.739078,2.622573,No,PCIe 4.0 x8,GDDR6,Navi 23
33,AMD,Radeon 680M,2022.0,3.113803,274.874445,2000.0,868.578119,768.0,48.0,32.0,6.739078,2.622573,Yes,PCIe 4.0 x8,System Shared,Rembrandt
53,AMD,Steam Deck GPU,2022.0,16.0,128.0,1000.0,1375.0,512.0,32.0,8.0,6.739078,2.622573,No,IGP,LPDDR5,Van Gogh
50,AMD,Radeon RX 7900 XT,2022.0,16.0,256.0,1800.0,2250.0,12288.0,768.0,256.0,6.739078,2.622573,No,PCIe 5.0 x16,GDDR6,Navi 31


Intel:

In [251]:
Intel_GPUs=df[df['manufacturer' ]=='Intel']
Intel_GPUs = Intel_GPUs.sort_values(by='releaseYear', ascending=False)
Intel_GPUs.head(5)

Unnamed: 0,manufacturer,productName,releaseYear,memSize,memBusWidth,gpuClock,memClock,unifiedShader,tmu,rop,pixelShader,vertexShader,igp,bus,memType,gpuChip
1,Intel,Arc A350M,2022.0,4.0,64.0,300.0,1500.0,768.0,48.0,24.0,6.739078,2.622573,No,PCIe 4.0 x8,GDDR6,DG2-128
9,Intel,Arctic Sound-M,2022.0,16.0,4096.0,900.0,1200.0,8192.0,256.0,128.0,6.739078,2.622573,No,PCIe 4.0 x16,HBM2e,Arctic Sound
2,Intel,Arc A370M,2022.0,4.0,64.0,300.0,1500.0,1024.0,64.0,32.0,6.739078,2.622573,No,PCIe 4.0 x8,GDDR6,DG2-128
57,Intel,UHD Graphics 730,2022.0,3.113803,274.874445,300.0,868.578119,192.0,12.0,8.0,6.739078,2.622573,Yes,Ring Bus,System Shared,Alder Lake GT1
56,Intel,UHD Graphics 710,2022.0,3.113803,274.874445,300.0,868.578119,128.0,8.0,8.0,6.739078,2.622573,Yes,Ring Bus,System Shared,Alder Lake GT1


In [252]:
# Let's start with Nvidia
# Dropping the string values as well as the last 5 columns of numerical values from unifiedShader to vertexShader
X_nvidia = Nvidia_GPUs.drop(['manufacturer','productName','bus','memType','gpuChip', 'igp','unifiedShader','tmu','rop','pixelShader','vertexShader','releaseYear'], axis=1)

# Multiple targets, using MultiOutputRegressor
# Let's see if the model can predict gpu / mem clocks and the release year.
targets_nvidia=Nvidia_GPUs[['releaseYear']]
y_nvidia=targets_nvidia

model=RandomForestRegressor()

# Fitting the whole data (Nvidia GPUs portion of the dataset only)
model.fit(X_nvidia,y_nvidia)
y_pred=model.predict(X_nvidia)

# Let's round the results
y_pred=y_pred.round()
print(y_pred)

  model.fit(X_nvidia,y_nvidia)


[2021. 2021. 2020. ... 1997. 1995. 1995.]


In [253]:
column_diff = set(X_nvidia.columns) - set(y_nvidia.columns)
print(column_diff)

{'memClock', 'memBusWidth', 'memSize', 'gpuClock'}
