Jake Onkka
https://www.kaggle.com/datasets/aadhavvignesh/pubg-weapon-stats

In [1]:
import pandas as pd
import numpy as np
import math
from sklearn.pipeline import Pipeline
from sklearn.base import ClassifierMixin, BaseEstimator
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import accuracy_score

In [2]:
data = pd.read_csv('pubg-weapon-stats.csv')
data.fillna(value=0,inplace=True)
columns_to_drop = ["Weapon Name", "Fire Mode"] #get rid of categorical values
data = data.drop(columns = columns_to_drop)
train = data.sample(frac = 0.70)
test = data.drop(train.index)
train_xs = train.drop(columns = "Weapon Type")
train_ys = train['Weapon Type']
test_xs = test.drop(columns = "Weapon Type")
test_ys = test['Weapon Type']
pd.set_option('display.max_rows',5)

print(train_xs)
print(train_ys)
print(test_xs)
print(test_ys)

    Bullet Type  Damage  Magazine Capacity  Range  Bullet Speed  Rate of Fire  \
4          5.56      43                 30  600.0         880.0         0.086   
12         9.00      31                 19   50.0         300.0         0.055   
..          ...     ...                ...    ...           ...           ...   
8          5.56      43                 30  600.0         880.0         0.086   
13         9.00      26                 25  200.0         350.0         0.048   

    Shots to Kill (Chest)  Shots to Kill (Head)  Damage Per Second  BDMG_0  \
4                       4                     2              502.0    41.0   
12                      5                     3              569.0    31.0   
..                    ...                   ...                ...     ...   
8                       4                     2              502.0    41.0   
13                      7                     4              542.0    26.0   

    BDMG_1  BDMG_2  BDMG_3  HDMG_0  HDMG_1  

In [3]:
class myClassifier(BaseEstimator, ClassifierMixin):
    def __init__(self):
        self.train_xs = None
        self.train_ys = None
        self.class_count = {}
        self.alpha = 0.01
        self.D = 9
    def countClass(self):
        count = (self.train_ys['Weapon Type'].value_counts())
        self.class_count = count
        #print(self.class_count[x])
    def p_conditional(self, feature, value, weapon_type):
        count = self.class_count[weapon_type]
        #print(weapon_type)
        train = pd.concat([self.train_xs, self.train_ys], axis=1)

        count_feature_given_class = train[(train['Weapon Type'] == weapon_type) & (train[feature] == value)].shape[0] 
        val = (count_feature_given_class + self.alpha) / (count + (self.alpha * self.D))
        return val
    def p_class(self, weapon_type):
        numerator = self.class_count[weapon_type]
        denominator = len(self.train_xs)
        return numerator / denominator
    def fit(self, X, y):
        train_xs = pd.DataFrame(X, columns = ['Bullet Type', 'Damage', 'Magazine Capacity', 'Range', 'Bullet Speed', 'Rate of Fire', 'Shots to Kill (Chest)', 'Shots to Kill (Head)', 'Damage Per Second',  "BDMG_0", "BDMG_1", "BDMG_2", "BDMG_3", "HDMG_0", "HDMG_1", "HDMG_2", "HDMG_3"])
        train_ys = pd.DataFrame(y, columns = ['Weapon Type'])
        self.train_xs = train_xs
        self.train_ys = train_ys
        #print(train_ys)
        self.countClass()
        

    def predict(self, X):
        predictions = []
        nb = 0
        test_xs = pd.DataFrame(X, columns = ['Bullet Type', 'Damage', 'Magazine Capacity', 'Range', 'Bullet Speed', 'Rate of Fire', 'Shots to Kill (Chest)', 'Shots to Kill (Head)', 'Damage Per Second',  "BDMG_0", "BDMG_1", "BDMG_2", "BDMG_3", "HDMG_0", "HDMG_1", "HDMG_2", "HDMG_3"])
        for index, row in test_xs.iterrows():
            best_class = None
            best_probability = -math.inf
            denom = 0
            for weapon_type, count in self.class_count.items(): #for each weapon type, or class
                nb = math.log2(self.p_class(weapon_type))
                for feature in self.train_xs.columns:  #now for every feature, add up totals
                    value = row[feature]
                    nb += math.log2(self.p_conditional(feature,value,weapon_type))
                denom += nb
                nb = math.pow(2,nb)
                denom = math.pow(2,nb)
                nb = nb / denom
                #print(weapon_type)
                #print(nb)
                if nb > best_probability:
                    best_probability = nb
                    best_class = weapon_type            
            predictions.append(best_class)
        return np.array(predictions)

scaler = MinMaxScaler()
classifier = myClassifier()
pipeline = Pipeline([
        ('scaler',scaler),
        ('classify',classifier)
])
pipeline.fit(train_xs,train_ys)

In [None]:
predicted_ys = pipeline.predict(test_xs)
accuracy_score(test_ys,predicted_ys)

0.3076923076923077

I chose the 'Weapon Type' as my target column because each it has a strong correlation to every other column. Typically each weapon type is going to have unique values in the other columns as each weapon type is more suited for different roles. Sniper rifles typically have larger bullets, higher damage, and higher range whereas pistols have smaller bullets, less damage, higher rate of fire, and less range for example.

My model doesn't perform well in this case, it is evident that the dataset size is far too small for my classifier to be of any real use. There are a total of 44 rows, and since we are sampling 70% for our training set from a small dataset means there are likely to be rows with no training data on them. I have seen a really severe problem when the test set contains every instance of a weapon type so the classifier is guaranteed to be wrong. If there were more rows along with more samples for each weapon types then the classifier will perform a lot better as it will have some training data for every weapon type and will be better at figuring out the most likely weapon type in the instances the features are similar or overlapping with other weapon types.

I chose the accuracy score metric because it fits perfectly for a multilabel classification. It checks to see if the true y label perfectly matches the predicted y label which is exactly what is needed in this case as I'm attempting to predict the weapon type.