## Step:1 Dataset Description and Objective:

**Dataset Description:**

`Counter-Strike (CS):` is a popular series of tactical first-person shooter (FPS) video games that have been enjoyed by gamers worldwide for many years. The series originated as a modification for the popular game Half-Life and quickly gained its own dedicated following. Here's an overview of Counter-Strike:

`Gameplay Overview:`Counter-Strike is primarily a multiplayer game where two teams, the Counter-Terrorists (CTs) and the Terrorists (Ts), compete against each other.

The objective of each round varies based on the game mode, but the primary goals include:

`Counter-Terrorists:` Prevent the Terrorists from achieving their objectives, such as defusing a bomb or rescuing hostages.
Terrorists: Achieve their objectives, which may include planting a bomb at a designated site or holding hostages.
Rounds are relatively short, typically lasting a few minutes, and players have only one life per round. When a player is eliminated, they must wait until the next round to respawn.

**Key Features:**

`Weapons:` Players can purchase and use a wide variety of firearms, grenades, and equipment. The choice of weaponry is an essential strategic element in the game.

`Economy:` Players earn in-game money based on their performance in the previous rounds. Money is used to buy weapons and equipment for the next round.

`Maps:` Counter-Strike features a range of maps, each with its own layout and objectives. Popular maps include Dust II, Mirage, Inferno, and more.

`Teamwork:` Successful gameplay in Counter-Strike heavily relies on teamwork, communication, and strategy. Players often coordinate their actions with their teammates to achieve objectives.

`Competitive Play:` Counter-Strike is well-known for its competitive scene, with professional esports tournaments held worldwide.

**Popular Game Modes:**

`Bomb Defusal (de_):` In this mode, Terrorists attempt to plant a bomb at one of the designated bomb sites, while Counter-Terrorists aim to prevent the bomb from being planted or defuse it if it's planted.

`Hostage Rescue (cs_):` In hostage rescue mode, Counter-Terrorists must rescue hostages held by the Terrorists, while the Terrorists aim to prevent the rescues.

`Arms Race:` A fast-paced mode where players cycle through a series of weapons, aiming to be the first to get a kill with each weapon.

`Deathmatch:` A mode where players respawn quickly and aim to get as many kills as possible within a set time limit.

`Wingman:` A 2v2 competitive mode with smaller maps and shorter rounds.

Counter-Strike has evolved over the years with different versions, including Counter-Strike 1.6, Counter-Strike: Source, and Counter-Strike: Global Offensive (CS:GO), which is the most recent and widely played installment as of my last knowledge update in September 2021.

CS:GO is known for its competitive gameplay, professional esports scene, and ongoing updates that have kept the game relevant and enjoyable for players worldwide. It remains a cornerstone of the first-person shooter genre.


**objective:** 
The objective of this project is to build and compare multiple machine learning algorithms to classify the round winners in Counter-Strike: Global Offensive (CS). The goal is to analyze the dataset, apply feature selection and engineering, and determine which machine learning models perform best for this classification task. Specifically, we aim to predict the winner of each round based on various in-game attributes.

## Step:2 Import Necessary Libraries

In [None]:
# Basic Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Preprocessing and Model Selection
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, LabelEncoder

# Machine Learning Algorithms
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis

# Evaluation Metrics
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, roc_auc_score

# Library for Model Saving and Loading
import joblib

## Step:3 Data Loading

In [None]:
df = pd.read_csv(r"E:\DS & ML Syllabus\DS and ML projects intellipat\CS-Go-Project\cs-go.csv")

## Step:4 Exploratory Data Analysis (EDA)

#### 4.1 Data Overview
* Checking the dimensions of the dataset (number of rows and columns).
* Inspect first few rows to understand the structure of the data.

In [None]:
# Configure pandas to display a maximum of 10 rows and all columns (unlimited)
pd.set_option('display.max_rows',10)
pd.set_option('display.max_columns',None)

In [None]:
# Display first 5 rows
df.head()

In [None]:
# Display last 5 rows
df.tail()

In [None]:
# checking shape of dataset
df.shape

* so we have 1,22,410 rows and 97 columns

#### 4.2 Checking for Null Values

In [None]:
# checking null values
df.isnull().sum().sum()

#### 4.3 Checking for Duplicates

In [None]:
# checking for duplicated values
df.duplicated().sum()

* We identified 4,962 duplicate records and now we are removing them to ensure a clean dataset.

In [None]:
# Now we are removing duplicate rows
df.drop_duplicates(inplace=True)

In [None]:
# Now again checking and confirming that no duplicate values present
df.duplicated().sum()

#### 4.4 Understanding Data Types and Information

In [None]:
# It helps to understand the data type and information about data
df.info()

* Out of the 97 columns, 94 are of type float64.
* The 'bomb_planted' column is of type bool.
* The 'map' and 'round_winner' columns are of type object.

The 'map' column has 6 different values. Originally, we would apply one-hot encoding to this column. However, due to the large number of columns already present in the dataset, I chose to use label encoding instead to avoid further increasing the number of columns.

## Step:5 Data Preprocessing

#### 5.1 Label Encoding

In [None]:
# Create an empty dictionary to store the mapping
encoded_to_original = {}

for col in df.columns:
    if df[col].dtype == "object" or df[col].dtype == "bool":
        le = LabelEncoder()
        # Fit and transform the column
        df[col] = le.fit_transform(df[col]) 
        # Store the mapping in the dictionary
        encoded_to_original[col] = {i: label for i, label in enumerate(le.classes_)}

# Now you can access the original labels corresponding to each encoded value for each column
encoded_to_original

In [None]:
df.head()

In [None]:
# Splitting the data into independent and dependent variables for applying Feature scaling
X=df.iloc[:,:-1]
Y=df["round_winner"]

#### 5.2 Feature Scaling

In [None]:
# Apply Feature scaling in independent variables
sc = StandardScaler()
X_std = sc.fit_transform(X)
X_std

## Step:6  Model Building

#### 6.1 Splitting the Dataset

In [None]:
x_train,x_test,y_train,y_test=train_test_split(X_std,Y,test_size=0.3,random_state=45)

In [None]:
x_train.shape, x_test.shape, y_train.shape, y_test.shape

#### 6.2 Apply Linear Discriminant Analysis
* Applying Linear Discriminant Analysis (LDA) for feature selection, because it is a good approach for dimensionality reduction and feature ranking.

In [None]:
lda = LinearDiscriminantAnalysis()
lda.fit_transform(x_train,y_train)
lda.transform(x_test)

* LDA coefficients are derived from the linear discriminant analysis process 
* which tells us importance of each column that this column is important to include during model training like that
* By analyzing these coefficients, we can interpret the relative importance and impact of each feature in the classification task.

In [None]:
X.columns

In [None]:
# So in total we have 96 columns that's why here also we have 96 lda coefficients each coeff. represnting one column
# like 1st column time_left holding this much importance 1.28511070e-01
# similarly for other 195 columns respectively
lda.coef_.shape

In [None]:
lda.coef_

* If the LDA coefficients have negative values, as well as when they are in exponential form,
* the approach we take is to remove negative values by applying the absolute function.
* We then apply a logarithmic transformation to the coefficients to mitigate the risk of encountering
* RuntimeWarning: overflow encountered in exp.
* Finally, to address the exponential values, we use np.exp to restore the original scale.

In [None]:
# If the coefficients of LDA have negative values, we take their absolute values
# This step removes negative values from the coefficients
lda_coefficients_abs = np.abs(lda.coef_)

# Apply a logarithmic transformation to the absolute values of the coefficients
# This helps to scale down the values and prevent overflow issues
lda_coefficients_log = np.log(lda_coefficients_abs)

# Apply the exponential function to the transformed coefficients
# This restores the original scale while avoiding overflow errors
lda_coefficients = np.exp(lda_coefficients_log)

In [None]:
# Here we are converting our 2D array into a 1D array by flattening it into a list
lda_coefficients=lda_coefficients.flatten()
lda_coefficients

In [None]:
# Now we are loading all column names into the feature_names variable
feature_names=X.columns

In [None]:
# Here we are plotting a bar graph a/c to lda_coeff and column_names
plt.figure(figsize=(20,10))
plt.bar(feature_names,lda_coefficients)
plt.title("Bar graph between lda_coefficients and column_names")
plt.xticks(rotation=90)
plt.xlabel("Features")
plt.ylabel("Score")
plt.show()

In [None]:
# Creating a new DataFrame with column names 'Feature_names' and 'feature_scores'
df_feature_score = pd.DataFrame({"Feature_names":feature_names,"feature_scores":lda_coefficients})

In [None]:
df_feature_score

In [None]:
# Now selecting the top 20 columns with the highest feature scores to train our model
top_20_values = df_feature_score.nlargest(20,"feature_scores")

In [None]:
top_20_values

In [None]:
# Extracting the indices of these columns to be used in x_train and stored in imp_col
imp_col = top_20_values.index

In [None]:
imp_col

In [None]:
# Updating x_train and x_test with the selected columns and converting them into dataframes
x_train=x_train[:,imp_col]
x_test=x_test[:,imp_col]

x_train=pd.DataFrame(x_train)
x_test=pd.DataFrame(x_test)

In [None]:
x_train

In [None]:
y_test

## Step:7 Predictive Model Implementation

### 7.1 Applying Logistic Regression

In [None]:
# Initialize Logistic Regression model
lg_model = LogisticRegression()

# Train the model
lg_model.fit(x_train, y_train)

# Predict using the trained model
lg_pred = lg_model.predict(x_test)

### 7.2 Applying Decision Tree Classifier

In [None]:
# Initialize Decision Tree Classifier model
dt_model = DecisionTreeClassifier()

# Train the model
dt_model.fit(x_train, y_train)

# Predict using the trained model
dt_pred = dt_model.predict(x_test)

### 7.3 Applying Random Forest Classifier

In [None]:
# Initialize Random Forest Classifier model
rf_model = RandomForestClassifier()

# Train the model
rf_model.fit(x_train, y_train)

# Predict using the trained model
rf_pred = rf_model.predict(x_test)

## Step:8 Model Evaluation and Comparison

#### 8.1 Model Evaluation (Using Accuracy, Precision, Recall, F1-score, and ROC-AUC)

In [None]:
# Logistic Regression Evaluation
lg_pred = lg_model.predict(x_test)
lg_accuracy = accuracy_score(y_test, lg_pred)
lg_precision = precision_score(y_test, lg_pred)
lg_recall = recall_score(y_test, lg_pred)
lg_f1 = f1_score(y_test, lg_pred)
lg_roc_auc = roc_auc_score(y_test, lg_pred)

# Decision Tree Classifier Evaluation
dt_pred = dt_model.predict(x_test)
dt_accuracy = accuracy_score(y_test, dt_pred)
dt_precision = precision_score(y_test, dt_pred)
dt_recall = recall_score(y_test, dt_pred)
dt_f1 = f1_score(y_test, dt_pred)
dt_roc_auc = roc_auc_score(y_test, dt_pred)

# Random Forest Classifier Evaluation
rf_pred = rf_model.predict(x_test)
rf_accuracy = accuracy_score(y_test, rf_pred)
rf_precision = precision_score(y_test, rf_pred)
rf_recall = recall_score(y_test, rf_pred)
rf_f1 = f1_score(y_test, rf_pred)
rf_roc_auc = roc_auc_score(y_test, rf_pred)

# Print or display evaluation metrics for each model
print("Logistic Regression Metrics:")
print("Accuracy:", lg_accuracy)
print("Precision:", lg_precision)
print("Recall:", lg_recall)
print("F1-score:", lg_f1)
print("ROC-AUC:", lg_roc_auc)
print()

print("Decision Tree Classifier Metrics:")
print("Accuracy:", dt_accuracy)
print("Precision:", dt_precision)
print("Recall:", dt_recall)
print("F1-score:", dt_f1)
print("ROC-AUC:", dt_roc_auc)
print()

print("Random Forest Classifier Metrics:")
print("Accuracy:", rf_accuracy)
print("Precision:", rf_precision)
print("Recall:", rf_recall)
print("F1-score:", rf_f1)
print("ROC-AUC:", rf_roc_auc)

#### 8.2 Model Comparison

In [None]:
# Comparing Model Performance based on Sccuracy
models_accuracy = {'Logistic Regression': lg_accuracy, 'Decision Tree Classifier': dt_accuracy, 'Random Forest Classifier': rf_accuracy}
best_model = max(models_accuracy, key=models_accuracy.get)
print("Best Model based on Accuracy:", best_model)

## Step:9 Model Saving and Loading for Scalability and Reproducibility

In [None]:
# Step 1: Saving the model
# Specify the file path where you want to save the model
model_file_path = '../models/random_forest_model.pkl'

# Save the model to the specified file path
joblib.dump(rf_model, model_file_path)

# Step 2: Loading the model
# Load the saved model from the file path
loaded_model = joblib.load(model_file_path)

In [None]:
# Step 3: Perform predictions using the loaded model
# For 1st row from X_test
predictions = loaded_model.predict([[-0.002918,-0.002918,-0.054919,-0.168787,-0.940986,1.869350,-1.035043,-0.992892,0.726451,-0.740199,-0.876780,-0.159325,-0.690560,-0.363148,-0.366084,-0.439628,-0.884340,-0.297517,0.612828,-0.479267]])
predictions

* So here we got output as 1: means round_winner is Terrorist

In [None]:
# For 3rd row from X_test
# Perform predictions using the loaded model
predictions = loaded_model.predict([[-0.002918,-0.002918,0.054203,0.482854,1.984910,-0.742782,0.683520,0.614080,1.544110,-0.528024,2.081069,-0.771821,1.388414,-0.363148,1.268759,-0.439628,-0.844547,-0.297517,1.167441,-0.479267]])
predictions

* So here we got output as 0: means round_winner is Counter-Terrorist

## Step:10 Conclusion

In this project, our objective was to utilize machine learning algorithms for classifying round winners in Counter-Strike: Global Offensive (CS). Following thorough data preprocessing, feature selection, and model training, we evaluated three classification algorithms: Logistic Regression, Decision Tree Classifier, and Random Forest Classifier.

The evaluation criteria included Accuracy, Precision, Recall, F1-score, and ROC-AUC. Notably, the Random Forest Classifier emerged as the top-performing model, achieving an accuracy of 0.841.

Upon analysis, the Random Forest Classifier demonstrated superior performance across all metrics, showcasing its effectiveness in predicting round winners in CS. With strong precision, recall, F1-score, and ROC-AUC values, it stands out as the most reliable choice for this classification task.

In conclusion, the Random Forest Classifier offers robust performance in leveraging game attributes for classification. Further optimization and fine-tuning of the model could potentially enhance its performance even more.