# Borderlands 2 Weapon Rarity Prediction – First Draft

## Introduction

For Project 3, I wanted to focus on data from my favorite game of all time, Borderlands 2. The goal is to predict weapon rarity based on various attributes, such as manufacturer, damage type, and required content. Weapon rarity plays a significant role in the game's progression and overall gameplay experience. By predicting rarity based on these attributes, we can provide insights that could assist in understanding weapon distribution and balance. 

## What is Regression?
Regression is a statistical method used to find relationships between a dependent variable and independent variables. In my project, the dependent variable is weapon rarity, and the independent variables are attributes such as manufacturer, damage type, and required content.

Linear regression finds the best-fit line that minimizes errors between actual and predicted values. The formula is:
Y= β0 + β1X1 + β2X2 + ... + βnXn + ϵ

For my project, the variables in the equation relate to the following:
* Y is the predicted weapon rarity
* X1, X2, ... Xn are the independent variables (e.g., manufacturer, damage type, etc.).
* B0 is the intercept and Bn are the coefficients
* ϵ is the error term

## Experiment 1: Data Understanding
The dataset contains the following columns:

* Required Content: This one seems pretty relevant for predicting weapon rarity. Some weapons might be tied to specific content packs or downloadable content (DLCs), which could impact their rarity. I plan to keep this feature and see how it influences rarity prediction.

* Item Name: Since each item name is unique, it doesn’t really help in predicting weapon rarity. I’ll likely exclude this feature from the model.

* Flavor Text Perk: This one is mostly descriptive, but it could be useful for feature engineering later if it turns out to have any patterns connected to weapon rarity. I’ll keep an eye on it.

* Manufacturer: This is a categorical feature (e.g., Hyperion, Jakobs). It could be useful since some manufacturers might have rarer weapons in the game, so it might affect the rarity prediction. I’ll test it out.

* Damage Type: Another categorical feature (e.g., Fire, Corrosive). While it might not directly determine rarity, it could influence it depending on how weapons of different types are distributed in terms of rarity. This is worth exploring.

* Dropped By: This feature tells us where the weapon drops, but it might not be very useful for predicting rarity. I’ll evaluate it during the analysis phase to see if it’s worth keeping.

* Minimum Task Required: This could provide some valuable insight, especially since higher-difficulty tasks might drop rarer weapons. Depending on how it correlates with rarity, I might include it in the model.

* Location: Not sure if this will be a strong predictor for rarity, but it’s still worth checking if certain locations tend to have rarer weapons. I’ll investigate this to see if it’s useful.

* Rarity: This is the target variable, the one I’m trying to predict. It includes categories like Common, Rare, and Legendary, which are the different levels of rarity we’ll focus on.


In [178]:
import pandas as pd

pd.set_option('display.max_columns', None)
pd.set_option('display.width', 1000)           
pd.set_option('display.expand_frame_repr', False)  
pd.set_option('display.max_colwidth', None)     

file_path = r"C:\Users\19857\Downloads\Borderlands 2 Loot Multi Pages 150520012018.xlsx"
sheet_names = ['Assault Rifles', 'Grenades', 'Pistols', 'Rocket Launchers', 'Shotguns', 'Sniper Rifles', 'SMGs']

weapon_dataframes = []

for sheet_name in sheet_names:
    df = pd.read_excel(file_path, sheet_name=sheet_name, skiprows=1)
    df = df.drop(columns=['Flavor Text Perk'], errors='ignore')
    
    weapon_dataframes.append(df)

combined_df = pd.concat(weapon_dataframes, ignore_index=True)

print("\nDataset Preview:")
print(combined_df.head())



Dataset Preview:
                   Required Content           Item           Name Manufacturer           Damage Type               Dropped By          Minimum Task Required              Location     Rarity
0                     Borderlands 2  Assault Rifle  Hammer Buster       Jakobs                   NaN                  McNally                       The Bane              The Dust  Legendary
1                     Borderlands 2  Assault Rifle     KerBlaster       Torgue             Explosive               Midge-Mong                      Symbiosis  Southern Shelf - Bay  Legendary
2                     Borderlands 2  Assault Rifle       Madhous!       Bandit  Multi (Nonexplosive)                  Mad Dog              Breaking the Bank             Lynchwood  Legendary
3                     Borderlands 2  Assault Rifle           Ogre       Torgue             Explosive             King of Orcs  Magic Slaughter: Badass Round    Murderlin's Temple  Legendary
4  Mr. Torgue's Campaign of Carn

## Experiment 1: Data-Preprocessing: 
Handling Missing Values: We identified and removed any rows containing missing values.

Feature Selection: We selected features that are likely to influence the rarity of the weapon, including Manufacturer, Damage Type, and Required Content.

Feature Encoding: Since the features such as Manufacturer, Damage Type, and Rarity are categorical, we used Label Encoding to convert these into numerical values, which are necessary for machine learning models.

Data Splitting: We split the data into a training set and a testing set to evaluate the model's performance.

In [190]:
import pandas as pd
from sklearn.preprocessing import LabelEncoder

pd.set_option('display.max_columns', None)
pd.set_option('display.width', 1000)
pd.set_option('display.expand_frame_repr', False)
pd.set_option('display.max_colwidth', None)

file_path = r"C:\Users\19857\Downloads\Borderlands 2 Loot Multi Pages 150520012018.xlsx"
sheet_names = ['Assault Rifles', 'Grenades', 'Pistols', 'Rocket Launchers', 'Shotguns', 'Sniper Rifles', 'SMGs']

weapon_dataframes = []

for sheet_name in sheet_names:
    df = pd.read_excel(file_path, sheet_name=sheet_name, skiprows=1)
    df = df.drop(columns=['Flavor Text Perk'], errors='ignore')
    
    weapon_dataframes.append(df)

combined_df = pd.concat(weapon_dataframes, ignore_index=True)

features = ['Required Content', 'Manufacturer', 'Damage Type', 'Dropped By', 'Minimum Task Required', 'Location']
target = 'Rarity'

combined_df = combined_df.dropna(subset=[target])
combined_df = combined_df.fillna(method='ffill')  

label_encoders = {}
for col in features:
    if combined_df[col].dtype == 'object':  
        label_encoder = LabelEncoder()
        combined_df[col] = label_encoder.fit_transform(combined_df[col])
        label_encoders[col] = label_encoder  


print("First few rows of the pre-processed data:")
print(combined_df.head())
print("\nMissing values after pre-processing:")
print(combined_df.isnull().sum())

print("\nUnique values in 'Manufacturer' column after encoding:")
print(combined_df['Manufacturer'].unique())


combined_df.to_csv('preprocessed_data.csv', index=False)
X = combined_df[features]
y = combined_df[target]

X.to_csv('X_features.csv', index=False)
y.to_csv('y_target.csv', index=False)


First few rows of the pre-processed data:
   Required Content           Item           Name  Manufacturer  Damage Type  Dropped By  Minimum Task Required  Location     Rarity
0                 0  Assault Rifle  Hammer Buster             3           10          44                     75        49  Legendary
1                 0  Assault Rifle     KerBlaster             6            2          46                     74        41  Legendary
2                 0  Assault Rifle       Madhous!             0            4          40                     11        27  Legendary
3                 0  Assault Rifle           Ogre             6            2          33                     48        31  Legendary
4                 5  Assault Rifle     Shredifier             7            4          28                     35        64  Legendary

Missing values after pre-processing:
Required Content         0
Item                     0
Name                     0
Manufacturer             0
Damage Type   

  combined_df = combined_df.fillna(method='ffill')  # Forward fill for simplicity


## Experiment 1: Modeling
In this experiment, I’m creating a linear regression model using the features I’ve cleaned and selected. First, I’m converting the "Rarity" column into numeric values using LabelEncoder because it’s a categorical variable. After that, I’ll define the features (independent variables) and the target variable (dependent variable), which in this case is the "Rarity" that has now been encoded into numbers.

Next, I’ll split the data into training and testing sets (80-20 split). Then, I’ll set up the linear regression model, train it using the training data, and test it on the test data.

In [199]:
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score

label_encoder_rarity = LabelEncoder()
combined_df['Rarity'] = label_encoder_rarity.fit_transform(combined_df['Rarity'])

X = combined_df[features]
y = combined_df['Rarity'] 

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

linear_model = LinearRegression(fit_intercept=True, copy_X=True, positive=False)
linear_model.fit(X_train, y_train)
y_pred_linear = linear_model.predict(X_test)

## Experiment 1: Evaluation
For this experiment, we need to evaluate how well our model is performing. A good way to do this is by calculating the Root Mean Squared Error (RMSE), which helps us understand how far off our predictions are from the actual values. RMSE is the square root of the Mean Squared Error (MSE), which we already have from our model output:

In [201]:
mse = mean_squared_error(y_test, y_pred_linear)
r_squared = r2_score(y_test, y_pred_linear)

print(f"Mean Squared Error: {mse}")
print(f"R-squared: {r_squared}")

Mean Squared Error: 1.8685375751079552
R-squared: -0.10346707191414661


# From the output, we can make a few conclusions:

**Mean Squared Error (MSE):**

- The MSE is 1.8685, which shows how far off the model's predictions are from the actual values. It's not a super high value, but without a reference, it's hard to say if it's good or not. Ideally, we want to lower this number.

**R-Squared:**

- The R-squared value is -0.1034, which is pretty low and even negative. This means that the model is not explaining the data well at all. In fact, it's worse than just predicting the average value all the time.

**Conclusion:**
- The negative R-squared tells us that the model is not working well and is making poor predictions.

- The MSE isn’t huge, but since the R-squared is bad, we need to improve the model. It might need better features, a different approach, or more tuning