# Predicting the rank of Starcraft II players

**Ethan Shapiro**

## Summary of Findings


### Introduction
**Our Questions:** Can we predict a Starcraft player's rank based on their past ranked performance? If so, what information is useful and or will be useful to collect for future prediction?

**Type of Prediction Problem:** We're doing a multiclass classification on the player's rank.

**Response Variable:** Finding a solution to these problems I see could help us a few ways:
1. Placing veteran players in the new ranked season
2. Placing new players into their first ranked league
3. Working in conjunction with another model to adjust ranks (ranked reset, rank inflation adjustments, etc.)

**Metric:** I chose accuracy for the model because we want to weigh false positives and false negatives equally in our outcomes.

### Exploratory Data Analysis


### Baseline Model
For our baseline model, we chose to use a **Decision Tree Classifier**.

We included these features:
 - **Officer Age at the incident (mos_age_incident)**
    - Type: Quantitative
    - Encoding: None
 - **Officer Ethnicity (mos_ethnicity)**
    - Type: Categorical
    - Encoding: One Hot Encoding
 - **Officer Gender (mos_gender)**
    - Type: Categorical
    - Encoding: One Hot Encoding
 - **Complainant Ethnicity (complainant_ethnicity)**
    - Type: Categorical
    - Encoding: One Hot Encoding
 - **Complainant Gender (complainant_gender)**
    - Type: Categorical
    - Encoding: One Hot Encoding
 - **Complainant Age (complainant_age_incident)**
    - Type: Quantitative
    - Encoding: None
 - **Type of Complaint (fado_type)**
    - Type: Categorical
    - Encoding: One Hot Encoding
    
We were trying to predict: **Outcome of Complaint (complaint_outcome)**

Our Basic Model's performance was:
 - Test accuracy: ~0.4944
 - Test precision: ~0.4628
 
We don't believe our model is good because it doesn't do significantly better than simply randomly guessing an outcome (which would be ~33%).<br>
There is no point to telling a complainant a possible outcome if we are not even 50% sure of our prediction.

### Final Model
For our final model, we stuck with a **Decision Tree Classifier**.

We feature engineered the following:
 - **Officer is a Minority**
    - *Type:* Categorical/Binary
    - *Encoding:* Returning any ethnicity non-white as a 1 and white as a 0.
    - *Why it's a good fit:* We believe there could be bias in the CCRB decision process based on ethnicity. Therefore, knowing if an officer is a minority might give information about the outcome of the complaint.
 - **Complainant is a Minority**
    - *Type:* Categorical/Binary
    - *Encoding:* Returning any ethnicity non-white as a 1 and white as a 0.
    - *Why it's a good fit:* We believe there could be bias in the CCRB decision process based on ethnicity. Therefore, knowing if a complainant is a minority might give information about the outcome of the complaint.
 - **Officer is a High Rank**
    - *Type:* Categorical/Binary
    - *Encoding:* Returning 1 for Deputy Inspector, Inspector, and Chief/other high ranks. Returning 0 otherwise.
    - *Why it's a good fit:* Higher ranking officers might have more favorable outcomes than lower ranking officers. Therefore, knowing if the rank is a high rank can help us predict the outcome.
 - **General Allegation Type**
    - *Type:* Categorical
    - *Encoding:* We take the original 76 unique allegation types and categorize them into 11 general groups. Then, we were able to One Hot Encode them.
    - *Why it's a good fit:* We believe the severity of the complaint will give information about the outcome of the complaint. I.E. More severe complaint allegations might be more likely to be Substantiated and vice-versa.
 - **General Outcome Type**
    - *Type:* Categorical/Ordinal
    - *Encoding:* We generalized the original outcome types into three major types: Arrest, Summons, or No Arrest. Then, we ordinally encoded it, making Arrest the most severe outcome, followed by Summons, and then No Arrest.
    - *Why it's a good fit:* We believe the outcome of the Officer and Complainant interaction can influence the decision of the CCRB. I.E. If someone is arrested during the interaction, their complaint might be out of spite and not necessarily true.

After creating these features, we ran a **GridSearchCV** with a **Decision Tree**, **5 folds**, and fitted on **75%** of the original data. We ended up with the following best hyperparameters:
 - criterion: entropy
 - max_depth: 12
 - min_samples_split: 15

Our Final Model's performance was:
 - Test accuracy: ~0.5349
 - Test precision: ~0.5408

This means our model improved its accuracy by ~4.05% and its precision by ~7.80%.

### Fairness Analysis

We ran a permutation Tests to test our model's **precision** of Minority vs. Non-Minority Complainants with the following hypotheses:

<b>Null Hypothesis:</b> Our model is fair. Its precision for minorities and non-minorities are roughly the same, and differences are due to random chance.

<b>Alternative Hypothesis:</b> Our model is unfair. Its precision for minorities is better than for minorities than non-minorities.

<b>Evaluation Metric:</b> Precision

<b>Our signifcance value:</b> 0.05

With a p-value = 0.013 < 0.05, we reject the null hypothesis and say that our model is unfair.
Our model has worse precision for complainants that are non-minorities than it does for minorities.