# Grid.gg Esports Match Prediction Pipeline
This notebook implements a machine learning pipeline for predicting esports match outcomes using the Grid.gg API data. The pipeline includes data cleaning, data preprocessing, feature engineering, model training, and evaluation components.

## Pipeline Structure
The pipeline is organized into the following main sections:
1. Data Collection and Cleaning
    - Reading in combined_player_stats_20241117_1343.csv
    - Handling missing values and outliers

2. Feature Engineering

3. Model Development and Evaluation
Three models will be developed and compared:

 - Neural Network: Designed for complex pattern recognition in player performance
 - XGBoost: Proven effectiveness in similar prediction tasks historically with our data and within Shaynes models
 - [Additional Model TBD]: To be selected based on feature characteristics and data structure

4. Model Optimization Possibilities 
    - Hyperparameter tuning
    - Feature importance analysis
    - Performance validation
    - Cross-validation strategies

## Data Cleaning and Preprocessing

### Data Loading and Initial Assessment
- Loading data from 'combined_player_stats_20241117_1343.csv'
- Initial examination of data structure and completeness
- Documentation of current data shape and basic statistics

### Data Quality Issues to Address

1. Known Invalid Records
    - Identification and removal of ~55 batches of players with all zero statistics
    - Documentation of removed records for future data collection improvement

2. Partial Zero Statistics
    - Analysis of players with some zero statistics but otherwise valid data
    - Determination of valid zero values vs. missing/error data
    - Strategy for handling partially complete player records

3. Team Information Linking
    - Assessment of available team information
    - Preparation for linking player statistics with match results
    - Identification of any missing team affiliations

In [None]:
# Imports
import pandas as pd
import numpy as np
import os
import matplotlib.pyplot as plt
import seaborn as sns