# Data Cleaning and Feature Engineering

by Mark JP Sanchez

## Table of Contents


1.   Introduction
2.   Clean Up
3.   Feature Engineering
  3.    i. Basic Features
  3.    ii. Elo
  3.    iii. Offensive Power Rating

4.   Creating the Use Case CSV
5.   Final Thoughts

## 1. Introduction

Now that we have all our data from the api, we need to focus on actually cleaning up our data and making some features for our use case. To start off we will be importing our basic packages:

In [3]:
#Set Up
import pandas as pd
import torch

if torch.cuda.is_available:
  print('GPU is available')
else:
  print('GPU is not available')

GPU is available


We are using two main packages for our data clean up and feature engineering. Pandas is the basic Python package used for Dataframes. We will be using this package to read and format our data for the use case.

Pytorch (torch) will be used later on for engineering some features. For features such as Offensive Power Rating (explained later), a lot of matrix operations are required and Pytorch comes with a lot of optimized functions for quick matrix manipulation. Pytorch also allows us to easily use a GPU to make our matrix manipulations even faster. The code was run on Google Colab so that I could make use of their free GPUs.

## 2. Clean Up

To start off with our clean up, we need to actually read in the data we collected. Data collection code can be found in **data_collector.py**.

In [4]:
#Get the three dataframs
team_df = pd.read_csv('raw_frc_teams.csv')
match_df = pd.read_csv('raw_frc_matches.csv')
awards_df = pd.read_csv('raw_frc_awards.csv')

Now that we have our data, we can see that we have a lot of raw match data. While it might be useful to use all of the data we have at our disposal, it might be actually more detrimental than useful. One reason that using all our the data might be detrimental is the ever-changing nature of FIRST Robotics competitions. In the beginning, matches were played with two teams facing off two other teams. Modern day matches consists of a 3v3 match. Also, some of our features we are going to engineer requires us "simulate" the matches in chronological order and that is not possible for data that is missing the time component. Those that do have a time value, have their time value expressed in milliseconds which we will need to format for easier use. Those are just some of the few things we will need to remove and fix to clean up our data. The code below contains all the changes we did.

In [5]:
#Removal of rows and columns

#Remove unwanted columns
match_df = match_df.drop(columns=['Unnamed: 0', 'actual_time', 'predicted_time'])

#Remove rows that have matches that don't contain 3 teams per alliance
for i in range(3):
    match_df = match_df[match_df['red_' + str(i)].notna()]
    match_df = match_df[match_df['blue_' + str(i)].notna()]
    
#Remove rows that have no time or negative time
match_df = match_df[match_df['time'].notna()]
match_df = match_df.drop(match_df[match_df.time < 0].index)

#Remove rows that have no winners
match_df = match_df[match_df['winning_alliance'].notna()]

#Remove teams that are not found in team df
frc_team_keys = []
for index, row in team_df.iterrows():
    frc_team_keys.append(row['key'])
    
for i in range(3):
    match_df = match_df[match_df['red_' + str(i)].isin(frc_team_keys)]
    match_df = match_df[match_df['blue_' + str(i)].isin(frc_team_keys)]

#Change unix time to datetime
match_df['time'] = pd.to_datetime(match_df['time'], unit='s')
#Sort by time
match_df.sort_values(by='time', inplace=True)

#Get earliest years for awards and matches
earliest_award_year, earliest_match_year = 1992, 2016

#Add a year column since every year is a new game
match_df['year'] = match_df['time'].apply(lambda x : x.year)
#Data before 2014 is scarce
match_df = match_df[match_df['year'] >= earliest_match_year]

#Drop all nan
match_df = match_df.dropna()

#Reset the index
match_df = match_df.reset_index(drop=True)

## 3. Feature Engineering

What is feature engineering and why do we do it?

Ultimately with this data, our goal is to create a prediction model to predict the winner of a given FIRST Robotics match. To create these models we need to feed it data which our model will base their predictions on. Our models will look at the different variables/features found within the data we feed it so we need to make sure that the features found within the data makes sense and help the model with its predictions. Sometimes it is useful to present data in a different way to help our models.

Be aware that not all features will be used for our models. We will explore the features more in depth in **FRC Match Simulator.ipynb** and see how they relate to each other. We might find out that some of our features are redundant or might actually make our models worse. We are only here to think up potentially useful features that covers all bases.

### i. Basic Features

#### Games Played and Won

We will be counting the average number of games/matches played and won by each alliance in total and per season. We will get this feature by keeping track of how many games each team has played and won. We will then find the average number of games the teams in the alliance has won/played seperately and we will then average those results together. To help our model compare the average games played and won by each alliance we will create a variable that displays the difference between the averages of the red alliance and the blue alliance. Specifically for games won, we will calculate the average winrate of each alliance by find the difference between the averages. It is possible that an old team has a track record of losing a bunch of their matches so we need a feature that might reflect this.

#### Awards

FIRST Robotics hands out a lot of awards during their competitions for a multitude of reasons. Some awards pertain to the matches the teams played while others pertain to things such as safety and engineering capabilities of the team. Since there are different types of awards we will split them into two categories: Match Awards which pertain to awards that are given to those that do well in matches and Other Awards which are all the other awards. We will find the average number of awards each alliance has and we will get the difference between the averages.

#### Age

This is mostly self-explanatory. We will find the difference between the average age of the blue alliance and the red alliance. Intuitively, older teams will know how to build better robots and older teams will probably have more sponsors since they had more time to find them.

#### Points

Same as the features above, we will be finding the average number of points each alliance gets per season. We will not however be finding the difference between the averages. Since every season is a different game, the average point values for each game differs as well. Some games see an average for 150 points while others see an average of 2000. So instead of finding the difference between the averages we will divide the average of points scored of the red alliance by the average points scored of the blue alliance: $red\_points\_ratio =\frac {red\ points} {blue\ points}$. In cases where not all member of an alliance have played at least 1 game, we will say the ratio between the two teams is 1 so that the models believes two teams are equal if they have very little data on them. This way if the red alliance scores double the amount of points the blue alliance scores, we have a metric that reflects this no matter what the season

### ii. ELO

ELO is a rating system that has been used in many games and originated from the game of chess. The elo system works by giving every person a number/elo value which designates their rank or their skill value. Winning against someone with a higher elo value than you increases your elo value by a large margin and decreases their elo value by a large margin. Winning against someone with a lower elo value than you increase your elo value by a small margin and decreases their elo value by a small margin. To create an elo system works by arbitrarily choosing a number to be your baseline rank which you will give to every new team. Our baseline rank will be the standard number of 1500. The elo system also has a formula which predicts the likely hood of a person winning a match and that is the value we will be using for our models.

The reason why we are using the elo ranking system is because it assigns a sort of skill value for each team based on their wins and losses. Good new teams can be recognized by the models when they start winning against teams with high elo values. The elo system also gives more context to each win and loss a team receives. A team that got lucky and only competes against low elo teams might have the same winrate as a team that competes against a lot of high elo teams. The elo system ensures that the models don't see these two teams to be equal in skill.

Formula for elo value change (*elo_result: 1 if player won and 0 if player lost*): $elo = old\_elo + 10*(elo\_result - elo\_prediction)$ 

Formula for elo prediction: $prediction = \frac {1} {1 + 10^{(red\_elo - blue\_elo)/400}}$

For more information on the elo ranking system check here: https://en.wikipedia.org/wiki/Elo_rating_system

*Note: The code below engineers all of the above features.*

In [6]:
#Format proper data and extract new data 

#Change winning alliance to red win
match_df['red_won'] = 0
match_df.loc[match_df['winning_alliance'] == 'red', 'red_won'] = 1

#Get Average Data
team_data = {}

#Adds a team to team data and initializes its default values
def initialize_team(team_key):
    team_data[team_key] = {}
    
    #Initialize awards
    team_data[team_key]['match_awards'] = 0
    team_data[team_key]['other_awards'] = 0
    
    #Initialize games
    team_data[team_key]['games_played'] = 0
    team_data[team_key]['games_won'] = 0
    
    team_data[team_key]['games_played_season'] = 0
    team_data[team_key]['games_won_season'] = 0
    team_data[team_key]['points_season'] = 0
    
    team_data[team_key]['elo'] = 1500
    
    team_data[team_key]['rookie_year'] = team_df[team_df['key'] == team_key]['rookie_year'].item()
    
#Adds awards to team from a certain event
def add_event_awards(event_key):
    if event_key is None:
        return
    
    event_awards_df = awards_df[awards_df['event_key'] == event_key]
    
    for index, row in event_awards_df.iterrows():
        team_key = row['recipient']
        
        if team_key not in team_data:
            initialize_team(team_key)
            
        if row['award_type'] in match_award_types:
            team_data[team_key]['match_awards'] += 1
        else:
            team_data[team_key]['other_awards'] += 1
        

#Get initial award data
#Splitting the awards into match awards and other awards
#Award types can be found here: https://github.com/the-blue-alliance/the-blue-alliance/blob/master/consts/award_type.py#L15
match_award_types = [0, 1, 2, 10, 14]
    
print('Initializing initial awards...')
for year in range(earliest_award_year, earliest_match_year):
    print('Formatting awards for the year ' + str(year))
    year_award_df = awards_df[awards_df['year'] == year]
    
    for index, row in year_award_df.iterrows():
        team_key = row['recipient']
        
        if team_key not in team_data:
            initialize_team(team_key)
            
        if row['award_type'] in match_award_types:
            team_data[team_key]['match_awards'] += 1
        else:
            team_data[team_key]['other_awards'] += 1
            
#Stores number of matches for giving awards from events
print("Recording number of matches per event")
num_of_matches_per_event = {}
event_groupby = match_df.groupby('event_key')
for event_key, group in event_groupby:
    num_of_matches_per_event[event_key] = len(match_df[match_df.event_key == event_key])
            
#Adding team data to matches dataframe
print('Adding team data to matches...')
match_df['red_avg_match_awards'] = 0
match_df['red_avg_other_awards'] = 0

match_df['blue_avg_match_awards'] = 0
match_df['blue_avg_other_awards'] = 0

match_df['red_avg_winrate'] = 0
match_df['red_avg_games_played'] = 0

match_df['blue_avg_winrate'] = 0
match_df['blue_avg_games_played'] = 0

match_df['red_avg_age'] = 0
match_df['blue_avg_age'] = 0

match_df['red_points_ratio_season'] = 0
match_df['red_points_avg'] = 0
match_df['red_avg_winrate_season'] = 0
match_df['red_avg_games_played_season'] = 0
match_df['blue_points_avg'] = 0
match_df['blue_avg_winrate_season'] = 0
match_df['blue_avg_games_played_season'] = 0

match_df['red_elo'] = 0
match_df['blue_elo'] = 0
match_df['elo_prediction'] = 0

#Functions for formatting data
def get_avg_awards(alliance):
    total_match_awards = 0.0
    total_other_awards = 0.0
    
    for team_key in alliance:
        total_match_awards += team_data[team_key]['match_awards']
        total_other_awards += team_data[team_key]['other_awards']
        
    return total_match_awards / len(alliance), total_other_awards / len(alliance)

def get_avg_games(alliance, boundary=''):
    total_games_played = 0
    total_wins = 0
    
    for team_key in alliance:
        total_games_played += team_data[team_key]['games_played' + boundary]
        total_wins += team_data[team_key]['games_won' + boundary]
        
    winrate = 0.5 if total_games_played == 0 else total_wins / total_games_played
        
    return total_games_played / len(alliance), winrate

def get_avg_age(alliance, current_year):
    total_age = 0
    
    for team_key in alliance:
        total_age += current_year - team_data[team_key]['rookie_year']
        
    return total_age / len(alliance)

def get_red_points_ratio(red_avg, blue_avg):
    
    if red_avg == -1 or blue_avg == -1:
        return 1
    
    return red_avg / blue_avg

def get_points_avg(alliance, boundary='_season'):
    total_avgs = 0
    
    for team_key in alliance:
        if team_data[team_key]['games_played' + boundary] == 0:
            return -1
        
        total_avgs += (team_data[team_key]['points' + boundary] / team_data[team_key]['games_played' + boundary])
        
    return total_avgs / len(alliance)

def get_elo_average(alliance):
    total_elo = 0
    
    for team_key in alliance:
        total_elo += team_data[team_key]['elo']
        
    return total_elo / len(alliance)

def get_elo_prediction(red_alliance_elo, blue_alliance_elo):
    elo_delta = blue_alliance_elo - red_alliance_elo
    
    return 1 / (1 + (10**(elo_delta / 400)))

def change_elo(alliance, prediction, result):
    for team_key in alliance:
        team_data[team_key]['elo'] += 10 * (result - prediction)

def reset_season():
    
    for team_key in team_data:
        team_data[team_key]['games_played'] += team_data[team_key]['games_played_season']
        team_data[team_key]['games_won'] += team_data[team_key]['games_won_season']
        
        team_data[team_key]['games_played_season'] = 0
        team_data[team_key]['games_won_season'] = 0
        team_data[team_key]['points_season'] = 0

#Set up variables for iteration through the matches
curr_year = None
len_match = len(match_df)
match_df = match_df.reset_index(drop=True)
#Iterate/simulate each match
for index, row in match_df.iterrows():
    
    print('Engineering features for match ' + str(index) + "/" + str(len_match))
    
    if row['year'] != curr_year:
        curr_year = row['year']
        reset_season()
        
    #Get the red and blue alliance
    red_alliance = [row['red_0'], row['red_1'], row['red_2']]
    blue_alliance = [row['blue_0'], row['blue_1'], row['blue_2']]
    
    #Initialize the teams
    for i in range(3):
        if red_alliance[i] not in team_data:
            initialize_team(red_alliance[i])
        if blue_alliance[i] not in team_data:
            initialize_team(blue_alliance[i])
        
    #Add data to the datafram
    row['red_avg_match_awards'], row['red_avg_other_awards'] = get_avg_awards(red_alliance)
    row['blue_avg_match_awards'], row['blue_avg_other_awards'] = get_avg_awards(blue_alliance)
    
    row['red_avg_games_played'], row['red_avg_winrate'] = get_avg_games(red_alliance)
    row['blue_avg_games_played'], row['blue_avg_winrate'] = get_avg_games(blue_alliance)
    
    row['red_avg_games_played_season'], row['red_avg_winrate_season'] = get_avg_games(red_alliance, boundary='_season')
    row['blue_avg_games_played_season'], row['blue_avg_winrate_season'] = get_avg_games(blue_alliance, boundary='_season')
    
    row['red_points_avg'] = get_points_avg(red_alliance)
    row['blue_points_avg'] = get_points_avg(blue_alliance)
    row['red_points_ratio_season'] = get_red_points_ratio(row['red_points_avg'], row['blue_points_avg'])
    
    row['red_avg_age'] = get_avg_age(red_alliance, row['time'].year)
    row['blue_avg_age'] = get_avg_age(blue_alliance, row['time'].year)
    
    row['red_elo'] = get_elo_average(red_alliance)
    row['blue_elo'] = get_elo_average(blue_alliance)
    
    red_pred = get_elo_prediction(row['red_elo'], row['blue_elo'])
    blue_pred = get_elo_prediction(row['blue_elo'], row['red_elo'])
    
    row['elo_prediction'] = red_pred
    
    #Add match stats to the teams
    if row['red_won'] == 1:
        for team_key in red_alliance:
            team_data[team_key]['games_won_season'] += 1
            
        change_elo(red_alliance, red_pred, 1)
        change_elo(blue_alliance, blue_pred, 0)
    else:
        for team_key in blue_alliance:
            team_data[team_key]['games_won_season'] += 1
            
        change_elo(red_alliance, red_pred, 0)
        change_elo(blue_alliance, blue_pred, 1)
            
    for team_key in (blue_alliance + red_alliance):
        team_data[team_key]['games_played_season'] += 1
       
    for team_key in red_alliance:
        team_data[team_key]['points_season'] += row['red_score']
        
    for team_key in blue_alliance:
        team_data[team_key]['points_season'] += row['red_score']
        
    #Add new event awards if previous event has passed
    num_of_matches_per_event[row['event_key']] -= 1
    if num_of_matches_per_event[row['event_key']] <= 0:
        add_event_awards(row['event_key'])
        # reset_match(row['event_key'])
    
    match_df.loc[index] = row
    
#Get differences for the features that require differences
match_df['avg_match_awards_diff'] = match_df['red_avg_match_awards'] - match_df['blue_avg_match_awards']
match_df['avg_other_awards_diff'] = match_df['red_avg_other_awards'] - match_df['blue_avg_other_awards']
match_df['avg_winrate_diff'] = match_df['red_avg_winrate'] - match_df['blue_avg_winrate']
match_df['avg_games_played_diff'] = match_df['red_avg_games_played'] - match_df['blue_avg_games_played']
match_df['avg_age_diff'] = match_df['red_avg_age'] - match_df['blue_avg_age']
match_df['avg_winrate_diff_season'] = match_df['red_avg_winrate_season'] - match_df['blue_avg_winrate_season']
match_df['avg_games_played_diff_season'] = match_df['red_avg_games_played_season'] - match_df['blue_avg_games_played_season']

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
Engineering features for match 67792/72791
Engineering features for match 67793/72791
Engineering features for match 67794/72791
Engineering features for match 67795/72791
Engineering features for match 67796/72791
Engineering features for match 67797/72791
Engineering features for match 67798/72791
Engineering features for match 67799/72791
Engineering features for match 67800/72791
Engineering features for match 67801/72791
Engineering features for match 67802/72791
Engineering features for match 67803/72791
Engineering features for match 67804/72791
Engineering features for match 67805/72791
Engineering features for match 67806/72791
Engineering features for match 67807/72791
Engineering features for match 67808/72791
Engineering features for match 67809/72791
Engineering features for match 67810/72791
Engineering features for match 67811/72791
Engineering features for match 67812/72791
Engineering features for match 6

### iii. Offensive Power Rating (OPR)

The Offensive Power Rating (OPR) system has been used by a couple of FIRST Robotics team as a way to gauge a team's skill. Each team is assigned an OPR value which is calculated from their previous matches. The OPR of a team is the predicted number of points they are going to contribute in a match. So the sum of all the OPR values in an alliance is the predicted points they will earn for that match. Here is how we find a team's OPR:

Assume that instead of 3 teams per alliance there are 2 teams per alliance. Also assume that every team will contribute the same number of points per match. Let **A**, **B**, and **C** be the number of points team A, B and C will contribute, respectively. Let's say team A and B play a match together where they score 40 points. So the equation for that match would be: $A + B = 40$. A few more matches occur and these are the equations that pertain to all the matches.

$A+B=40$

$B+C=35$

$A+C=20$

As we can see, the matches are just a system of linear equation and so we can solve for **A**, **B** and **C**. Well A, B and C are the OPR values for Team A, B and C. Since it is just a system of linear equations we can easily turn it into an $Ax=B$ matrix equation which we can use Pytorch to solve.

There are caveats to this system. For one, if we don't have enough matches we are unable to get a unique solution. So in that case we can not find the OPR values. It is also possible that we find ourselves with too many matches. In that case we use the least squares approximation.

The OPR system gives us another way to look at the points each team may contribute. A bad team might have a lot of matches where their alliance scores a lot of points, but that is only because they are partnered up with good alliance members. The OPR system attempts to downplay their contribution if everyone on their team are known to score a lot of points on average. Also by the nature of linear systems of equations, it is possible for a team's OPR to be negative which represents the fact that a team might actually be a detriment to their alliance.

For our model we will need a way to compare the alliance's OPRs. We can not take the difference since, as stated previously, each season comes with a different game which implies different average point values. Since the OPR of an alliance is the predicted number of points they will receive, the average OPR will change per season. To create a variable that compares the two OPRs we will add a 100,000 to each OPR and then divide the red OPR by the blue OPR. We are doing this for the same reason we took the ratio of average points. We offset the OPR by 100,000 because OPRs can be negative so a straight division between the two OPRs might not be useful. No season has ever reached a max points cap of 100,000 so we can ensure ourselves that we are not going to be dividing any negative numbers.

To learn more about the Offensive Power Rating system look here: https://blog.thebluealliance.com/2017/10/05/the-math-behind-opr-an-introduction/

#### Competition OPR

In [7]:
#Calculate Competition OPR
match_df['red_opr'] = 0
match_df['blue_opr'] = 0
match_df['opr_ratio'] = 1
opr_offset = 100000

# Used to make opr calculation easier
class MatchOPRCounter:
    
    def __init__(self, event_key):
        self.event_key = event_key
        #2D matrix of all the matches
        #The A in Ax=B
        self._match_matrix = []
        #The B in Ax=B
        self._match_score_vector = []
        #Matches team key to position on index
        self._team_positions = {}
        #Current OPRs for teams
        self._team_oprs = {}
        
    def get_opr(self, team_key):
        return self._team_oprs.get(team_key, None)

    def get_alliance_opr(self, alliance):
        opr = 0
        for alliance_member in alliance:
          if alliance_member in self._team_oprs:
            opr += self.get_opr(alliance_member)
          else:
            return None

        return opr
    
    def record_half_match(self, alliance, score):
        for alliance_member in alliance:
            if alliance_member not in self._team_positions:
                self._add_team(alliance_member)
                
        opr_match_info = [0] * len(self._team_positions)
        for alliance_member in alliance:
            opr_match_info[self._team_positions[alliance_member]] = 1
        
        self._match_matrix.append(opr_match_info)
        self._match_score_vector.append(score)
    
    def _add_team(self, team_key):
        self._team_positions[team_key] = len(self._team_positions)
        
        for opr_match_info in self._match_matrix:
            opr_match_info.append(0)
    
    def update_oprs(self):
        match_matrix = torch.tensor(self._match_matrix, device="cuda", dtype=torch.float64)
        score_vector = torch.tensor(self._match_score_vector, device="cuda", dtype=torch.float64)
        
        #For the least squares solution
        normal_matrix = match_matrix.t().mm(match_matrix)
        normal_score = match_matrix.t().mv(score_vector)

        #Checks if it is invertible
        if -0.001 < normal_matrix.det() < 0.001:
          return

        try:
          raw_oprs, LU = normal_score.view(normal_score.shape[0], 1).solve(normal_matrix)
        except:
          return

        for team_key, index in self._team_positions.items():
          self._team_oprs[team_key] = int(raw_oprs[index])

#Simulates matches to calculate opr
match_oprs = {}
for index, row in match_df.iterrows():
    
    print('Calculating OPR for match ' + str(index + 1) + "/" + str(len_match))
    
    event_key = row['event_key']
    
    if event_key not in match_oprs:
        match_oprs[event_key] = MatchOPRCounter(event_key)
        
    opr_counter = match_oprs[event_key]
    
    #Get the red and blue alliance
    red_alliance = [row['red_0'], row['red_1'], row['red_2']]
    blue_alliance = [row['blue_0'], row['blue_1'], row['blue_2']]

    red_opr = opr_counter.get_alliance_opr(red_alliance)
    blue_opr = opr_counter.get_alliance_opr(blue_alliance)

    row['blue_opr'] = blue_opr
    row['red_opr'] = red_opr

    if blue_opr is not None and red_opr is not None:
      row['opr_ratio'] = (red_opr + opr_offset) / (blue_opr + opr_offset)
    
    opr_counter.record_half_match(red_alliance, row['red_score'])
    opr_counter.record_half_match(blue_alliance, row['blue_score'])
    
    opr_counter.update_oprs()

    match_df.loc[index] = row

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
Calculating OPR for match 67793/72791
Calculating OPR for match 67794/72791
Calculating OPR for match 67795/72791
Calculating OPR for match 67796/72791
Calculating OPR for match 67797/72791
Calculating OPR for match 67798/72791
Calculating OPR for match 67799/72791
Calculating OPR for match 67800/72791
Calculating OPR for match 67801/72791
Calculating OPR for match 67802/72791
Calculating OPR for match 67803/72791
Calculating OPR for match 67804/72791
Calculating OPR for match 67805/72791
Calculating OPR for match 67806/72791
Calculating OPR for match 67807/72791
Calculating OPR for match 67808/72791
Calculating OPR for match 67809/72791
Calculating OPR for match 67810/72791
Calculating OPR for match 67811/72791
Calculating OPR for match 67812/72791
Calculating OPR for match 67813/72791
Calculating OPR for match 67814/72791
Calculating OPR for match 67815/72791
Calculating OPR for match 67816/72791
Calculating OPR for mat

## 4. Creating the Use Case CSV

Now that we have engineered all of our features it is time to save it to a csv so that we can use it in **FRC Match Simulator.ipynb**. There we will look more into the features we just engineered and we will construct our models.

In [8]:
#Get the use case dataframe and create a csv
use_case_df = match_df
use_case_df.to_csv('frc_use_case.csv')

## 5. Final Thoughts

If you have made it this far then I thank you for taking the time to read the whole thing. Due to the nature of FIRST Robotics we had to make a lot of assumptions with our features as well as construct them in interesting ways. If you want to know more about FIRST Robotics and why a model can be useful for teams as well as the result of our feature engineering I would encourage you to checkout the **FRC Match Simulator.ipynb** file which should be in the same repo as this one.

In [11]:
print(match_df)

       red_score    red_0    red_1  ... red_opr  blue_opr opr_ratio
0             66   frc157   frc166  ...     NaN       NaN   1.00000
1             50  frc1519    frc58  ...     NaN       NaN   1.00000
2             62    frc78   frc319  ...     NaN       NaN   1.00000
3             66   frc811   frc509  ...     NaN       NaN   1.00000
4             24   frc157  frc1735  ...     NaN       NaN   1.00000
...          ...      ...      ...  ...     ...       ...       ...
72786         88  frc6025  frc6417  ...   122.0     133.0   0.99989
72787        151  frc6989  frc7296  ...   136.0     133.0   1.00003
72788         42  frc6989  frc7296  ...   136.0     131.0   1.00005
72789        143  frc6989  frc7296  ...   163.0     131.0   1.00032
72790         45  frc6989  frc7296  ...   158.0     129.0   1.00029

[72791 rows x 47 columns]
