# Overview
This notebook contains my work for exploring the dyno run data.

## Notes
For future analysis having a wide variety of 'objects' and modifications with a continuous target:
1. If possible, decrease variety of 'objects' for feature engineering and statistical purposes
2. If not possible to decrease variety, group instead on similar value for continuous target
    * Choose the peak in normal distribution; if not normally distributed, choose the highest point in the distribution
3. Create features based on the decreased-variety data
4. Make determinations in new features for all observations
5. Categorize 'objects' into cluster features using keywords
    * EX: A Nissan GT-R often has more than 500 horsepower, make a feature called over_500 which encodes Nissan GT-R as True


# Objectives
From a practical perspective, this project should do two things:
1. Determine which factors increase max horsepower for any given car
2. Predict expected horsepower accurately given some information about the car

Exploration is done for step 1, we will lay out the plan below.

## 1. Determine which factors increase horsepower the most
To determine which parts packages and setups increase horsepower overall, we need to do the following things:
1. Limit cars to one overall make and model with same horsepower number
2. Create new version of car_info df for single make+model+horsepower combination
3. Get the stock performance of the single combination
4. Create features for parts, fuel, and more based on 'specs' values
    * Simplify dyno run data to max horsepower, max torque, and max boost
    * Append run's max horsepower, torque, and boost to car_info dataframe
    * Append stock max horsepower, torque, and boost to car_info dataframe
5. Use correlation heatmap to visually identify drivers

# Imports

In [1]:
import warnings
warnings.filterwarnings('ignore')

import pandas as pd

import wrangle

In [2]:
info, runs = wrangle.prep_explore()
info.shape, runs.shape

((2244, 5), (920014, 5))

In [3]:
# check for most numerous make+model
info.car_model.value_counts().head(25)

2005 Impreza WRX STI    126
2011 Impreza WRX STI    113
2008 Impreza WRX STI    104
2006 Impreza WRX STI    103
2002 Impreza WRX         87
2009 GT-R                84
2008 EVO X GSR           79
2009 Impreza WRX         77
2007 Impreza WRX STI     76
2004 Impreza WRX         72
2004 Impreza WRX STI     67
2011 Impreza WRX         66
2006 Impreza WRX         51
2010 Impreza WRX STI     44
2005 Impreza WRX         40
2003 Impreza WRX         40
2010 Impreza WRX         36
2007 Impreza WRX         32
2013 Impreza WRX STI     29
2012 Impreza WRX STI     28
2013 Impreza WRX         27
2008 Impreza WRX         27
2011 EVO X GSR           26
2010 EVO X GSR           26
2008 EVO X MR            26
Name: car_model, dtype: int64

Based on these numbers, we're going with the **WRX STI** as our single make+model, which has 300 - 305 horsepower stock. We will stick with **305 BHP** as our baseline.

In [4]:
# new datafrmae for the WRX STI
df = info[info.car_model.str.contains('STI')]
df

Unnamed: 0,run,name,specs,car_make,car_model
1583,2002,Geoffrey Rollwitz,"Stage2 93 octane - AEM CAI, Invidia Catless TB...",Subaru,2008 Impreza WRX STI
118,186,Trevor Ott,"20G, Full Race Header, 92 Octane",Subaru,2008 Impreza WRX STI
1744,2198,Brian Baker,20G-XT 20psi 93 Octane Sport#,Subaru,2011 Impreza WRX STI
2922,3667,Andrew Pearson,91OCT 18.5 psi Godspeed catless DP Invidia G20...,Subaru,2004 Impreza WRX STI
3873,4797,Sous Vorana,"COBB Tuning Accessport, 3"" Turboback Exhaust, ...",Subaru,2013 Impreza WRX STI
...,...,...,...,...,...
548,764,David Olson,"COBB Tuning AccessPORT, Turboback Exhaust, COB...",Subaru,2005 Impreza WRX STI
1029,1334,Taison Kane,"COBB Tuning AccessPORT, DOM 3.0 XTR, AVO TMIC,...",Subaru,2008 Impreza WRX STI
1408,1810,Brant Proden,"COBB Tuning AccessPORT, COBB Tuning SF Intake/...",Subaru,2011 Impreza WRX STI
1179,1537,Stephen Duering,"Pump Gas - 17psi, Stock Turbo, COBB DP, Stock ...",Subaru,2010 Impreza WRX STI
