# Analyzing Motorcycle Power Trends

- Data set selection
- Executive summary
- Loading and file IO
- Preparation and feature engineering
- Analysis and Visualization
- Conclusions
- Appendix and References

## Data set selection

Source:
> This project uses the “Motorcycle Specifications Dataset” dataset from Kaggle
 > (https://www.kaggle.com/datasets/emmanuelfwerr/motorcycle-technical-specifications-19702022)


Fields:
> This dataset includes detailed motorcycle specifications, including:
 - Rating
 - Displacement (cc)
 - Power (hp)
 - Torque (Nm)
 - Engine cylinder
 - Engine stroke
 - Gearbox
 - Bore (mm)
 - Stroke (mm)
 - Fuel capacity (lts)
 - Fuel system
 - Fuel control
 - Cooling system
 - Transmission type
 - Dry weight (kg)
 - Wheelbase (mm)
 - Seat height (mm)
 - Front brakes
 - Rear brakes
 - Front tire
 - Rear tire
 - Front suspension
 - Rear suspension
 - Color options


License:
> This dataset has a CC0: Public Domain license.
 - "The person who associated a work with this deed has dedicated the work to the public domain by waiving all of his or her rights to the work worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law. You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission."

### Data set selection rationale

> This dataset has a wide variety of motorcycle statistics, including many engine specifications.
> This dataset lets me apply hobby grade knowledge to a real world application because of my interest in motorcycles.
> Many questions can be answered using this dataset.
> The dataset is very large, which is important when analyzing performance values.
> The nature of the dataset will make a difficult skill more enjoyable to use. 


### Questions to be answered

> Using statistical analysis and visualization, we can answer the following:

- Is engine torque a direct result of the engine displacement?
 - Value: Helps understand how these two major engine specs are connected.

- Do motorcycles with more cylinders produce more horsepower on average?
 - Value: Shows whether engine configuration impacts power delivery.

- Do motorcycles with higher power output tend to have wider rear tires?
 - Value: Useful for identifying how manufacturers match tire size to performance needs.

- Which engine configurations are the most common in the dataset?
 - Value: Shows which engine types trend the best.

- Is there a correlation between motorcycle weight and engine size?
 - Value: Reveals whether heavier motorcycles typically have larger engines.


### Visualization ideas
> To answer the questions above, I plan to create the following visualizations:

- Scatterplot: Displacement (cc) vs Torque (Nm)
 - Insight: Shows whether engine displacement influences torque and how strong that relationship is.

- Bar Chart: Average Power (hp) grouped by Engine Cylinder count
 - Insight: Reveals how different cylinder configurations (1-cylinder, 2-cylinder, 4-cylinder, etc.) impact horsepower.

- Scatterplot: Power (hp) vs Rear Tire Width
 - Insight: Shows whether motorcycles with higher power output tend to use wider rear tires, which relates to traction and performance.

- Histogram: Distribution of Power (hp)
 - Insight: Helps identify how horsepower is spread across the dataset and where most models fall.

- Scatterplot: Weight (kg) vs Displacement (cc)
 - Insight: Shows whether heavier motorcycles tend to have larger engines.

- Boxplot: Power-to-Weight Ratio (hp per kg) grouped by Cylinder Count
 - Insight: Highlights differences in performance efficiency across engine types.

- Plotly Interactive Chart: Displacement vs Torque with hover details (cylinders, weight, model)
 - Insight: Allows stakeholders to explore individual motorcycle models interactively.

In [None]:
# Imports libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
# Imports the dataset into a Pandas Data frame
df = pd.read_csv('data/motorcycledata.csv')
df.head()
df.shape
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 38472 entries, 0 to 38471
Data columns (total 28 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   Brand                38472 non-null  object 
 1   Model                38444 non-null  object 
 2   Year                 38472 non-null  int64  
 3   Category             38472 non-null  object 
 4   Rating               21788 non-null  float64
 5   Displacement (ccm)   37461 non-null  float64
 6   Power (hp)           26110 non-null  float64
 7   Torque (Nm)          16634 non-null  float64
 8   Engine cylinder      38461 non-null  object 
 9   Engine stroke        38461 non-null  object 
 10  Gearbox              32675 non-null  object 
 11  Bore (mm)            28689 non-null  float64
 12  Stroke (mm)          28689 non-null  object 
 13  Fuel capacity (lts)  31704 non-null  float64
 14  Fuel system          27844 non-null  object 
 15  Fuel control         22008 non-null 

  df = pd.read_csv('data/motorcycledata.csv')
