Skip to content

Performing a multiple linear regression analysis to identify variables in a dataset that predict the mpg of prototype cars

Notifications You must be signed in to change notification settings

BaileeRice/MechaCar_Statistical_Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 

Repository files navigation

MechaCar_Statistical_Analysis

TASKS:

Perform multiple linear regression analysis to identify which variables in the dataset predict the mpg of MechaCar prototypes

Collect summary statistics on the pounds per square inch (PSI) of the suspension coils from the manufacturing lots

Run t-tests to determine if the manufacturing lots are statistically different from the mean population

Design a statistical study to compare vehicle performance of the MechaCar vehicles against vehicles from other manufacturers. For each statistical analysis, you’ll write a summary interpretation of the findings.


Linear Regression to Predict MPG

image

  1. Which variables/coefficients provided a non-random amount of variance to the mpg values in the dataset?

The length of the vehicle and its ground clearance provide a non-random amount of variance to the mpg values based off their p-values.

  • ground clearance: 0 < .05
  • vehicle length: 0 < .05
  1. Is the slope of the linear model considered to be zero?

image

This model has a very low p-value compared to the typical significance value of .05%, therefore the null hypothesis can be rejected and it confirms a non-zero slope.

  1. Does this linear model predict mpg of MechaCar prototypes effectively?

image

Yes and no, the model has a .7149 or 71% prediction efficiency but there is still alot of breathe room there, I suppose its down to how 'effective' we're talking.


Summary Statistics on Suspension Coils

All lots:

image

Individual lots:

image

The design specifications for the MechaCar suspension coils dictate that the variance of the suspension coils must not exceed 100 pounds per square inch. Does the current manufacturing data meet this design specification for all manufacturing lots in total and each lot individually?

Total manufacturing variance rests well in the 100 PSI range at 62 PSI. However, when you take a look at the individual lots you see 'lot 3' sits at a 170 variance. As a whole the summary stats seemed to show a normal PSI range, individually examining the lots proves that not all of them meet the design specifications.


T-Tests on Suspension Coils

image

Based off the p-value, one can assume the all the lots as a whole fall within the normal range. (.60 > .05)

image

Similarly, lot 1 falls into the same category with a p-value of 1. (1 > .05)

image

Lot 2 being the same way with little difference in distribution, its p-value being .6 relative to the .5 we're comparing this to. (.60 > .05)

image

Lot 3 has a p-value lower than our .5 set point, one can conclude that this is abnormal but interestingly, the mean still rests in the 95 percent confidence interval. (.04 < .05)


Study Design: MechaCar vs Competition

Write a short description of a statistical study that can quantify how the MechaCar performs against the competition. In your study design, think critically about what metrics would be of interest to a consumer:

Metrics I'd consider would be horse power and safety ratings. A null hypothesis could be that the mean safety rating relative to horsepower is a star higher in the competitors lineup.

A multiple linear regression can be done to show the relation between the HP and safety ratings across the companies product lines to prove whether they may correlate. A minimum of 30 logs of sample data from each company containing info on cost, horse power, and safety ratings would be needed to run an analysis.

About

Performing a multiple linear regression analysis to identify variables in a dataset that predict the mpg of prototype cars

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published