---
title: "Pony Car Showdown: Mustang vs. Camaro Sales"
author: "Thomas J. Gette"
date: "2023-02-20"
output: html_document
---

## Introduction
The battle between the Mustang and Camaro is an American pastime. The Ford Mustang galloped onto the scene in the famous year of 1964 1/2. This new "pony car" was a wild success, and so Chevrolet introduced the Camaro in 1967 as a way to compete in the marketplace. 

## Purpose
This is a hobbiest journey into the historical sales data. This study will initially focus on consolidated yearly sales figures for the United States (all body styles combined). As more data is obtained, this will expand to include a breakdown of body type, engine type, editions, etc. This notebook aslo serves to make the data available in a manner useful for analysis. And this is also an homage to a dear friend and fellow 'Stang enthusiast.

## Data Mining and Cleaning Process
[Click here for full notes](#)
Data for this was difficult to come by. There are multiple websites that have sales data listed, with discrepencies in them. I have put a request with Ford for more reliable data, but I defaulted to Wikipedia for the initial data because it is public domain, and has the citations to the source of the data, unlike with the websites that don't cite any sources.
[Mustang Wikipedia](https://en.wikipedia.org/wiki/Ford_Mustang#Sales) | [Camaro Wikipedia](https://en.wikipedia.org/wiki/Chevrolet_Camaro#Sales)
Other sites: [Car and Driver](https://www.caranddriver.com/news/a15352949/warning-graphic-content-50-years-of-camaro-vs-mustang-sales-numbers-in-living-color/), [Mustang Specs](https://www.mustangspecs.com/mustang-sales-numbers-by-year/), [Car Figures](https://carfigures.com/us-market-brand/ford/mustang), [Carsalesbase](https://carsalesbase.com/us-ford-mustang/), [CJ Pony Parts](https://www.cjponyparts.com/resources/mustang-sales-throughout-years), 

## Link to Visual Analysis Presented in Google Slides

## Link to Full Notes for Analysis Process


**Here begins the R code for handling the data:**

## Load Libraries, Set Directory

In [1]:
library(tidyverse)  # for wrangling data
library(lubridate)  # to work with datetime values
library(ggplot2)  # for data viz
library(dplyr)  #for using write_csv(), which is faster than write.csv()
setwd("/kaggle/input/mustang-camaro-sales")

── [1mAttaching packages[22m ─────────────────────────────────────── tidyverse 1.3.2 ──
[32m✔[39m [34mggplot2[39m 3.4.0      [32m✔[39m [34mpurrr  [39m 1.0.1 
[32m✔[39m [34mtibble [39m 3.1.8      [32m✔[39m [34mdplyr  [39m 1.0.10
[32m✔[39m [34mtidyr  [39m 1.2.1      [32m✔[39m [34mstringr[39m 1.5.0 
[32m✔[39m [34mreadr  [39m 2.1.3      [32m✔[39m [34mforcats[39m 0.5.2 
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()
Loading required package: timechange


Attaching package: ‘lubridate’


The following objects are masked from ‘package:base’:

    date, intersect, setdiff, union




## Load Datasets from CSV
Data from first production year to 2022

In [2]:
mustang_sales <- read_csv("mustang_sales.csv")
camaro_sales <- read_csv("camaro_sales.csv")

[1mRows: [22m[34m59[39m [1mColumns: [22m[34m5[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m (2): make, model
[32mdbl[39m (3): year, sales, generation

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.
[1mRows: [22m[34m56[39m [1mColumns: [22m[34m5[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m (2): make, model
[32mdbl[39m (3): year, sales, generation

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.


## Combine All Datasets into One Long Dataset

In [3]:
all_sales <- bind_rows(mustang_sales, camaro_sales)

## Inspect New Table

In [4]:
colnames(all_sales)  #List of column names
nrow(all_sales)  #How many rows are in data frame?
dim(all_sales)  #Dimensions of the data frame?
head(all_sales)  #See the first 6 rows of data frame.  Also tail(all_trips)
str(all_sales)  #See list of columns and data types (numeric, character, etc)
summary(all_sales)  #Statistical summary of data.

year,sales,generation,make,model
<dbl>,<dbl>,<dbl>,<chr>,<chr>
1964,121538,1,Ford,Mustang
1965,559451,1,Ford,Mustang
1966,607568,1,Ford,Mustang
1967,472121,1,Ford,Mustang
1968,317404,1,Ford,Mustang
1969,299824,1,Ford,Mustang


spc_tbl_ [115 × 5] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ year      : num [1:115] 1964 1965 1966 1967 1968 ...
 $ sales     : num [1:115] 121538 559451 607568 472121 317404 ...
 $ generation: num [1:115] 1 1 1 1 1 1 1 1 1 1 ...
 $ make      : chr [1:115] "Ford" "Ford" "Ford" "Ford" ...
 $ model     : chr [1:115] "Mustang" "Mustang" "Mustang" "Mustang" ...
 - attr(*, "spec")=
  .. cols(
  ..   year = [32mcol_double()[39m,
  ..   sales = [32mcol_double()[39m,
  ..   generation = [32mcol_double()[39m,
  ..   make = [31mcol_character()[39m,
  ..   model = [31mcol_character()[39m
  .. )
 - attr(*, "problems")=<externalptr> 


      year          sales          generation        make          
 Min.   :1964   Min.   :     0   Min.   :1.000   Length:115        
 1st Qu.:1980   1st Qu.: 71464   1st Qu.:2.000   Class :character  
 Median :1994   Median :122349   Median :3.000   Mode  :character  
 Mean   :1994   Mean   :134620   Mean   :3.509                     
 3rd Qu.:2008   3rd Qu.:169485   3rd Qu.:5.000                     
 Max.   :2022   Max.   :607568   Max.   :6.000                     
                                 NA's   :7                         
    model          
 Length:115        
 Class :character  
 Mode  :character  
                   
                   
                   
                   

## Check Number of Obsvervances in Model

In [5]:
table(all_sales$model)


 Camaro Mustang 
     56      59 

## Inpsect Column Structure

In [6]:
str(all_sales)

spc_tbl_ [115 × 5] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ year      : num [1:115] 1964 1965 1966 1967 1968 ...
 $ sales     : num [1:115] 121538 559451 607568 472121 317404 ...
 $ generation: num [1:115] 1 1 1 1 1 1 1 1 1 1 ...
 $ make      : chr [1:115] "Ford" "Ford" "Ford" "Ford" ...
 $ model     : chr [1:115] "Mustang" "Mustang" "Mustang" "Mustang" ...
 - attr(*, "spec")=
  .. cols(
  ..   year = [32mcol_double()[39m,
  ..   sales = [32mcol_double()[39m,
  ..   generation = [32mcol_double()[39m,
  ..   make = [31mcol_character()[39m,
  ..   model = [31mcol_character()[39m
  .. )
 - attr(*, "problems")=<externalptr> 


## Write Dataset to CSV (disabled)

In [7]:
# write_csv(all_sales, "mustang_camaro_sales.csv")