# Exploratory Data Analysis 

In this notebook we will analyse the data available to find evidence to support or refute the claim that the Covid19 pandemic affected production in the automotive industry.

## Data sets used

- [OICA - International Organization of Motor Vehicle Manufacturers](https://www.oica.net/production-statistics/)
  - Initially we looked at th OICA data, however it became clear that the data here would not be detailed enough since only yearly summaries of production are available.


In [38]:
# Imports 
import os
import numpy as np
import pandas as pd
import scipy.stats as st
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots

In [77]:
# Loading data sets 

# Root data dir 
data_dir = os.path.join(os.pardir, 'data')

# Automotive Data
automotive_dir = os.path.join(data_dir, 'automotive')

# OICA Continets data 
oica_continents_pdf = pd.read_csv(os.path.join(automotive_dir, 'oica_continents_17_to_21.csv'))

# OICA Countries data 
oica_countries_pdf = pd.read_csv(os.path.join(automotive_dir, 'oica_countries_17_to_21.csv'))

# FRED DAUPSA
daupsa_pdf = pd.read_csv(os.path.join(automotive_dir, 'FRED_DAUPSA.csv'))

# Covid data 
covid_year_pdf = pd.read_csv(os.path.join(os.pardir, 'data', 'covid', 'covid_year_agg.csv'))
covid_month_pdf = pd.read_csv(os.path.join(os.pardir, 'data', 'covid', 'covid_month_agg.csv'))
covid_confirmed_pdf = pd.read_csv(os.path.join(os.pardir, 'data', 'covid', 'covid_confirmed.csv'))

# US new cases
us_new_cases_pdf = covid_confirmed_pdf[covid_confirmed_pdf['Country_Region']=='US'].sort_values(by=['year', 'month'])
us_new_cases_pdf['New Cases'] = (us_new_cases_pdf['Confirmed'] - us_new_cases_pdf['Confirmed'].shift(1)).fillna(us_new_cases_pdf['Confirmed'])

# China new cases
ch_new_cases_pdf = covid_confirmed_pdf[covid_confirmed_pdf['Country_Region']=='China'].sort_values(by=['year', 'month'])
ch_new_cases_pdf['New Cases'] = (ch_new_cases_pdf['Confirmed'] - ch_new_cases_pdf['Confirmed'].shift(1)).fillna(ch_new_cases_pdf['Confirmed'])



## Investigating global trend for automotive production 

For this we will base out figures off the data scrapped form OICA, taking the sum of the continental production values.

In [40]:
# Formatting data 
global_production_pdf = oica_continents_pdf.melt(id_vars=['country'], var_name='year', value_name='production')\
                                           .groupby('year').sum('production').reset_index()

# Plotting 
fig = px.line(global_production_pdf, y='production', x='year',
              title='Global automotive production')
fig.show()

From the trend we can see that the global production was already falling prior to 2019. 

We observe a sharper decline between 2019 and 2021 from ~67M to less than 40M. This value however should be only considered as an indicator since there are still statistics missing for 2021, notably from countries that only report data once a year.

We now examine the difference in production a bit closer.

In [41]:
global_production_pdf['Variation'] = global_production_pdf['production'] - global_production_pdf['production'].shift(1)
global_production_pdf['percentage_diff'] = round(global_production_pdf['production']*100/global_production_pdf['production'].shift(1)) -100
global_production_pdf

Unnamed: 0,year,production,Variation,percentage_diff
0,2017,72663012.0,,
1,2018,71750946.0,-912066.0,-1.0
2,2019,67163769.0,-4587177.0,-6.0
3,2020,55834455.0,-11329314.0,-17.0
4,2021,39681237.0,-16153218.0,-29.0


We see a cumulative loss in production of 7% between 2017 to 2019.

The cumulative loss between 2019 and 2020 is 46%, more than 6 times as much compared to the previous period, while this is not statistical evidence that the pandemic is directly responsible it is clear to see that an outside factor has cause production to decline.

## USA automotive production

Limited by the available data we will now focus on automotive production in the USA.

In [60]:
# Formatting OICA USA data 2017-2021
usa_oica_pdf =  oica_countries_pdf.melt(id_vars=['country'], var_name='year', value_name='production')
usa_oica_pdf.country = usa_oica_pdf.country.astype(str)
usa_oica_pdf = usa_oica_pdf[usa_oica_pdf['country'] == 'USA']


# Calculating variation and % diff 
usa_oica_pdf['Variation'] = usa_oica_pdf['production'] - usa_oica_pdf['production'].shift(1)
usa_oica_pdf['percentage_diff'] = round(usa_oica_pdf['production']*100/usa_oica_pdf['production'].shift(1)) -100
usa_oica_pdf

# Covid values 
us_covid_data = covid_year_pdf[covid_year_pdf['Country_Region'] == 'US']

# Plotting 
fig = px.line(usa_oica_pdf, y='production', x='year', hover_data=['Variation', 'percentage_diff'],
              title='USA automotive production')
fig.update_traces(mode="text+markers+lines")
#fig.add_bar(x=us_covid_data['year'], y=us_covid_data['Confirmed'] ,)
fig.show()

In [43]:
fig = make_subplots(specs=[[{"secondary_y": True}]])
# Add traces
fig.add_trace(
    go.Scatter(x=us_covid_data['year'], y=us_covid_data['Incident_Rate']),
    secondary_y=True,
)

fig.add_trace(
    go.Scatter(x=usa_oica_pdf['year'], y=usa_oica_pdf['production']),
    secondary_y=False,
)



The OICA data limits us to yearly observations so now we will look at the FRED Domestic Auto Production (DAUPSA). Domestic auto production is defined as all autos assembled in the U.S.

In [80]:
# formatting daupsa data 
daupsa_pdf.DATE = pd.to_datetime(daupsa_pdf.DATE) 
# filtering out values 
daupsa_pdf = daupsa_pdf[daupsa_pdf['DATE']>'2019-11']


# Formatting mothnly covid data 
us_new_cases_pdf['day'] = 1
us_new_cases_pdf['DATE'] = pd.to_datetime(us_new_cases_pdf[['year', 'month', 'day']])

# Plotting 
fig = make_subplots(specs=[[{"secondary_y": True}]])
# Add traces
fig.add_trace(
    go.Scatter(x=us_new_cases_pdf['DATE'], y=us_new_cases_pdf['New Cases'], name='US Covid19 Daily New Cases'),
    secondary_y=True,
)

fig.add_trace(
    go.Scatter(x=daupsa_pdf['DATE'], y=daupsa_pdf['DAUPSA'], name='Auto units produced'),
    secondary_y=False,
)

# Add figure title
fig.update_layout(
    title_text="DAUPSA and Covid Incident Rate"
)

# Set x-axis title
fig.update_xaxes(title_text="Date")

# Set y-axes titles
fig.update_yaxes(title_text="<b>primary</b> Auto production in thousands", secondary_y=False)
fig.update_yaxes(title_text="<b>secondary</b> Covid Incident Rate", secondary_y=True)
fig.add_vrect(x0='2020-03-01', x1='2020-04-07', line_width=0, fillcolor="grey", opacity=0.4,
                annotation_text="lockdown", annotation_position="top left",)
fig.show()
