Scenario

In this project, I will be creating plots which answer questions for analysing "historical_automobile_sales" data to understand the historical trends in automobile sales during recession periods.
recession period 1 - year 1980
recession period 2 - year 1981 to 1982
recession period 3 - year 1991
recession period 4 - year 2000 to 2001
recession period 5 - year end 2007 to mid 2009
recession period 6 - year 2020 -Sep to Dec (Covid-19 Impact)

Data Description

The dataset used for this visualization assignment contains historical_automobile_sales data representing automobile sales and related variables during recession and non-recession period.

The dataset includes the following variables:
1. Date: The date of the observation.
2. Recession: A binary variable indicating recession perion; 1 means it was recession, 0 means it was normal.
3. Automobile_Sales: The number of vehicles sold during the period.
4. GDP: The per capita GDP value in USD.
5. Unemployment_Rate: The monthly unemployment rate.
6. Consumer_Confidence: A synthetic index representing consumer confidence, which can impact consumer spending and automobile purchases.
7. Seasonality_Weight: The weight representing the seasonality effect on automobile sales during the period.
8. Price: The average vehicle price during the period.
9. Advertising_Expenditure: The advertising expenditure of the company.
10.Vehicle_Type: The type of vehicles sold; Supperminicar, Smallfamiliycar, Mediumfamilycar, Executivecar, Sports.
11.Competition: The measure of competition in the market, such as the number of competitors or market share of major manufacturers.
12.Month: Month of the observation extracted from Date..
13.Year: Year of the observation extracted from Date.
By examining various factors mentioned above from the dataset, I aim to gain insights into how recessions impacted automobile sales for the company.

In [None]:
%pip install seaborn

In [None]:
%pip install folium

In [None]:
import numpy as np  # Importing necessary libraries

In [None]:
import pandas as pd  # Importing necessary libraries

In [None]:
%matplotlib inline

In [None]:
import matplotlib as mpl  # Importing necessary libraries

In [None]:
import matplotlib.pyplot as plt  # Importing necessary libraries

In [None]:
import seaborn as sns  # Importing necessary libraries

In [None]:
import folium  # Importing necessary libraries

In [None]:
import requests  # Importing necessary libraries

In [None]:
import io  # Importing necessary libraries

In [None]:
print('import done')

In [None]:
URL = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DV0101EN-SkillsNetwork/Data%20Files/historical_automobile_sales.csv"

In [None]:
response = requests.get(URL)

response.raise_for_status()  # Raises an error if the request failed

In [None]:
# Read the CSV data into a pandas DataFrame

In [None]:
df = pd.read_csv(io.StringIO(response.text))  # Reading data from CSV file

In [None]:
print('Data downloaded and read into a dataframe!')

In [None]:
df.head()  # Displaying the first few rows of the dataset

In [None]:
df.columns()

Develop a Line chart using the functionality of pandas to show how automobile sales fluctuate from year to year

In [None]:
yearly_sales = df.groupby('Year')['Automobile_Sales'].mean()

In [None]:
plt.figure(figsize=(10,6))

yearly_sales.plot()

In [None]:
plt.title('Yearly Sales')

In [None]:
plt.xlabel('Year')

In [None]:
plt.ylabel('Sales')

In [None]:
plt.show()

In [None]:
print('done')

ticks on x- axis with all the years, to identify the years of recession

In [None]:
plt.figure(figsize=(10,6))

yearly_sales.plot()

In [None]:
plt.xticks(list(range(1980, 2024)), rotation=75)

In [None]:
plt.xlabel('Year')

In [None]:
plt.ylabel('Total Automobile Sales')

In [None]:
plt.title('Automobile Sales during Recession')

In [None]:
plt.text(1982, yearly_sales.get(1982, 0) + 100, '1981-82 Recession')

In [None]:
plt.text(2009, yearly_sales.get(2009, 0) + 100, '2008-09 Recession')

In [None]:
plt.legend(['Total Sales'])

In [None]:
plt.show()

Plot different lines for categories of vehicle type and analyse the trend to answer the question Is there a noticeable difference in sales trends between different vehicle types during recession periods?

In [None]:
df_rec = df[df['Recession'] == 1]

In [None]:
df_Mline = df_rec.groupby(['Year', 'Vehicle_Type'], as_index=False)['Automobile_Sales'].mean()

In [None]:
df_Mline['Normalized_Sales'] = df_Mline.groupby('Vehicle_Type')['Automobile_Sales'].transform(lambda x: x / x.mean())

In [None]:
df_Mline.set_index('Year', inplace=True)

In [None]:
plt.figure(figsize=(12, 8))

In [None]:
vehicle_colors = {

'Mediumfamilycar': 'blue',

'Smallfamiliycar': 'orange',

'Supperminicar': 'red',

'Sports': 'yellow',

'Executivecar': 'black',

}

In [None]:
for vehicle_type in df_Mline['Vehicle_Type'].unique():

In [None]:
data = df_Mline[df_Mline['Vehicle_Type'] == vehicle_type]

In [None]:
color = vehicle_colors.get(vehicle_type, 'black')

In [None]:
plt.plot(data.index, data['Normalized_Sales'], label=vehicle_type, marker='o',color=color)

In [None]:
recession_years = df_rec['Year'].unique()

for year in recession_years:

In [None]:
plt.axvline(x=year, color='gray', linestyle='--', alpha=0.5)

In [None]:
plt.legend(title="Vehicle Type", bbox_to_anchor=(1.05, 1), loc='upper left')

In [None]:
plt.ylabel("Normalized Sales")

In [None]:
plt.xlabel("Year")

In [None]:
plt.title("Normalized Automobile Sales by Vehicle Type During Recession")

In [None]:
plt.tight_layout()

In [None]:
plt.show()

Sports cars demonstrate growth over the recession periods. During recession period, supermini cars show downward trend. Medium and Small family cars show less consistent trends, going up and down during recession period. Executive cars were not sold during recession except for 1991. Chatgpt, please make a conclusion and write an insight you get from this chart.

Use the functionality of Seaborn Library to create a visualization to compare the sales trend per vehicle type for a recession period with a non-recession period.

Use the functionality of Seaborn Library to create a visualization to compare the sales trend per vehicle type for a recession period with a non-recession period.

Now compare the sales of different vehicle types during a recession and a non-recession period

In [None]:
dd = df.groupby(['Recession', 'Vehicle_Type'])['Automobile_Sales'].mean().reset_index()

In [None]:
plt.figure(figsize=(12, 8))

In [None]:
sns.barplot(x='Recession', y='Automobile_Sales', hue='Vehicle_Type', data=dd)

In [None]:
plt.xticks(ticks=[0, 1], labels=['Non-Recession', 'Recession'])

In [None]:
plt.xlabel('Economic Condition')

In [None]:
plt.ylabel('Average Automobile Sales')

In [None]:
plt.title('Vehicle-Wise Sales during Recession and Non-Recession Period')

In [None]:
plt.tight_layout()

In [None]:
plt.show()

From this plot, we can understand that there is a drastic decline in the overall sales of the automobiles during recession. However, the most affected type of vehicle is sports.

Use sub plotting to compare the variations in GDP during recession and non-recession period by developing line plots for each period.

Now, I want to find more insights from the data to understand the reason

How did the GDP vary over time during recession and non-recession periods

In [None]:
rec_data = df[df['Recession'] == 1]

In [None]:
non_rec_data = df[df['Recession'] == 0]

In [None]:
# Create figure

In [None]:
fig = plt.figure(figsize=(14, 6))

In [None]:
# Create different axes for subplots

In [None]:
ax0 = fig.add_subplot(1, 2, 1)  # subplot 1

In [None]:
ax1 = fig.add_subplot(1, 2, 2)  # subplot 2

In [None]:
# Line plot for recession period

In [None]:
sns.lineplot(x='Year', y='GDP', data=rec_data, label='GDP during Recession', ax=ax0)

ax0.set_xlabel('Year')

ax0.set_ylabel('GDP')

ax0.set_title('GDP Variation during Recession Period')

In [None]:
# Line plot for non-recession period

In [None]:
sns.lineplot(x='Year', y='GDP', data=non_rec_data, label='GDP during Non-Recession', ax=ax1)

ax1.set_xlabel('Year')

ax1.set_ylabel('GDP')

ax1.set_title('GDP Variation during Non-Recession Period')

In [None]:
# Add vertical lines to both plots

In [None]:
recession_years = df[df['Recession'] == 1]['Year'].unique()

for year in recession_years:

In [None]:
ax0.axvline(x=year, color='gray', linestyle='--', alpha=0.5)

In [None]:
ax1.axvline(x=year, color='gray', linestyle='--', alpha=0.5)

In [None]:
# Adjust layout

In [None]:
plt.tight_layout()

In [None]:
plt.show()

From this plot, it is evident that during recession, the GDP of the country was in a low range, might have affected the overall sales of the company

Develop a Bubble plot for displaying the impact of seasonality on Automobile Sales

How has seasonality impacted the sales, in which months the sales were high or low? Check it for non-recession years to understand the trend

Develop a Bubble plot for displaying Automobile Sales for every month and use Seasonality Weight for representing the size of each bubble

In [None]:
non_rec_data = df[df['Recession'] == 0]

In [None]:
# Create bubble plot

In [None]:
plt.figure(figsize=(12, 6))

In [None]:
sns.scatterplot(

In [None]:
data=non_rec_data,

In [None]:
x='Month',

In [None]:
y='Automobile_Sales',

In [None]:
size='Seasonality_Weight',

In [None]:
hue='Seasonality_Weight',

In [None]:
sizes=(100, 1000),

In [None]:
palette='coolwarm',

In [None]:
legend=False

)

In [None]:
# Add labels and title

In [None]:
plt.xlabel('Month')

In [None]:
plt.ylabel('Automobile Sales')

In [None]:
plt.title('Seasonality Impact on Automobile Sales')

In [None]:
plt.grid(True)

In [None]:
plt.tight_layout()

In [None]:
plt.show()

From this plot, it is evident that seasonality has not affected on the overall sales. However, there is a drastic raise in sales in the month of April.

Use the functionality of Matplotlib to develop a scatter plot to identify the correlation between average vehicle price relate to the sales volume during recessions.

In [None]:
import matplotlib.pyplot as plt  # Importing necessary libraries

In [None]:
# Create dataframe for recession period

In [None]:
rec_data = df[df['Recession'] == 1]

In [None]:
# Scatter plot

In [None]:
plt.figure(figsize=(10, 6))

In [None]:
plt.scatter(rec_data['Consumer_Confidence'], rec_data['Automobile_Sales'], color='blue', alpha=0.6)

In [None]:
# Labels and title

In [None]:
plt.xlabel('Consumer Confidence')

In [None]:
plt.ylabel('Automobile Sales')

In [None]:
plt.title('Consumer Confidence and Automobile Sales during Recessions')

In [None]:
plt.grid(True)

In [None]:
plt.tight_layout()

In [None]:
plt.show()

How does the average vehicle price relate to the sales volume during recessions?

In [None]:
import matplotlib.pyplot as plt  # Importing necessary libraries

In [None]:
# Create dataframe for recession period

In [None]:
rec_data = df[df['Recession'] == 1]

In [None]:
# Scatter plot

In [None]:
plt.figure(figsize=(10, 6))

In [None]:
plt.scatter(rec_data['Price'], rec_data['Automobile_Sales'], color='green', alpha=0.6)

In [None]:
# Labels and title

In [None]:
plt.xlabel('Average Vehicle Price')

In [None]:
plt.ylabel('Automobile Sales')

In [None]:
plt.title('Relationship between Average Vehicle Price and Sales during Recessions')

In [None]:
plt.tight_layout()

In [None]:
plt.show()

From the plot, there is not much relation!

Create a pie chart to display the portion of advertising expenditure of XYZAutomotives during recession and non-recession periods.

In [None]:
# Filter the data

In [None]:
Rdata = df[df['Recession'] == 1]

In [None]:
NRdata = df[df['Recession'] == 0]

In [None]:
# Calculate the total advertising expenditure for both periods

In [None]:
RAtotal = Rdata['Advertising_Expenditure'].sum()

In [None]:
NRAtotal = NRdata['Advertising_Expenditure'].sum()

In [None]:
# Create a pie chart for the advertising expenditure

In [None]:
plt.figure(figsize=(8, 6))

In [None]:
labels = ['Recession', 'Non-Recession']

In [None]:
sizes = [RAtotal, NRAtotal]

In [None]:
colors = ['lightcoral', 'lightgreen']

In [None]:
plt.pie(sizes, labels=labels, autopct='%1.1f%%', startangle=90, colors=colors)

In [None]:
plt.title('Advertising Expenditure of XYZAutomotives: Recession vs Non-Recession')

In [None]:
plt.axis('equal')  # Ensures the pie is a circle

In [None]:
plt.show()

It seems XYZAutomotives has been spending much more on the advertisements during non-recession periods as compared to during recession times. Fair enough!

Develop a pie chart to display the total Advertisement expenditure for each vehicle type during recession period.

In [None]:
Rdata = df[df['Recession'] == 1]

In [None]:
# Calculate the advertising expenditure by vehicle type during recessions

In [None]:
VTexpenditure = Rdata.groupby('Vehicle_Type')['Advertising_Expenditure'].sum()

In [None]:
# Create a pie chart for the share of each vehicle type in total expenditure during recessions

In [None]:
plt.figure(figsize=(8, 6))

In [None]:
labels = VTexpenditure.index

In [None]:
sizes = VTexpenditure.values

In [None]:
plt.pie(sizes, labels=labels, autopct='%1.1f%%', startangle=90)

In [None]:
plt.title('Vehicle-Type Wise Advertising Expenditure during Recession')

In [None]:
plt.axis('equal')  # Ensures pie chart is circular

In [None]:
plt.show()

During recession the advertisements were mostly focued on low price range vehicle types. A wise decision!

Develop a lineplot to analyse the effect of the unemployment rate on vehicle type and sales during the Recession Period.

Analyze the effect of the unemployment rate on vehicle type and sales during the Recession Period.

In [None]:
df_rec = df[df['Recession'] == 1]

In [None]:
# Create the lineplot

In [None]:
plt.figure(figsize=(10, 6))

In [None]:
vehicle_colors = {

'Mediumfamilycar': 'blue',

'Smallfamiliycar': 'orange',

'Supperminicar': 'red',

'Sports': 'yellow',

'Executivecar': 'black'}

In [None]:
sns.lineplot(

In [None]:
data=df_rec,

In [None]:
x='unemployment_rate',

In [None]:
y='Automobile_Sales',

In [None]:
hue='Vehicle_Type',

In [None]:
style='Vehicle_Type',

In [None]:
markers='o',

In [None]:
err_style=None,

In [None]:
palette = vehicle_colors

)

In [None]:
plt.ylim(0, 850)

In [None]:
plt.xlabel('Unemployment Rate')

In [None]:
plt.ylabel('Automobile Sales')

In [None]:
plt.title('Effect of Unemployment Rate on Vehicle Type and Sales')

In [None]:
plt.legend(loc=(0.05, 0.3))

In [None]:
plt.show()

Except for the sport cars whose sales got lower as the umemployment rate higher, the other vehicle type has no consistence trend.

Create a map on the hightest sales region/offices of the company during recession period

In [None]:
def download(url, filename):

In [None]:
response = requests.get(url)

In [None]:
if response.status_code == 200:

with open(filename, "wb") as f:

f.write(response.content)

In [None]:
path = 'https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DV0101EN-SkillsNetwork/Data%20Files/us-states.json'

download(path, "us-states.json")

In [None]:
# Filter the data for the recession period and specific cities

In [None]:
recession_data = df[df['Recession'] == 1]

In [None]:
# Calculate the total sales by city

In [None]:
sales_by_city = recession_data.groupby('City')['Automobile_Sales'].sum().reset_index()

In [None]:
# Create a base map centered on the United States

In [None]:
map1 = folium.Map(location=[37.0902, -95.7129], zoom_start=4)

In [None]:
# Create a choropleth layer using Folium

In [None]:
choropleth = folium.Choropleth(

In [None]:
geo_data= 'us-states.json',  # GeoJSON file with state boundaries

In [None]:
data=sales_by_city,

In [None]:
columns=['City', 'Automobile_Sales'],

In [None]:
key_on='feature.properties.name',

In [None]:
fill_color='YlOrRd',

In [None]:
fill_opacity=0.7,

In [None]:
line_opacity=0.2,

In [None]:
legend_name='Automobile Sales during Recession'

).add_to(map1)

In [None]:
# Add tooltips to the choropleth layer

choropleth.geojson.add_child(

In [None]:
folium.features.GeoJsonTooltip(['name'], labels=True)

)

In [None]:
# Display the map

map1