# Project Title - BMW Sales Analysis (2010â€“2024)


## Data set selection

> In this section, you will need to provide the following information about the selected data set:
>
> - Source: [Kaggle â€“ BMW Sales Dataset (2010â€“2024)](https://www.kaggle.com/) 
> - Fields: Year, Month, Model, Region, Sales Volume, Revenue  
> - License: Open Data / Educational Use

### Data set selection rationale

> Why did you select this data set?
I selected this dataset because Iâ€™m interested in how luxury vehicle sales evolve over time and across regions.  
Analyzing BMWâ€™s sales offers insights into consumer demand, growth trends in different markets, and the impact of external factors such as new model releases or economic events.


### Questions to be answered

> Using statistical analysis and visualization, what questions would you like to be able answer about this dataset.
1. How have BMWâ€™s total global sales changed from 2010 to 2024?  
2. Which regions have shown the strongest growth or decline?  
3. Which BMW model lines (3-Series, X-Series, etc.) contribute most to sales? 
4. Are there noticeable seasonal sales patterns across years?  
5. How does the shift toward electric models affect total performance? 

### Visualization ideas

> Provide a few examples of what you plan to visualize to answer the questions you posed in the previous section. In this project, you will be producing 6-8 visualizations. You will also be producing an interactive chart using Plotly.
> Think about what those visualization could be: what are the variables used in the charts? what insights do you hope to gain from them?
- Line chart â€“ total global sales by year (trend analysis)
- Bar chart â€“ sales by region or model line (comparison)
- Heatmap â€“ correlations between revenue, sales, and time
- Box plot â€“ monthly sales distribution (seasonality)
- Stacked area chart â€“ EV vs non-EV share over time
- Interactive Plotly chart â€“ dynamic regional sales explorer
These visualizations will help reveal patterns in sales performance, highlight regional differences, and provide a forecast of potential future trends.

## ðŸ§® Analysis Plan

> I will analyze BMWâ€™s global sales data from 2010â€“2024 using Pandas for data exploration and Seaborn/Matplotlib for visualization.  
>  
> The plan is to:
> 1. Clean and prepare the data by checking for missing values, duplicates, and converting the Year column to numeric.
> 2. Perform descriptive statistics (mean, median, growth rates).
> 3. Create 6â€“8 visualizations to examine sales trends, regional performance, model popularity, and correlations.
> 4. Interpret findings to explain which regions and models drive performance and how the data reflects broader market trends.
>  
> These insights will help visualize BMWâ€™s historical sales behavior and provide a baseline for future forecasting analysis.




In [None]:
# ðŸš€ Importing some libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Load the dataset
df = pd.read_csv("data/BMW sales data (2010-2024) (1).csv")

# Display first few rows
df.head()

# Data overview
df.info()
df.describe()

# Check for missing values or duplicates
df.isnull().sum()
df.duplicated().sum()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50000 entries, 0 to 49999
Data columns (total 11 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   Model                 50000 non-null  object 
 1   Year                  50000 non-null  int64  
 2   Region                50000 non-null  object 
 3   Color                 50000 non-null  object 
 4   Fuel_Type             50000 non-null  object 
 5   Transmission          50000 non-null  object 
 6   Engine_Size_L         50000 non-null  float64
 7   Mileage_KM            50000 non-null  int64  
 8   Price_USD             50000 non-null  int64  
 9   Sales_Volume          50000 non-null  int64  
 10  Sales_Classification  50000 non-null  object 
dtypes: float64(1), int64(4), object(6)
memory usage: 4.2+ MB


np.int64(0)