# Sales Data Analysis
## Descriptive Statistics
<span>© Copyright Notice 2025, Sales Data Analysis - All Rights Reserved.</span>


### Introduction
In this notebook, we will explore descriptive statistics using the `sales_data.csv` dataset. We will cover:
- Loading the dataset
- Measures of Central Tendency: Mean, Median, Mode
- Dispersion Metrics: Variance, Standard Deviation
- Visualizations: Bar charts, Histograms, Box plots


In [None]:
import numpy as np  # Import the numpy library for numerical operations
import pandas as pd  # Import the pandas library for data manipulation and analysis
import matplotlib.pyplot as plt  # Import the matplotlib library for data visualization

# Load the dataset
data = pd.read_csv('../data/sales_data.csv')  # Read the dataset into a pandas DataFrame
data.head()  # Show the first 5 rows of the dataset to get an overview

### Measures of Central Tendency
We will calculate the mean, median, and mode for the `SALES` column.

In [None]:
# Calculate mean, median, and mode for the SALES column
mean_sales = data['SALES'].mean()
median_sales = data['SALES'].median()
mode_sales = data['SALES'].mode()[0]  # [0] is used to extract the first element from the result of the mode() function

mean_sales, median_sales, mode_sales

### Dispersion Metrics
Next, we will calculate the variance and standard deviation for the `SALES` column.

In [None]:
# Variance and Standard Deviation
variance_sales = data['SALES'].var()  # Calculate the variance of the SALES column
std_dev_sales = data['SALES'].std()  # Calculate the standard deviation of the SALES column

variance_sales, std_dev_sales

### Visualizations
We will create visualizations to better understand the distribution of sales data.

In [None]:
# Bar chart for the frequency of sales
data['SALES'].plot(kind='bar', title='Sales Frequency')
plt.xlabel('Sales')
plt.ylabel('Frequency')
plt.show()  # Display the bar chart

# Histogram for the distribution of sales
data['SALES'].hist(bins=30, edgecolor='black')
plt.title('Distribution of Sales')
plt.xlabel('Sales')
plt.ylabel('Frequency')
plt.show()  # Display the histogram

# Boxplot for sales data
plt.boxplot(data['SALES'].dropna())  # Create a boxplot, dropping any NaN values
plt.title('Boxplot of Sales')
plt.ylabel('Sales')
plt.show()  # Display the boxplot