## Sales Data Analysis

In this notebook, we conduct an exploratory data analysis (EDA) on a sample sales dataset to understand its properties and derive initial insights.


We'll start by loading our dataset and examining the first few rows. This initial inspection will give us a clear understanding of the structure and nature of the data.


In [None]:
import pandas as pd

# Load the CSV data into a pandas DataFrame
sales_data = pd.read_csv("/mnt/data/sample_sales_data.csv")

# Display the first few rows of the dataset
sales_data.head()


Next, we'll generate some basic statistics for the numerical columns in the dataset. This will help us understand the distribution of values, as well as provide measures like mean, median, min, and max for each column.


In [None]:
# Generate descriptive statistics for the numerical columns
sales_data.describe()


Following this, we'll explore the categorical columns, such as `category`, `customer_type`, and `payment_type`, to understand the distribution of values within these columns.


In [None]:
# Explore the distribution of categorical columns
category_distribution = sales_data['category'].value_counts()
customer_type_distribution = sales_data['customer_type'].value_counts()
payment_type_distribution = sales_data['payment_type'].value_counts()

category_distribution, customer_type_distribution, payment_type_distribution


To get a more visual representation and a better understanding of the data, we'll create some visualizations. Specifically:
- A bar chart for product category distribution.
- A pie chart for customer type distribution.
- A pie chart for payment type distribution.


In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Bar chart for product category distribution
plt.figure(figsize=(15, 8))
sns.barplot(x=category_distribution.index, y=category_distribution.values, palette="viridis")
plt.xticks(rotation=90)
plt.ylabel('Number of Transactions')
plt.xlabel('Product Category')
plt.title('Distribution of Product Categories')
plt.show()

# Pie charts for customer type and payment type distributions
fig, ax = plt.subplots(1, 2, figsize=(15, 7))
ax[0].pie(customer_type_distribution.values, labels=customer_type_distribution.index, autopct='%1.1f%%', startangle=90, colors=sns.color_palette("pastel", len(customer_type_distribution)))
ax[0].set_title('Distribution of Customer Types')
ax[1].pie(payment_type_distribution.values, labels=payment_type_distribution.index, autopct='%1.1f%%', startangle=90, colors=sns.color_palette("pastel", len(payment_type_distribution)))
ax[1].set_title('Distribution of Payment Types')
plt.tight_layout()
plt.show()


In light of our findings, we have the following recommendations:

1. Given the popularity of **fruits** and **vegetables**, consider strategies to enhance their availability and visibility in stores.
2. Implement programs or incentives to convert non-members to premium or gold members, enhancing customer loyalty.
3. Ensure a seamless payment experience across all payment methods to enhance customer satisfaction.

For a more detailed analysis and to answer the client's overarching question of "How to better stock the items that they sell", we'd need a more comprehensive dataset with additional features and a more extended period of data collection.
