# Sales & Revenue Analysis with Python

This notebook analyzes a synthetic sales dataset using Python and pandas.
The goal is to demonstrate data loading, cleaning, transformation, and business-oriented analysis.


In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [4]:
customers = pd.read_csv("customers.csv")
products = pd.read_csv("products.csv")
orders = pd.read_csv("orders.csv")
order_items = pd.read_csv("order_items.csv")

In [5]:
orders["order_date"] = pd.to_datetime(orders["order_date"])

In [6]:
orders.isnull().sum(), order_items.isnull().sum()

(order_id       0
 customer_id    0
 order_date     0
 dtype: int64,
 order_item_id    0
 order_id         0
 product_id       0
 quantity         0
 unit_price       0
 dtype: int64)

No missing values were found in the orders and order_items datasets, ensuring data completeness for revenue calculations.

In [7]:
sales_df = (
    order_items
    .merge(orders, on="order_id", how="left")
    .merge(products, on="product_id", how="left")
    .merge(customers, on="customer_id", how="left")
)

Left joins were used to preserve all order items, even if related dimension data is missing.


In [8]:
sales_df["revenue"] = sales_df["quantity"] * sales_df["unit_price"]

In [9]:
sales_df.head()

Unnamed: 0,order_item_id,order_id,product_id,quantity,unit_price,customer_id,order_date,product_name,category,price,customer_name,region,revenue
0,1,1,1,1,1150,1,2024-01-15,Laptop,Electronics,1200,Ana García,Spain,1150
1,2,1,3,2,140,1,2024-01-15,Headphones,Accessories,150,Ana García,Spain,280
2,3,2,2,1,780,2,2024-01-20,Smartphone,Electronics,800,Carlos López,Mexico,780
3,4,3,4,1,290,3,2024-02-05,Office Chair,Furniture,300,Lucía Fernández,Argentina,290
4,5,4,5,2,75,1,2024-02-18,Desk Lamp,Furniture,80,Ana García,Spain,150


In [10]:
sales_df["revenue"].sum()

np.int64(4000)

This represents the total revenue generated in the dataset.

## Data Preparation Summary

The individual datasets were merged into a single dataframe to simplify analysis.
A revenue column was created using quantity and unit price to reflect actual sales value.
This prepared dataset serves as the foundation for all subsequent analysis.