Customer Revenue In March

Calculate the total revenue from each customer in March 2019. Include only customers who were active in March 2019.


Output the revenue along with the customer id and sort the results based on the revenue in descending order.

In [1]:
import pandas as pd
import numpy as np

In [4]:
orders = pd.read_csv("../CSV/orders.csv", usecols=[0,1,2,3,4])
orders.head()

Unnamed: 0,id,cust_id,order_date,order_details,total_order_cost
0,1,3,2019-03-04,Coat,100
1,2,3,2019-03-01,Shoes,80
2,3,3,2019-03-07,Skirt,30
3,4,7,2019-02-01,Coat,25
4,5,7,2019-03-10,Shoes,80


In [5]:
orders['order_date'] = orders['order_date'].apply(pd.to_datetime)
orders.head()

Unnamed: 0,id,cust_id,order_date,order_details,total_order_cost
0,1,3,2019-03-04,Coat,100
1,2,3,2019-03-01,Shoes,80
2,3,3,2019-03-07,Skirt,30
3,4,7,2019-02-01,Coat,25
4,5,7,2019-03-10,Shoes,80


In [6]:
march = orders[orders['order_date'].dt.month == 3]
march

Unnamed: 0,id,cust_id,order_date,order_details,total_order_cost
0,1,3,2019-03-04,Coat,100
1,2,3,2019-03-01,Shoes,80
2,3,3,2019-03-07,Skirt,30
4,5,7,2019-03-10,Shoes,80
7,8,15,2019-03-11,Slipper,20
8,9,15,2019-03-01,Jeans,80
9,10,15,2019-03-09,Shirts,50
12,13,12,2019-03-11,Slipper,20


In [7]:
march_2019 = march[march['order_date'].dt.year == 2019]
march_2019

Unnamed: 0,id,cust_id,order_date,order_details,total_order_cost
0,1,3,2019-03-04,Coat,100
1,2,3,2019-03-01,Shoes,80
2,3,3,2019-03-07,Skirt,30
4,5,7,2019-03-10,Shoes,80
7,8,15,2019-03-11,Slipper,20
8,9,15,2019-03-01,Jeans,80
9,10,15,2019-03-09,Shirts,50
12,13,12,2019-03-11,Slipper,20


In [8]:
result = march_2019.groupby(['cust_id'])['total_order_cost'].sum().to_frame('revenue').reset_index().sort_values('revenue', ascending = False)
result

Unnamed: 0,cust_id,revenue
0,3,210
3,15,150
1,7,80
2,12,20


Solution Walkthrough
In this problem, we are given a dataset of orders and we need to calculate the total revenue from each customer in March 2019. We are only interested in customers who were active in March 2019. The output should include the customer ID and revenue, sorted in descending order of revenue.

To solve this problem, we will use the pandas library, which is a powerful data manipulation and analysis tool for Python. We will also use the numpy library for some operations.

We will go through the solution code step by step and explain each part in detail.

Let's get started!

Understanding The Data
Before we dive into the code, let's understand the dataset we are working with. The dataset contains information about orders, including the order date, customer ID, and total order cost. The orders dataframe will be used throughout our solution.

The Problem Statement
Our task is to calculate the total revenue from each customer in March 2019, considering only the customers who were active in that month. We also need to include the customer ID in the output and sort the results in descending order of revenue.

Breaking Down The Code
Now let's break down the code and understand each part separately.

import pandas as pd
import numpy as np
In the first two lines of code, we import the pandas and numpy libraries. We will use these libraries to manipulate and analyze our data.

orders["order_date"] = orders["order_date"].apply(pd.to_datetime)
This line of code converts the 'order_date' column in the orders dataframe to datetime format using the pd.to_datetime() function. This is necessary for date-based operations.

march = orders[orders["order_date"].dt.month == 3]
This line of code creates a new dataframe called 'march' by filtering the 'orders' dataframe for rows where the month in the 'order_date' column is equal to 3 (which represents March).

march_2019 = march[march["order_date"].dt.year == 2019]
This line of code creates another new dataframe called 'march_2019' by further filtering the 'march' dataframe for rows where the year in the 'order_date' column is equal to 2019.

result = (
    march_2019.groupby(["cust_id"])["total_order_cost"]
    .sum()
    .to_frame("revenue")
    .reset_index()
    .sort_values("revenue", ascending=False)
)
This line of code calculates the total order cost for each customer in the 'march_2019' dataframe. It uses the groupby() function to group the data by 'cust_id', then selects the 'total_order_cost' column and applies the sum() function to calculate the total for each group. The result is stored in a new dataframe called 'result'.

The to_frame() function is used to convert the 'revenue' column into a dataframe, and the reset_index() function is used to reset the index of the dataframe. Finally, the sort_values() function is used to sort the dataframe based on the 'revenue' column in descending order.

Bringing It All Together
The complete code for calculating the total revenue from each customer in March 2019, including only active customers, and sorting the results based on revenue is as follows:

import pandas as pd
import numpy as np

orders["order_date"] = orders["order_date"].apply(pd.to_datetime)

march = orders[orders["order_date"].dt.month == 3]
march_2019 = march[march["order_date"].dt.year == 2019]

result = (
    march_2019.groupby(["cust_id"])["total_order_cost"]
    .sum()
    .to_frame("revenue")
    .reset_index()
    .sort_values("revenue", ascending=False)
)
Conclusion
In this walkthrough, we learned how to calculate the total revenue from each customer in March 2019 using pandas and numpy. We filtered the dataset based on specific conditions, grouped the data, and calculated the sum of the 'total_order_cost' for each customer. Finally, we sorted the results in descending order based on revenue.