Monthly Percentage Difference

Given a table of purchases by date, calculate the month-over-month percentage change in revenue. The output should include the year-month date (YYYY-MM) and percentage change, rounded to the 2nd decimal point, and sorted from the beginning of the year to the end of the year.
The percentage change column will be populated from the 2nd month forward and can be calculated as ((this month's revenue - last month's revenue) / last month's revenue)*100.

In [1]:
import pandas as pd
import numpy as np
from datetime import datetime
pd.options.display.float_format = "{:,.2f}".format

In [3]:
sf_transactions = pd.read_excel("../CSV/sf_transactions.xlsx", header=1)
sf_transactions.head()

Unnamed: 0,id,created_at,value,purchase_id
0,1,2019-01-01,172692,43
1,2,2019-01-05,177194,36
2,3,2019-01-09,109513,30
3,4,2019-01-13,164911,30
4,5,2019-01-17,198872,39


In [4]:
sf_transactions['created_at'] = sf_transactions['created_at'].apply(pd.to_datetime)
sf_transactions.head()

Unnamed: 0,id,created_at,value,purchase_id
0,1,2019-01-01,172692,43
1,2,2019-01-05,177194,36
2,3,2019-01-09,109513,30
3,4,2019-01-13,164911,30
4,5,2019-01-17,198872,39


In [5]:
sf_transactions['year_month'] = pd.to_datetime(sf_transactions['created_at']).dt.to_period('M')
sf_transactions.head()

Unnamed: 0,id,created_at,value,purchase_id,year_month
0,1,2019-01-01,172692,43,2019-01
1,2,2019-01-05,177194,36,2019-01
2,3,2019-01-09,109513,30,2019-01
3,4,2019-01-13,164911,30,2019-01
4,5,2019-01-17,198872,39,2019-01


In [6]:
df = sf_transactions.groupby('year_month')['value'].sum().reset_index(name='monthly_revenue').sort_values('year_month')
df.head()

Unnamed: 0,year_month,monthly_revenue
0,2019-01,1332636
1,2019-02,952031
2,2019-03,1174373
3,2019-04,1011869
4,2019-05,1148390


In [7]:
df['prev_value'] = df['monthly_revenue'].shift(1)
df

Unnamed: 0,year_month,monthly_revenue,prev_value
0,2019-01,1332636,
1,2019-02,952031,1332636.0
2,2019-03,1174373,952031.0
3,2019-04,1011869,1174373.0
4,2019-05,1148390,1011869.0
5,2019-06,1116470,1148390.0
6,2019-07,1049530,1116470.0
7,2019-08,1347176,1049530.0
8,2019-09,1280233,1347176.0
9,2019-10,1117846,1280233.0


In [8]:
df['revenue_diff_pct'] = round(((df['monthly_revenue'] - df['prev_value'])/df['prev_value'])*100, 2)
df

Unnamed: 0,year_month,monthly_revenue,prev_value,revenue_diff_pct
0,2019-01,1332636,,
1,2019-02,952031,1332636.0,-28.56
2,2019-03,1174373,952031.0,23.35
3,2019-04,1011869,1174373.0,-13.84
4,2019-05,1148390,1011869.0,13.49
5,2019-06,1116470,1148390.0,-2.78
6,2019-07,1049530,1116470.0,-6.0
7,2019-08,1347176,1049530.0,28.36
8,2019-09,1280233,1347176.0,-4.97
9,2019-10,1117846,1280233.0,-12.68


In [9]:
result = df[['year_month','revenue_diff_pct']].fillna('')
result

Unnamed: 0,year_month,revenue_diff_pct
0,2019-01,
1,2019-02,-28.56
2,2019-03,23.35
3,2019-04,-13.84
4,2019-05,13.49
5,2019-06,-2.78
6,2019-07,-6.0
7,2019-08,28.36
8,2019-09,-4.97
9,2019-10,-12.68


Solution Walkthrough
This is a solution walkthrough for calculating the month-over-month percentage change in revenue using a table of purchases by date. The desired output is a table with the year-month date and the corresponding percentage change, rounded to the 2nd decimal point, sorted from the beginning of the year to the end of the year.

Understanding The Data
The data consists of a table named sf_transactions which contains information about purchases. The table has the following columns:

created_at: The date and time when the purchase occurred.
value: The value of the purchase.
The Problem Statement
The task is to calculate the month-over-month percentage change in revenue using the given table of purchases. The output should include the year-month date (in the format 'YYYY-MM') and the corresponding percentage change, rounded to the 2nd decimal point. The output table should be sorted from the beginning of the year to the end of the year.

Breaking Down The Code
Let's break down the code step by step to understand the solution:

Importing the necessary libraries:
import pandas as pd
import numpy as np
from datetime import datetime
The code begins by importing the libraries pandas, numpy, and datetime which are required for data manipulation and date operations.

Setting the display format for float values:
pd.options.display.float_format = "{:,.2f}".format
This line of code sets the display format for float values to have 2 decimal places.

Converting the 'created_at' column to datetime format:
sf_transactions["created_at"] = sf_transactions["created_at"].apply(
    pd.to_datetime
)
This line of code converts the 'created_at' column of the 'sf_transactions' table to datetime format using the pd.to_datetime function.

Creating a new column 'year_month':
sf_transactions["year_month"] = pd.to_datetime(
    sf_transactions["created_at"]
).dt.to_period("M")
This line of code creates a new column named 'year_month' in the 'sf_transactions' table which contains the year and month extracted from the 'created_at' column. The pd.to_datetime function is used to convert the 'created_at' column to datetime format, and then the .dt.to_period('M') method is used to extract the year and month and convert them to a period format.

Grouping the data by year and month, and calculating the sum of 'value':
df = (
    sf_transactions.groupby("year_month")["value"]
    .sum()
    .reset_index(name="monthly_revenue")
    .sort_values("year_month")
)
This line of code groups the 'sf_transactions' table by the 'year_month' column and calculates the sum of the 'value' column for each group. The reset_index(name='monthly_revenue') method is used to reset the index of the resulting DataFrame and rename the sum column as 'monthly_revenue'. Finally, the resulting DataFrame is sorted by the 'year_month' column.

Adding a column with the previous month's revenue:
df["prev_value"] = df["monthly_revenue"].shift(1)
This line of code adds a new column named 'prev_value' to the DataFrame 'df' which contains the previous month's revenue. The shift(1) method is used to shift the values of the 'monthly_revenue' column by 1 position to get the previous month's revenue.

Calculating the month-over-month percentage change in revenue:
df["revenue_diff_pct"] = round(
    ((df["monthly_revenue"] - df["prev_value"]) / df["prev_value"])
    * 100,
    2,
)
This line of code calculates the month-over-month percentage change in revenue and stores the result in a new column named 'revenue_diff_pct'. The formula ((this month's revenue - last month's revenue) / last month's revenue)*100 is used to calculate the percentage change. The round function is used to round the result to 2 decimal places.

Selecting the required columns and filling missing values:
result = df[["year_month", "revenue_diff_pct"]].fillna("")
This line of code selects the 'year_month' and 'revenue_diff_pct' columns from the 'df' DataFrame and stores the result in a new DataFrame called 'result'. The fillna('') method is used to fill any missing values in the 'revenue_diff_pct' column with an empty string.

Bringing It All Together
The complete code for calculating the month-over-month percentage change in revenue is as follows:

import pandas as pd
import numpy as np
from datetime import datetime
pd.options.display.float_format = "{:,.2f}".format

sf_transactions["created_at"] = sf_transactions["created_at"].apply(
    pd.to_datetime
)
sf_transactions["year_month"] = pd.to_datetime(
    sf_transactions["created_at"]
).dt.to_period("M")
df = (
    sf_transactions.groupby("year_month")["value"]
    .sum()
    .reset_index(name="monthly_revenue")
    .sort_values("year_month")
)
df["prev_value"] = df["monthly_revenue"].shift(1)
df["revenue_diff_pct"] = round(
    ((df["monthly_revenue"] - df["prev_value"]) / df["prev_value"])
    * 100,
    2,
)
result = df[["year_month", "revenue_diff_pct"]].fillna("")
Conclusion
The code snippet provided calculates the month-over-month percentage change in revenue using a table of purchases by date. The resulting DataFrame 'result' contains the year-month date and the corresponding percentage change, rounded to the 2nd decimal point, sorted from the beginning of the year to the end of the year.