In [2]:
import pandas as pd

# Problem Description

Given a table of purchases by date, calculate the month-over-month percentage change in revenue. The output should include the year-month date (YYYY-MM) and percentage change, rounded to the 2nd decimal point, and sorted from the beginning of the year to the end of the year.

The percentage change column will be populated from the 2nd month forward and can be calculated as:

### $\frac{(thisMonthsRevenue - lastMonthsRevenue)} {lastMonthsRevenue} *100$


## First look at Data

In [5]:
transactions = pd.read_csv('sf_transactions.csv')
transactions.head(3)

Unnamed: 0,id,created_at,value,purchase_id
0,1,2019-01-01 00:00:00,172692,43
1,2,2019-01-05 00:00:00,177194,36
2,3,2019-01-09 00:00:00,109513,30


## Firsts Tougths

* Create a year month column

* Groupby year_month

* For each month, except the first, calculate the change [%]

## Data Analysis

In [6]:
#checking for missing values and format of columns
transactions.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 92 entries, 0 to 91
Data columns (total 4 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   id           92 non-null     int64 
 1   created_at   92 non-null     object
 2   value        92 non-null     int64 
 3   purchase_id  92 non-null     int64 
dtypes: int64(3), object(1)
memory usage: 3.0+ KB


There is no missing values, but date coumns are not in optimal format

date -> to_datetime

In [7]:
transactions.created_at = pd.to_datetime(transactions.created_at, format='%Y-%m-%d')
transactions.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 92 entries, 0 to 91
Data columns (total 4 columns):
 #   Column       Non-Null Count  Dtype         
---  ------       --------------  -----         
 0   id           92 non-null     int64         
 1   created_at   92 non-null     datetime64[ns]
 2   value        92 non-null     int64         
 3   purchase_id  92 non-null     int64         
dtypes: datetime64[ns](1), int64(3)
memory usage: 3.0 KB


Now that data is fixed lets start...
## Solution

In [14]:
#create YY-mm column
transactions['year-month']= transactions.created_at.dt.strftime('%Y-%m')

# Groupby YY-mm and sum value
gby = transactions.groupby('year-month').value.sum().reset_index().sort_values('year-month')

#Except for the first month, calculate the change %
for i in range(1,gby.shape[0]):
    gby.loc[i,'change (%)'] = (gby.loc[i,'value'] - gby.loc[i-1,'value'])/ gby.loc[i-1,'value'] *100
output = gby[['year-month','change (%)']]

## Final Output

In [15]:
output.round(2)

Unnamed: 0,year-month,change (%)
0,2019-01,
1,2019-02,-28.56
2,2019-03,23.35
3,2019-04,-13.84
4,2019-05,13.49
5,2019-06,-2.78
6,2019-07,-6.0
7,2019-08,28.36
8,2019-09,-4.97
9,2019-10,-12.68
