### Converting new bookings forecasts for deferred revenue model

Since Karen Burgess changed roles, I now have two sources of data for the bookings forecast

DX - 
DME - 

The data files they provide are very different and I need to change them to get one bookings dataframe


In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

import pickle

### DME bookings

In [2]:
filename = r'/Volumes/Treasury/Financial_Database/Deferred_Revenue/Inputs/DATA_2020_p12/DME_Bookings_FY21_Plan.xlsx'
sheetname = 'Raw'
df_DME = pd.read_excel(filename, sheetname)

In [3]:
df_DME.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4403 entries, 0 to 4402
Data columns (total 10 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   Metrics          4403 non-null   object 
 1   Profit center    4403 non-null   object 
 2   Market Area      4403 non-null   object 
 3   Market Segement  4403 non-null   object 
 4   GTM              4403 non-null   object 
 5   Q1 2021          4403 non-null   float64
 6   Q2 2021          4403 non-null   float64
 7   Q3 2021          4403 non-null   float64
 8   Q4 2021          4403 non-null   float64
 9   2021             4403 non-null   float64
dtypes: float64(5), object(5)
memory usage: 344.1+ KB


In [4]:
df_DME.sample(5)

Unnamed: 0,Metrics,Profit center,Market Area,Market Segement,GTM,Q1 2021,Q2 2021,Q3 2021,Q4 2021,2021
2553,Net ACV,10800 - Acrobat Desk,SEA (R),EDUCATION,Strategic,-2916.0,0.0,-400.1562,0.0,-3316.156
553,Total Subscription Attrition,13450 - Stock Photography,Switzerland (MA),COMMERCIAL,Corporate,-8223.643,-8985.996285,73011.120612,-42290.914504,13510.57
2788,Net ACV,IS17 - Adobe Sign,Japan (R),COMMERCIAL,Territory,216587.1,344513.318009,376764.719168,329357.481727,1267223.0
823,Total Subscription Attrition,IS15 - Acrobat DC,Switzerland (MA),GOVERNMENT,Corporate,-9314.473,-717.893164,-1495.732462,-2110.574143,-13638.67
902,Total Subscription Attrition,10800 - Acrobat Desk,EMEA (G),COMMERCIAL,Corporate,-1483631.0,-967392.599598,-441973.632527,-607492.171553,-3500490.0


In [5]:
# Clean Column Names
df_DME.columns

Index(['Metrics', 'Profit center', 'Market Area', 'Market Segement', 'GTM',
       'Q1 2021', 'Q2 2021', 'Q3 2021', 'Q4 2021', '2021'],
      dtype='object')

### The DX bookings have no information about segment. We need to delete segment and then group by to following
- pc_descr
- geo
- region
- market_area


In [6]:
df_DME = df_DME.rename(columns = {'Metrics': 'metrics',
                        'Profit center': 'profit_center',
                        'Market Area': 'market_area',
                        'Market Segement': 'segment',
                        'Q1 2021':'Q1_2021',
                        'Q2 2021':'Q2_2021',
                        'Q3 2021':'Q3_2021',
                        'Q4 2021':'Q4_2021' 
                        })

In [7]:
df_DME = df_DME.drop(columns = ['segment', 'GTM', '2021'])

In [8]:
df_DME.columns

Index(['metrics', 'profit_center', 'market_area', 'Q1_2021', 'Q2_2021',
       'Q3_2021', 'Q4_2021'],
      dtype='object')

### Clear out anything that is not 'NET ASV'


In [9]:
df_DME['metrics'].value_counts(dropna=False)

Net ACV                         1797
ASV & Usage Based               1510
Total Subscription Attrition    1096
Name: metrics, dtype: int64

In [10]:
df_DME = df_DME[df_DME['metrics']=='Net ACV']

In [11]:
df_DME.head(5)

Unnamed: 0,metrics,profit_center,market_area,Q1_2021,Q2_2021,Q3_2021,Q4_2021
1096,Net ACV,EB10 - Creative,AMER (G),12830970.0,13262380.0,19437170.0,20683690.0
1097,Net ACV,EB10 - Creative,AMER (G),-420283.6,829917.1,1133272.0,1270177.0
1098,Net ACV,EB10 - Creative,AMER (G),3921053.0,3845633.0,5875375.0,6629117.0
1099,Net ACV,EB10 - Creative,AMER (G),-259452.3,-300933.2,-1303246.0,-499390.9
1100,Net ACV,EB10 - Creative,AMER (G),2279720.0,2284183.0,3444284.0,3827079.0


# THE DME profit_center is completely nested on the DME raw sheet.

We will need to understand the nesting and see if there is a way to adjust for this.


In [12]:
df_DME['profit_center'].value_counts()

IS10 - Creative - Professional    161
10100 - Design                    161
GP10 - Creative                   161
EB10 - Creative                   161
GP15 - Document Cloud             155
EB15 - Document Cloud             155
IS15 - Acrobat DC                 146
10800 - Acrobat Desk              146
14400 - Adobe Sign                102
IS17 - Adobe Sign                 102
13450 - Stock Photography          97
IS18 - Stock Photography           97
10850 - DCE                        78
10110 - CCE + Stock                75
Name: profit_center, dtype: int64

# After checking the spreadsheet, it appears GP10 numbers map to EB10 and EB15 numbers map to GP15.
So there is no further subset of the broader EB mapping to GP (not a seperate GP within an EB)

### To fix  the data here, we will split the profit center based on the -.
- First column is BU_id
- second column is BU

We will then filter the dataframe for including only EB10 and EB15.
The BU will be kept (Creative, Document Cloud)


In [13]:
# creating the BU_ID
df_DME['BU_id'] =  df_DME['profit_center'].apply(lambda st: st[0:st.find("-")])
df_DME['BU_segment'] = df_DME['profit_center'].apply(lambda st: st[st.find("-")+1:])

In [14]:
df_DME['BU_id'] = df_DME['BU_id'].str.strip()
df_DME['BU_segment'] = df_DME['BU_segment'].str.strip()

In [15]:
# check the BU_id value counts
df_DME['BU_id'].value_counts(dropna=False)

GP10     161
EB10     161
IS10     161
10100    161
GP15     155
EB15     155
IS15     146
10800    146
14400    102
IS17     102
IS18      97
13450     97
10850     78
10110     75
Name: BU_id, dtype: int64

In [16]:
list_BU_keepers = ['EB10', 'EB15']

In [17]:
df_DME = df_DME[df_DME['BU_id'].isin(list_BU_keepers)]

In [18]:
df_DME['BU_segment'].value_counts(dropna=False)

Creative          161
Document Cloud    155
Name: BU_segment, dtype: int64

## Now Dealing with the GEO, Region, Market Area mess

It looks like there are three seperate cuts of the data that are a hierarchy

1. G is GEO
2. R is region
3. MA is market Area

Going to split these into seperate sections and do successive fill forwards to cover all of them



In [19]:
# identify the characters in a string
df_DME['in_parens'] =  df_DME['market_area'].apply(lambda st: st[st.find("(")+1:st.find(")")])

In [20]:
df_DME['in_parens'].value_counts()

MA    177
R     102
G      37
Name: in_parens, dtype: int64

In [21]:
# now delete the parenthesis and the items between the parenthesis in the 'Market Area' column
df_DME.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 316 entries, 1096 to 2163
Data columns (total 10 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   metrics        316 non-null    object 
 1   profit_center  316 non-null    object 
 2   market_area    316 non-null    object 
 3   Q1_2021        316 non-null    float64
 4   Q2_2021        316 non-null    float64
 5   Q3_2021        316 non-null    float64
 6   Q4_2021        316 non-null    float64
 7   BU_id          316 non-null    object 
 8   BU_segment     316 non-null    object 
 9   in_parens      316 non-null    object 
dtypes: float64(4), object(6)
memory usage: 27.2+ KB


In [22]:
df_DME['market_area'] = df_DME['market_area'].apply(lambda st: st[0:st.find("(")-1])
df_DME.head(5)

Unnamed: 0,metrics,profit_center,market_area,Q1_2021,Q2_2021,Q3_2021,Q4_2021,BU_id,BU_segment,in_parens
1096,Net ACV,EB10 - Creative,AMER,12830970.0,13262380.0,19437170.0,20683690.0,EB10,Creative,G
1097,Net ACV,EB10 - Creative,AMER,-420283.6,829917.1,1133272.0,1270177.0,EB10,Creative,G
1098,Net ACV,EB10 - Creative,AMER,3921053.0,3845633.0,5875375.0,6629117.0,EB10,Creative,G
1099,Net ACV,EB10 - Creative,AMER,-259452.3,-300933.2,-1303246.0,-499390.9,EB10,Creative,G
1100,Net ACV,EB10 - Creative,AMER,2279720.0,2284183.0,3444284.0,3827079.0,EB10,Creative,G


In [23]:
df_DME['market_area'].value_counts()

Korea                   24
India                   18
North America           15
United States           15
AMER                    15
China                   12
ASIA                    12
Aus and New Zealand     12
Southeast Asia          12
Greater China           12
SEA                     12
ANZ                     12
Canada                  10
Japan                    8
Brazil                   8
Hong Kong & Taiwan       8
Latin America            8
Nordic                   6
Middle East              6
France                   6
Central Europe           6
United Kingdom           6
Northern Europe          6
Southwest Europe         6
SSA & Israel             6
Benelux                  6
Switzerland              6
Germany                  6
Strat. Latin America     6
Iberica                  6
Italy                    6
EMEA                     6
JPN                      4
Mexico                   3
Russia & CIS             3
Eastern Europe           3
Name: market_area, dtype: in

In [24]:
df_DME.head(10)

Unnamed: 0,metrics,profit_center,market_area,Q1_2021,Q2_2021,Q3_2021,Q4_2021,BU_id,BU_segment,in_parens
1096,Net ACV,EB10 - Creative,AMER,12830970.0,13262380.0,19437170.0,20683690.0,EB10,Creative,G
1097,Net ACV,EB10 - Creative,AMER,-420283.6,829917.1,1133272.0,1270177.0,EB10,Creative,G
1098,Net ACV,EB10 - Creative,AMER,3921053.0,3845633.0,5875375.0,6629117.0,EB10,Creative,G
1099,Net ACV,EB10 - Creative,AMER,-259452.3,-300933.2,-1303246.0,-499390.9,EB10,Creative,G
1100,Net ACV,EB10 - Creative,AMER,2279720.0,2284183.0,3444284.0,3827079.0,EB10,Creative,G
1101,Net ACV,EB10 - Creative,AMER,1520516.0,1897964.0,1755988.0,1382500.0,EB10,Creative,G
1102,Net ACV,EB10 - Creative,AMER,-2833.68,0.0,-18591.05,-3333.574,EB10,Creative,G
1103,Net ACV,EB10 - Creative,North America,11632490.0,12072000.0,17632070.0,18664150.0,EB10,Creative,R
1104,Net ACV,EB10 - Creative,North America,-364743.7,894220.9,1166460.0,1287489.0,EB10,Creative,R
1105,Net ACV,EB10 - Creative,North America,3921053.0,3845633.0,5875375.0,6629117.0,EB10,Creative,R


In [25]:
df_DME['pc_ID'] = df_DME['profit_center'].apply(lambda st: st[0:st.find('-')])
df_DME['pc_descr'] = df_DME['profit_center'].apply(lambda st: st[st.find('-')+1:])

In [26]:
df_DME.sample(20)

Unnamed: 0,metrics,profit_center,market_area,Q1_2021,Q2_2021,Q3_2021,Q4_2021,BU_id,BU_segment,in_parens,pc_ID,pc_descr
1187,Net ACV,EB10 - Creative,Korea,800655.7,779105.5,891039.8,937944.5,EB10,Creative,MA,EB10,Creative
1186,Net ACV,EB10 - Creative,Korea,261000.0,270000.0,270000.0,270000.0,EB10,Creative,R,EB10,Creative
2118,Net ACV,EB15 - Document Cloud,Central Europe,25179.23,76149.45,98721.07,230401.6,EB15,Document Cloud,R,EB15,Document Cloud
1243,Net ACV,EB10 - Creative,France,0.0,1862.573,361771.3,362649.0,EB10,Creative,MA,EB10,Creative
2138,Net ACV,EB15 - Document Cloud,SSA & Israel,0.0,-12844.26,0.0,0.0,EB15,Document Cloud,MA,EB15,Document Cloud
1155,Net ACV,EB10 - Creative,Greater China,267541.0,372762.1,403324.3,418434.0,EB10,Creative,R,EB10,Creative
2107,Net ACV,EB15 - Document Cloud,SEA,39000.0,48750.0,48750.0,65000.0,EB15,Document Cloud,R,EB15,Document Cloud
1209,Net ACV,EB10 - Creative,Central Europe,226173.9,241757.2,553494.6,896329.9,EB10,Creative,R,EB10,Creative
1101,Net ACV,EB10 - Creative,AMER,1520516.0,1897964.0,1755988.0,1382500.0,EB10,Creative,G,EB10,Creative
2038,Net ACV,EB15 - Document Cloud,Latin America,449342.3,573233.4,870363.5,1166855.0,EB15,Document Cloud,R,EB15,Document Cloud


adding columns geo, region, market_area (which already exists)

In [27]:
df_DME['geo'] = df_DME[df_DME['in_parens']=='G']['market_area']

In [28]:
df_DME.tail(60)

Unnamed: 0,metrics,profit_center,market_area,Q1_2021,Q2_2021,Q3_2021,Q4_2021,BU_id,BU_segment,in_parens,pc_ID,pc_descr,geo
2104,Net ACV,EB15 - Document Cloud,SEA,10100.13,13016.13,12615.97,16734.13,EB15,Document Cloud,R,EB15,Document Cloud,
2105,Net ACV,EB15 - Document Cloud,SEA,5578.341,5578.341,5578.341,10078.34,EB15,Document Cloud,R,EB15,Document Cloud,
2106,Net ACV,EB15 - Document Cloud,SEA,12963.45,33966.44,198139.4,292834.1,EB15,Document Cloud,R,EB15,Document Cloud,
2107,Net ACV,EB15 - Document Cloud,SEA,39000.0,48750.0,48750.0,65000.0,EB15,Document Cloud,R,EB15,Document Cloud,
2108,Net ACV,EB15 - Document Cloud,Southeast Asia,462020.0,430457.7,357097.9,461331.7,EB15,Document Cloud,MA,EB15,Document Cloud,
2109,Net ACV,EB15 - Document Cloud,Southeast Asia,209810.8,250312.9,277798.0,295365.8,EB15,Document Cloud,MA,EB15,Document Cloud,
2110,Net ACV,EB15 - Document Cloud,Southeast Asia,10100.13,13016.13,12615.97,16734.13,EB15,Document Cloud,MA,EB15,Document Cloud,
2111,Net ACV,EB15 - Document Cloud,Southeast Asia,5578.341,5578.341,5578.341,10078.34,EB15,Document Cloud,MA,EB15,Document Cloud,
2112,Net ACV,EB15 - Document Cloud,Southeast Asia,12963.45,33966.44,198139.4,292834.1,EB15,Document Cloud,MA,EB15,Document Cloud,
2113,Net ACV,EB15 - Document Cloud,Southeast Asia,39000.0,48750.0,48750.0,65000.0,EB15,Document Cloud,MA,EB15,Document Cloud,


In [29]:
df_DME['geo'] =df_DME['geo'].ffill()

In [30]:
df_DME['geo'].value_counts(dropna=False)

ASIA    134
EMEA     90
AMER     80
JPN      12
Name: geo, dtype: int64

In [31]:
#df_DME.head(60)

### Comfortable that the forward fill of 'geo' worked well

### Working on the region now
Note: There will be a problem with the geo values here. They will all contain a region even if they are subtotals. This is OK because we will only be using the market area data and can recreate the region and geo data thanks to the new columns we have created


In [32]:
df_DME['region'] = df_DME[df_DME['in_parens']=='R']['market_area']
df_DME.head(20)

Unnamed: 0,metrics,profit_center,market_area,Q1_2021,Q2_2021,Q3_2021,Q4_2021,BU_id,BU_segment,in_parens,pc_ID,pc_descr,geo,region
1096,Net ACV,EB10 - Creative,AMER,12830970.0,13262380.0,19437170.0,20683690.0,EB10,Creative,G,EB10,Creative,AMER,
1097,Net ACV,EB10 - Creative,AMER,-420283.6,829917.1,1133272.0,1270177.0,EB10,Creative,G,EB10,Creative,AMER,
1098,Net ACV,EB10 - Creative,AMER,3921053.0,3845633.0,5875375.0,6629117.0,EB10,Creative,G,EB10,Creative,AMER,
1099,Net ACV,EB10 - Creative,AMER,-259452.3,-300933.2,-1303246.0,-499390.9,EB10,Creative,G,EB10,Creative,AMER,
1100,Net ACV,EB10 - Creative,AMER,2279720.0,2284183.0,3444284.0,3827079.0,EB10,Creative,G,EB10,Creative,AMER,
1101,Net ACV,EB10 - Creative,AMER,1520516.0,1897964.0,1755988.0,1382500.0,EB10,Creative,G,EB10,Creative,AMER,
1102,Net ACV,EB10 - Creative,AMER,-2833.68,0.0,-18591.05,-3333.574,EB10,Creative,G,EB10,Creative,AMER,
1103,Net ACV,EB10 - Creative,North America,11632490.0,12072000.0,17632070.0,18664150.0,EB10,Creative,R,EB10,Creative,AMER,North America
1104,Net ACV,EB10 - Creative,North America,-364743.7,894220.9,1166460.0,1287489.0,EB10,Creative,R,EB10,Creative,AMER,North America
1105,Net ACV,EB10 - Creative,North America,3921053.0,3845633.0,5875375.0,6629117.0,EB10,Creative,R,EB10,Creative,AMER,North America


In [33]:
df_DME['region'] = df_DME['region'].ffill()
df_DME['region'].value_counts(dropna=False)

North America       40
Latin America       37
Southwest Europe    34
Greater China       32
SEA                 30
Northern Europe     30
ANZ                 24
Korea               24
Central Europe      24
India               18
Japan               16
NaN                  7
Name: region, dtype: int64

In [34]:
df_DME.tail(40)

Unnamed: 0,metrics,profit_center,market_area,Q1_2021,Q2_2021,Q3_2021,Q4_2021,BU_id,BU_segment,in_parens,pc_ID,pc_descr,geo,region
2124,Net ACV,EB15 - Document Cloud,Switzerland,0.0,194.2695,38721.07,14398.38,EB15,Document Cloud,MA,EB15,Document Cloud,EMEA,Central Europe
2125,Net ACV,EB15 - Document Cloud,Switzerland,71612.97,66141.2,95195.4,326199.6,EB15,Document Cloud,MA,EB15,Document Cloud,EMEA,Central Europe
2126,Net ACV,EB15 - Document Cloud,Eastern Europe,33632.13,36196.2,159125.3,204538.3,EB15,Document Cloud,MA,EB15,Document Cloud,EMEA,Central Europe
2127,Net ACV,EB15 - Document Cloud,Russia & CIS,35295.56,48806.11,50100.99,116023.9,EB15,Document Cloud,MA,EB15,Document Cloud,EMEA,Central Europe
2128,Net ACV,EB15 - Document Cloud,Northern Europe,3764673.0,4523071.0,3921933.0,8809946.0,EB15,Document Cloud,R,EB15,Document Cloud,EMEA,Northern Europe
2129,Net ACV,EB15 - Document Cloud,Northern Europe,-4310.175,-12844.26,58.87947,4747.28,EB15,Document Cloud,R,EB15,Document Cloud,EMEA,Northern Europe
2130,Net ACV,EB15 - Document Cloud,Northern Europe,168735.3,586957.2,483221.2,664062.7,EB15,Document Cloud,R,EB15,Document Cloud,EMEA,Northern Europe
2131,Net ACV,EB15 - Document Cloud,Nordic,929095.3,1152535.0,836562.6,2779185.0,EB15,Document Cloud,MA,EB15,Document Cloud,EMEA,Northern Europe
2132,Net ACV,EB15 - Document Cloud,Nordic,0.0,0.0,0.0,1948.999,EB15,Document Cloud,MA,EB15,Document Cloud,EMEA,Northern Europe
2133,Net ACV,EB15 - Document Cloud,Nordic,26106.46,64028.2,114317.0,182383.4,EB15,Document Cloud,MA,EB15,Document Cloud,EMEA,Northern Europe


#### Now we only want to keep the market areas

In [35]:
df_DME = df_DME[df_DME['in_parens']=='MA'].copy()
df_DME['market_area'].value_counts(dropna=False)

United States           15
Southeast Asia          12
Korea                   12
China                   12
Aus and New Zealand     12
Canada                  10
India                    9
Hong Kong & Taiwan       8
Brazil                   8
SSA & Israel             6
United Kingdom           6
France                   6
Benelux                  6
Switzerland              6
Germany                  6
Nordic                   6
Strat. Latin America     6
Iberica                  6
Italy                    6
Middle East              6
Japan                    4
Mexico                   3
Eastern Europe           3
Russia & CIS             3
Name: market_area, dtype: int64

In [36]:
len(df_DME)

177

In [37]:
df_DME.sample(20)

Unnamed: 0,metrics,profit_center,market_area,Q1_2021,Q2_2021,Q3_2021,Q4_2021,BU_id,BU_segment,in_parens,pc_ID,pc_descr,geo,region
1192,Net ACV,EB10 - Creative,Korea,260999.999998,270000.0,270000.0,270000.0,EB10,Creative,MA,EB10,Creative,ASIA,Korea
2042,Net ACV,EB15 - Document Cloud,Brazil,449342.262766,573233.4,870363.5,1166855.0,EB15,Document Cloud,MA,EB15,Document Cloud,AMER,Latin America
1114,Net ACV,EB10 - Creative,Canada,-107311.016392,-52618.21,-43329.15,-18793.36,EB10,Creative,MA,EB10,Creative,AMER,North America
2153,Net ACV,EB15 - Document Cloud,Iberica,0.0,1609.229,193574.7,0.0,EB15,Document Cloud,MA,EB15,Document Cloud,EMEA,Southwest Europe
1151,Net ACV,EB10 - Creative,Aus and New Zealand,520009.698493,454125.8,470668.4,543177.1,EB10,Creative,MA,EB10,Creative,ASIA,ANZ
1170,Net ACV,EB10 - Creative,Hong Kong & Taiwan,42500.0,42500.0,42500.0,42500.0,EB10,Creative,MA,EB10,Creative,ASIA,Greater China
2080,Net ACV,EB15 - Document Cloud,Hong Kong & Taiwan,123462.955655,97747.37,152374.1,147863.3,EB15,Document Cloud,MA,EB15,Document Cloud,ASIA,Greater China
2032,Net ACV,EB15 - Document Cloud,United States,884775.875018,1226003.0,1895016.0,2396398.0,EB15,Document Cloud,MA,EB15,Document Cloud,AMER,North America
2066,Net ACV,EB15 - Document Cloud,Aus and New Zealand,491022.387813,481485.8,581470.4,679000.9,EB15,Document Cloud,MA,EB15,Document Cloud,ASIA,ANZ
1113,Net ACV,EB10 - Creative,Canada,335711.243074,350931.2,514795.1,552220.3,EB10,Creative,MA,EB10,Creative,AMER,North America


In [38]:
df_DME = df_DME.drop(columns=['profit_center', 'in_parens' ])

In [39]:
df_DME.head()

Unnamed: 0,metrics,market_area,Q1_2021,Q2_2021,Q3_2021,Q4_2021,BU_id,BU_segment,pc_ID,pc_descr,geo,region
1110,Net ACV,Canada,609548.258713,623444.413051,918945.896499,984383.352738,EB10,Creative,EB10,Creative,AMER,North America
1111,Net ACV,Canada,-37140.794047,-49386.365826,-116295.456059,-35329.061308,EB10,Creative,EB10,Creative,AMER,North America
1112,Net ACV,Canada,0.0,0.0,-116219.13333,-13053.575018,EB10,Creative,EB10,Creative,AMER,North America
1113,Net ACV,Canada,335711.243074,350931.151868,514795.103694,552220.273334,EB10,Creative,EB10,Creative,AMER,North America
1114,Net ACV,Canada,-107311.016392,-52618.206674,-43329.145301,-18793.361067,EB10,Creative,EB10,Creative,AMER,North America


In [40]:
df_DME.columns

Index(['metrics', 'market_area', 'Q1_2021', 'Q2_2021', 'Q3_2021', 'Q4_2021',
       'BU_id', 'BU_segment', 'pc_ID', 'pc_descr', 'geo', 'region'],
      dtype='object')

In [41]:
df_DME['metrics'].value_counts()

Net ACV    177
Name: metrics, dtype: int64

## To keep the columns the same as the old bookings file, we need to change some column names

- pc_descr to segment
- Add 'BU' which is all 'Digital Media'


In [42]:
df_DME.rename(columns = {'pc_descr': 'segment'}, inplace=True)

In [43]:
df_DME.sample(4)

Unnamed: 0,metrics,market_area,Q1_2021,Q2_2021,Q3_2021,Q4_2021,BU_id,BU_segment,pc_ID,segment,geo,region
2121,Net ACV,Germany,25179.233,75955.180496,60000.0,216003.234762,EB15,Document Cloud,EB15,Document Cloud,EMEA,Central Europe
1127,Net ACV,Brazil,-21038.832792,-55285.03564,-33187.957736,-11402.781261,EB10,Creative,EB10,Creative,AMER,Latin America
1219,Net ACV,Russia & CIS,130157.246329,241765.884505,253089.574767,246622.838986,EB10,Creative,EB10,Creative,EMEA,Central Europe
2045,Net ACV,Brazil,-3751.1626,-135.057409,-539.189465,-3760.054219,EB15,Document Cloud,EB15,Document Cloud,AMER,Latin America


In [44]:
df_DME['BU'] = 'Digital Media'

In [45]:
# We need to remove the segment data: it is not included in the DME bookings
df_DME = df_DME.groupby(by = ['BU', 'segment', 'geo', 'region', 'market_area']).sum()
df_DME = df_DME.reset_index()

In [46]:
df_DME = df_DME[['BU', 'segment', 'geo', 'region', 'market_area', 'Q1_2021','Q2_2021', 'Q3_2021', 'Q4_2021']]

In [47]:
df_DME.sample(10)

Unnamed: 0,BU,segment,geo,region,market_area,Q1_2021,Q2_2021,Q3_2021,Q4_2021
33,Digital Media,Document Cloud,ASIA,Korea,Korea,184078.1,315237.7,383758.3,428030.8
47,Digital Media,Document Cloud,JPN,Japan,Japan,1677403.0,2320611.0,2051876.0,2582073.0
18,Digital Media,Creative,EMEA,Northern Europe,United Kingdom,3448298.0,3228927.0,3991542.0,6352543.0
7,Digital Media,Creative,ASIA,Greater China,Hong Kong & Taiwan,561192.8,634980.9,708514.7,729525.3
1,Digital Media,Creative,AMER,Latin America,Mexico,-25806.51,-10743.23,0.0,-9129.548
15,Digital Media,Creative,EMEA,Northern Europe,Middle East,303142.2,33220.54,318099.2,299655.4
5,Digital Media,Creative,ASIA,ANZ,Aus and New Zealand,1647356.0,2063449.0,2171380.0,3754770.0
34,Digital Media,Document Cloud,ASIA,SEA,Southeast Asia,739472.7,782081.5,899979.5,1141344.0
21,Digital Media,Creative,EMEA,Southwest Europe,Iberica,139474.6,352771.9,719710.5,492612.2
44,Digital Media,Document Cloud,EMEA,Southwest Europe,France,622207.3,1277408.0,948072.6,2115803.0


In [48]:
df_DME.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 48 entries, 0 to 47
Data columns (total 9 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   BU           48 non-null     object 
 1   segment      48 non-null     object 
 2   geo          48 non-null     object 
 3   region       48 non-null     object 
 4   market_area  48 non-null     object 
 5   Q1_2021      48 non-null     float64
 6   Q2_2021      48 non-null     float64
 7   Q3_2021      48 non-null     float64
 8   Q4_2021      48 non-null     float64
dtypes: float64(4), object(5)
memory usage: 3.5+ KB


In [49]:
df_DME.sum()

BU             Digital MediaDigital MediaDigital MediaDigital...
segment         Creative Creative Creative Creative Creative ...
geo            AMERAMERAMERAMERAMERASIAASIAASIAASIAASIAASIAEM...
region         Latin AmericaLatin AmericaLatin AmericaNorth A...
market_area    BrazilMexicoStrat. Latin AmericaCanadaUnited S...
Q1_2021                                               8.3782e+07
Q2_2021                                              1.00621e+08
Q3_2021                                              1.24743e+08
Q4_2021                                               1.6877e+08
dtype: object

# Done with DME now moving to DX



In [50]:
filename = r'/Volumes/Treasury/Financial_Database/Deferred_Revenue/Inputs/DATA_2020_p12/DX_Bookings_FY21_Plan.xlsx'
sheetname = 'Sheet1'
start=12
df_DX = pd.read_excel(filename, sheetname, skiprows=start)

In [51]:
df_DX.sample(10)

Unnamed: 0.1,Unnamed: 0,Unnamed: 1,Unnamed: 2,2021,Q1 2021,Q2 2021,Q3 2021,Q4 2021
350,Enterprise,SSA & Israel (MA),14300 - Adobe Campaign,38759.43,5813.91505,9302.26408,8720.872575,14922.38
108,Enterprise,ANZ (R),14800 - Customer Journey Analytics,1000000.0,140000.0,200000.0,250000.0,410000.0
395,Enterprise,Benelux (MA),14600 - Adobe Exp Platform,1050386.0,189069.470059,262596.486193,199573.329507,399146.7
735,Strategic,Benelux (MA),14700 - Real Time CDP,170000.0,30600.0,42500.0,32300.0,64600.0
824,Corporate,All Market Areas,14900 - Journeys,3274044.0,558504.511698,736555.28853,740225.886046,1238759.0
1742,Greenfield,Hong Kong & Taiwan (MA),12630 - Magento,348867.4,59081.366901,88622.050352,96007.221215,105156.7
1359,Territory,India (R),13010 - Assets,263821.8,47487.920281,58040.791454,73870.098214,84422.97
279,Enterprise,Eastern Europe (MA),13020 - Sites,647104.6,103536.733318,135891.96248,148834.054144,258841.8
1583,Territory,France (MA),13030 - Forms,173000.0,31140.0,43250.0,32870.0,65740.0
1712,Greenfield,Aus and New Zealand (MA),12500 - Adobe Target,29218.06,3928.6944,5758.037557,7211.061571,12320.27


In [52]:
df_DX = df_DX.rename(columns = {'Unnamed: 0': 'segment',
                                'Unnamed: 1': 'market_area',
                                'Unnamed: 2': 'profit_center',
                                'Q1 2021':'Q1_2021',
                                'Q2 2021':'Q2_2021',
                                'Q3 2021':'Q3_2021',
                                'Q4 2021':'Q4_2021' 
                                })

In [53]:
df_DX = df_DX.drop(columns = ['segment', '2021'])

## It looks like there are three seperate cuts of the data that are a hierarchy again, but they also have an 'All Market Areas' that is a total by profit center. We don't need this but there is no need to change the code here.
It will be removed when we get to the filter on region

1. G is GEO
2. R is region
3. MA is market Area

Going to split these into seperate sections and do successive fill forwards to cover all of them



In [54]:
df_DX.sample(10)

Unnamed: 0,market_area,profit_center,Q1_2021,Q2_2021,Q3_2021,Q4_2021
1968,Iberica (MA),13020 - Sites,221760.0,308000.0,234080.0,468160.0
1913,United Kingdom (MA),12500 - Adobe Target,73187.604116,117100.166585,109781.406174,187848.183897
1763,India (MA),12640 - Marketo,18863.310563,23025.918727,28359.433449,29821.258356
944,China (MA),12500 - Adobe Target,1336.07792,1985.922305,2380.771199,3233.099096
1185,Iberica (MA),12500 - Adobe Target,21548.572306,29928.572648,22745.715212,45491.430424
1142,Southwest Europe (R),14700 - Real Time CDP,130140.0,180750.0,137370.0,274740.0
1883,Northern Europe (R),12500 - Adobe Target,91505.893123,146409.428996,137258.839684,234865.125681
1174,France (MA),13020 - Sites,187576.99115,260523.59882,197997.935103,395995.870206
583,Aus and New Zealand (MA),12400 - Adobe Analytics,85296.944731,121963.591258,152464.191125,250366.255225
268,Switzerland (MA),12630 - Magento,136615.934866,179308.414511,196385.406369,341539.837164


In [55]:
# identify the characters in a string
df_DX['in_parens'] =  df_DX['market_area'].apply(lambda st: st[st.find("(")+1:st.find(")")])

In [56]:
df_DX['market_area'] = df_DX['market_area'].apply(lambda st: st[0:st.find("(")-1])

In [57]:
df_DX['pc_ID'] = df_DX['profit_center'].apply(lambda st: st[0:st.find('-')])
df_DX['pc_descr'] = df_DX['profit_center'].apply(lambda st: st[st.find('-')+1:])

In [58]:
df_DX['geo'] = df_DX[df_DX['in_parens']=='G']['market_area']
df_DX['geo'] =df_DX['geo'].ffill()

In [59]:
df_DX['region'] = df_DX[df_DX['in_parens']=='R']['market_area']
df_DX['region'] = df_DX['region'].ffill()

In [60]:
# filter to just include market area
df_DX = df_DX[df_DX['in_parens']=='MA'].copy()

In [61]:
# need to add the BU and segment information to the bookings file so that it matches
df_DX['BU'] = 'Digital Experience'
df_DX['segment'] = 'Experience Cloud'

In [62]:
# drop unnecessary columns and reorder the columns
df_DX = df_DX[['BU', 'segment', 'geo', 'region', 'market_area', 'Q1_2021','Q2_2021', 'Q3_2021', 'Q4_2021']]

In [63]:
# We need to remove the segment data: it is not included in the DME bookings
df_DX = df_DX.groupby(by = ['BU', 'segment', 'geo', 'region', 'market_area']).sum()
df_DX = df_DX.reset_index()


In [64]:
df_DX.head(20)

Unnamed: 0,BU,segment,geo,region,market_area,Q1_2021,Q2_2021,Q3_2021,Q4_2021
0,Digital Experience,Experience Cloud,AMER,Latin America,Brazil,3427051.0,2403855.0,5079044.0,7421805.0
1,Digital Experience,Experience Cloud,AMER,North America,Canada,7924589.0,11150700.0,10689460.0,13065770.0
2,Digital Experience,Experience Cloud,AMER,North America,United States,187462800.0,213939600.0,252227800.0,329381600.0
3,Digital Experience,Experience Cloud,ASIA,ANZ,Aus and New Zealand,10551180.0,15073110.0,18841380.0,30899870.0
4,Digital Experience,Experience Cloud,ASIA,Greater China,China,1210874.0,1816312.0,1967671.0,2573108.0
5,Digital Experience,Experience Cloud,ASIA,Greater China,Hong Kong & Taiwan,1348579.0,2022851.0,2191645.0,2866115.0
6,Digital Experience,Experience Cloud,ASIA,India,India,4697714.0,5741650.0,7307555.0,8351491.0
7,Digital Experience,Experience Cloud,ASIA,Korea,Korea,981859.1,1472789.0,1595521.0,2086450.0
8,Digital Experience,Experience Cloud,ASIA,SEA,Southeast Asia,3797880.0,5649756.0,5495533.0,7397304.0
9,Digital Experience,Experience Cloud,EMEA,Central Europe,Eastern Europe,835436.0,1096510.0,1200939.0,2088590.0


In [65]:
df_DX.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 23 entries, 0 to 22
Data columns (total 9 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   BU           23 non-null     object 
 1   segment      23 non-null     object 
 2   geo          23 non-null     object 
 3   region       23 non-null     object 
 4   market_area  23 non-null     object 
 5   Q1_2021      23 non-null     float64
 6   Q2_2021      23 non-null     float64
 7   Q3_2021      23 non-null     float64
 8   Q4_2021      23 non-null     float64
dtypes: float64(4), object(5)
memory usage: 1.7+ KB


# Final Check on the data


In [66]:
# DX
df_DX.head(20)

Unnamed: 0,BU,segment,geo,region,market_area,Q1_2021,Q2_2021,Q3_2021,Q4_2021
0,Digital Experience,Experience Cloud,AMER,Latin America,Brazil,3427051.0,2403855.0,5079044.0,7421805.0
1,Digital Experience,Experience Cloud,AMER,North America,Canada,7924589.0,11150700.0,10689460.0,13065770.0
2,Digital Experience,Experience Cloud,AMER,North America,United States,187462800.0,213939600.0,252227800.0,329381600.0
3,Digital Experience,Experience Cloud,ASIA,ANZ,Aus and New Zealand,10551180.0,15073110.0,18841380.0,30899870.0
4,Digital Experience,Experience Cloud,ASIA,Greater China,China,1210874.0,1816312.0,1967671.0,2573108.0
5,Digital Experience,Experience Cloud,ASIA,Greater China,Hong Kong & Taiwan,1348579.0,2022851.0,2191645.0,2866115.0
6,Digital Experience,Experience Cloud,ASIA,India,India,4697714.0,5741650.0,7307555.0,8351491.0
7,Digital Experience,Experience Cloud,ASIA,Korea,Korea,981859.1,1472789.0,1595521.0,2086450.0
8,Digital Experience,Experience Cloud,ASIA,SEA,Southeast Asia,3797880.0,5649756.0,5495533.0,7397304.0
9,Digital Experience,Experience Cloud,EMEA,Central Europe,Eastern Europe,835436.0,1096510.0,1200939.0,2088590.0


In [67]:
# DME
df_DME.sample(20)

Unnamed: 0,BU,segment,geo,region,market_area,Q1_2021,Q2_2021,Q3_2021,Q4_2021
25,Digital Media,Document Cloud,AMER,Latin America,Mexico,-587.8793,0.0,0.0,0.0
26,Digital Media,Document Cloud,AMER,Latin America,Strat. Latin America,-4793.599,-1201.923,-22.46623,0.0
2,Digital Media,Creative,AMER,Latin America,Strat. Latin America,-39291.25,-11805.57,-14172.09,-10252.29
44,Digital Media,Document Cloud,EMEA,Southwest Europe,France,622207.3,1277408.0,948072.6,2115803.0
15,Digital Media,Creative,EMEA,Northern Europe,Middle East,303142.2,33220.54,318099.2,299655.4
7,Digital Media,Creative,ASIA,Greater China,Hong Kong & Taiwan,561192.8,634980.9,708514.7,729525.3
11,Digital Media,Creative,EMEA,Central Europe,Eastern Europe,213265.4,233441.0,215792.4,278030.9
17,Digital Media,Creative,EMEA,Northern Europe,SSA & Israel,120558.4,-42623.44,20487.44,136417.0
46,Digital Media,Document Cloud,EMEA,Southwest Europe,Italy,202525.2,191023.7,384494.7,479230.1
13,Digital Media,Creative,EMEA,Central Europe,Russia & CIS,130157.2,241765.9,258158.3,258354.3


In [68]:
df_DME.sample(5)

Unnamed: 0,BU,segment,geo,region,market_area,Q1_2021,Q2_2021,Q3_2021,Q4_2021
40,Digital Media,Document Cloud,EMEA,Northern Europe,Nordic,955201.7,1216563.0,950879.6,2963517.0
28,Digital Media,Document Cloud,AMER,North America,United States,19417750.0,23456720.0,35558280.0,48877930.0
7,Digital Media,Creative,ASIA,Greater China,Hong Kong & Taiwan,561192.8,634980.9,708514.7,729525.3
11,Digital Media,Creative,EMEA,Central Europe,Eastern Europe,213265.4,233441.0,215792.4,278030.9
21,Digital Media,Creative,EMEA,Southwest Europe,Iberica,139474.6,352771.9,719710.5,492612.2


In [69]:
df_DME.columns

Index(['BU', 'segment', 'geo', 'region', 'market_area', 'Q1_2021', 'Q2_2021',
       'Q3_2021', 'Q4_2021'],
      dtype='object')

In [70]:
df_DX.sample(5)

Unnamed: 0,BU,segment,geo,region,market_area,Q1_2021,Q2_2021,Q3_2021,Q4_2021
11,Digital Experience,Experience Cloud,EMEA,Central Europe,Russia & CIS,172333.6,226187.8,247729.5,430833.9
18,Digital Experience,Experience Cloud,EMEA,Southwest Europe,Benelux,4788053.0,6650073.0,5054056.0,10108110.0
21,Digital Experience,Experience Cloud,EMEA,Southwest Europe,Italy,2816852.0,3912295.0,2973344.0,5946688.0
19,Digital Experience,Experience Cloud,EMEA,Southwest Europe,France,7019966.0,9749952.0,7409964.0,14819930.0
13,Digital Experience,Experience Cloud,EMEA,Med,Mediterranean,-1920000.0,-2800000.0,-2800000.0,-3880000.0


# Checking totals


In [71]:
df_DX.sum()

BU             Digital ExperienceDigital ExperienceDigital Ex...
segment        Experience CloudExperience CloudExperience Clo...
geo            AMERAMERAMERASIAASIAASIAASIAASIAASIAEMEAEMEAEM...
region         Latin AmericaNorth AmericaNorth AmericaANZGrea...
market_area    BrazilCanadaUnited StatesAus and New ZealandCh...
Q1_2021                                              2.88639e+08
Q2_2021                                              3.58238e+08
Q3_2021                                              3.98039e+08
Q4_2021                                              5.68557e+08
dtype: object

In [72]:
df_DME.sum()

BU             Digital MediaDigital MediaDigital MediaDigital...
segment         Creative Creative Creative Creative Creative ...
geo            AMERAMERAMERAMERAMERASIAASIAASIAASIAASIAASIAEM...
region         Latin AmericaLatin AmericaLatin AmericaNorth A...
market_area    BrazilMexicoStrat. Latin AmericaCanadaUnited S...
Q1_2021                                               8.3782e+07
Q2_2021                                              1.00621e+08
Q3_2021                                              1.24743e+08
Q4_2021                                               1.6877e+08
dtype: object

In [73]:
df = pd.concat([df_DME, df_DX])

In [74]:
df.sample(30)

Unnamed: 0,BU,segment,geo,region,market_area,Q1_2021,Q2_2021,Q3_2021,Q4_2021
20,Digital Experience,Experience Cloud,EMEA,Southwest Europe,Iberica,2889054.0,4012575.0,3049557.0,6099113.0
7,Digital Experience,Experience Cloud,ASIA,Korea,Korea,981859.1,1472789.0,1595521.0,2086450.0
8,Digital Experience,Experience Cloud,ASIA,SEA,Southeast Asia,3797880.0,5649756.0,5495533.0,7397304.0
32,Digital Media,Document Cloud,ASIA,India,India,756929.4,838168.1,976901.5,1044341.0
35,Digital Media,Document Cloud,EMEA,Central Europe,Eastern Europe,33632.13,36196.2,159125.3,204538.3
6,Digital Media,Creative,ASIA,Greater China,China,6390890.0,7214689.0,7461772.0,7697756.0
28,Digital Media,Document Cloud,AMER,North America,United States,19417750.0,23456720.0,35558280.0,48877930.0
9,Digital Media,Creative,ASIA,Korea,Korea,2855967.0,2971994.0,2983956.0,3035081.0
4,Digital Experience,Experience Cloud,ASIA,Greater China,China,1210874.0,1816312.0,1967671.0,2573108.0
0,Digital Media,Creative,AMER,Latin America,Brazil,1151616.0,1118700.0,1765603.0,1978504.0
