Notebook with weekly ETLs for multiple purposes. Being **KPIs** keeping an archive of prospection and Lead Generation KPIs such as Marketing Qualified Lead, Revenue etc. Also, **Cluster Brazil Total** being delivering the desired information for Brazil's Sales Team to analyse and **Weekly Stats** being prospection numbers and Go-to-Markets from the Prospection Team.

In [None]:
!pip install -q --upgrade gspread
!pip install -q pydrive

# Google Authorization
---

In [None]:
from google.colab import auth
auth.authenticate_user()
import pandas as pd
import gspread
from oauth2client.client import GoogleCredentials
gc = gspread.authorize(GoogleCredentials.get_application_default())
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

# KPIs
----

In [None]:
#Personal KPI File
kpi = gc.open_by_key('KEY')
# Sales Data - Sheet
prosparchives = kpi.worksheet('Prospection Archives')

Acessing and working with Sales Data 6.0

In [None]:
#General Sales - Coorp 6.0 Workbook
wb = gc.open_by_url('KEY')
# Sales Data - Sheet
ws = wb.worksheet("Sales Data")

Accessing Hubspot's Deals Report

In [None]:
#Hubspot - DealsReport
deals_download = drive.CreateFile({'id':'KEY'})
deals_download.GetContentFile('DealsReport.xlsx')
deals_df = pd.read_excel('DealsReport.xlsx')
print(deals_df.columns)

Index(['Deal ID', 'CI', 'Deal Name', 'Pipeline', 'Deal Stage', 'Create Date',
       'First task completed date', 'Last Activity Date', 'Last Contacted',
       'MQL Date', 'SQL Date', 'Opportunity Date', 'Source',
       'Source Marketing', 'Inbound Type', 'Last Modified Date', 'Close Date',
       'Amount in company currency', 'Strategy', 'BDR', 'LGA', 'Deal owner',
       'Channel', 'Country', 'Segment', 'Industry',
       'Lead Response Time (Hours)', 'Demo Date', 'Deal Type'],
      dtype='object')


Filtering and Updating **Prospection Archives**

In [None]:
# get all values from rows from Sales Data
rows = ws.get_all_values()
df = pd.DataFrame.from_records(rows[3:],columns=rows[1])
print(df.columns)
#Apply filters in the df from Sales Data
personaldf = df[df['LGA']=='Vinicius']
print(personaldf)

Index(['', '# employees ( >100)', 'Website in English', 'HQ Non Spanish',
       'Best Place2Work or Ranking Merco', 'HR Department', 'Date (M-D-Y)',
       'Company Name', 'ID', 'LGA', 'Country', 'City', 'Score Corp',
       'Course 1', 'Course 2', 'Course 3\nSoft skills', 'Industry', 'Web Page',
       'Contact Name', 'Score', 'Title', 'Área', 'Email', 'Linkedin', 'Phone',
       'NeverBounce', 'Status', 'Campaign Number \nCampaing \n(día/mes/año)',
       '', 'OUT OF CRITERIA', '', '', '', '', '', '', '', '', '', '', '', ''],
      dtype='object')
         # employees ( >100) Website in English HQ Non Spanish  ...            
25411                201-500                Yes             No  ...            
25412                201-500                Yes             No  ...            
25413                201-500                Yes             No  ...            
25414               501-1000                Yes             No  ...            
25415               501-1000               

In [None]:
# Convert the Dataframe to a List, sum with the personal Dataframe and then turns into a Python List with Numpy
# That's because the cells won't be updated with a dataframe value or a spreadsheets value that is read as json or csv.
#def updateSpreadsheet()
listconvertpersonaldf = [personaldf.columns.tolist()] + personaldf.to_numpy().tolist()
prosparchives.update("A1",listconvertpersonaldf)

##Filtering and Updating **KPIs Metrics**

In [None]:
kpiupdate = kpi.worksheet('Input Hubspot')

In [None]:
LGAsHubs = deals_df["LGA"]
uniqueLGAsHubs = LGAsHubs.drop_duplicates().tolist()

In [None]:
#
for LGAsHubs in uniqueLGAsHubs:
  print(LGAsHubs)

Isabella Rivera
nan
Ginna Acuña
Aline Omote
Harbey Morato
Divermedios
Camila Acosta
Own
Juan Manuel Jauregui
Renata Texeira
Daniela Ojeda
Inbound
Juan Pablo Peñuela
Valeria Silvera
Paola Adrianofabre
Melanie Quintero
Diana Dávila
Natalia de Vivero
Laura Restrepo
Angela Martinez
Paula Jaramillo
Tatiana Shayo
Brenda Merino
Vinicius Ramos
Arturo Salazar
Juliana Padilla
Alfredo Loredo
Maria Lucia Pardo
Diego Trujillo
Andrea Moncada
Referral


In [None]:
deals_personal = deals_df[(deals_df["LGA"].isin(['Vinicius Ramos']))]
deals_personal

In [None]:
deals_per2 = deals_personal.copy()

In [None]:
display(deals_personal.columns)

Index(['Deal ID', 'CI', 'Deal Name', 'Pipeline', 'Deal Stage', 'Create Date',
       'First task completed date', 'Last Activity Date', 'Last Contacted',
       'MQL Date', 'SQL Date', 'Opportunity Date', 'Source',
       'Source Marketing', 'Inbound Type', 'Last Modified Date', 'Close Date',
       'Amount in company currency', 'Strategy', 'BDR', 'LGA', 'Deal owner',
       'Channel', 'Country', 'Segment', 'Industry',
       'Lead Response Time (Hours)', 'Demo Date', 'Deal Type'],
      dtype='object')

Metrics:

In [None]:
deals_personal.dtypes

Deal ID                                int64
CI                                   float64
Deal Name                             object
Pipeline                              object
Deal Stage                            object
Create Date                   datetime64[ns]
First task completed date     datetime64[ns]
Last Activity Date            datetime64[ns]
Last Contacted                datetime64[ns]
MQL Date                      datetime64[ns]
SQL Date                      datetime64[ns]
Opportunity Date              datetime64[ns]
Source                                object
Source Marketing                      object
Inbound Type                          object
Last Modified Date            datetime64[ns]
Close Date                    datetime64[ns]
Amount in company currency           float64
Strategy                              object
BDR                                   object
LGA                                   object
Deal owner                            object
Channel   

In [None]:
deals_opp = deals_per2[deals_per2['Opportunity Date']> '2021-07-12']

In [None]:
deals_sqls = deals_per2[deals_per2['SQL Date']> '2021-07-12']

In [None]:
kpis_sqlvalue = deals_sqls[['Create Date','SQL Date','Amount in company currency']].groupby(pd.Grouper(key='SQL Date',freq='M')).sum()
print(kpis_sqlvalue)

            Amount in company currency
SQL Date                              
2021-08-31                      6350.0
2021-09-30                     37228.0
2021-10-31                     29890.0
2021-11-30                    132805.0


In [None]:
kpis_oppvalue = deals_opp[['Create Date','Opportunity Date','Amount in company currency']].groupby(pd.Grouper(key='Opportunity Date',freq='M')).sum()
print(kpis_oppvalue)

                  Amount in company currency
Opportunity Date                            
2021-09-30                           18793.0
2021-10-31                               0.0
2021-11-30                          126175.0


In [None]:
kpis_metricsMQL = deals_personal[['Create Date', 'MQL Date']].groupby(pd.Grouper(key='Create Date',freq='M')).count()
print(kpis_metricsMQL)

             MQL Date
Create Date          
2021-08-31          3
2021-09-30          9
2021-10-31          6
2021-11-30         11


In [None]:
kpis_metricsSQL = deals_personal[['SQL Date','Deal Name']].groupby(pd.Grouper(key='SQL Date',freq='M')).count()
print(kpis_metricsSQL)

            Deal Name
SQL Date             
2021-08-31          1
2021-09-30          4
2021-10-31          2
2021-11-30          4


In [None]:
kpis_metricsOPPs = deals_personal[['Opportunity Date','Deal Name']].groupby(pd.Grouper(key='Opportunity Date',freq='M')).count()
print(kpis_metricsOPPs)

                  Deal Name
Opportunity Date           
2021-09-30                2
2021-10-31                0
2021-11-30                2


Merge the two dataframes

In [None]:
dataframes = [kpis_metrics,kpis_sqlvalue,kpis_oppvalue]

In [None]:
kpis_final = pd.merge(kpis_metrics,kpis_sqlvalue,on='Create Date')

In [None]:
kpis_final_opp = pd.merge(kpis_final,kpis_oppvalue,how='outer',on='Create Date')
print(kpis_final_opp)

In [None]:
kpis_final_opp.fillna('', inplace=True)

Update command

In [None]:
listkpisdf = [kpis_final_opp.columns.tolist()] + kpis_final_opp.to_numpy().tolist()
kpiupdate.update("A1",listkpisdf)

# Cluster Brazil Total
--------


In [None]:
cluster = gc.open_by_key('KEY')
# LGA Input - Sheet
lga_input = cluster.worksheet('Teste LGA Input')

Accessing Sales Data 6.0

In [None]:
clusterdf = df[(df["LGA"].isin(['Aline', 'Vinicius', 'Tatiana','Renata']))]
cluster1_lgainput_vita = clusterdf[["Date (M-D-Y)","Company Name","LGA","Country","Industry","Score"]]

Accessing Sales Data 5.0 (Ali's)

In [None]:
#General Sales - Coorp 5.0 Workbook
wb2 = gc.open_by_url('KEY')
# Sales Data - Sheet
ws2 = wb2.worksheet("Sales Data")

In [None]:
# get all values from rows from Sales Data
rows = ws2.get_all_values()
df2 = pd.DataFrame.from_records(rows[3:],columns=rows[1])
#Apply filters in the df from Sales Data
#personaldf = df[(df["LGA"]=="Vinicius") | (df["LGA"]=="Tatiana") | (df["LGA"]=="Aline")]
personaldf2 = df2[(df2["LGA"].isin(['Aline']))]
cluster1_lgainput_ali = personaldf2[["Date (M-D-Y)","Company Name","LGA","Country","Industry","Score"]]
print(cluster1_lgainput_ali)

Joining the two Dataframes in one for Cluster Re view

In [None]:
Cluster1_LGA_DF = pd.concat([cluster1_lgainput_ali,cluster1_lgainput_vita])
print(Cluster1_LGA_DF.head())

In [None]:
Cluster1_LGA_DF[Cluster1_LGA_DF['Company Name'].str.len() > 0]
Cluster1_LGA_DF[Cluster1_LGA_DF['Score'].str.len() > 0]

Update Command

In [None]:
# Convert the Dataframe to a List, sum with the personal Dataframe and then turns into a Python List with Numpy
# That's because the cells won't be updated with a dataframe value or a spreadsheets value that is read as json or csv.
#def updateSpreadsheet()
listconvertpersonaldf = [Cluster1_LGA_DF.columns.tolist()] + Cluster1_LGA_DF.to_numpy().tolist()
lga_input.update('A1',listconvertpersonaldf)

## BDR Cluster 1 - Update

### Workbook Access & Filtering

In [None]:
# BDR Input - Sheet
bdr_input = cluster.worksheet('Teste BDR input')

In [None]:
#Hubspot - DealsReport
deals_download = drive.CreateFile({'id':'KEY'})
deals_download.GetContentFile('DealsReport.xlsx')
deals_df = pd.read_excel('DealsReport.xlsx')
print(deals_df.columns)

Index(['Deal ID', 'CI', 'Deal Name', 'Pipeline', 'Deal Stage', 'Create Date',
       'First task completed date', 'Last Activity Date', 'Last Contacted',
       'MQL Date', 'SQL Date', 'Opportunity Date', 'Source',
       'Source Marketing', 'Inbound Type', 'Last Modified Date', 'Close Date',
       'Amount in company currency', 'Strategy', 'BDR', 'LGA', 'Deal owner',
       'Channel', 'Country', 'Segment', 'Industry',
       'Lead Response Time (Hours)', 'Demo Date', 'Deal Type'],
      dtype='object')


In [None]:
BDRss = deals_df['BDR']
BDRSUNI = BDRss.drop_duplicates().tolist()

In [None]:
debs_df = deals_df[(deals_df["BDR"].isin(['Débora Boschini','Lara Almeida','Gabriela Iglesias']))]

In [None]:
Opps_Dates = debs_df["Opportunity Date"] = debs_df["Opportunity Date"].dt.strftime('%Y-%m-%d %H:%M:%S')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


In [None]:
MQL_Dates = debs_df["MQL Date"] = debs_df["MQL Date"].dt.strftime('%Y-%m-%d %H:%M:%S')


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


In [None]:
SQL_Dates = debs_df["SQL Date"] = debs_df["SQL Date"].dt.strftime('%Y-%m-%d %H:%M:%S')


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


In [None]:
debs_df["MQL Dates"] = MQL_Dates
debs_df["SQL Dates"] = SQL_Dates
debs_df["OPPs Dates"] = Opps_Dates
print(debs_df.columns)

Index(['Deal ID', 'CI', 'Deal Name', 'Pipeline', 'Deal Stage', 'Create Date',
       'First task completed date', 'Last Activity Date', 'Last Contacted',
       'MQL Date', 'SQL Date', 'Opportunity Date', 'Source',
       'Source Marketing', 'Inbound Type', 'Last Modified Date', 'Close Date',
       'Amount in company currency', 'Strategy', 'BDR', 'LGA', 'Deal owner',
       'Channel', 'Country', 'Segment', 'Industry',
       'Lead Response Time (Hours)', 'Demo Date', 'Deal Type', 'MQL Dates',
       'SQL Dates', 'OPPs Dates'],
      dtype='object')


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  This is separate from the ipykernel package so we can avoid doing imports until


In [None]:
bdr_dates_df = pd.concat([Opps_Dates,MQL_Dates,SQL_Dates], axis=1)

In [None]:
debs_df.fillna('', inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  downcast=downcast,


In [None]:
cluster1_bdrinput = debs_df[["Deal Name","BDR","LGA","Country","Industry","MQL Dates","SQL Dates","OPPs Dates","Amount in company currency"]]


In [None]:
# Convert the Dataframe to a List, sum with the personal Dataframe and then turns into a Python List with Numpy
# That's because the cells won't be updated with a dataframe value or a spreadsheets value that is read as json or csv.
#def updateSpreadsheet()
listconvertbdrdf = [cluster1_bdrinput.columns.tolist()] + cluster1_bdrinput.to_numpy().tolist()
bdr_input.update("A1",listconvertbdrdf)

# Weekly Stats
-----


In [None]:
# Cami Estatísticas File
stats_vini = gc.open_by_key('KEY')
#Sheets
stats_vini_sheets = stats_vini.worksheet('Input Py')

In [None]:
# Cami Estatísticas File
stats_rena = gc.open_by_key('KEY')
#Sheets
stats_rena_sheets = stats_rena.worksheet('Input Py')

In [None]:
# Cami Estatísticas File
stats_tati = gc.open_by_key('KEY')
#Sheets
stats_tati_sheets = stats_tati.worksheet('Input Py')

In [None]:
# Stats DF
st_df = df.copy()
st_df['Date (M-D-Y)'] = pd.to_datetime(df['Date (M-D-Y)'])
stats_df = st_df[['LGA','Date (M-D-Y)','Company Name','# employees ( >100)','Industry']]
stats_df.drop_duplicates('Company Name')
weekday = stats_df['Date (M-D-Y)'].dt.week
stats_df['WeekNum'] = weekday
stats_df['Date (M-D-Y)'] = stats_df['Date (M-D-Y)'].astype(str)
stats_df['WeekNum'] = stats_df['WeekNum'].astype(str)

In [None]:
stats_df

**Vini's Stats**

In [None]:
#Apply filters in the df from Sales Data
vinidf = stats_df[stats_df['LGA']=='Vinicius']
vinidf_final = vinidf.drop_duplicates('Company Name')

In [None]:
print(vinidf_final)

In [None]:
# Convert the Dataframe to a List, sum with the personal Dataframe and then turns into a Python List with Numpy
# That's because the cells won't be updated with a dataframe value or a spreadsheets value that is read as json or csv.
#def updateSpreadsheet()
listconvertvinidf = [vinidf_final.columns.tolist()] + vinidf_final.to_numpy().tolist()
stats_vini_sheets.update("A1",listconvertvinidf)

**Tati's Stats**

In [None]:
#Apply filters in the df from Sales Data
tatidf = stats_df[stats_df['LGA']=='Tatiana']
tatidf_final = tatidf.drop_duplicates('Company Name')

In [None]:
# Convert the Dataframe to a List, sum with the personal Dataframe and then turns into a Python List with Numpy
# That's because the cells won't be updated with a dataframe value or a spreadsheets value that is read as json or csv.
#def updateSpreadsheet()
listconverttatidf = [tatidf_final.columns.tolist()] + tatidf_final.to_numpy().tolist()
stats_tati_sheets.update("A1",listconverttatidf)

**Rena's Stats**

In [None]:
#Apply filters in the df from Sales Data
renadf = stats_df[stats_df['LGA']=='Renata']
renadf_final = renadf.drop_duplicates('Company Name')

In [None]:
listconvertrenadf = [renadf_final.columns.tolist()] + renadf_final.to_numpy().tolist()
stats_rena_sheets.update("A1",listconvertrenadf)