# Description

This notebook was used for prototyping ways to postprocess docsend data that has been exported to a Google Sheet in GDrive.

- getting the docsend data using this zapier connection might be good if used properly in conjunction with the GSheets API: https://zapier.com/apps/docsend/integrations/google-sheets/175058/create-google-sheet-rows-for-new-visits-in-docsend

## Instructions

- Run pip install gspread-pandas in terminal
- Choose between the GSheets and CSV methods


### Method 1 - GSheets
- Follow the directions here for *Client Credentials* and follow the *Service Account Route* : https://gspread-pandas.readthedocs.io/en/latest/getting_started.html
    - Remember to save your credentials to the gspread_pandas folder
- Get the email of your service account
    - This can be found under Service Accounts header in the Credentials Tab of your project
- Share your Google Sheets file with the service account email
- Rename the gsheets_name variable to the name of the file
- Run notebook with Google Sheets mode

### Method 2 - CSV Download
- Download the CSV to the current directory(the place where this notebook exists)
- Rename the csv_file_name variable to the file name

In [59]:
# imports

import numpy as np
import pandas as pd
from gspread_pandas import Spread

In [60]:
## Names and external imports
gsheets_name = "Kaizen - VC presentation - v1.5-export"
csv_file_name = "docsend_data.csv"

In [61]:
## load data
if gsheets_name != "":
    spread = Spread(gsheets_name)
    data = spread.sheets[0].get_values()
    headers = data.pop(0)
    df = pd.DataFrame(data, columns=headers)
else:
    df = pd.read_csv(csv_file_name)

In [62]:
## clean data
df["Duration"] = pd.to_timedelta(df["Duration"])
df = df.replace(r"^\s*$", np.nan, regex=True)
df

Unnamed: 0,Created At,Name,Email,Link Name,Duration,% Completion,Link Owner,Content Version,Account,Downloaded At,Printed At
0,2023-11-15 14:36:52 UTC,,dteten@versatilevc.com,Mailmerge,0 days 00:03:53,1,Crypto Crypto,4,,,
1,2023-11-15 14:35:02 UTC,,dennis@rre.com,Mailmerge,0 days 00:00:10,0.04,Crypto Crypto,4,,,
2,2023-11-15 14:34:32 UTC,,dennis@rre.com,Mailmerge,0 days 00:00:00,0,Crypto Crypto,4,,,
3,2023-11-15 14:33:56 UTC,,dennis@rre.com,Mailmerge,0 days 00:00:10,0.08,Crypto Crypto,4,,,
4,2023-11-15 04:37:54 UTC,,jschmidt@a16z.com,Mailmerge,0 days 00:03:16,0.96,Crypto Crypto,4,,,
...,...,...,...,...,...,...,...,...,...,...,...
157,2023-10-25 06:49:34 UTC,,,All,0 days 00:01:58,1,Crypto Crypto,2,All,,
158,2023-10-24 18:11:05 UTC,,,All,0 days 00:00:08,0.04,Crypto Crypto,2,All,,
159,2023-10-23 23:26:45 UTC,,,All,0 days 04:32:55,1,Crypto Crypto,2,All,,
160,2023-10-23 20:53:57 UTC,,,All,0 days 00:02:55,0.92,Crypto Crypto,2,All,,


In [63]:
## compute data
sorted_by_duration_df = df.sort_values(by="Duration", ascending=False)
sorted_by_duration_df = sorted_by_duration_df.dropna(subset=["Email"])
top_ten_emails = sorted_by_duration_df["Email"].unique()[:10]

In [64]:
## show data
print(top_ten_emails)

['info@electriccapital.com' 'jschmidt@a16z.com' 'jayhao1@gmail.com'
 'dteten@versatilevc.com' 'james@fika.vc' 'info@rre.com'
 'cf@foxventures.io' 'eesa.ahmad@crypto.com' 'dennis@rre.com'
 'jackdavis@eureliosollutions.com']
