## Marketing Funnel by Olist - Data Manipulation
#### Analyst: Isabela Barbosa 
#### Last updated on: 04/12/2022

#### Project description

This code is to manipulate the data used for the Forecast project. 
The dataset was provided by Olist and is available on Kaggle's website on the following link:
https://www.kaggle.com/datasets/olistbr/marketing-funnel-olist?resource=download

### 1. Importing libraries

In [1]:
import pandas as pd
import seaborn as sns
import matplotlib as plt
import datetime

### 2. Loading data

In [2]:
qualified_leads_raw=pd.read_csv('olist_marketing_qualified_leads_dataset.csv')
closed_deals_raw=pd.read_csv('olist_closed_deals_dataset.csv')

In [3]:
qualified_leads_raw.head()

Unnamed: 0,mql_id,first_contact_date,landing_page_id,origin
0,dac32acd4db4c29c230538b72f8dd87d,2018-02-01,88740e65d5d6b056e0cda098e1ea6313,social
1,8c18d1de7f67e60dbd64e3c07d7e9d5d,2017-10-20,007f9098284a86ee80ddeb25d53e0af8,paid_search
2,b4bc852d233dfefc5131f593b538befa,2018-03-22,a7982125ff7aa3b2054c6e44f9d28522,organic_search
3,6be030b81c75970747525b843c1ef4f8,2018-01-22,d45d558f0daeecf3cccdffe3c59684aa,email
4,5420aad7fec3549a85876ba1c529bd84,2018-02-21,b48ec5f3b04e9068441002a19df93c6c,organic_search


In [4]:
closed_deals_raw.head()

Unnamed: 0,mql_id,seller_id,sdr_id,sr_id,won_date,business_segment,lead_type,lead_behaviour_profile,has_company,has_gtin,average_stock,business_type,declared_product_catalog_size,declared_monthly_revenue
0,5420aad7fec3549a85876ba1c529bd84,2c43fb513632d29b3b58df74816f1b06,a8387c01a09e99ce014107505b92388c,4ef15afb4b2723d8f3d81e51ec7afefe,2018-02-26 19:58:54,pet,online_medium,cat,,,,reseller,,0.0
1,a555fb36b9368110ede0f043dfc3b9a0,bbb7d7893a450660432ea6652310ebb7,09285259593c61296eef10c734121d5b,d3d1e91a157ea7f90548eef82f1955e3,2018-05-08 20:17:59,car_accessories,industry,eagle,,,,reseller,,0.0
2,327174d3648a2d047e8940d7d15204ca,612170e34b97004b3ba37eae81836b4c,b90f87164b5f8c2cfa5c8572834dbe3f,6565aa9ce3178a5caf6171827af3a9ba,2018-06-05 17:27:23,home_appliances,online_big,cat,,,,reseller,,0.0
3,f5fee8f7da74f4887f5bcae2bafb6dd6,21e1781e36faf92725dde4730a88ca0f,56bf83c4bb35763a51c2baab501b4c67,d3d1e91a157ea7f90548eef82f1955e3,2018-01-17 13:51:03,food_drink,online_small,,,,,reseller,,0.0
4,ffe640179b554e295c167a2f6be528e0,ed8cb7b190ceb6067227478e48cf8dde,4b339f9567d060bcea4f5136b9f5949e,d3d1e91a157ea7f90548eef82f1955e3,2018-07-03 20:17:45,home_appliances,industry,wolf,,,,manufacturer,,0.0


### 3. Loading copy and renaming columns

In [5]:
qualified_leads=qualified_leads_raw.copy()
closed_deals=closed_deals_raw.copy()

In [6]:
## checking data volume (rows, columns)
qualified_leads.shape

(8000, 4)

In [7]:
closed_deals.shape ## you can see that the conversion rate is a little higher than 10%

(842, 14)

In [8]:
### Renaming columns
qualified_leads=qualified_leads.rename(columns={'mql_id':'Marketing Lead id','origin':'Origem'})
closed_deals=closed_deals.rename(columns={'mql_id':'Marketing Lead id','sdr_id':'Sales Development id','sr.id':'Sales Representative id'})

In [9]:
closed_deals.head()

Unnamed: 0,Marketing Lead id,seller_id,Sales Development id,sr_id,won_date,business_segment,lead_type,lead_behaviour_profile,has_company,has_gtin,average_stock,business_type,declared_product_catalog_size,declared_monthly_revenue
0,5420aad7fec3549a85876ba1c529bd84,2c43fb513632d29b3b58df74816f1b06,a8387c01a09e99ce014107505b92388c,4ef15afb4b2723d8f3d81e51ec7afefe,2018-02-26 19:58:54,pet,online_medium,cat,,,,reseller,,0.0
1,a555fb36b9368110ede0f043dfc3b9a0,bbb7d7893a450660432ea6652310ebb7,09285259593c61296eef10c734121d5b,d3d1e91a157ea7f90548eef82f1955e3,2018-05-08 20:17:59,car_accessories,industry,eagle,,,,reseller,,0.0
2,327174d3648a2d047e8940d7d15204ca,612170e34b97004b3ba37eae81836b4c,b90f87164b5f8c2cfa5c8572834dbe3f,6565aa9ce3178a5caf6171827af3a9ba,2018-06-05 17:27:23,home_appliances,online_big,cat,,,,reseller,,0.0
3,f5fee8f7da74f4887f5bcae2bafb6dd6,21e1781e36faf92725dde4730a88ca0f,56bf83c4bb35763a51c2baab501b4c67,d3d1e91a157ea7f90548eef82f1955e3,2018-01-17 13:51:03,food_drink,online_small,,,,,reseller,,0.0
4,ffe640179b554e295c167a2f6be528e0,ed8cb7b190ceb6067227478e48cf8dde,4b339f9567d060bcea4f5136b9f5949e,d3d1e91a157ea7f90548eef82f1955e3,2018-07-03 20:17:45,home_appliances,industry,wolf,,,,manufacturer,,0.0


### 4. Create BigTable

In [10]:
# TableName=activar biblioteca.merge 
# on=nomedacoluna que vai conectar
# how= tipo de joint, nesse caso eu quero as 8000 linhas dos leads e os que nao converteram vao ficar sem os dados

bigtable=pd.merge(qualified_leads,closed_deals, on='Marketing Lead id', how='left')

In [18]:
# ver se a mescla foi feita corretamente (17 colunas > 14 + 4 - Marketing lead id repetida)
bigtable.shape

(8000, 19)

In [12]:
bigtable.head()

Unnamed: 0,Marketing Lead id,first_contact_date,landing_page_id,Origem,seller_id,Sales Development id,sr_id,won_date,business_segment,lead_type,lead_behaviour_profile,has_company,has_gtin,average_stock,business_type,declared_product_catalog_size,declared_monthly_revenue
0,dac32acd4db4c29c230538b72f8dd87d,2018-02-01,88740e65d5d6b056e0cda098e1ea6313,social,,,,,,,,,,,,,
1,8c18d1de7f67e60dbd64e3c07d7e9d5d,2017-10-20,007f9098284a86ee80ddeb25d53e0af8,paid_search,,,,,,,,,,,,,
2,b4bc852d233dfefc5131f593b538befa,2018-03-22,a7982125ff7aa3b2054c6e44f9d28522,organic_search,,,,,,,,,,,,,
3,6be030b81c75970747525b843c1ef4f8,2018-01-22,d45d558f0daeecf3cccdffe3c59684aa,email,,,,,,,,,,,,,
4,5420aad7fec3549a85876ba1c529bd84,2018-02-21,b48ec5f3b04e9068441002a19df93c6c,organic_search,2c43fb513632d29b3b58df74816f1b06,a8387c01a09e99ce014107505b92388c,4ef15afb4b2723d8f3d81e51ec7afefe,2018-02-26 19:58:54,pet,online_medium,cat,,,,reseller,,0.0


In [20]:
### put the dates as month (first_contact_date and won_date)
# nome_da_tabela [nome da coluna que esta sendo criada] = funcao para converter o dado em data.funcao para alterar para mes e ano

bigtable ['month_year_first_contact'] = pd.to_datetime(bigtable['first_contact_date']).dt.to_period('M')
bigtable ['month_year_won_date'] = pd.to_datetime(bigtable['won_date']).dt.to_period('M')

In [21]:
bigtable.head()

Unnamed: 0,Marketing Lead id,first_contact_date,landing_page_id,Origem,seller_id,Sales Development id,sr_id,won_date,business_segment,lead_type,lead_behaviour_profile,has_company,has_gtin,average_stock,business_type,declared_product_catalog_size,declared_monthly_revenue,month_year_first_contact,month_year_won_date
0,dac32acd4db4c29c230538b72f8dd87d,2018-02-01,88740e65d5d6b056e0cda098e1ea6313,social,,,,,,,,,,,,,,2018-02,NaT
1,8c18d1de7f67e60dbd64e3c07d7e9d5d,2017-10-20,007f9098284a86ee80ddeb25d53e0af8,paid_search,,,,,,,,,,,,,,2017-10,NaT
2,b4bc852d233dfefc5131f593b538befa,2018-03-22,a7982125ff7aa3b2054c6e44f9d28522,organic_search,,,,,,,,,,,,,,2018-03,NaT
3,6be030b81c75970747525b843c1ef4f8,2018-01-22,d45d558f0daeecf3cccdffe3c59684aa,email,,,,,,,,,,,,,,2018-01,NaT
4,5420aad7fec3549a85876ba1c529bd84,2018-02-21,b48ec5f3b04e9068441002a19df93c6c,organic_search,2c43fb513632d29b3b58df74816f1b06,a8387c01a09e99ce014107505b92388c,4ef15afb4b2723d8f3d81e51ec7afefe,2018-02-26 19:58:54,pet,online_medium,cat,,,,reseller,,0.0,2018-02,2018-02


### 5. Convert BigTable in a Excel file

In [24]:
bigtable.to_excel('bigtable_marketing_funel.xlsx')