## Olist Dataset SQL exploratory analysis and visualisation

Thank you for taking the time to read this exploratory analysis SQL/Python Markdown project. My intention for this project is to answer the questions below in SQL, and with relevant python visualisations, followed by a Tableau dashboard for executives to examine and receive actionable ideas from.

Olist is an online Ecommerce Platform which allows sellers to list their products to the main marketplaces of Brazil e.g. Amazon, Vivo, Casa and Video. How the business works is:
- Sellers list their item through Olist.
- Olist lists the items onto different marketplaces, giving the sellers more visibility.
- Olist charges the business a monthly fee to hold an account, and a commission per sale.

Olist have come up to me, a Data Analyst consultant to answer the questions below, and to provide a dashboard with key KPIs for their executives.

1: What is the total revenue generated by Olist, and how has it changed over time?

2: How many orders were placed on Olist, and how does this vary by month or season?

3: What are the most popular product categories on Olist, and how do their sales volumes compare to each other?

4: What is the average order value (AOV) on Olist, and how does this vary by product category or payment method?

5: How many sellers are active on Olist, and how does this number change over time?

6: What is the distribution of seller ratings on Olist, and how does this impact sales performance?

7: How many customers have made repeat purchases on Olist, and what percentage of total sales do they account for?

8: What is the average customer rating for products sold on Olist, and how does this impact sales performance?

9: What is the average order cancellation rate on Olist, and how does this impact seller performance?

10: What are the top-selling products on Olist, and how have their sales trends changed over time?

11: Which payment methods are most commonly used by Olist customers, and how does this vary by product category or geographic region?

12: How do customer reviews and ratings affect sales and product performance on Olist?

13: Which product categories have the highest profit margins on Olist, and how can the company increase profitability across different categories?

14: How do Olist’s marketing spend and channel mix impact sales and customer acquisition costs, and how can the company optimize its marketing strategy to increase ROI?

15: Geolocation having high customer density. Calculate customer retention rate according to geolocations

The questions are provided from this medium article, with many thanks: https://medium.com/@tobye070/the-exploratory-data-analysis-on-olist-e-commerce-dataset-cbddd09d936c 

The SQL queries shown in the following are my own, with my interpretation of the questions provided.



In [10]:
# importing dependencies into notebook

import pandas as pd
from google.cloud import bigquery
from google.oauth2 import service_account
import plotly.express as px


credentials = service_account.Credentials.from_service_account_file('totemic-studio-372000-56c1543a7106.json')
client = bigquery.Client(credentials=credentials)

# testing dataframes and bigquery connection.
query = """
SELECT * 
FROM totemic-studio-372000.olist_data_set.olist_orders_dataset
LIMIT 5
"""

df = client.query(query).to_dataframe()
display(df)


Unnamed: 0,order_id,customer_id,order_status,order_purchase_timestamp,order_approved_at,order_delivered_carrier_date,order_delivered_customer_date,order_estimated_delivery_date,delivery_time
0,7a4df5d8cff4090e541401a20a22bb80,725e9c75605414b21fd8c8d5a1c2f1d6,created,2017-11-25 11:10:00+00:00,NaT,NaT,NaT,2017-12-12 00:00:00+00:00,0.0
1,35de4050331c6c644cddc86f4f2d0d64,4ee64f4bfc542546f422da0aeb462853,created,2017-12-05 01:07:00+00:00,NaT,NaT,NaT,2018-01-08 00:00:00+00:00,0.0
2,b5359909123fa03c50bdb0cfed07f098,438449d4af8980d107bf04571413a8e7,created,2017-12-05 01:07:00+00:00,NaT,NaT,NaT,2018-01-11 00:00:00+00:00,0.0
3,dba5062fbda3af4fb6c33b1e040ca38f,964a6df3d9bdf60fe3e7b8bb69ed893a,created,2018-02-09 17:21:00+00:00,NaT,NaT,NaT,2018-03-07 00:00:00+00:00,0.0
4,90ab3e7d52544ec7bc3363c82689965f,7d61b9f4f216052ba664f22e9c504ef1,created,2017-11-06 13:12:00+00:00,NaT,NaT,NaT,2017-12-01 00:00:00+00:00,0.0


**1: What is the total revenue generated by Olist, and how has it changed over time?**

In [11]:
# Finding the monthly revenue of Olist
query = '''
WITH cte AS (
  SELECT CONCAT(EXTRACT(MONTH FROM o.order_purchase_timestamp), '-', EXTRACT(YEAR FROM o.order_purchase_timestamp)) AS monthyear, p.payment_value AS payment
  FROM `totemic-studio-372000.olist_data_set.olist_orders_dataset` o
  JOIN `totemic-studio-372000.olist_data_set.olist_order_payments_dataset` p
    ON o.order_id = p.order_id
)

SELECT PARSE_DATE('%m-%Y', monthyear) AS month_start, round(SUM(payment),2) AS total_payment
FROM cte
GROUP BY month_start
order by month_start
'''
df = client.query(query).to_dataframe()
display(df)


Unnamed: 0,month_start,total_payment
0,2016-09-01,252.24
1,2016-10-01,59090.48
2,2016-12-01,19.62
3,2017-01-01,138488.04
4,2017-02-01,291908.01
5,2017-03-01,449863.6
6,2017-04-01,417788.03
7,2017-05-01,592918.82
8,2017-06-01,511276.38
9,2017-07-01,592382.92


In [16]:
# finding the total revenue during this time period

total_revenue_query = """ 
SELECT sum(payment_value) as total_revenue
FROM totemic-studio-372000.olist_data_set.olist_order_payments_dataset
"""
total_payments = client.query(total_revenue_query).to_dataframe()
display(total_payments)

Unnamed: 0,total_revenue
0,16008870.0


In [19]:
# making the final result prettier (not scientific notation)
total_revenue = total_payments['total_revenue'].values[0]
display(f'${total_revenue:.2f}')


'$16008872.12'

In [20]:
fig = px.bar(df, x= 'month_start', y='total_payment', text_auto = True)
fig.show()

**2: How many orders were placed on Olist, and how does this vary by month or season?**

**3: What are the most popular product categories on Olist, and how do their sales volumes compare to each other?**

**4: What is the average order value (AOV) on Olist, and how does this vary by product category or payment method?**