Using a Google BigQuery, fictitious ecommerce database, I used SQL and DAX to build a Power BI dashboard for analysis.
Google BigQuery Google Open Datasets, "thelook_ecommerce"
In this project, I pretended I was given a task by the Vice President of Logistsics to discover any and all major issues with the back end of the clothing and textile ecommerce company's supply chain; this company is named "TheLook." This led to a Goods-To-Consumer-Type dashboard and analysis. The Vice President wanted a way to quickly injest all order and shipping data in a simple, yet holistic, manner in real time and at the national level. (Knowing this is a Google-generated fictitious database, I was bound to find some things that were odd.)
The "thelook_ecommerce" database in the Google BigQuery open datasets environment consists of 6 tables:
- distribution_centers (dimension) - as the name describes
- events (fact) - web data as it applies to each users' purchase
- inventory_items (dimension) - items contained in inventory
- order_items (fact) - the items within each order
- orders (fact) - orders made and the orders' attributes
- products (dimension) - products and their attributes
- users (dimension) - customers
With the VP being a very busy person, there would be very limited time to sift through many visuals on a report page. With that in mind, I opted for the multi-page approach where each page had a specific purpose and a limited amount of visuals.
I used SQL as the method to extract the data from the database. I chose not to use a live connection; rather, I used the option to make a copy of the data in the Power BI environment. Using this method increases the performance of the dashboard and removes limits on building measures and calculated columns. If I were to publish this dashboard, I would set the dashboard to refresh every night at midnight; this way, the data would be current to the most recently completed business day.
- The original SQL query extracted data for orders, where they came from, their current status, which distribution center they came from, and the items contained in each order.
- Next, I created a copy of this table using Power Query, deleted duplicate order IDs, and deleted product columns. This would be the table I would actually use for the report visuals and measures. My purpose for doing this was to create a clean table that focused exclusively on distribution center performance; however, I still had an original table with product information for future scalability, if desired.
- In order to maintain a scalable model using an auto calendar table, I changed each DATETIME column to a DATE column. Power BI's time intelligence feature will lock up if you attempt to utilize it with DATETIME columns. It is not designed for data at such a granular scale, nor is it necessary.
- I created a calendar table with CALENDARAUTO() using the "create table" option. I joined this calendar table to our duplicate table for clean time series visuals and analysis. I set the fiscal year to end on 6/30 and begin on 7/1.
The second part of the data modeling process required me to design some flags for unique order ID KPIs to measure performance.
- Delayed Shipping Flag - any order that took longer than two days to ship is considered delayed shipping, but I also made sure to filter out blank created_at and shipped_at cells to stay within the scope of the measure
- Failed to Ship Flag - any order that has been stuck in the processing stage for longer than 7 days
- Unshipped Order Flag - any order that is not yet late past the 2-day flag, but is waiting to be shipped
- Delayed Shipping Rate - the percent rate at which orders are considered "delayed" using the flag custom column
- Delayed Shipping Target - the same delayed shipping rate but calucalated for the same period last year; for use with a KPI card
- Order Failure Rate - the percent rate at which orders are considered "failed" using the flag custom column
- Order Failure Target - the same failed shipping rate but calculated for the same period last year; for use with a KPI card
- Overview - an overall dashboard with a high-level status pipeline, visualized as a heat map matrix; and an order failure stacked bar chart, to see how many orders consistent of failures--all filterable with a date slider -clicking a distribution center in the matrix will update the overview cards at the top for that specific warehouse
- Delayed Shipping Over Time - a detailed view and YoY KPI of how each distribution center performs with shipping delays over time; can be isolated to view a single distribution center and specific year
- Failed Shipping Over Time - a detailed view and YoY KPI of how each distribution center performs with shipping failures over time; can be isolated to view a single distribution center and specific year
- Shipping Decomp Analysis - a supplementary tree map to view the flow of goods to consumers and the amounts of items contained in each shipping status
Because this is a refreshable dashboard, the current analysis documented is as of 4/15/2023.
As mentioned earlier in the documentation, the data presented was bound to have oddities. With the VP of Logistics tasking me to find major areas or concern as well as a monitoring process, I quickly developed two KPIs.
- There is an order failure rate of 33.38%, nationally. I don't have the data to dive into this issue any further, but this amount of orders placed that never shipped is a major, major problem. Any national ecommerce business cannot survive for long with logistics failures of this magnitude.
- Of the orders that did ship, 8.58% took longer than two days to ship. This is not good but a more manageable area to tackle.
- The Los Angeles, Philadephia, and Mobile distribution centers seem to be correcting delayed shipping errors over time.
- The Los Angeles distribution center has maintained a decreasing failure and delayed shipping rate (after 2020) since the start up of the company. This distribution center is the only distribution center to maintain decreasing rates for both KPIs.
- Most distribution centers are receiving returns close to 10% of the time per each fiscal year.
The very serious problem areas discovered in this report should prompt at least two more targeted analyses.
- We need to start collecting data on warehouse staffing and design. We don't have the data available in the current "thelook_ecommerce" database to investigate further. With the very poor rates on order failure and delays being on a national scale, its guaranteed that the problems are occuring in the logistics model.
- We need to collect data on the software and/or ordering system used within the company. We have access to the "events" table, which could be joined to our "orders" table. Doing an analysis on web events could help use find if any specific browsers or areas contribute to order failure.
- This would be out of scope of the current dashboard, so that will have to be a separate analysis.
- Visit the Los Angeles distribution center to meet with leadership. This is a great location to benchmark how they are correcting shipping errors.
- Collect data by way of quarterly surveys or day-of return reasons to diagnose the magnitute of returns.
Overall, TheLook has a few very large issues if they are to recover from this logistical nightmare that has occured over the past 5 years. Immediately tasking data analyst teams on the 4 projects above is vital to the survival of the company.
I encourage anyone who has read this project and analysis to investigate possible causes of TheLook's goods-to-consumer nightmare.
Thank you for reading!