Welcome! This case study is designed to simulate a typical project for an Analytics Engineer on our Data & Integration team. Our goal is to see how you approach a real-world dataset and transform it into a valuable, trusted analytical asset.
This exercise focuses on your data modeling philosophy, technical skills with modern tools, and your ability to create a foundation for business insights, all while adhering to software engineering best practices.
Fictional Company Context: You are a new Analytics Engineer at "Nordic Trail Outfitters," an e-commerce company specializing in outdoor and hiking gear. We've just acquired a new company that operates in Brazil, and we need you to analyze their historical sales data to understand their business.
This repository is a self-contained, ready-to-use dbt project. We have already handled the initial setup to allow you to focus on the core analytics engineering tasks.
Included in this repository:
- A standard
dbtproject structure. uvfor simple and fast Python environment and dependency management (seepyproject.toml).- The raw Brazilian E-Commerce Public Dataset by Olist located in the
seedsdirectory. - A pre-configured
profiles.ymlthat configuresdbtto use a localduckdbdatabase file (olist.duckdb), which will be created in the project directory.
To begin, clone this repository and ensure you have uv installed (pip install uv). Then, follow these steps in your terminal:
-
Install Dependencies: This command creates the virtual environment (
.venv) if it doesn't exist and installs all dependencies specified inpyproject.toml.Bash
uv sync -
Set Up Your IDE (VS Code): To ensure features like auto-completion and extensions (e.g., dbt Power User) work correctly, you must select the project's Python interpreter.
- Open the Command Palette (
Ctrl+Shift+PorCmd+Shift+P). - Search for
Python: Select Interpreter. - Choose the interpreter located in the
./.venvfolder of this project.
- Open the Command Palette (
-
Load Raw Data: Use the
dbt seedcommand to load all the raw CSV files from theseedsdirectory into your localduckdbdatabase.1Bash
uv run dbt seed -
Verify Setup: You can verify that the data has been loaded correctly by running the test model.
Bash
uv run dbt runThis should output a
test.parquetfile into thedatafolder.
You are now ready to begin development!
PRO TIP: If you want a nice UI to work with your data, you can use DuckDB's web UI. After running the seed command, run uv run duckdb -ui in your terminal. This will open a UI in your browser where you can inspect the database. For more details, see the DuckDB UI documentation. Remember to attach olist.duckdb database in the ui.
The leadership team needs to understand the performance of our newly acquired Brazilian operation. Your mission is to create a well-modeled data warehouse foundation using dbt that will empower our business analysts to answer critical questions and power their future BI dashboards.
Your final data models should be designed to easily answer high-level business questions like:
- Customer Behavior: Who are our most valuable customers? What are their purchasing patterns?
- Product Performance: What are our best-selling products and product categories?
- Geographic Performance: Where are our key markets geographically (by city/state)?
- Order Fulfillment: What is the average time between an order being placed and delivered? How does this vary?
-
Model the Data:
- Following
dbtbest practices, transform the raw data into a logical, multi-layered architecture (e.g., staging, intermediate, marts). - You have the freedom to decide on the final data models. Your choices should be driven by the goal of answering the business questions listed above. Consider creating models like a
fct_orders(fact table) anddim_customers(dimension table), or other structures you feel are appropriate.
- Following
-
Ensure Data Trust:
- Implement comprehensive
dbttests (both generic and custom) to ensure the accuracy, integrity, and validity of your data models. Quality is more important than quantity.
- Implement comprehensive
-
Document Your Work:
- Use
dbt's documentation features (dbt docs generate) by adding descriptions for your models and columns in.ymlfiles. - Crucially, you must update the second half of this
README.mdfile in your submission to explain your work to the hiring team.
- Use
Your final deliverable is a link to your forked repository containing all your completed work.
Please complete the sections below in this README.md file. Your written explanations are as important as your code.
(Please add any specific instructions for running your final models and tests here. For example: uv run dbt build)
(This is the most important section. Please justify your architectural choices.)
- Architectural Pattern: (e.g., "I chose a dimensional model with fact and dimension tables because...")
- Key Models: (Describe the purpose of your main Silver/Gold layer models. For example:)
dim_customers: This model provides a single, clean view of every customer...fct_orders: This model captures every order event and includes key metrics like...
- Answering Business Questions: (Explain how your models would be used to answer one of the business questions from the challenge. For example: "To find our most valuable customers, an analyst could join
dim_customerswithfct_ordersand sum thetotal_amountper customer...")
(Briefly describe any assumptions you made and any trade-offs you considered.)
- Assumptions: (e.g., "I assumed that any order without a
delivered_attimestamp is still in-flight and excluded it from delivery time calculations.") - Trade-offs: (e.g., "For performance, I chose to denormalize the product category directly into the
fct_orderstable. This increases data redundancy but simplifies queries for analysts, avoiding an extra join.")
(If you had more time, what would you add or change? e.g., CI/CD setup, performance optimizations, more advanced testing.)