Skip to content

Micro-Matic/analytics-engineer-case

Repository files navigation

Analytics Engineer - Take-Home Case Study

Introduction

Welcome! This case study is designed to simulate a typical project for an Analytics Engineer on our Data & Integration team. Our goal is to see how you approach a real-world dataset and transform it into a valuable, trusted analytical asset.

This exercise focuses on your data modeling philosophy, technical skills with modern tools, and your ability to create a foundation for business insights, all while adhering to software engineering best practices.

Fictional Company Context: You are a new Analytics Engineer at "Nordic Trail Outfitters," an e-commerce company specializing in outdoor and hiking gear. We've just acquired a new company that operates in Brazil, and we need you to analyze their historical sales data to understand their business.

The Project Environment

This repository is a self-contained, ready-to-use dbt project. We have already handled the initial setup to allow you to focus on the core analytics engineering tasks.

Included in this repository:

  • A standard dbt project structure.
  • uv for simple and fast Python environment and dependency management (see pyproject.toml).
  • The raw Brazilian E-Commerce Public Dataset by Olist located in the seeds directory.
  • A pre-configured profiles.yml that configures dbt to use a local duckdb database file (olist.duckdb), which will be created in the project directory.

Getting Started

To begin, clone this repository and ensure you have uv installed (pip install uv). Then, follow these steps in your terminal:

  1. Install Dependencies: This command creates the virtual environment (.venv) if it doesn't exist and installs all dependencies specified in pyproject.toml.

    Bash

    uv sync
    
  2. Set Up Your IDE (VS Code): To ensure features like auto-completion and extensions (e.g., dbt Power User) work correctly, you must select the project's Python interpreter.

    • Open the Command Palette (Ctrl+Shift+P or Cmd+Shift+P).
    • Search for Python: Select Interpreter.
    • Choose the interpreter located in the ./.venv folder of this project.
  3. Load Raw Data: Use the dbt seed command to load all the raw CSV files from the seeds directory into your local duckdb database.1

    Bash

    uv run dbt seed
    
  4. Verify Setup: You can verify that the data has been loaded correctly by running the test model.

    Bash

    uv run dbt run
    

    This should output a test.parquet file into the data folder.

You are now ready to begin development!

PRO TIP: If you want a nice UI to work with your data, you can use DuckDB's web UI. After running the seed command, run uv run duckdb -ui in your terminal. This will open a UI in your browser where you can inspect the database. For more details, see the DuckDB UI documentation. Remember to attach olist.duckdb database in the ui.

Your Mission

The leadership team needs to understand the performance of our newly acquired Brazilian operation. Your mission is to create a well-modeled data warehouse foundation using dbt that will empower our business analysts to answer critical questions and power their future BI dashboards.

Your final data models should be designed to easily answer high-level business questions like:

  • Customer Behavior: Who are our most valuable customers? What are their purchasing patterns?
  • Product Performance: What are our best-selling products and product categories?
  • Geographic Performance: Where are our key markets geographically (by city/state)?
  • Order Fulfillment: What is the average time between an order being placed and delivered? How does this vary?

Core Tasks

  1. Model the Data:

    • Following dbt best practices, transform the raw data into a logical, multi-layered architecture (e.g., staging, intermediate, marts).
    • You have the freedom to decide on the final data models. Your choices should be driven by the goal of answering the business questions listed above. Consider creating models like a fct_orders (fact table) and dim_customers (dimension table), or other structures you feel are appropriate.
  2. Ensure Data Trust:

    • Implement comprehensive dbt tests (both generic and custom) to ensure the accuracy, integrity, and validity of your data models. Quality is more important than quantity.
  3. Document Your Work:

    • Use dbt's documentation features (dbt docs generate) by adding descriptions for your models and columns in .yml files.
    • Crucially, you must update the second half of this README.md file in your submission to explain your work to the hiring team.

Final Deliverable

Your final deliverable is a link to your forked repository containing all your completed work.

Please complete the sections below in this README.md file. Your written explanations are as important as your code.

Candidate Submission

1. How to Run This Project

(Please add any specific instructions for running your final models and tests here. For example: uv run dbt build)

2. Data Modeling Decisions

(This is the most important section. Please justify your architectural choices.)

  • Architectural Pattern: (e.g., "I chose a dimensional model with fact and dimension tables because...")
  • Key Models: (Describe the purpose of your main Silver/Gold layer models. For example:)
    • dim_customers: This model provides a single, clean view of every customer...
    • fct_orders: This model captures every order event and includes key metrics like...
  • Answering Business Questions: (Explain how your models would be used to answer one of the business questions from the challenge. For example: "To find our most valuable customers, an analyst could join dim_customers with fct_orders and sum the total_amount per customer...")

3. Assumptions & Trade-offs

(Briefly describe any assumptions you made and any trade-offs you considered.)

  • Assumptions: (e.g., "I assumed that any order without a delivered_at timestamp is still in-flight and excluded it from delivery time calculations.")
  • Trade-offs: (e.g., "For performance, I chose to denormalize the product category directly into the fct_orders table. This increases data redundancy but simplifies queries for analysts, avoiding an extra join.")

4. Suggestions for Improvement

(If you had more time, what would you add or change? e.g., CI/CD setup, performance optimizations, more advanced testing.)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors