## Project Startup Structure

When building a dashboard project, especially one with multiple views, it's important to separate the data logic from the layout logic. This keeps things simple and easy to debug.

In this notebook, we focus on the startup structure:
* How we load and manage data

* How we define constants

* How we organize helper functions (like for computing KPIs)

* How we centralize styles

We'll not yet focus on layout like dbc.Container or the full app structure. That comes in a separate notebook: 01_layout_structure.ipynb.

This notebook is all about setting up the backend smartly, so that your layout and callback logic can be clean, reusable, and easy to extend.

### Setup
To start, we'll have to do some simple imports. Make sure you import all relevant features that you want to use from dash, and the helper functions you'll use for the visualizations and data extraction:

In [1]:
from pathlib import Path #i like using this path library to have a set path saved, kind of as a variable, that we can then reuse everywhere, but you don't have to use this.

import dash_bootstrap_components as dbc #see explanation of bootstrap below
import pandas as pd #for database management (dataframes)
from dash import Dash, dcc, html, Input, Output

from src.const import get_constants #helper function to get constants from our datasets
from src import dash1, dash2, dash3, dash4 #helper functions to create the visualizations based on selected tabs.

**What's Bootstrap?**

Bootstrap is a CSS framework that helps you build responsive and well-structured UIs using a 12-column grid and pre-styled components like cards, buttons, and rows. In our case, we use Dash Bootstrap Components (dbc) to bring Bootstrap's layout and styling into the Dash ecosystem. 

We don't actually use .css files here, as bootstrap can handle it on its own, without that. Though you are free to use a custom .css file if that aligns with your dashboard, there is a seperate .css notebook in the Github that will go into this shortly.

In this IMDb dashboard example: 
* Bootstrap provides the grid system (dbc.Container, dbc.Row, dbc.Col) for clean layout and automatic responsiveness.
* It also gives us themeable UI elements like dbc.Card, which we use for KPI tiles.
* Dash handles the interactivity (via Python + callbacks), while Bootstrap ensures the app looks good on all screen sizes without writing custom CSS.

Think of Bootstrap as the design scaffolding, and Dash as the brain.

---

Then, before we define the layout and logic, we define a few global constants and mappings. These support dynamic tab logic, dropdown filtering, card rendering, and consistent styling across the app.

We load our datasets, in this case, the movie and series datasets (both .csv and .xlsx) once at startup using Path(...) / filename, which is cleaner and OS-agnostic.

In [2]:
DATA_DIR = Path("./data") #whatever the path to your data is

MOVIES = pd.read_csv(DATA_DIR / "movie_after_cleaning.csv")
MOVIES_SPLITS = pd.read_excel(DATA_DIR / "splits_movie.xlsx", sheet_name=None)
SERIES = pd.read_csv(DATA_DIR / "series_after_cleaning.csv")
SERIES_SPLITS = pd.read_excel(DATA_DIR / "splits_series.xlsx", sheet_name=None)

Instead of using if tab == ..., we use dictionaries for clarity and extensibility:

In [3]:
DATA_BY_TAB = {
    "movie": (MOVIES, MOVIES_SPLITS),
    "series": (SERIES, SERIES_SPLITS),
}

So inside the callback we can just do something like:
```
"data, splits = DATA_BY_TAB[data_tab]"
```

However, if you are not familiar with using a dictionary (or just don't want to :P), you can use a simple if–elif structure like this:
```
if data_tab == "movie":
    data, splits = MOVIES, MOVIES_SPLITS
elif data_tab == "series":
    data, splits = SERIES, SERIES_SPLITS
```

Likewise, we define which builder function to use per analysis tab:

In [4]:
VISUALIZATION_BUILDERS = {
    "overview": (dash1.generate_visualizations, 4),
    "content_creators": (dash2.generate_visualizations, 4),
    "parental": (dash3.generate_visualizations, 2),
    "year": (dash4.generate_visualizations, 2),
}

Here, the dashX.generate_visualizations is taking the function "generate_visualizations" from the chosen .py file (dash1, dash2, ...), and gives it a proper name, for easy reference later.

Also, this way every builder takes (data, splits) and returns n figures. And wit some code in the callback, will let us catch early errors, so that the entire browser doesn't crash with one faulty code line.

Then, we call get_constants(...) once and store the KPI values globally:

In [5]:
# Top-level stats
NUM_WORKS, NUM_COUNTRIES, NUM_LANGUAGES, AVG_VOTES = get_constants(
    MOVIES, SERIES, MOVIES_SPLITS, SERIES_SPLITS
)

We use these numbers as a display at the top of the dashboard in the stat cards. By computing them once, we avoid recalculating every time the dashboard refreshes. _(KPI values here means Key Performance Indicators: summary statistics that give users a quick, at-a-glance overview of the dataset.)_

_**How do we build such a helper function?**_

Let’s quickly look at how we define a helper function, which is a vital part of creating a clean back-end structure that keeps your head from exploding when coding such a dashboard! In this example (but you can do whatever you want of course), we want this helper function, (_const.py_), to calculate the four KPI values used in the dashboard. These values are shown in the stat cards at the top of the app layout.

I will copy paste the entire function here, just as you would put it in a .py file, and the comments in the code will explain what each line of code achieves:

In [6]:
import pandas as pd

def get_constants(movies, series, movies_splits, series_splits):
    """
    Return four key KPI values for the dashboard:
    1. Total number of works (movies + series)
    2. Total unique countries represented
    3. Total unique languages represented
    4. Average votes (integer) across movies and series
    """

    # Count total rows in both datasets
    # If movies has 120,000 entries and series has 50,000, this returns 170000
    num_of_works = len(movies) + len(series)

    # Extract all country names from both movie and series splits
    # Then concatenate into one Series and count unique values
    countries = pd.concat(
        [movies_splits["country"]["country"], series_splits["country"]["country"]],
        ignore_index=True,  # Ensures clean index after concat
    )
    num_of_countries = countries.nunique()

    # Same for spoken languages: merge both columns, then count unique entries
    languages = pd.concat(
        [movies_splits["language"]["language"], series_splits["language"]["language"]],
        ignore_index=True,
    )
    num_of_lang = languages.nunique()

    # Take the average of the vote averages across both datasets
    # We don’t merge the dataframes to avoid skewing the result
    avg_votes = int((movies["votes"].mean() + series["votes"].mean()) / 2)

    # Return a tuple of all four KPI values
    return num_of_works, num_of_countries, num_of_lang, avg_votes


Then, as we saw, we can simply call it once in app.py when the app starts:
```
from src.const import get_constants

NUM_WORKS, NUM_COUNTRIES, NUM_LANGUAGES, AVG_VOTES = get_constants(
    MOVIES, SERIES, MOVIES_SPLITS, SERIES_SPLITS
)
```

As a quick intermission in the code, a handy thing to do in dash to keep dropdowns fast and responsive, is only including the first 3,300 titles:

In [7]:
MAX_OPTIONS_DISPLAY = 3_300
DROPDOWN_OPTIONS = {
    "movie": [{"label": t, "value": t} for t in MOVIES["title"][:MAX_OPTIONS_DISPLAY]],
    "series": [{"label": t, "value": t} for t in SERIES["title"][:MAX_OPTIONS_DISPLAY]],
}

This avoids crashing the browser with too many options, especially on slower machines.

Lastly, again using dictionaries, we define all core "styles" that we will use. I annotated each line again to give you an idea of what every part does. But you are advised to look up some Dash/Bootstrap tutorials or lookup tables to find the exact wording of code you need to achieve your aesthetic goals:

In [8]:
# Define the brand yellow once, so we can reuse it everywhere
BRAND_COLOR = "#deb522"

# Style settings for the KPI cards at the top of the dashboard
CARD_STYLE = {
    "paddingBlock": "10px",          # Adds vertical padding (top + bottom)
    "backgroundColor": BRAND_COLOR,  # Sets the card background to our yellow
    "border": "none",                # Removes the default card border
    "borderRadius": "10px",          # Rounds the corners slightly
}

# Style for tab buttons (when they are *not* selected)
TAB_STYLE_IDLE = {
    "borderRadius": "10px",          # Rounded corners
    "padding": 0,                    # No inner spacing
    "marginInline": "5px",           # Horizontal spacing between tabs
    "display": "flex",               # Enables centering with flexbox
    "alignItems": "center",          # Vertically centers the text/icon
    "justifyContent": "center",      # Horizontally centers the text
    "fontWeight": "bold",            # Makes the tab label bold
    "backgroundColor": BRAND_COLOR,  # Brand yellow tab background
    "border": "none",                # Removes default borders
}

# Style for the *selected* tab
# We reuse the idle style, but add an underline
TAB_STYLE_ACTIVE = {
    **TAB_STYLE_IDLE,                # Unpack all the properties from TAB_STYLE_IDLE
    "textDecoration": "underline"   # Add an underline to show it’s selected
}

---

### More helper functions

To keep our app.layout and callback code clean and readable, we define two small helper functions: one for KPI cards, and one for wrapping graphs into a layout.

Let’s look at them in detail:

In [9]:
def stats_card(title: str, value, img: str) -> html.Div:
    """Single KPI card."""
    return html.Div(
        dbc.Card(
            [
                # Top image (e.g., country flag or language icon)
                dbc.CardImg(src=img, top=True, style={"width": "50px", "alignSelf": "center"}),

                # Text content inside the card body
                dbc.CardBody(
                    [
                        # The actual number, styled boldly
                        html.P(value, style={"margin": 0, "fontSize": "22px", "fontWeight": "bold"}),

                        # The label under the number (e.g., "Languages")
                        html.H4(title, style={"margin": 0, "fontSize": "18px", "fontWeight": "bold"}),
                    ],
                    style={"textAlign": "center"},  # Center everything inside the card body
                ),
            ],
            style=CARD_STYLE,  # Use our predefined yellow + rounded card style
        )
    )


So why use a function for this, and not just code it into our callbacks?
If you were to repeat the same code for all 4 KPI callbacks (the values we discussed above, for which we had the const.py helper function to retrieve), we would call that "dry code". Basically, the 4 KPI cards are visually identical, except for their values and icons. So instead of repeating the same layout 4 times, we just call this function with different values.

This also helps with central styling. I.e., if you ever want to change how all the cards look, you only need to update stats_card() once. This practice goes for all repeatable code, of course!

This function above returns a full html.Div containing a dbc.Card, which you can insert directly into your layout using dbc.Col(). We'll see more about that later.

Now for the second one:

In [10]:
def wrap_figures(figures) -> html.Div:
    """Lay out a list of Plotly figures in a 2-column grid."""
    return html.Div(
        [
            html.Div(dcc.Graph(figure=fig), style={"width": "50%", "display": "inline-block"})
            for fig in figures
        ]
    )

This function is useful for our use case here, because every visualisation function that we define, returns a list of Plotly figures (usually 2 or 4).

Dash does not automatically lay these out nicely, that would be too easy, I guess... So, instead of us laying each one out manually (which is a dread, believe me), we wrap them using this function!

It auto-formats them in a responsive 2-column grid using width: 50% and display: inline-block. So whether we have 2 or 4 graphs, they always show up neatly in two columns. You are welcome to use something similar for your dashboard, whether you want a 2-column grid like this, or a 4, or 8, whatever, just alter the code accordingly.