# Data Modelling in Power BI

> Data modeling in Power BI is the process of designing and structuring data ready for analysis and reporting. Using Power BI's intuitive interface, users can define how different data sources relate, define new columns derived from existing columns, and generate aggregated values, known as measures. These processes prepare the data into a format that allows it to be used effectively when generating visuals and reports.


## Motivation

By learning how to perform data modelling tasks in Power BI, you will be able to:

- Effectively organise complex data, making it easier to work with generate more complex and insightful visuals
- Design custom measures using DAX (Data Analysis Expressions) language, that can generate powerful insights
- Ensure data integrity by creating and managing relationships between various tables




## The Model View

> The **Model View** in Power BI Desktop is a visual interface that allows you to manage and observe relationships between tables and fields in your data model. A relationship is where two or more tables are linked together because they contain related data. This enables users to run queries for related data across multiple tables.


A data model inside Power BI comprises the following elements:

### Tables

> Your data model is typically composed of multiple tables, arranged into some form of schema. These tables appear as rectangular objects in the **Model View**. Each table lists its individual fields.

<img src="images/table.png"  width="400" > <br>
<br>

### Fields 

> Fields are individual columns or aggregations within a table. For example, a table about customers might include fields like `Name`, `Email Address`, and `Purchase History`.

<img src="images/field.png"  width="400" > <br>
<br>

### Relationships

> Relationships are links between tables that allow you to combine and compare data from different sources. For instance, you might link a `Sales` table to a `Products` table using a common `[Product ID]` field. In the **Model View**, these relationships are shown as lines connecting the relevant tables.

<img src="images/relationship.png"  width="400" height="200" > <br>
<br>

### Calculated Columns and Measures

Within the **Model View**, you can add *calculated columns* and *measures* to your tables. Measures are calculations or aggregations that are performed on the data in the table, such as sums, averages, or more complex formulas. They consist of a single value, as opposed to a calculated column, which will have as many values as there are rows in the table. Calculated columns and measures become fields in their respective tables.

### Formatting
**Model View** allows you to format your model, including renaming fields or tables, hiding fields, changing data types, etc.



## Model Schemas

When crafting a data model in Power BI, a typical first step is to determine the architecture that underpins how data is organised and related. This architecture is known as the model *schema*. At the heart of this architecture are the concepts of fact and dimension tables, which delineate the types of data being dealt with and offer a framework for structuring data optimally.

### Fact Tables

Fact tables are the core of most data models, primarily containing measurable, quantitative data that a business might track. This could range from sales figures to web traffic counts. Each row of a fact table is typically an event of some sort, such as an order or a log entry.

### Dimension Tables

Dimension tables predominantly house descriptive data which provide context to the data in the fact tables, such as product details, customer profiles, or time frames. They are typically related to the fact tables by a common column, usually an ID, which acts as the primary key in the dimension table and the foreign key in the fact table. 

By segregating data into fact and dimension tables, the model not only streamlines data organisation but also optimises query performance, since it can quickly aggregate numbers from fact tables and associate them with descriptive context from dimension tables.  



## Star and Snowflake Schemas

*Star* and *snowflake* schemas are two of the most common schemas you will encounter. You have encountered the **star schema** in a previous lesson. It is a straightforward design, with a central fact table surrounded by dimension tables. Each dimension table directly connects to the fact table, forming a pattern reminiscent of a starburst. 

Building on the foundation of the star schema, the **snowflake** design introduces an added layer of complexity by normalising the dimension tables. This normalisation breaks down the dimension tables further into sub-dimensions, leading to a more branched, "snowflake-like" appearance. The primary motivation behind this schema is to reduce redundancy by presenting data in its most granular form.




## Relationships and Filtering

*Relationships* in Power BI determine how your tables interact. They function similarly to primary and foreign keys in an SQL database, with added nuances in the context of *filtering*.

>Filtering is one of the core concepts in Power BI, allowing you to restrict the values in one table based on the context of another. This is achieved by defining relationships between the tables, which allows filters to act on interconnected data. 

Consider the following example with two tables: `Orders` and `Products`.

#### Products:

| product_ID | Product_Name  | Product_Category |
|------------|---------------|------------------|
| 1          | Running Pro   | Shoes            |
| 2          | Frog Optima   | Dive Mask        |
| ...        | ...           | ...              |

#### Orders:

| Order_ID | product_ID | Order_Date  | Quantity |
|----------|------------|-------------|----------|
| A1       | 1          | 01/01/2023  | 2        |
| A2       | 2          | 01/02/2023  | 1        |
| ...      | ...        | ...         | ...      |

Using the `product_ID`, you can establish a **relationship** between the two tables. This linkage lets you filter data in the `Orders` table based on attributes in the `Products` table. For example, we could filter the rows of the `Orders` table based on the product category. Filtering is used extensively when creating visualisations and building reports, and we will discuss it in more detail later.



### Types of Relationship in Power BI

1. **One to One (1:1)**: In this relationship, each row in a table corresponds to one, and only one, row in another table. While less common, it's essential when tables have unique key columns that align perfectly.

2. **One to Many (1:N)**: A prevalent relationship in Power BI. Here, one row in a table (often the dimension table) relates to one or more rows in another table (typically the fact table). For instance, one product in a `Products` table might have several associated orders in an `Orders` table.

3. **Many to One (N:1)**: The reverse of the 1:N relationship, where multiple rows in one table associate with a single row in another table

4. **Many to Many (N:N)**: A more intricate relationship, where multiple rows in one table can relate to multiple rows in another. This relationship type should be used judiciously, as it can complicate the data model and potentially impact performance.

### Filter Direction

An essential property of relationships in Power BI is the filter direction. It defines how filters, applied to one table, propagate to another table through a relationship. There are two main directions:

- **Single Direction**: This is the most common setting. Filters from the primary (or "one" side) table flow to the related (or "many" side) table. For instance, in a **1:N** relationship between a `Products` table and an `Orders` table, a filter applied to `Products` would influence the data shown from `Orders`. However, the reverse isn't true: filtering `Orders` doesn't impact `Products`.

- **Bidirectional**: Here, filters applied to one table can affect the other table and vice versa. This setting is prevalent in many-to-many relationships or specific scenarios where cross-filtering between tables is required. However, caution is necessary when using bidirectional filtering, as it can introduce ambiguity into the data model and can make reports less performant.


## Data Analysis Expressions (DAX)

> *DAX* is a formula language in Power BI that enables users to perform data manipulation and calculations. It is used for a variety of data modelling functions as well as for building interactive visualisations. 

Some of the functions that can be performed with DAX expressions include:

- Creating **calculated columns** and **measures** to derive new data and metrics from existing tables
- Creating **relationships** between tables
- Creating **calculated tables**, which are entire new tables made of calculated columns
- Building interactive visualisations

We will learn about DAX in a lot more detail in a later lesson, but for now, just be aware that any actions taken in the **Model View** GUI are creating DAX expressions in the background, just as the M language underpins any UI changes made in Power Query.

## Load Your Project

In the lesson on the Power Query Editor, we created a Power BI project called `Power_BI_demo_session.pbix`. We will use this project to demonstrate some of the features of **Model View**. Open this project inside Power BI Desktop to follow along with the next sections of the lesson. If you have not done this yet and wish to follow along, please complete that lesson before continuing.

## Hiding and Formatting Fields in **Model View**

### Hiding Fields

>Hiding fields in the Power BI Desktop **Model View** can streamline and de-clutter your reports by removing irrelevant or redundant data, thus enhancing the user's focus on the most essential information.

To hide a column, navigate to the **Model View** and find the table containing the column you wish to hide. Right-click on the column's header and select **Hide in Report View**. The column will still be visible in the **Model View** and can be used in calculations, but will not be visible in the **Report View**, making your reports cleaner and less confusing.

As an example, let's hide the `Region` field in `dim_customer`:

- Go to **Model View** and right-click the `Region` field
- Select **Hide in report view** from the menu

### Formatting Fields

> Field formatting involves altering the appearance and data types of fields, such as setting a numeric field to display as currency or a date field to a specific date format, to improve data readability and interpretation. We have already encountered formatting in Power BI Desktop during the lesson on Power Query Editor, but it can also be accomplished in **Model View**.

To format a column in **Model View**, again locate the column in its table. Left-click on the column's header and you will see its properties in a dialog box to the left, in which you can change the data type and format. For instance, for a numeric column, you could set the format to be currency, percentage, or whole number, among others. 

These steps can help you customise the **Model View** to best suit your needs, facilitating better understanding and usability of your data.

As an example, let's make sure the `date` fields in our `Sales` fact table and `dim_datetime` are both of the `date` datatype.

- In **Model View** , highlight the `date`` field in the `Sales` table
- Left click on it
- Change the data type in the **Data Type** dropdown on the right of the screen
- Click **Yes** in the resulting dialog box
- Repeat the same process for the `date` field in `dim_datetime`


It will also be helpful for a later section of this lesson if you convert the `month_number` field in `dim_datetime` to `Whole Number`.

## Managing Relationships in Model View

<img src="images/model_view.png"  width="700" > <br>
<br>


### Why We Need Relationships

Before we start editing relationships in our data model, it will be helpful to do a quick demonstration of why relationships between tables matter. Note that this will use the **Report View** tab, which we will cover in more detail in a future lesson. Don't worry about the details too much for now, just follow along with the instructions.

- Go to the**Report View** tab in the left-hand pane of the Power BI workspace
- Select **Clustered Column Chart** from the **Visualisations** pane on the right
- Drag the `Day of Week` field from `dim_datetime` into the **X-axis** field in the **Visualisations** pane
- Drag the `CustomerID` field from the `Sales` table into the **Y-axis** field

You will now see a column chart of `Day of Week` vs the number of sales, but every value will be the same. This is because there is no link between the `Date` field of the `Sales` table and the `Date` field of the `dim_datetimes` table. Power BI creates the column values by filtering the `Sales` table by the `Day of Week` field of the `dim_datetimes` table, and because there is no relationship between the two tables, the filtering is not able to flow from one table to the other.

<img src="images/graph_notworking.png"  width="400" > <br>

### Creating a Relationship

Now navigate to the **Model View** pane.

<img src="images/find_model_view.png"  width="300" > <br>

Inside we can see the schema of our dataset. The first thing to notice is that Power BI has already understood the relationships between the dim tables we created from our initial `Sales` table, and has established the relationships for us. They are visible as lines linking the two tables.

<img src="images/prelink_datamodel.png"  width="700" > <br>

We need to create a new relationship between the `Date` field of the `Sales` table, and that of the `dim_datetime` table.

- In the **Model View**, click on the `Date` field in the `Sales` table , and drag it over the `Date` field in `dim_datetime`
- A new relationship will be created. Click on the line in the schema diagram to bring up the properties in the right-hand pane.
- Make sure that the direction is from `Sales` to `dim_datetime`, the relationship type is **Many to one**, and the cross-filter direction is **Single**

Now, when we return to the **Report View**, we should see our graph is now displaying the correct information.




## Managing Relationships in the **Manage Relationships** Tab

As an alternative to the graphical UI provided by the **Model View** tab, it is also possible to view the relationships in your data model as a list. To access this alternative view, navigate to the `Home` tab of the Ribbon and click on **Manage Relationships**. This will open a **Manage Relationships** dialog box that represents your relationships in a list form rather than a graphical illustration. 

In this dialog box, the Autodetect function is available to identify relationships in newly inputted or revised data. Choose **Edit** for personal alterations to your relationships. Advanced settings are located in the **Edit** segment, allowing you to define the Cardinality and Cross-filter direction of your data relationships.

<img src="images/manage_relationships_tab.png"  width="700" > <br>

## Creating Calculated Tables

> *Calculated tables* are tables created by defining and executing a DAX formula. Unlike regular tables that are loaded with data from a source, calculated tables are computed and generated within the data model itself. In most cases, you can import data into your model from an external data source. There are a number of uses for calculated tables however: they can used for intermediate calculations, or to cross join two tables, or they can store pre-aggregated data, improving performance in certain scenarios by reducing calculation complexity in visuals.

To create a calculated table in **Model View**:

- Go to the `Modeling` tab on the ribbon
- Click on the `New Table` button. This will open a formula bar at the top
- In the formula bar, write your DAX formula that defines the calculated table
- Once you have written your DAX formula, press Enter

Let's say we wanted to calculate the year-to-date revenue for each order in our fact table. A first step might be to create a calculated table which adds the `Unit Price` and `Unit Cost` fields from `dim_products` to the existing `Sales` fact table. We don't want to store our data like this in the database, as it would create redundancy and violate our star schema. So we can just create the table as part of our model, as follows:

- In the **Model** or **Data** view, select the **New Table** option from the **Table Tools** panel in the ribbon
- Enter the following DAX formula into the formula bar

```revenue_table = 
ADDCOLUMNS(
    'Sales', 
    "Product Price", RELATED(dim_product[Unit Price]),
    "Product Cost", RELATED(dim_product[Unit Cost])
)```

- Press **Enter** or click the tick icon to commit the formula



## Creating Calculated Columns in Model View

> Calculated columns in Power BI Desktop are columns that you add to existing tables in your data model, where the column values are computed using a formula that applies to each row in the column. We covered the topic of creating calculated columns in the previous lesson on the `Power Query Editor`. In that case the columns were created using M, but you can also use the **Model View** to create calculated columns, in which case they are generated from a DAX expression instead.

To create a calculated column in the **Model View** of Power BI Desktop, start by navigating to the model view. Here, select the table to which you want to add a calculated column. Right-click on the column headers and then select `New column` from the context menu that appears. This action will open a formula bar at the top. Now, you can write your DAX expression, which will be used to calculate the values in the new column. When you finish writing your expression, press **Enter**. Power BI will evaluate the expression for each row in the table, generating a value for the calculated column.

### Add a `revenue` column

- To achieve this, let's first add a column for `revenue`
- With the `revenue_table` highlighted in **Model View**, click **New Column** in the **Table Tools** tab of the ribbon
- add the following formula:

```DAX
revenue_table[revenue] = revenue_table[Quantity] * revenue_table[ProductPrice]
```

Don't worry about the DAX syntax at this stage, we will cover it in a later lesson. 


## Creating Measures

>In Power BI, *measure*s are calculations used in data analysis that are created using Data Analysis Expressions (DAX). They are defined calculations on your data that are performed at the time of your query. Measures are calculated as you interact with your reports and aren't stored in your database.

You can create measures in both the **Model View** and the **Report View** in Power BI Desktop. To create a measure in the **Model View**, you can right-click on the table where you want the measure to reside and select **New Measure**. Then, you can enter your DAX formula in the formula bar. 

In the **Report View**, you can select **New Measure** from the **Home** tab in the ribbon, and enter your DAX formula in the formula bar. Note that once created, all measures are available across all views: **Report**, **Data**, and **Model**.



We can now create a measure of year-to-date revenue, based on the calculated table we created previously:

- Click `New Measure` in the `Table Tools` tab of the ribbon
- Enter the following DAX expression:


```
YTD Revenue = TOTALYTD(SUM(revenue_table[revenue]), revenue_table[Date])
```

## Sorting Visualisation Data

>The **Sort by Column** feature in the **Model View** can be view to sort visualisation data by another field, such as sorting `month_name` (January, February, etc.) by `month_number` (1, 2, etc.), to ensure that the data are presented in a logical and chronological order, rather than being sorted alphabetically.

For example, in the graph of orders by month pictured below, the months are ordered alphabetically rather than logically. You can create this figure for yourself by replacing `month` for `day of week` in the y-axis field of the figure you created earlier. 

<img src="images/o_by_month_unsorted.png" width="600"> <br> <br>
We can fix this issue by sorting by the `month_number` column. To achieve this in **Model View** , left click on the `month` column to highlight it, and in the `Properties` pane, open the `Advanced` subpanel. Then just select the `month_number` column in the `Sort by column` dropdown.

Returning to **Report View**, we can see the figure has been corrected. If the months are still in the wrong order for you, then make sure you have converted `month number` to the `whole number` datatype!

<img src="images/o_by_month_sorted.png" width="600" > <br> <br>

## Defining Hierarchies

> Hierarchies in Power BI Desktop are a way to organize data attributes into a specific order, often to facilitate drill-down capabilities in reports and visuals. Hierarchies can be especially useful when working with time-based data or geographic data, as they allow users to view data at varying levels of granularity. For instance, a geographic hierarchy might start at the country level, then drill down to state, city, and finally to a specific address.

To define a hierarchy in the **Model View** of Power BI Desktop, follow these steps:

1. Navigate to the **Model View**.
2. In the **Data** pane, locate the table where you want to define the hierarchy
3. **Right-click** on the column you wish to use as your top level in the hierarchy, and select **Create hierarchy** from the context menu.
4. By default, Power BI will create a hierarchy with a single level. You can rename this level and then start adding columns or fields to it.
5. To add more levels, drag and drop columns from the table (or even other related tables) onto the hierarchy. Ensure you arrange them in the desired order of granularity.


<img src="images/hierarchy.gif" width="1000" > <br> <br>

This hierarchy now allows users to drill down from a country level view all the way to specific addresses in their Power BI reports.

Hierarchies can be used in various visuals like maps, charts, and tables to provide a structured drill-down experience. We'll explore how to leverage these hierarchies in visuals in upcoming lessons.


## Key Takeaways
- Data modelling is the process of creating relationships between the different data your business collects, so that is organised in the correct way for your analysis
- **Model View** is used to manage table and field relationships for efficient querying of related data across multiple tables
- Relationships are links between tables that allow you to combine and compare data from different sources
- DAX (Data Analysis Expressions) is a formula language in Power BI used for data manipulation, calculations, and analysis
- Calculated tables are computed and generated within the data model itself, via a DAX formula
- Measures are calculations used in data analysis that are created using a DAX expression