Creating a great data model is one of the most important tasks that a data analyst can perform in Microsoft Power BI. By doing this job well, we help make it easier for people to understand our data, which will make building valuable Power BI reports easier for them and for us.

A good data model offers the following benefits:

* Faster data exploration
* Simpler aggregation building
* More accurate reporting
* Quicker report writing
* Easier report maintenance

Providing set rules for what makes a good data model is difficult because all data is different, and the usage of that data varies. Generally, a smaller data model is better because it will perform faster, and it will be simpler to use. However, defining what a smaller data model entails is equally as problematic because it's a heuristic and subjective concept.

Typically, a smaller data model is comprised of fewer tables and fewer columns in each table that the user can see. If we import all necessary tables from a sales database, but the total table count is 30 tables, the user won't find that intuitive. Collapsing those tables into five tables will make the data model more intuitive to the user, whereas if the user opens a table and finds 100 columns, they might find it overwhelming. Removing unneeded columns to provide a more manageable number will increase the likelihood that the user will read all column names. To summarize, we should aim for simplicity when designing our data models.

The following image is an example data model. The boxes contain tables of data, where each line item within the box is a column. The lines that connect the boxes represent relationships between the tables. These relationships can be complex, even in such a simplistic model. The data model can become easily disorganized, and the total table count in the model can gradually increase. Keeping a data model simple, comprehensive, and accurate requires constant effort.

![image.png](attachment:image.png)

Relationships are defined between tables through primary and foreign keys. Primary keys are column(s) that identify each unique, non-null data row. For instance, if we have a Customers table, we could have an index that identifies each unique customer. The first row will have an ID of `1`, the second row an ID of `2`, and so on.

Each row is assigned a unique value, which can be referred to with this simple value: the **primary key**. This process becomes important when we are referencing rows in a different table, which is what foreign keys do. Relationships between tables are formed when we have primary and foreign keys in common between different tables.

Power BI allows relationships to be built from tables with different data sources, a powerful function that enables us to pull one table from Microsoft Excel and another from a relational database. We would then create the relationship between those two tables and treat them as a unified dataset.

Now that we have learned about the relationships that make up the data schema, we can explore a specific type of schema design, the **star schema**, which is optimized for high performance and usability. We'll get to that next, but first, let's load some sample data provided by Microsoft.

We can design a star schema to simplify our data. It's not the only way to simplify our data, but it's a popular method; therefore, every Power BI data analyst should understand it. In a **star schema**, each table within our dataset is defined as a dimension or a fact table, as we see in the following visual.

![image.png](attachment:image.png)

**Fact tables** contain observational or event data values: 
* sales orders, 
* product counts, 
* prices, 
* transactional dates and times, and 
* quantities. 

Fact tables can contain several repeated values. For example, one product can appear multiple times in multiple rows, for different customers on different dates. These values can be aggregated to create visuals. For instance, a visual of the total sales orders is an aggregation of all sales orders in the fact table. With fact tables, it is common to see columns that are filled with numbers and dates. The numbers can be units of measurement, such as sale amount, or they can be keys, such as a customer ID. The dates represent time that is being recorded, like order date or shipped date.

**Dimension tables** contain the details about the data in fact tables: products, locations, employees, and order types. These tables are connected to the fact table through key columns. Dimension tables are used to filter and group the data in fact tables. The dimension tables, by contrast, contain unique values, for instance, one row for each product in the Products table and one row for each customer in the Customer table. For the total sales orders visual, we could group the data so that we see total sales orders by product, in which product is data in the dimension table.

Fact tables are usually much larger than dimension tables because many events occur in fact tables, such as individual sales. Dimension tables are typically smaller because we are limited to the number of items that we can filter and group on. For instance, a year contains only so many months, and the United States is comprised of only a certain number of states.

Considering this information about fact tables and dimension tables, we might wonder how we can build this visual in Power BI.

The pertinent data resides in two tables, **Employee and Sales**, as shown in the following data model.
* Because the **Sales** table contains the sales order values, which can be aggregated, it is considered a fact table. 
* The **Employee** table contains the specific employee name, which filters the sales orders, so it would be a dimension table. 

The common column between the two tables, which is the primary key in the **Employee** table, is `EmployeeID`, so we can establish a relationship between the two tables based on this column.

![image.png](attachment:image.png)

When creating this relationship, we can build the visual according to the requirements, as shown in the following figure. If we didn't establish this relationship, while keeping in mind the commonality between the two tables, we would have had more difficulty building our visual.

![image.png](attachment:image.png)

Star schemas and the underlying data model are the foundation of organized reports; the more time we spend creating these connections and designs, the easier it will be to create and maintain reports.

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

When users see fewer tables, they will enjoy using our data model considerably more. For example, suppose we've imported dozens of tables from many data sources and now the visual appears disorderly. In this case, we need to ensure (before we begin working on building reports) that our data model and table structure are simplified.

A simple table structure will have the following features:

* Simple to navigate because of column and table properties that are specific and user-friendly
* Merged or appended tables to simplify the tables within our data structure
* Good-quality relationships between tables (that make sense)

There are two ways to work with tables:

* Configuring data model and building relationships between tables
* Configuring table and column properties

![image.png](attachment:image.png)

Assuming that we've already retrieved our data and cleaned it in Power Query, we can then go to the **Model** tab, where the data model is located. The following image shows how we can see the relationship between the **Order and Sales tables** through the `OrderDate` column.

![image.png](attachment:image.png)

To manage these relationships, go to **Manage Relationships** on the ribbon, where the following window will appear:

![image.png](attachment:image.png)

In this view, we can create, edit, and delete relationships between tables — and also autodetect relationships that already exist. When we load our data into Power BI, the **Autodetect** feature will help us establish relationships between columns that are named similarly. Relationships can be inactive or active. Only one active relationship can exist between tables, which we will discuss in a future module.

While the **Manage Relationships** feature allows us to configure relationships between tables, we can also configure table and column properties to ensure organization in our table structure.

Let's inspect the relationship of the tables in the **Sales and Marketing sample report**.

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

The **Model** view in Power BI desktop provides many options within the column **properties** that we can view or update. A simple method to get to this menu to update the tables and fields is by **Ctrl + clicking** or **Shift + clicking** items on this pane. The **Properties** pane is to the right of the **Model** view, near the **Fields** pane.

![image.png](attachment:image.png)

Under the **General** tab of the **Properties** pane, we can do the following:

* Edit the name and description of the column
* Add synonyms that can be used to identify the column when we are using the Q&A feature
* Add a column into a folder to further organize the table structure
* Hide or show the column

Under the **Formatting** tab of the **Properties** pane, we can do the following:

* Change the data type
* Format the date

For instance, suppose that the dates in our column are formatted, as we saw in the previous screenshot, in the form of `Wednesday, March 14, 2001`. If we want to change the format so that the date was in the `mm/dd/yyyy` format, we would select the drop-down menu under **All date formats** and then choose the appropriate date format, as we see in the following figure:

![image.png](attachment:image.png)

**Important:** the **Formatting** tab only appears after selecting a specific table and column in the **Fields** pane. If we do not see **Formatting** in the **Properties** pane, select a specific table and column in the **Fields** pane.

After selecting the appropriate date format, return to the `Date` column, where we should see that the format has indeed changed, as we see in the following figure:

![image.png](attachment:image.png)

Under the **Advanced** tab, we can do the following:

* Sort by a specific column
* Assign a specific category to the data
* Summarize the data
* Determine if the column or table contains null values

Additionally, Power BI has a new functionality to update these properties on many tables and fields by **Ctrl + clicking** or **Shift + clicking** items.

These examples are only some of the many types of transformations that we can make to simplify the table structure. This step is important to take before we begin making our visuals so that we don't have to go back and forth when making formatting changes. We can also do this process of formatting and configuring tables in **Power Query**.

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

During report creation in Power BI, a common business requirement is to make calculations based on date and time. Organizations want to know how their business is doing over months, quarters, fiscal years, and so on. For this reason, it's crucial that these time-oriented values are formatted correctly. Power BI autodetects for date columns and tables; however, situations can occur when we will need to take extra steps to get the dates in the format that our organization requires.

For example, suppose that we are developing reports for the sales team at our organization. The database contains tables for sales, orders, products, and more. We notice that many of these tables, including Sales and Orders, contain their own date columns, as shown by the `ShipDate` and `OrderDate` columns in the `Sales` and `Orders` tables. We are tasked with developing a table of the total sales and orders by year and month.

How can we build a visual with multiple tables, each referencing their own date columns?

![image.png](attachment:image.png)

To solve this problem, we can create a common date table that can be used by multiple tables. Ways to do this include the following:

* Source data
* Data Analysis eXpression (DAX)
* Power Query

Let's start with Source data.

### Source data

Occasionally, source databases and data warehouses already have their own date tables. If the administrator who designed the database did a thorough job, these tables can be used to perform the following tasks:

* Identify company holidays
* Separate calendar and fiscal year
* Identify weekends versus weekdays

Source data tables are mature and ready for immediate use. If we have a table as such, bring it into a data model, and don't use any other methods that are outlined in this section. We recommend to use a source date table because it's likely shared with other tools that we might be using in addition to Power BI.

If we don't have a source data table, we can use other ways to build a common date table.

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

We can use **Data Analysis Expression (DAX)** functions to build our common date table. DAX is a programming language that is used throughout Microsoft Power BI for creating calculated columns, measures, and custom tables. It's a collection of functions, operators, and constants that can be used in a formula, or expression, to calculate and return one or more values. We'll learn all about DAX in another file, so we won't employ it just yet, but let's cover the basics of how to use DAX functions to build common date tables.

The `CALENDAR()` function returns a contiguous range of dates based on a start and end date that are entered as arguments in the function. Alternatively, the `CALENDARAUTO()` function returns a contiguous, complete range of dates that are automatically determined from our dataset. The starting date is chosen as the earliest date that exists in our dataset, and the ending date is the latest date that exists in our dataset plus data that has been populated to the fiscal month that we can choose to include as an argument in the `CALENDARAUTO()` function. For the purposes of this example, the `CALENDAR()` function is used because we only want to see the data from **May 31, 2011** (the first day that the sales team began tracking this data) and forward for the next 10 years.

In Power BI Desktop, we go to the **Table** tab on the ribbon. We select **New Table**, and then enter in the following DAX formula:

`Dates  = CALENDAR(DATE(2011, 5, 31), DATE(2022, 12, 31))`

![image.png](attachment:image.png)

Now, we have a column of dates that we can use. However, this column is slightly sparse. We also want to see columns for just the year, the month number, the week of the year, and the day of the week. We can accomplish this task by selecting **New Column** on the ribbon and entering the following DAX equation, which will retrieve the year from our **Dates** table.

`Year = YEAR(Dates[Date])`

![image.png](attachment:image.png)

Let's break down the syntax above. `Year` is the new column we created. `YEAR` is one of many built-in date functions used for working with dates. Dates is the table we created with the `CALENDAR()` function. And `Date` is the column of dates in the new table we created - we did not specify this name ourselves, it was created by default.

`Year = YEAR(Dates[Date])
Column_Name = FUNCTION(Table_Name[Column_Name])`

We can perform the same process to retrieve the month number, week number, and day of the week (selecting **New Column** on the ribbon each time):

`MonthNum = MONTH(Dates[Date])
 WeekNum = WEEKNUM(Dates[Date])
 DayoftheWeek = FORMAT(Dates[Date], "DDDD")`

When we have finished, our table will contain the columns that we see in the following figure.

![image.png](attachment:image.png)

We have now created a common date table by using DAX. Let's now practice with the **Financial Sample data**.

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

We can use **M-language**, the development language that is used to build queries in Power Query, to define a common date table.

We select **Transform Data** in Power BI Desktop, which will direct us to Power Query. In the blank space of the left **Queries** pane, we can right-click to open the following drop-down menu, where we'll select **New Query > Blank Query**.

![image.png](attachment:image.png)

In the resulting **New Query** view, enter the following **M-formula** to build a calendar table:

`= List.Dates(#date(2011,05,31), 365*10, #duration(1,0,0,0))`

![image.png](attachment:image.png)

For our sales data, we want the start date to reflect the earliest date that we have in our `data: May 31, 2011`. Additionally, we want to see dates for the next 11 years, including dates in the future. This approach ensures that, as new sales data flows in, we won't have to re-create this table. We can also change duration. In this case, we want a data point for every day, but we can also increment by hours, minutes, and seconds. The following figure shows the result.

![image.png](attachment:image.png)

After we have realized success in the process, we notice that we have a list of dates instead of a table of dates. To correct this error, go to the **Transform** tab on the ribbon, and select **Convert > To Table**. As the name suggests, this feature will convert our list into a table. We can also rename the column to `DateCol`.

![image.png](attachment:image.png)

Next, we want to add columns to our new table to see dates in terms of year, month, week, and day so that we can build a hierarchy in our visual. Our first task is to change the column type by selecting the icon next to the name of the column and, in the resulting drop-down menu, selecting the **Date type**.

![image.png](attachment:image.png)

After we have finished selecting the **Date** type, we can add columns for year, months, weeks, and days. For that, we go to **Add Column**, select the drop-down menu under **Date**, and then select **Year**, as shown in the following figure.

![image.png](attachment:image.png)

Notice that Power BI has added a column of all years that are pulled from `DateCol`.

![image.png](attachment:image.png)

Complete the same process for months, weeks, and days. After we have finished this process, the table will contain the columns that we see in the following figure.

![image.png](attachment:image.png)

We have now successfully used Power Query to build a common date table.

The previous steps show how to get the table into the data model. Now, we need to mark our table as the **official date table** so that Power BI can recognize it for all future values and ensure that formatting is correct.

Let's practice building a common table.

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

This process of creating a date table with DAX or M-Code only adds our new table to the data model; we'll still need to establish relationships between our date table and the `Sales` and `Order` tables, and then mark our table as the official date table of our data model.

Our first task in marking our table as the official date table is to find the new table on the **Fields** pane. **Right-click the name of the table**, and then select **Mark as date table**, as we see in the following figure.

![image.png](attachment:image.png)

When we mark our table as a date table, Power BI performs validations to ensure that the data contains zero null values, is unique, and contains continuous date values over a period. We can also choose specific columns in our table to mark as the date, which can be useful when we have many columns within our table. Right-click the table, select **Mark as date table**, and then select **Date table settings**. The following window will appear, where we can choose which column should be marked as **Date**.

![image.png](attachment:image.png)

Selecting **Mark as date** table will remove autogenerated hierarchies from the Date field in the table that we marked as a date table. For other date fields, the auto hierarchy will still be present until we establish a relationship between that field and the date table or until we turn off the **Auto Date/Time** feature. We can manually add a hierarchy to our common date table by right-clicking the year, month, week, or day columns in the Fields pane and then selecting New hierarchy. We'll discuss this process later in this module.

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

To build a visual between the `Sales` and `Orders` tables, we'll need to establish a relationship between this new common date table and the `Sales` and `Orders` tables. As a result, we'll be able to build visuals by using the new date table. To complete this task, we go to **Model** tab > **Manage Relationships**, where we can create relationships between the common date table and the `Orders` and `Sales` tables by using the `OrderDate` column. The following screenshot shows an example of one such relationship:

![image.png](attachment:image.png)

After we have built the relationships, we can build our **Total Sales** and **Order Quantity** by **Time** visual with our common date table that we developed by using the DAX or Power Query method.

To determine the total sales, we need to add all sales because the `Amount` column in the `Sales` table only looks at the revenue for each sale, not the total sales revenue. We can complete this task by using the following measure calculation, which we will explain in later discussions. The calculation that we'll use when building this measure is as follows:

`#Total Sales = SUM(Sales[‘Amount’])`

After we have finished, we can create a table by returning to the **Visualizations** tab and selecting the **Table visual**. We want to see the total orders and sales by year and month, so we only want to include the `Year` and `Month` columns from our date table, the `OrderQty` column, and the `#TotalSales` measure. When we learn about hierarchies, we can also build a hierarchy that will allow us to drill down from years to months. For this example, we can view them side-by-side. We have now successfully created a visual with a common date table.

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

We've learned about modeling data in Power BI, which includes such topics as creating common date tables, learning about and configuring many-to-many relationships, resolving circular relationships, designing star schemas, and much more. These skills are crucial to the Power BI practitioner's toolkit so that it is easier to build visuals and hand off our report elements to other teams. With this foundation, we can now explore the many nuances of the data model.