# Overview

In this section, we import our cleaned and modelled data from MySQL into Power BI to create a report, summarising the questions answered and information obtained from the SQL insights section.

The layout of this walkthrough is as follows: each section will focus on a specific type of task (e.g., data modelling, creating measures, building visuals, adding slicers and functionality), along with additional sections highlighting major challenges encountered and how they were resolved.

To maintain chronological flow, closely following the order in which the report was developed, the report shown throughout this walkthrough will remain in a rough, unpolished state, with final formatting and presentation completed in the last step.

# Importing the Data and Checking That Relationships Are Set Up Correctly

We load the data from a MySQL Database. The model view shows us the following:

![model view of imported schema from MySQL](documentation_images/ImportMySQLPowerBI.png)

Firstly, we add a calculated column `yearmonth = [year] * 100 + [month]` to both the `datesdim` and `cpih` tables, as this allows us to effectively create a one-to-one relationship on (year, month) pairs.

With regards to the relationships between the `datesdim` and median salary tables (both regional and national), a potential issue is that the `year` column in the dimension table, `datesdim`, does not consist of unique values. Therefore, relationships between `year` have a "many" on the `datesdim` side. 

To amend this, another dimension table is created, called `yearsdim`, with a two-sided filter direction with the `datesdim` table. The newly created `yearsdim` is related to the median salary tables.

![Model view with relationships set up in PowerBI](documentation_images/RelationshipsPowerBI.png)

# Handling national v.s. regional data, and creating the required measures (e.g. CPIH adjusted house prices).

## Adding calculated columns for the CPIH adjusted prices and the affordability index

For each house price entry (in both the regional and national tables), we need to adjust this for CPIH, and also calculate the ratio with median salary (the "affordability index"). These are added as calculated columns, since they involve row-by-row calculations

We begin with the national table, `housepricesnational`. To calculate CPIH-adjusted house prices, we use the `RELATED` function. Each house price entry has an associated date, and the idea is that `RELATED` follows the established relationships to retrieve the corresponding CPIH index for that date.

The calculated column for CPIH-adjusted national house prices is defined using the following DAX formula. Note that because some CPIH values may be missing or zero, we first check whether the index is non-zero. If it is, we perform the adjustment, otherwise, we return a blank value:

```DAX
average_price_CPIH_adjusted = 
IF(
    RELATED(cpih[index_normed]) <> 0, 
    housepricesnational[average_price] / RELATED(cpih[index_normed]), 
    BLANK()
)
```

The following diagram illustrates how the RELATED function works. Travel from one table to another is faciltated by the relationships that have been defined. Within a table, the travel from one column to another corresponds to "reading across the row".

![A diagram showing how RELATED travels through tables to find the related value](documentation_images/RELATEDVisualised.png)

Creating a calculated column for the national affordability index (i.e. average price/median salary) follows the exact same format, with a related value taken from the `mediansalarynational` table instead.

```dax
affordability_index = 
IF(
    RELATED(mediansalarynational[MedianSalary]) <> 0, 
    housepricesnational[average_price] / RELATED(mediansalarynational[MedianSalary]), 
    BLANK()
)
```

------

For the regional table, creating the CPIH adjusted average price is essentially the same, achieved with the DAX function:

```dax
average_price_CPIH_adjusted = 
IF(
    RELATED(cpih[index_normed]) <> 0,
    housepricesregional[average_price] / RELATED(cpih[index_normed]),
    blank()
    )
```

For the affordability index, however, we have an issue. Specifically, there is no relationship between `housepricesregional` and `mediansalaryregional` (look closely at the directions of the relationships in the model view). 

Making these relationships bi-directional is a solution, but not a great one. 

Better is to create require a helper column, `regionyear`, in both the `mediansalaryregional`and `housepricesregional` tables (this will just be a concatenation of the region name and then the year, as it doesn't have to be usable/practical in any other situation, so long as it uniquely identifies a row) and relate them directly. 

```dax
-- for the `mediansalaryregional` table
regionyear = CONCATENATE([region],[year])

-- for the `housepricesregional` table
regionyear = CONCATENATE([region],YEAR([date]))

```

These helper columns will allow us to create a one-to-many relationship from `mediansalaryregional` to `housepricesregional`, thus enabling the use of the `RELATED` function.

It should be noted that we must disable the current active relationships from `regionsdim` and `yearsdim` to the `mediansalaryregional` table to allow for this new relationship. This is OK, as any information required from the `mediansalaryregional` table can be obtained using our new relationship anyway.

With this in place, the affordability index column is calculated just as in the national table (with the obvious modifications):

```dax
affordability_index = 
IF(
    RELATED(mediansalaryregional[salary]) <> 0, 
    housepricesregional[average_price] / RELATED(mediansalaryregional[salary]), 
    BLANK()
)
```

Part of the new model view (with the updated relationship) is as follows:

![New part of the model view, with the new regionyear one-to-many relationship in place](documentation_images/RegionYearRelationship.png)

## Creating dynamic regional v.s. national measures

The plan is for the report to feature a slicer that allows users to select a specific region (e.g., West Midlands) or select all regions. Typically, this would be no problem, as any calculations would adjust dynamically based on the slicer selection (i.e. the filter context).

However, in our case, we have separate tables for regional and national data. Recall that this is because the national median cannot be derived from the regional medians. To work around this, we created distinct regional and national tables.

So, we need to tell Power BI to use regional data if only one region is selected, and national data if all regions are selected. This can be achieved using the `HASONEFILTER` function, which returns TRUE if a single value from a specified field is in the active context, and FALSE otherwise. 

We will produce these measures for average price, sales volume, detached average, semi-detached average, terraced average, flat average, CPIH adjusted average price, and affordability index from the housing price tables, as well as the salary from the median salary tables. (9 in total.)

Each of the nine measures has essentially the same structure. We name them with a suffix "RorN" (Regional or National) so we can immediately identify them as dynamic/contextual measures. 

The measure for average house price is:

```dax
avg_price_RorN = 
IF (
    HASONEFILTER('regionsdim'[Region]),
    SUMX('housepricesregional', [average_price] * [sales_volume]) / SUM('housepricesregional'[sales_volume]),
    SUMX('housepricesnational', [average_price] * [sales_volume]) / SUM('housepricesnational'[sales_volume])
)
```

Notice that to find the average, we must multiply each `average_price` entry by the `sales_volume`, and then divide by `SUM([sales_volume])`. The reason for this is to give higher weight to rows with more `sales_volume` (i.e. use a weighted average).

Note that with the measure set up this way, if two or more regions are in the active context, the national average price will be used. This is not an issue for us, as the slicer will be set up to allow either a single region to be selected, or all regions to be selected. Therefore, selecting 2 or 3 regions (but not all of them) will not be possible.

The other eight measures look very similar, they are as follows:

```dax
sales_volume_RorN = 
IF (
    HASONEFILTER('regionsdim'[Region]),
    SUM('housepricesregional'[sales_volume]),
    SUM('housepricesnational'[sales_volume])
)
```

```dax
detached_avg_RorN = 
IF (
    HASONEFILTER('regionsdim'[Region]),
    SUMX('housepricesregional', [detached_avg] * [sales_volume]) / SUM('housepricesregional'[sales_volume]),
    SUMX('housepricesnational', [detached_avg] * [sales_volume]) / SUM('housepricesnational'[sales_volume])
)
```

```dax
semi_avg_RorN = 
IF (
    HASONEFILTER('regionsdim'[Region]),
    SUMX('housepricesregional', [semi_avg] * [sales_volume]) / SUM('housepricesregional'[sales_volume]),
    SUMX('housepricesnational', [semi_avg] * [sales_volume]) / SUM('housepricesnational'[sales_volume])
)
```

```dax
terraced_avg_RorN = 
IF (
    HASONEFILTER('regionsdim'[Region]),
    SUMX('housepricesregional', [terraced_avg] * [sales_volume]) / SUM('housepricesregional'[sales_volume]),
    SUMX('housepricesnational', [terraced_avg] * [sales_volume]) / SUM('housepricesnational'[sales_volume])
)
```

```dax
flat_avg_RorN = 
IF (
    HASONEFILTER('regionsdim'[Region]),
    SUMX('housepricesregional', [flat_avg] * [sales_volume]) / SUM('housepricesregional'[sales_volume]),
    SUMX('housepricesnational', [flat_avg] * [sales_volume]) / SUM('housepricesnational'[sales_volume])
)
```

```dax
avg_price_CPIH_adjusted_RorN =
IF (
    HASONEFILTER('regionsdim'[Region]),
    SUMX('housepricesregional', [average_price_CPIH_adjusted] * [sales_volume]) / SUM('housepricesregional'[sales_volume]),
    SUMX('housepricesnational', [average_price_CPIH_adjusted] * [sales_volume]) / SUM('housepricesnational'[sales_volume])
)
```

```dax
affordability_index_RorN = 
IF (
    HASONEFILTER('regionsdim'[Region]),
    SUMX('housepricesregional', [affordability_index] * [sales_volume]) / SUM('housepricesregional'[sales_volume]),
    SUMX('housepricesnational', [affordability_index] * [sales_volume]) / SUM('housepricesnational'[sales_volume])
)
```

```dax
median_salary_RorN = 
IF (
    HASONEFILTER('regionsdim'[Region]),
    AVERAGE(mediansalaryregional[salary]),
    AVERAGE(mediansalarynational[MedianSalary])
)
```

These are stored in a separate `MyMeasures` table for organisational reasons

![overview of the MyMeasures table, holding all the RorN measures](documentation_images/MeasuresTable.png)

## Slicer allowing selection of a single or all regions

An unexpected issue arises here, as Power BI does not currently have a slicer configuration that allows both single select and a 'select all' option.

Luckily, there is a trick/workaround, as seen [in this YouTube Video.](https://www.youtube.com/watch?v=CTsNxnQvxns&t=141s)

The "trick" is to add a helper column into the `regionsdim` table with a constant value "Select All", and add this helper column to the single select slicer, alongside the `region` column, to provide a makeshift "select all" button. 

The helper column can easily be added using DAX, i.e. `SelectAll = "Select All"`, and the outcome is the following:

![Method to add Select All option to a single select slicer](documentation_images/SelectAllTrick.png)

# Creating the visualisations

The report will consist of the following visualisations:

1. A title...
2. Slicers for year and region selection
3. Housing price trends, presented in three ways, namely, as nominal prices, CPIH-adjusted prices, and an affordability index
4. A map displaying average affordability by region (see later, a bar chart visualisation is used instead).
5. Distribution-based charts showing: the percentage distribution of average prices of housing by type; total yearly sales by region; the percentage distribution of total sales by month

As mentioned above, we begin by creating the visuals with minimal or no formatting. Once all visualisations are confirmed and positioned, we then choose a colour scheme and styling, and apply consistent formatting across the entire report.

The following is a rough plan for the layout of the report (foreshadowing: the final report doesn't end up looking exactly like this):

![A rough plan of the layout for the report](documentation_images/ReportRoughLayoutPlan.png)

## Creating the title

There's not too much to say here, we choose a short, catchy title and ensure the terms "UK housing", "price", "trend" and "affordability" are in it, since this is what the report focuses on.

We create a text box, add the main title "UK Housing Report", and add a subtitle "Housing Price Trends and Affordability".

We leave the formatting to the end. This gives us the following visual:

![Report title visual with no formatting](documentation_images/ReportTitle.png)

## Creating the slicers

Not much needs to be done here, since the tricky part, namely, getting a 'select all' option to appear within a single-select slicer, has already been addressed.

The `year` column of `YearsDim` is added to a slicer to create the year slicer. 

Both the `SelectAll` and `regions` columns from the `regionsdim` table are added to another slicer to create the region slicer.

They are as follows:

![report slicers (year and region) with no formatting](documentation_images/ReportSlicers.png)

## Creating the three housing price trends visuals (nominal, CPIH adjusted, and affordability)

These visuals make use of our contextual measures, ensuring that the charts dynamically use either regional or national data based on the selection in the region slicer.

Each of the visuals is created by dragging the `year` column into the x-axis field, and dragging the relevant measure into the y-axis field.

(checking **Format → Edit Interactions** for each slicer and briefly testing them out, we verify that the slicers filter the visuals, which is the intended behaviour.)

They are as follows (note: as it stands, they are too small, and if they cannot be enhanced via formatting, we may remove one or rearrange the report):

![Line charts showing nominal price, CPIH adjusted price, and affordability](documentation_images/ReportPriceAffordabilityLineCharts.png)

## Creating the map visual (due to issues, we use a different visual instead...)

This visual displays the average affordability index for each region, calculated over the currently selected years from the year slicer.

Attempting to use the filled region, however, results in issues. Namely, dragging `region` (which has been changed to Data category "State or Province") into the location section of the visual gives unexpected results. For example, the `north west` value fills in the north west of South Africa...

![north west region is shaded the north west of south africa](documentation_images/FilledMapRegionIssue.png)

In truth, this is reasonable. The region "North West" is not descriptive enough, we know this to mean "North West [of England]", but Power BI should not be expected to gather the required context to do this (potentially, this will change in the future with AI integration). 

To fix this, we add a column to our `regionsdim` table, giving the ITS1 (International Territorial Level) codes for each region, and use this as our location data, rather than the (vague) region name data we currently have. 

The [codes can be found here](https://en.wikipedia.org/wiki/First-level_NUTS_of_the_European_Union#United_Kingdom) (including NI, Wales, and Scotland), however, note that since 2021, the [codes now use "TL" instead of "UK"](https://en.wikipedia.org/wiki/ITL_1_statistical_regions_of_England#List_of_regions). 

![A column containing the ITL1 codes added to regionsdim table](documentation_images/TLCodesAddedToRegion.png)

no luck... 

The regions "Northern Ireland", "Wales" and "Scotland" worked perfectly before, so the next thing to try is to add a column of the form [region],[country], where country is one of England, Wales, Scotland, Northern Ireland. This will hopefully give Power BI enough context to place the regions correctly.

![A column containing the [region], [country] added to regionsdim table](documentation_images/CountryAddedToRegion.png)

Once again, no luck...

Another attempt using "[region], UK" was attempted, but was also unsuccessful.

So, we concede and use a column chart instead... 

On the bright side, this is easy to set up - simply drag `region` from `regionsdim` into the x-axis section and the `affordability_index` measure into the y-axis section. Note that due to the visual filter context (namely the region name on the x-axis), the measure will always use regional data.

It looks as follows:

![column chart displaying average affordability index by region](documentation_images/AverageAffordabilityVisual.png)

Lastly, we adjust the interaction between the region slicer and this visual, ensuring that the slicer selection has no impact on the chart.

## Creating the distribution based visuals.

Finally, we create charts showing the average relative cost of each housing type by year (distribution of housing price), the total sales by region by year, and the monthly distribution of sales.

---

First up, the distribution of average price per type (detached, semi, terraced, flat) per year. 

We use a 100% stacked area chart, add the `year` to the x-axis, the the measures `detached_avg_RorN`, `semi_avg_RorN`, `terraced_avg_RorN`, `flat_avg_RorN` on the y-axis.

This produces the following visual:

![A 100% stacked area chart showing the distribution of average price by house type](documentation_images/ReportPriceByTypeDistribution.png)

---

Secondly, we produce a visual showing the total sales by region by year. 

For this, we use a stacked area chart. We add `year` to the x-axis, `sales_volume_RorN` to the y-axis, and `region` to the legend. 

This produces the following:

![a visual showing the sales per region in each year](documentation_images/ReportSalesByRegion.png)

---

Lastly, we produce a visual showing the monthly distribution in sales. 

Once again, we use a 100% stacked area chart.

This produces the following visual:

![A 100% stacked area chart showing the distribution of sales by month](documentation_images/ReportSalesByMonth.png)

# Refinements to the report

The report currently looks as follows:

![the first draft of the report](documentation_images/ReportDraft1.png)

We start with some functional adjustments, before finishing with the visual formatting (changing axes labels, titles, subtitles, adding a colour scheme and borders, etc).

Adjustments to be made are as follows (we will not explicitly walk through the majority of these, as the fixes are either straightforward or not very insightful):

1. The period spanning 82 years is too much (1968 - 2024). In order to save on horizontal space, whilst still producing visuals for a reasonable time period, we restrict the period from 2000-2024.
   
   This is achieved by adding a page level filter (**View -> Filters -> Filters on this page**) to only show `YearsDim.year` $\geq$ 2000.

2. The visualisation showing the distribution of average house prices by type (the middle right visualisation) is a bit... boring... especially compared to the other visualisations. Moreover, the visualisations in general currently look a bit cramped, especially in the bottom corner. Overall, the report would benefit from removing this visualisation and making the others around it bigger.

    This is achieved by deleting the middle right visualisation.


This yields the following:

![second draft of report, after removing a visual and filtering the years from 2000 onwards](documentation_images/ReportDraft2.png)

More adjustments are needed; it still feels too cramped.

3. The affordability index visual is too big relative to the information it displays, especially as the other visuals still look cramped.

    To resolve this, we make the affordability index by region visualisation a bar chart, rather than a column chart, and place it in the middle column of the visualisation. We then move the sales-volume related visuals to the right-hand column, and vertically stretch them.

4. The visuals on the left could do with more vertical space. The only difference between the average price charts is that one shows nominal prices, whilst the other shows CPIH-adjusted prices, so rather than have two charts, we could have one chart with a dropdown. This would allow us to extend the (remaining) two visuals vertically.

   This is achieved in two steps.

   Firstly, we create a new table with one column and two values (say "Nominal" and "CPIH-adjusted"). These are the values which will appear in the dropdown slicer.

   The second step is to create a new measure (which will be used as the y-axis value in the chart) which tells Power BI whether to use nominal or CPIH-adjusted values, based on the dropdown selection. Explicitly, a measure with this functionality is

   ```dax
    avg_price_RorN_nominal_or_CPIH_adjusted = 
    IF(
        SELECTEDVALUE(PriceType[Price Type]) = "CPIH Adjusted",
        [avg_price_CPIH_adjusted_RorN], 
        [avg_price_RorN] --note that if no value is selected, then the "ELSE" value is used, and so nominal prices are used
    )
   ```

After making these adjustments, the report looks as follows:

![Draft of report after making the affordability index a bar chart, moving the sales volume related visuals to the right, and merging the nominal/CPIH-adjusted house price charts into one](documentation_images/ReportDraft3.png)

This is acceptable in terms of sizing and spacing. 

All that is left is to make the visuals look nice by adding an overall colour scheme, sorting out titles, axes, legends and tick labels.

As this is essentially just going through each visual's formatting pane and choosing suitable options, as well as the occasional new calculated column (of which we have walked through many), we omit the walk-through of the final formatting...

The resulting report is as follows:

![The final report](documentation_images/ReportDraft4Final.png)

The Year and Region slicers can be adjusted to filter the data (the Region slicer does not interact with the centerpiece).

Moreover, the title of the centerpiece visual is dynamic with respect to the year(s) selected in the slicer.

![An example of the slicers being used in the final report](documentation_images/ReportDraft4FinalSlicerAdjustment.png)