# Power Query

## Motivation

In this lesson we will learn about the Power Query Editor in Power BI. Learning about this tool will allow you to clean and transform your data quickly and effectively.

## The Power Query Editor

> Power Query Editor is a powerful tool for ETL (Extract, Transform, Load) processes inside Power BI. It allows you to import data from various sources, clean and transform that data to suit your needs, and then load the processed data into Power BI for further analysis and visualization.

It's important to understand that the underlying data isn't changed by any of the steps performed in Power Query Editor. Instead, you are creating a sequence of transforms to the data which will be repeated each time the data is loaded from the source.

## Components of the Power Query Editor View



<p align="center">
    <img src="images/query_overview.png"  width="700"/>
</p>
<br>

### 1. The Ribbon

> The Ribbon in Power Query Editor is a command panel that sits at the top of the interface, providing quick access to a wide range of functions for data extraction, transformation, and manipulation. It's divided into several tabs:

- Home tab: contains options for basic tasks such as adding new data sources, and choosing which rows and columns to keep or remove

- Transform tab: offers a variety of tools for modifying data, such as grouping, sorting, and changing data types

- Add Column tab: provides a suite of options to create new columns in your dataset based on existing data. This could be as simple as a duplicate of an existing column, or as complex as a column derived from advanced calculations or transformations on one or more existing columns.

- View tab: provides a range of options to customize the appearance and functionality of the Power Query Editor workspace according to user preferences.

- Tools tab: contains a set of diagnostic tools for the management of diagnostics, error detection, performance evaluation, and management of query dependencies etc.

### 2. The Queries Pane

> The Queries Pane provides an overview of all the currently active queries in your workbook. It is located on the left side of the interface. Each query represents a specific instance of a connected dataset that you are working with, and you can use the queries pane to create multiple instances of the current query to perform disparate transforms on the same data.

This pane allows you to manage your queries, such as creating new queries or duplicating existing ones. It is also possible to group queries into folders for better organization, especially useful in complex projects with multiple queries.

When you select a query in the Queries Pane, its data preview and applied steps are shown in the main workspace area. Each query listed in the Queries Pane has a series of steps associated with it, which are the transformations or manipulations applied to the original data. You can review, modify, reorder, or delete these steps as needed.


### 3. Centre (Data) Pane

> The center pane, often referred to as the Data Preview pane, provides a preview of the data contained within the selected query, allowing you to see the impact of your transformations in real-time. 

The Data Preview pane displays your data in a table format, similar to an Excel spreadsheet. The data types of each column are indicated by icons in the column headers. Any transformations or manipulations you apply to your data, such as sorting, filtering, splitting, or merging columns, are instantly reflected in this pane, providing immediate feedback on your actions.

In addition to viewing data, the data pane is also an interactive workspace where you can select columns, rows, or individual cells and apply various transformations using the options in the Power Query Editor ribbon. For instance, by right-clicking on a column header, you can access a context menu with a variety of operations like rename, duplicate, change type, remove, and others.

### 4. Right (Query Settings) Pane

> At the top of the Query Settings pane, you'll find the "Properties" section. Here, you can name or rename your query. The name you provide will be used to reference the table of data in Power BI's data model once you've loaded it from the Power Query Editor.

Below the "Properties" section is the "Applied Steps" section. This section provides a list of all transformations that have been applied to your data in the order they were applied. It serves as a history of your data transformations, giving you a clear picture of the sequence of operations that have led to your current dataset.


### The Advanced Editor

> The Advanced Editor lets you see the code that Power Query Editor is creating with each step. Every transformation that you create in the GUI is also represented as a code snippet in a functional language called M, which is written in the background as you work. You can also write your own transformations directly in M. To launch the advanced editor, select View from the ribbon, then select Advanced Editor. A window appears, in which you can view and edit the code associated with the current transformation.

### Saving your work
When your query is where you want it, select `Close & Apply` from Power Query Editor's File menu. This action applies the changes and closes the editor.


## Power Query Worked Example - Creating a Star-Based Schema

To follow along with this tutorial, make sure you have downloaded and installed Power BI, and then downloaded the dataset from [this link](https://link-url-here.org).


### Loading the data

- First, we need to load our data table into Power BI. Begin by clicking the `get data` button in the Home pane, and selecting the 'Text/CSV' option from the dropdown

- Navigate to the 'Sales.csv' file on your hard drive and open it. This will open a dialog box showing a preview of the table we are importing

- Choose 'Transform Data' at the bottom of the screen to open the Power Query Editor


<p align="center">
    <img src="images/sales_table.png"  width="900"/>
</p>
<br>

## Creating the Star-Based Schema

Inside the Power Query editor, you can see the columns of the sales spreadsheet. Each row represents one order from a hypothetical business. You will see that some columns pertain to the individual order, while others relate to the customer or to the product. 

A flat 'fact' table like this contains a lot of redundant information, and data analysts often transform a flat fact table like this into a star schema.


<p align="center">
    <img src="images/star_schema.png"  width="500"/>
</p>
<br>

A star schema is a popular data modeling approach, named for its resemblance to a star, with lines radiating from a central table to multiple surrounding tables. The central table, known as the fact table, contains transactional data or measures that are quantitative in nature, such as sales amount or units sold, with one transaction per customer.

The associated dimension or 'dim' tables store descriptive attributes in a non-redundant manner, so for example the customers table has one row per customer. This approach to data modelling can lead to enhanced query performance, improved data integrity, and can make it easier to write complex queries.

Turning a flat table into a star schema is easy in Power Query, and will allow us to explore a lot of the functionality.

### Creating the dimension tables from the Fact table

1. Create two duplicates of the original fact table

- Right click on the `Sales` table in the Queries Pane, and choose 'Duplicate'
- Repeat a second time
- Select the first new table in the Queries Pane, and rename to `dim_Customer`
- Rename the second one to `dim_product`

2. Create the dim_customer table

- Highlight the `dim_customer` table, and use the central pane to select the columns: `CustomerID`, `Email_Name`, `City`, `ZipCode`, `State`, `Region`, `District`, and `Country`. 
- You can select multiple columns at the same time by holding down the Ctrl key (Cmd on Mac)
- Right-click one of the selected columns and choose 'Remove Other Columns' from the dropdown
- Now highlight all the remaining columns in the central pane, right click and choose 'Remove Duplicates'

3. Split and clean the `Email Name` column

The `Email Name` column in the `dim_customer` table contains multiple pieces of information that would be better as separate columns. There are multiple ways to do this in Power Query, but the easiest is using the 'split column' function:

- Right click on the column and choose 'split column' from the dropdown, or choose 'split column' button from the Home pane on the ribbon
- Choose the 'by delimiter' option, and select the 'colon' option
- Now do the same for the newly generated column, splitting by comma delimiter to create first and second names
- Rename the columns to `email`, `first_name` and `last_name` by right-clicking and choosing the rename option

- Finally let's remove the `()` parentheses from the emails: right click the email column and choose 'replace values'. Then enter the open parenthesis (`(`) in the `Value to Find` field, and leave the `Replace With` field blank. Then repeat the step with the `)` symbol.

4. Create a website login for each customer

We can create a unique website login for each customer using the powerful 'New Column from Examples' feature:

- Select the `Column from Examples` dropdown from the Add Column ribbon
- Choose 'from all columns'
- In the new column, type the value for `last_name` for the topmost row, followed by a `-`, and then the `CustomerID` value
- click 'OK'
- A new column will be created, filling the correct info for each row according to the example given

We can see all the changes we have made to our `dim_customer` table in the `applied_steps` pane to the right of the Power Query window.


<p align="center">
    <img src="images/customer_actions.png"  width="400"/>
</p>
<br>

5. Create the `dim_product` table

- In the `dim_product` duplicate table, simply highlight the following columns: `ProductID`, `Product`, `Category`, `Segment`, `Manufacturer`, `ManufacturerID`, `Unit Cost` and `Unit Price`
- Right click and choose 'remove other columns'
- Select all remaining columns, right-click on one of the column headings and choose 'remove duplicates'

### Creating the new Sales table

Next we should return to our `Sales` table and remove any redundant columns. 

- Remove all columns except for the following: `ProductID`, `CustomerID`, `Date`, `Units`

### Create a `dim_datetimes` table

Often, when doing time-based analysis, it is useful to have a separate date-times table, which contains lots of different information about a range of dates. Columns can include things like day-of-week, financial year, business quarter etc.

It is often possible to use the same date-times table for multiple projects, by re-generating it over the date range necessary for whatever modelling task is needed. 

In this case, we will use free a script written in the M query language by Devin Knight, which can be found at this [URL](https://devinknightsql.com/2015/06/16/creating-a-date-dimension-with-power-query/). You can find the query on its own in this [text file](example_datetime.txt).

- Right click the (left) query pane and select New Query > Blank Query
- Select the Advanced Editor from the View pane of the Ribbon
- Paste the contents of the [text file](example_datetime.txt) into the Advanced Editor
- Click Done
- This will create a function. To invoke it we just need to choose start and end dates
- Choose 01/01/2011 as the start date, and today's date as the end date
- Press 'invoke'

### Finishing Off

We have now finished creating the tables for our star schema, and so we want to save our work and close the Power Query Editor. 

- Click the 'close and apply' button in the Home pane of the ribbon
- The data will now be loaded into your session according to the schema that you have created


<p align="center">
    <img src="images/close_and_apply.png"  width="400"/>
</p>
<br>

## Save Your Work!

In subsequent lessons we will see how to create a working data model based on the schema we have created, and using it to generate visualisations. Please make sure to __SAVE YOUR SESSION__ before closing Power BI. 

Go to `File` > `Save As`, and name your session `Power_BI_demo_session.pbix`.



## Key Takeaways

- Power Query Editor is an ETL tool enabling data import, transformation, and loading into Power BI for further analysis
- Power Query transforms do not alter the underlying data source, but are run each time the data are loaded or refreshed
- The Ribbon in Power Query Editor provides quick access to various functions for data extraction, transformation, and manipulation
- The Queries pane shows a list of all current queries
-  The Centre pane provides a real-time preview of the selected query data and its transformations
- The Query Settings pane includes the "Applied Steps" section that lists all applied data transformations
- M is a functional used in Power Query for data manipulation and transformation across various sources
- Transformations you make in the UI create code snippets in M in the background
- You can edit the M code for each query in the Advanced Editor