# Power Query Editor

> *Power Query Editor* is a powerful tool for ETL (Extract, Transform, Load) processes in Power BI. It allows you to import data from various sources, clean and transform that data to suit your needs, and then load the processed data into Power BI for further analysis and visualization.

It's important to understand that the underlying data isn't changed by any of the steps performed in Power Query Editor. Instead, you are creating a sequence of transforms to the data which will be repeated each time the data is loaded from the source. Once you have finished making modifications in Power Query Editor, the data loading process will be refreshed, and the steps you specified in the editor will be applied during the loading process.

## Accessing Power Query Editor

Power Query Editor can be accessed either during the data loading process, or from the Data or Model views. 

When loading data into your Power BI session, it is typically useful to go through a stage of transforming your queries prior to the loading process. To do this, choose the **Transform Data** button in the loading screen:

<p align="center">
    <img src="images/transform_data_hires.gif"  width="1200"/>
</p>
<br>

It is also often necessary to revisit the Power Query Editor during the process of creating the data model or while building a Power BI report. In this case, it can be returned to via the **Transform Data** option on the **Home** tab of the ribbon at the top of any of the Data, Model, or Report Views.


<p align="center">
    <img src="images/Transform_data_ribbon.jpeg"  width="1200"/>
</p>
<br>

## Components of the Power Query Editor View



<p align="center">
    <img src="images/query_overview.jpeg"  width="1200"/>
</p>
<br>

### 1. The Ribbon

> The Ribbon in Power Query Editor is a command panel that sits at the top of the interface, providing quick access to a wide range of functions for data extraction, transformation, and manipulation. It's divided into several tabs:

- **Home tab:** Contains options for basic tasks such as adding new data sources, and choosing which rows and columns to keep or remove

- **Transform tab:** Offers a variety of tools for modifying data, such as grouping, sorting, and changing data types

- **Add Column tab:** Provides a suite of options to create new columns in your dataset based on existing data. This could be as simple as a duplicate of an existing column, or as complex as a column derived from advanced calculations or transformations on one or more existing columns.

- **View tab:** Provides a range of options to customize the appearance and functionality of the Power Query Editor workspace according to user preferences. The view tab also allows for a range of data profiling tools to be added to the columns. We will cover these later in the lesson.

- **Tools tab:** Contains a set of diagnostic tools for the management of diagnostics, error detection, performance evaluation, and management of query dependencies etc.

### 2. The Queries Pane

> In Power BI, each table is referred to as a *query*, whether it was generated by loading a data source, by duplicating an existing query, or created from scratch inside Power BI. The *Queries Pane* provides an overview of all the currently active queries in your workbook. It is located on the left side of the interface. Each query represents a specific instance of a connected dataset that you are working with, and you can use the queries pane to create multiple instances of the current query to perform disparate transforms on the same data.

This pane allows you to manage your queries, such as creating new queries or duplicating existing ones. It is also possible to group queries into folders for better organization, especially useful in complex projects with multiple queries.

When you select a query in the Queries Pane, its data preview and applied steps are shown in the main workspace area. Each query listed in the Queries Pane has a series of steps associated with it, which are the transformations or manipulations applied to the original data. You can review, modify, reorder, or delete these steps as needed.

<p align="center">
    <img src="images/queries_pane.png"  width="1000"/>
</p>
<br>


### 3. Centre (Data) Pane

> The center pane, often referred to as the Data Preview pane, provides a preview of the data contained within the selected query, allowing you to see the impact of your transformations in real-time.

The Data Preview pane displays your data in a table format, similar to an Excel spreadsheet. The data types of each column are indicated by icons in the column headers. Any transformations or manipulations you apply to your data, such as sorting, filtering, splitting, or merging columns, are instantly reflected in this pane, providing immediate feedback on your actions. 

In addition to viewing data, the data pane is also an interactive workspace where you can select columns, rows, or individual cells and apply various transformations using the options in the Power Query Editor ribbon. For instance, by right-clicking on a column header, you can access a context menu with a variety of operations like rename, duplicate, change type, remove, and others.

### 4. Right (Query Settings) Pane

> At the top of the Query Settings pane, you'll find the **Properties** section. Here, you can name or rename your query. The name you provide will be used to reference the table of data in Power BI's data model once you've loaded it from the Power Query Editor.

Below the **Properties section** is the **Applied Steps** section. This section provides a list of all transformations that have been applied to your data in the order they were applied. It serves as a history of your data transformations, giving you a clear picture of the sequence of operations that have led to your current dataset.


### The Advanced Editor

> The **Advanced Editor** lets you see the code that Power Query Editor is creating with each step. Every transformation that you create in the GUI is also represented as a code snippet in a functional language called M, which is written in the background as you work. You can also write your own transformations directly in M. To launch the **Advanced Editor**, select **View** from the ribbon, then select **Advanced Editor**. A window appears, in which you can view and edit the code associated with the current transformation.

### Saving your work
When your query is where you want it, select **Close & Apply** from the **Home** pane of the **Power Query Editor**. This action applies the changes and closes the editor.


## Data Types in Power Query Editor

>Power Query Editor can be used to change the data type of columns in the query. The data types you can select at this stage of the ETL process (e.g. text, whole number, Boolean etc.) are the building blocks that you'll use to shape your data before loading it into the Power BI model. This contrasts with a more diverse set of "semantic" data types, such as geographical categories, that can be selected when building the data model itself. We will cover these in more detail in another lesson.


### Different Types of Data

**1. Text:**  Used for storing alphanumeric characters. Suitable for names, addresses, and other string-based information.

**2. Whole Number:** Stores whole integers, both positive and negative. It's optimized for arithmetic operations.

**3. Decimal Number:** Used for storing decimal numbers, suitable for calculations requiring fractional numbers

**4. Percentage:** Though essentially a decimal number, this type is specially formatted to appear as a percentage

**5. Currency:** Similar to decimal numbers, but formatted to display currency symbols

**6. Date/Time:** This type contains both date and time information down to the second. It's useful for timestamping events.

**7. Date:** Stores only the date information. Suitable for scenarios where the time of day is not required.

**8. Time:** Holds the time of day without date information

**9. Duration:** Stores a span of time, measured in days, hours, minutes, and seconds

**10. Boolean:** Stores `True` or `False` values, suitable for flagging or binary choices




### Formatting Data Types
In Power Query Editor, you can change the data type of a column by:

1. **Manual Selection**: Click the data type icon next to the column name and select the appropriate type from the dropdown list

<p align="center">
    <img src="images/change_dtype_dropdown.gif"  width="900"/>
</p>
<br>
  
2. **Using Transform Menu**: Go to the **Transform** tab and find the **Data Type** dropdown where you can select the data type

<p align="center">
    <img src="images/datatype_transform_tab__hires.gif"  width="1200"/>
</p>
<br>

3. **Via Formula**: Use M Query functions like `#table`, `#datetime` etc. to set the type programmatically

### Considerations
- Changing the data type might result in errors if the existing data is not compatible with the selected type
- Certain types like `Date/Time` and `Currency` allow you to set additional locale-specific formatting options





## Cleaning Data in Power BI

Data rarely comes in a perfect form that's ready for analysis. Power Query Editor in Power BI offers a full set of tools for identifying and fixing issues such as nulls, errors and cardinality issues such as duplicate values.

### Table and Column Naming Conventions

In Power BI, it is best practice to use descriptive names for columns, using spaces rather than `_` underscores or `-` hyphens. This is because columns are specified inside DAX or Power Query M functions using `[Column Name]`, and because the names of columns often appears in the legend of the visuals you create.

For example, `SalePrice` or `sale_price` should be renamed to `Sale Price` or `Sale price`.

Tables should ideally still be named without spaces, although if spaces are used they can be referenced in `" "` quotes. Any abbreviations used in database management should be removed however. So for example `dim_products` should be renamed to `Products`, while `fact_orders` can become just `Orders`.

### Profiling Data

Before you begin cleaning, you should identify what needs to be addressed. The Power Query Editor contains a range of tools for profiling your data. These can be accessed from the **View** tab of the ribbon, and are designed to help you understand the quality and characteristics of your data at a glance, and identify missing or inconsistent data.

<p align="center">
    <img src="images/profiling_tools.jpeg"  width="900"/>
</p>
<br>

1. **Column Quality**: This tool provides a quick overview of the data quality within a specific column. It shows the percentage of valid, error, and empty values, allowing you to assess the reliability of the data you're working with.

<p align="center">
    <img src="images/column_quality.jpeg"  width="900"/>
</p>
<br>

2. **Column Distribution**: This feature gives you a snapshot of how individual data points are distributed among the range of possible values in a column. It displays a histogram of value frequencies, which is useful for spotting patterns, outliers, or potential errors in your dataset, and shows the counts of *distinct* and *unique* values. 

> Power BI uses some rather idiosyncratic terminology here. The term **distinct** is used to mean the set of values in the column, so for the list `[1,2,2,3,3,4]`, there would be 4 **distinct** values: `1`,`2`,`3` and `4`. Meanwhile the term **unique** is reserved for those values that appear only once in a column, so `1` and `4` in our example list.

<p align="center">
    <img src="images/column_distribution.jpeg"  width="900"/>
</p>
<br>


3. **Column Profile**: This is a comprehensive profiling feature that combines elements of both **Column Quality** and **Column Distribution**. It provides additional metadata like minimum, maximum, and average values, as well as the data type. The **Column Profile** can be particularly helpful when you are exploring a new dataset or diagnosing data issues.

<p align="center">
    <img src="images/column_profile.jpeg"  width="900"/>
</p>
<br>

## Cleaning Data in Power Query Editor

### Removing Duplicates or Errors
You can easily remove duplicate rows by selecting **Remove Duplicates** from the **Home** tab, or by right-clicking the column. This will ensure that only unique rows are retained. Similarly, it is possible to remove rows containing errors with the **Remove Errors** option.

### Replacing Values
The **Replace values** option allows you to substitute a specific value with another. This can be used to replace missing or null values with a specific value. There is also a **Replace Errors** option to do the same with rows containing errors.

<p align="center">
    <img src="images/replace_values.jpeg"  width="900"/>
</p>
<br>

### Filtering a Column to a Subset of Values

In Power Query Editor, you can filter a column to display only a subset of its values:

1. **Select the Column:** Click on the column header of the column you wish to filter.
2. **Apply Filter:** On the Home tab, click on the Filter dropdown. This will display a list of all unique values in the column.
3. **Choose Values:** From the dropdown list, select the values you want to retain. You can also search for specific values using the search bar at the top.
4. **Confirm Selection:** Once you've selected the desired values, click OK to apply the filter. Only rows containing the selected values in the filtered column will be displayed.

<p align="center">
    <img src="images/filter_column.gif"  width="900"/>
</p>
<br>







## Common Data Transformations

### Splitting Columns
When a single column contains multiple elements like first name and last name, or address components, you can use the **Split Column** feature. This divides the data in a single column into multiple columns based on a delimiter such as a comma, space, or custom text.  Power Query Editor provides several methods to effectively split a column into multiple columns:


- **Delimiter-Based:** Splits columns using a chosen character, such as a comma or space

    - Choose your column and go to **Home** > **Split Column** > **By Delimiter**
    - Select your delimiter from a list of common options, or input a custom delimiter
    - Specify where to split: left-most, right-most, or on each occurrence <br><br>


- **Specified Positions:** Splits the column into multiple columns at a set of different positions within the column
    - Select your column and go to **Home** > **Split Column** > **By Positions**
    - Enter the specific positions at which you'd like to split the column

There are several other methods, including at transitions between letters and numbers, or between lower and upper case letters. For a full list see the following [link](https://support.microsoft.com/en-au/office/split-a-column-of-text-power-query-5282d425-6dd0-46ca-95bf-8e0da9539662).


### Removing Columns
In some cases, you'll have columns that are unnecessary for your analysis. To declutter your dataset and make it easier to work with, you can remove these columns. Simply right-click on the column header and choose **Remove**. You can also select several columns using **CTRL + CLICK** and then right click one of the columns and select **Remove Other Columns**.

<p align="center">
    <img src="images/remove_cols.gif"  width="900"/>
</p>
<br>


### Column from Examples
The **Column from Examples** feature lets you create a new column based on data from existing columns. You manually provide example outputs, and Power Query Editor uses these examples to automatically recognize the pattern and generate the entire column for you. This is useful for tasks like extracting substrings or performing calculations across rows.

<p align="center">
    <img src="images/col_from_examples_hires.gif"  width="1000"/>
</p>
<br>



### Transposing Data
Sometimes your data will be in a "wide" format (where each subject's multiple measurements are spread across columns) but you want it in a  "long" format (where each measurement is a separate row). Using the **Transpose** feature, you can switch rows and columns with each other to better fit your analytical needs.

For example, if we had the following table:

| Product  | Q1 Sales | Q2 Sales | Q3 Sales | Q4 Sales |
|----------|----------|----------|----------|----------|
| Product A|    500   |    600   |    700   |    800   |
| Product B|    400   |    450   |    550   |    650   |

We could transpose it into something like this:


| Quarter  | Product A | Product B |
|----------|-----------|-----------|
| Q1 Sales |    500    |    400    |
| Q2 Sales |    600    |    450    |
| Q3 Sales |    700    |    550    |
| Q4 Sales |    800    |    650    |

To achieve this in Power BI, perform the following steps:

1. With the table selected, apply **Home** > **Use Headers as First Row**
2. Transpose the table
3. Apply **Home** > **Use First Row as Headers**
4. Rename the `[Product]` column as `[Quarter]`


<p align="center">
    <img src="images/proper_transpose_hires.gif"  width="1200"/>
</p>
<br>





### Grouping
Grouping data can help you organize a large dataset into more manageable sub-groups, which can then be aggregated using functions like `SUM`, `COUNT`, `AVERAGE` etc. You can do this by selecting **Group By** from the **Transform** tab.

For example, if you had the following data:

| Salesperson | Sales  |
|-------------|--------|
| Alice       |  1200  |
| Alice       |  1100  |
| Alice       |   500  |
| Bob         |  1500  |
| Bob         |   600  |
| Bob         |   400  |
| Carol       |  1000  |
| Carol       |   700  |

You might want to summarise the total sales for each salesperson. You would do this by selecting the `[Salesperson]` column, and selecting the **Group By** option. In the following screen, there is the option to select the aggregation type and the column to which it applies. In this case it would be `SUM` and `[Sales]`.

<p align="center">
    <img src="images/group_by.gif"  width="1200"/>
</p>
<br>



### Unpivoting

Unpivoting is the process of transforming a table's structure to move values that are spread across multiple columns into a single column. By condensing information from multiple columns into one, you facilitate easier data manipulation and enable a broader range of analytic functions. 

Let's look at an example:

| Month    | Product A Sales | Product B Sales |
|----------|-----------------|-----------------|
| January  |       1200      |       1500      |
| February |       1100      |       1600      |
| March    |       1300      |       1550      |
| April    |       1250      |       1400      |

In a typical **Unpivot** operation, you could convert this wide format into a long format, which makes it easier to analyze sales by product across multiple months.

| Month    | Product      | Sales  |
|----------|-------------|--------|
| January  | Product A    | 1200   |
| January  | Product B    | 1500   |
| February | Product A    | 1100   |
| February | Product B    | 1600   |
| March    | Product A    | 1300   |
| March    | Product B    | 1550   |
| April    | Product A    | 1250   |
| April    | Product B    | 1400   |

This long format is more flexible for analysis. For example, it allows you to easily compare sales figures for `Product A` and `Product B` across different months or to calculate monthly averages. The **Unpivot** option in Power BI can perform this transformation easily.

Top perform this operation in Power BI, select the `[Month]` column, right click and select **Unpivot Other Columns**"

<p align="center">
    <img src="images/unpivot.gif"  width="900"/>
</p>
<br>



## Power Query Worked Example - Creating a Star-Based Schema

To follow along with this tutorial, make sure you have downloaded and installed Power BI, and then downloaded the dataset from [this link](https://cdn.theaicore.com/content/lessons/652420d4-f578-40ab-a153-cea0f66f210f/Sales.csv).


### Loading the data

- First, we need to load our data table into Power BI. Begin by clicking the **Get Data** button in the Home pane, and selecting the **Text/CSV** option from the dropdown

- Navigate to the `Sales.csv` file on your hard drive and open it. This will open a dialog box showing a preview of the table we are importing.

- Choose **Transform Data** at the bottom of the screen to open the Power Query Editor


<p align="center">
    <img src="images/sales_table.jpeg"  width="900"/>
</p>
<br>

## Creating the Star-Based Schema

Inside the Power Query editor, you can see the columns of the sales spreadsheet. Each row represents one order from a hypothetical business. You will see that some columns pertain to the individual order, while others relate to the customer or to the product. 

A flat 'fact' table like this contains a lot of redundant information, and data analysts often transform a flat fact table like this into a star schema.


<p align="center">
    <img src="images/star_schema.jpeg"  width="500"/>
</p>
<br>

A star schema is a popular data modeling approach, named for its resemblance to a star, with lines radiating from a central table to multiple surrounding tables. The central table, known as the fact table, contains transactional data or measures that are quantitative in nature, such as sales amount or units sold, with one transaction per customer.

The associated dimension or 'dim' tables store descriptive attributes in a non-redundant manner, so for example the customers table has one row per customer. This approach to data modelling can lead to enhanced query performance, improved data integrity, and can make it easier to write complex queries.

Turning a flat table into a star schema is easy in Power Query, and will allow us to explore a lot of the functionality.

### Creating the dimension tables from the Fact table

1. Create two duplicates of the original fact table

    - Right click on the `Sales` table in the Queries Pane, and choose **Duplicate**
    - Repeat a second time
    - Select the first new table in the Queries Pane, and rename to `dim_Customer`
    - Rename the second one to `dim_product`
    <br><br>

2. Create the `dim_customer` table

    - Highlight the `dim_customer` table, and use the central pane to select the columns: `CustomerID`, `Email_Name`, `City`, `ZipCode`, `State`, `Region`, `District`, and `Country`
    - You can select multiple columns at the same time by holding down the **Ctrl** key (**Cmd** on Mac)
    - Right-click one of the selected columns and choose **Remove Other Columns** from the dropdown
    - Now highlight all the remaining columns in the central pane, right click and choose **Remove Duplicates**
    <br><br>
3. Split and clean the `Email Name` column

    The `Email Name` column in the `dim_customer` table contains multiple pieces of information that would be better as separate columns. There are multiple ways to do this in Power Query, but the easiest is using the 'split column' function:

    - Right click on the column and choose **split column** from the dropdown, or choose **split column** button from the Home pane on the ribbon
    - Choose the 'by delimiter' option, and select the **colon** option
    - Now do the same for the newly generated column, splitting by comma delimiter to create first and second names
    - Rename the columns to `email`, `first_name` and `last_name` by right-clicking and choosing the rename option
    - Finally let's remove the `()` parentheses from the emails: right click the email column and choose **Replace values**. Then enter the open parenthesis (`(`) in the `Value to Find` field, and leave the `Replace With` field blank. Then repeat the step with the `)` symbol.
    <br><br>
4. Create a website login for each customer

    We can create a unique website login for each customer using the powerful **New Column from Examples** feature:

    - Select the **Column from Examples** dropdown from the **Add Column** ribbon
    - Choose **from all columns**
    - In the new column, type the value for `last_name` for the topmost row, followed by a `-`, and then the `CustomerID` value
    - click 'OK'
    - A new column will be created, filling the correct info for each row according to the example given

    We can see all the changes we have made to our `dim_customer` table in the `applied_steps` pane to the right of the Power Query window.


<p align="center">
    <img src="images/customer_actions.jpeg"  width="400"/>
</p>
<br>

5. Create the `dim_product` table

    - In the `dim_product` duplicate table, simply highlight the following columns: `ProductID`, `Product`, `Category`, `Segment`, `Manufacturer`, `ManufacturerID`, `Unit Cost` and `Unit Price`
    - Right click and choose **Remove other columns**
    - Select all remaining columns, right-click on one of the column headings and choose **Remove duplicates**
    <br><br>
### Creating the new Sales table

Next we should return to our `Sales` table and remove any redundant columns. 

    - Remove all columns except for the following: `ProductID`, `CustomerID`, `Date`, `Units`



### Create a `dim_datetimes` table

Often, when doing time-based analysis, it is useful to have a separate date-times table, which contains lots of different information about a range of dates. Columns can include things like day-of-week, financial year, business quarter etc. One way to achieve this is using the M query language in the **Advanced Editor** window of the Power Query Editor. The same task can also be done via DAX while creating the data model, as we will see in a later lesson. Generally, actions performed in Power Query using M are more performant than those performed with DAX during data modelling, however the DAX syntax is significantly easier, and it is quicker to add a new column to your date table using DAX than it is to return to the Power Query editor to adjust your M query.

In this case, we will use free a script written in the M query language by Devin Knight, which can be found at this [URL](https://devinknightsql.com/2015/06/16/creating-a-date-dimension-with-power-query/). You can find the query on its own in this [text file](https://cdn.theaicore.com/content/lessons/652420d4-f578-40ab-a153-cea0f66f210f/example_datetime.txt).

- Right click the (left) query pane and select **New Query** > **Blank Query**
- Select the **Advanced Editor** from the View pane of the Ribbon
- Paste the contents of the [text file](https://cdn.theaicore.com/content/lessons/652420d4-f578-40ab-a153-cea0f66f210f/example_datetime.txt) into the **Advanced Editor**
- Click **Done**
- This will create a function. To invoke it we just need to choose start and end dates.
- Choose `01/01/2011` as the start date, and today's date as the end date
- Press **Invoke**

### Finishing Off

We have now finished creating the tables for our star schema, and so we want to save our work and close the Power Query Editor. 

- Click the **Close and apply** button in the Home pane of the ribbon
- The data will now be loaded into your session according to the schema that you have created


<p align="center">
    <img src="images/close_and_apply.jpeg"  width="400"/>
</p>
<br>

## Save Your Work!

In subsequent lessons we will see how to create a working data model based on the schema we have created, and using it to generate visualisations. Please make sure to __SAVE YOUR SESSION__ before closing Power BI. 

Go to **File** > **Save As**, and name your session `Power_BI_demo_session.pbix`.



## Key Takeaways

- Power Query Editor is an ETL tool enabling data import, transformation, and loading into Power BI for further analysis
- Power Query transforms do not alter the underlying data source, but are run each time the data are loaded or refreshed
- The Ribbon in **Power Query Editor** provides quick access to various functions for data extraction, transformation, and manipulation
- The **Queries** pane shows a list of all current queries
-  The **Centre** pane provides a real-time preview of the selected query data and its transformations
- The **Query Settings** pane includes the **Applied Steps** section that lists all applied data transformations
- **M** is a functional used in **Power Query** for data manipulation and transformation across various sources
- Transformations you make in the UI create code snippets in M in the background
- You can edit the M code for each query in the **Advanced Editor**