The ability to combine queries is powerful because it allows us to append or merge different tables or queries. We can combine mutiple tables into a single table in the following circumstances:

* Too many tables exist, making it difficult to navigate an overly complicated data model
* Several tables have similar roles
* A table has only a column or two that can fit into a different table
* We want to use several columns from different tables in a custom column

In this file, we will learn how to do the following:

* Profile data to learn more about a specific column before using it
* Apply data shape transformations to table structures
* Combine queries
* Edit **M code** in the Advanced Editor

We can combine the tables in two different ways: 
* merging and 
* appending.

Assume that we are developing Power BI reports for the **Sales and HR teams**. They have asked us to create a contact information report that contains the contact information and location of every employee, supplier, and customer. The data is in the `HR.Employees`, `Production.Suppliers`, and the `Sales.Customers` tables, as shown in the following image.

![image.png](attachment:image.png)

However, this data comes from multiple tables, so the dilemma is determining how we can merge the data in these multiple tables and create one source-of-truth table from which to create a report. Power BI allows us to combine and merge queries into a single table.

![image.png](attachment:image.png)

![image.png](attachment:image.png)

When we append queries, we'll be adding rows of data to another table or query. For example, we could have two tables, one with 300 rows and another with 100 rows, and when we append queries, we will end up with 400 rows. When we merge queries, we'll be adding columns from one table (or query) into another. To merge two tables, we must have a column that is the key between the two tables.

For the previously mentioned scenario, we'll append the `HR.Employees` table with the `Production.Suppliers` and `Sales.Customers` tables so that we have one master list of contact information. 

Because we want to create one table that has all contact information for employees, suppliers, and customers, when we combine the queries, the pertinent columns that we require in our combined table must be named the same in our original data tables to see one consolidated view.

Before we begin combining queries, we can remove extraneous columns that we don't need for this task from our tables. To complete this task, we'll format each table to have only four columns with our pertinent information and rename them so they all have the same column headers: `ID`, `company`, `name`, and `phone`. The following images are snippets of the reformatted `Sales.Customers`, `Production.Suppliers`, and `HR.Employees` tables.

![image.png](attachment:image.png)

After we have finished reformatting, we can combine the queries. On the **Home** tab on the **Power Query Editor ribbon**, we select **Combine**, and then select the drop-down list for **Append Queries**. We can select the following:

* **Append Queries as New**, which means that the output of appending will result in a new query or table,
* **Append Queries**, which will add the rows from an existing table into another.

The next task is to create a new master table, so we need to select **Append Queries as New**. This selection will bring us to a window where we can add the tables that we want to append from Available Tables to Tables to Append, as shown in the following image.

![image.png](attachment:image.png)

After we have added the tables that we want to append, select **OK**.

We'll be routed to a new query that contains all rows from all three of our tables, as shown in the following image.

![image.png](attachment:image.png)

We have now succeeded in creating a master table that contains the information for the employees, suppliers, and customers. We can exit Power Query Editor and build any report elements surrounding this master table.

However, if we wanted to merge tables instead of appending the data from one table to another, the process would be different.

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

When we merge queries, we are combining the data from multiple tables into one based on a column that is common between the tables. This process is similar to the **JOIN clause in SQL**. Consider a scenario wherein the `Sales` team now wants us to consolidate orders and their corresponding details (which are currently in two tables) into a single table. We can do this by merging the two tables,` Orders` and `OrderDetails`, as shown in the following image. The column that is shared between these two tables is `OrderID`.

![image.png](attachment:image.png)

To do this, we go to **Home** on the **Power Query Editor ribbon** and select **Combine** then select the **Merge Queries drop-down**, where we can select **Merge Queries as New**. This selection will open a new window, where we can choose the tables that we want to merge from the drop-down list, and then we'll select the column that matches between the tables, which in this case is `orderid`.

![image.png](attachment:image.png)

We can also choose how to join the two tables together, a process that is also similar to JOIN statements in SQL. These join options include the following:

* **Left Outer** - Displays all rows from the first table and only the matching rows from the second.
* **Full Outer** - Displays all rows from both tables.
* **Inner** - Displays the matched rows between the two tables.

For this scenario, we'll choose to use a **Left Outer** join and select **OK**, which will route us to a new window where we can view our merged query.

![image.png](attachment:image.png)

Now, we can merge two queries or tables in different ways so that we can view our data in the most appropriate way for our business requirements.

For more information on this topic, see the [**Shape and Combine Data in Power BI** documentation](https://docs.microsoft.com/en-us/power-bi/connect-data/desktop-shape-and-combine-data/).

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

Profiling data is about studying its nuances, like so: 

* Determining anomalies
* Examining and developing the underlying data structures
* Querying data statistics such as row counts, value distributions, minimum and maximum values, averages, and so on

This concept is important because it allows shaping and organizing the data so that interacting with the data and identifying the distribution of the data is uncomplicated, thereby helping to simplify working with the data on the front end to develop report elements.

Assume that we're developing reports for the Sales team atyour organization. We're uncertain how the data is structured and contained within the tables, so we want to profile the data behind the scenes before we begin developing the visuals. Power BI has inherent functionality that makes these tasks user-friendly and straightforward.

Before we begin examining the data in Power Query Editor, we should first learn about the underlying data structures that data is organized in. We can view the current data model under the **Model tab on Power BI Desktop**.

![image.png](attachment:image.png)

On the **Model** tab, we can edit specific column and table properties by selecting a table or columns, and we can transform the data by using the **Transform Data** button, which takes us to Power Query Editor. Additionally, we can manage, create, edit, and delete relationships between different tables by using **Manage Relationships**, which is located on the ribbon.

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

After we've created a connection to a data source and selected **Transform Data**, we are brought to Power Query Editor, where we can determine if anomalies exist within our data. Data anomalies are outliers within our data. Determining what those anomalies are can help.

We identify what the normal distribution of our data looks like and whether specific data points exist that we need to investigate further. Power Query Editor determines data anomalies by using the **Column Distribution** feature.

Select **View** on the ribbon, and under Data Preview, we can choose from a few options. To understand data anomalies and statistics, check the **Column Distribution**, **Column Quality**, and **Column Profile** options. The following figure shows the statistics that appear.

Column quality and Column distribution are shown in the graphs above the columns of data. Column quality shows us the percentages of data that is valid, in error, and empty. In an ideal situation, we want 100 percent of the data to be valid.

![image.png](attachment:image.png)

**Note:** By default, Power Query examines the first 1,000 rows of your dataset. To change this, select the profiling status in the status bar and select Column profiling based on entire dataset.

* **Column distribution** shows us the distribution of the data within the column and the counts of distinct and unique values, both of which can tell us details about the data counts. Distinct values are all values in a column, including duplicates and null values, while unique values don't include duplicates or nulls. Therefore, 
    * **distinct** in this table tells us the total count of how many values are present, while
    * **unique** tells us how many of those values only appear once.

* **Column profile** gives us a more in-depth look into the statistics within the columns for the first 1,000 rows of data. This column provides several different values, including the count of rows, which is important when verifying whether the importing of our data was successful. For example, if our original database had 100 rows, we could use this row count to verify that 100 rows were, in fact, imported correctly. Additionally, this row count will show how many rows that Power BI has deemed as being outliers, empty rows and strings, and the min and max, which will tell us the smallest and largest values in a column, respectively. This distinction is particularly important in the case of numeric data because it will immediately notify us if we have a maximum value that is beyond what our business identifies as a "maximum." This value calls our attention to these values, which means that we can then focus our efforts when delving deeper into the data. In the case where data was in the text column, as seen in the previous image, the minimum value is the first value, and the maximum value is the last value when in alphabetical order.

Additionally, the Value distribution graph tells us the counts for each distinct value in that specific column. When looking at the graph in the previous image, notice that the value distribution indicates that "Anthony Grosse" appears the greatest number of times within the SalesPerson column and that "Lily Code" appears the least number of times. This information is particularly important because it identifies outliers. If a value appears far more than other values in a column, the Value distribution feature allows us to pinpoint a place to begin our investigation into why this is so.

On a numeric column, **Column Statistics** will also include how many zeroes and null values exist, along with the average value in the column, the standard deviation of the values in the column, and how many even and odd values are in the column. These statistics give us an idea of the distribution of data within the column. They're important because they summarize the data in the column and serve as a starting point to determine the outliers.

For example, while looking through invoice data, we notice that the Value distribution graph shows that a few salespeople in the `SalesPerson` column appear the same number of times within the data. Additionally, we notice the same situation has occurred in the Profit column and in a few other tables as well. During our investigation, we discover that the data we were using was bad data and needed to be refreshed, so we immediately complete the refresh. Without viewing this graph, we might not have seen this error so quickly. This is why value distribution is essential.

After we have completed our edits in Power Query Editor and are ready to begin building visuals, we return to **Home** on the Power Query Editor ribbon and select **Close & Apply**, which will return us to Power BI Desktop and any column edits/transformations will also be applied.

We have now determined the elements that make up profiling data in Power BI, which include loading data in Power BI, interrogating column properties to gain clarity about and make further edits to the type and format of data in columns, finding data anomalies, and viewing data statistics in Power Query Editor.

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

Each time we shape data in Power Query, we create a step in the Power Query process. Those steps can be reordered, deleted, and modified where it makes sense. Each cleaning step that we made was likely created by using the graphical interface, but Power Query uses the **M language** behind the scenes. The combined steps are available to read by using the Power Query Advanced Editor. 

The **M language** is always available to read and modify directly. However, it isn't required to use **M code** to take advantage of Power Query. We will rarely need to write **M code**, but it can still prove useful. Because each step in Power Query is written in **M code**, even if the UI created it for us, we can use those steps to learn **M code** and customize it to suit our needs.

After creating steps to clean data, select the **View** ribbon of Power Query, and then select **Advanced Editor**.

![image.png](attachment:image.png)

The following screen should appear.

![image.png](attachment:image.png)

Each Power Query step will roughly align with one or two lines of **M code**. We don't have to be an expert in **M code** to be able to read it. We can even experiment with changing it. For instance, if we need to change the name of a database, we could do it right in the code and then select **Done**.

We might notice that **M code** is written top-down. Later steps in the process can refer to previous steps by the variable name to the left of the equal sign. Be careful about reordering these steps because it could ruin the statement dependencies. Write to a query formula step by using the in statement. Generally, the last query step is used as the final dataset result.

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

In this lesson, we learned how we can profile data to understand more about a specific column before using it. we also learned to apply data shape transformations to table structures.

Additionally, we learned how to combine queries so that they were fewer in number, which streamlines data navigation.