# Clean Data

**Clean data has the following advantages:**

1. Measures and columns produce more accurate results  
1. Tables are organized so that users can find the data in an intuitive manner  
1. Duplicates are removed  
1. Produces columns that can be used in slicers and filters.  
1. A complicated column can be split into two, simpler columns.  
1. Multiple columns can be combined into one column for readability.  
1. Codes and integers can be replaced with human readable values.

# Shape the Initial Data

Power Query Editor in Power BI Desktop allows you to shape (transform) your imported data
1. Renaming columns or tables
1. Changing text to numbers
1. Removing rows
1. Setting the first row as headers
1. etc.

## Get started with Power Query Editor  
**Shaping your data**
1. Open Power Query Editor by selecting the Transform data option on the Home tab of Power BI Desktop.
1. Selected query displays in the middle of the screen
1. On the left side, the Queries pane lists the available queries (tables).
1. __[Power Query Ribbon](https://docs.microsoft.com/en-us/power-query/power-query-quickstart-using-power-bi#the-query-ribbon)__

## Identify column headers and names 
The first step in shaping your initial data is to identify the column headers and names within the data and then evaluate where they are located to ensure that they are in the right place.

## Promote headers
A data source might have a first row that contains column names, and headers are promoted in two ways: 
1. By selecting the Use First Row as Headers option on the Home tab
2. Selecting the drop-down button next to Column1 and then selecting Use First Row as Headers

## Rename columns 
The next step in shaping your data is to examine the column headers. Correct any inconsistencies or errors. Rename column headers in two ways:
1. Right-click the header, select Rename, edit the name, and then press Enter
1. Double-click the column header and overwrite the name with the correct name. 

## Remove top rows 
To remove top rows:
1. Select Remove Rows > Remove Top Rows on the Home tab. 

## Remove columns 
Remove columns in two ways:
1. Select the columns to remove and then, on the Home tab, select Remove Columns. 
1. Select the columns to keep and then, on the Home tab, select Remove Columns > Remove Other Columns.

## Unpivot columns 

Unpivoting is a relational operator that accepts two columns, along with a list of columns, and generates a row for each column specified in the list.
It turns columns into rows

|Month|2018|2019|
|:-|:-|:-|
|January|12|13|
|February|14|12|

when unpivoted this becomes
|Month|Attribute|Value|
|:-|:-|:-|
|January|2018|12|
|January|2019|13|
|February|2018|14|
|February|2019|12|

## Pivot columns 
For shaping flat data with a lot of detail but is not organized or grouped in any way use the Pivot Column feature to convert your flat data into a table that contains an aggregate value for each unique value in a column.

# Simplify the Data Structure
When you import data from multiple sources into Power BI Desktop, the data retains its predefined table and column names. You might want to change some of these names so that they are in a consistent format, easier to work with, and more meaningful to a user.

## Rename a query 
To rename a query:
1. Power Query Editor, in the Queries pane to the left of your data, select the query that you want to rename. 
1. Right-click the query and select Rename. 
1. Edit the current name or type a new name
1. Press Enter.

## Replace values
To replace any value with another value in a selected column:
1. Select the column that contains the value that you want to replace 
1. Select Replace Values on the Transform tab. 
1. In the Value to Find box, enter the name of the value that you want to replace
1. In the Replace With box, enter the correct value name
1. Select OK. In Power Query

## Replace null values
See replace values

## Remove duplicates 
To remove duplicates from columns to only keep unique names:
1. Right-click on the header of the column
1. Select the Remove Duplicates option. 

## Naming best practices
Use the language and abbreviations that are commonly used within your organization and that everyone agrees on and considers them as common terminology. 

# Evaluate and change column data types
When you import a table from any data source, Power BI Desktop automatically starts scanning the first 1,000 rows (default setting) and tries to detect the type of data in the columns. 

A higher chance of getting data type errors when you are dealing with flat files

## Implications of incorrect data types
1. Incorrect data types will prevent you from creating certain calculations, deriving hierarchies, or creating proper relationships with other tables. 
1. An incorrect data type applied on a date field is the inability to create a date hierarchy, which would allow you to analyze your data on a yearly, monthly, or weekly basis. 

## Change the column data type
Change the data type of a column in two places: 
1. Power Query Editor
    a. Select the column that has the issue  
    b. Select Data Type in the Transform tab  
    c. Select the correct data type from the list  
    d. or Select the data type icon next to the column header
    e. Select the correct data type from the list
1. Power BI Desktop Report view by using the column tools
1. Best practice is to change the data type in the Power Query Editor before data is loaded
1. __[Data Types in Power BI](https://docs.microsoft.com/en-us/power-bi/connect-data/desktop-data-types/)__

# Combine multiple tables into a single table
Allows you to append or merge different tables or queries together. You can combine tables into a single table in the following circumstances:
1. Too many tables exist
1. Several tables have a similar role
1. A table has only a column or two that can fit into a different table
1. Use several columns from different tables in a custom column

**Merging** and **Appending**

## Append queries
When you append queries, you will be adding rows of data to another table or query.

## Merge queries
When you merge queries, you are combining the data from multiple tables into one based on a column that is common between the tables. 

Join options include:
1. Left Outer - Displays all rows from the first table and only the matching rows from the second.
1. Full Outer - Displays all rows from both tables.
1. Inner - Displays the matched rows between the two tables.
1. __[Shape and Combine Data in Power BI](https://docs.microsoft.com/en-us/power-bi/connect-data/desktop-shape-and-combine-data/)__

# Profile data in Power BI
Profiling data is about studying the nuances of the data: determining anomalies, examining and developing the underlying data structures, and querying data statistics such as row counts, value distributions, minimum and maximum values, averages, and so on.

## Examine data structures

1. View the current data model under the Model tab on Power BI Desktop
1. Edit specific column and table properties by selecting a table or columns
1. Transform the data by using the Transform Data button
1. Manage, create, edit, and delete relationships between different tables by using Manage Relationships

## Find data anomalies and data statistics 
Data anomalies are outliers within your data. Determining what those anomalies are can help you identify what the normal distribution of your data looks like and whether specific data points exist that you need to investigate further. Power Query Editor determines data anomalies by using the **Column Distribution** feature.

To understand data anomalies and statistics
1. Select the Column Distribution, Column Quality, and Column Profile options
1. Column quality and Column distribution are shown in the graphs above the columns of data  
    a. Column quality shows you the percentages of data that is valid, in error, and empty  
    b. Column distribution shows you the distribution of the data within the column and the counts of distinct and unique values  
1. Column profile gives you a more in-depth look into the statistics within the columns for the first 1,000 rows of data 
    a. Row Count
    b. outliers, empty rows and strings, and the min and max
    c. Value distribution graph tells you the counts for each distinct value in that specific column
    d. For numeric columns: how many zeroes and null values exist, average value, standard deviation , even and odd values

# The Elements of Profiling Data in Power BI Summary
1. Load data in Power BI
1. Interrogating column properties
1. Make further edits to the type and format of data in columns
1. Finding data anomalies
1. Viewing data statistics in Power Query Editor. 

# Use Advanced Editor to modify M code
If you need to