# Importing Data in Power BI

## Motivation

Power BI has a range of powerful tools for data ingestion. Learning about in this lesson them will allow you to do the following:

- Efficiently import multiple data types into Power BI
- Choose the correct import mode for your project, so that the data are updated in the correct way
- Combine data from spreadsheets, relational databases and other sources
- Understand the additional steps required to connect an AWS database to Power BI

> Power BI supports a wide range of data sources, including relational databases, Excel files, flat files (e.g., CSV, JSON, XML), cloud-based services (e.g., Azure, Salesforce, Google Analytics), and web-based APIs. Each can be accessed using its own dedicated data connector, which can be accessed via the `Get Data` dropdown on the Home pane.


<p align="center">
    <img src="images/get_data.png"  width="400"/>
</p>
<br>


## General Process for Importing Data


> Power BI offers a very broad selection of data connectivity options, allowing you to connect to a huge range data sources. This includes traditional relational databases like SQL Server, MySQL, and PostgreSQL, as well as cloud databases like Azure SQL Database nd AWS RDS. Power BI can also connect to NoSQL databases like MongoDB, and big data platforms like Azure Data Lake Storage, Databricks and Apache Hadoop. Furthermore, connections can be made directly to online services like Google Analytics, Github, LinkedIn, Mixpanel etc. And of course Power BI can also import data directly from flat files such as CSV, XML, and JSON, or from Excel spreadsheets.

The general procedure is as follows:

- Click the `Get Data ` dropdown menu in the `Home` tab of the ribbon at the top of the screen
- Choose the appropriate connector. See here for a list of [possible connector types](https://learn.microsoft.com/en-us/power-bi/connect-data/desktop-data-sources)
- Select which data you want to import from the subsequent window
- Choose either Load or Transform Data. The latter will take you to the `PowerQuery` editor


## Importing Flat Files

> *Flat files* are a type of data file in which data is stored in a plain text format with a simple structure, typically with rows representing records and columns representing fields. In contrast to relational databases such as SQL, they do not contain relationships between records. Examples include csv and Excel xlsx files.

There are several options for importing flat files to Power BI, in terms of the location where the files are stored.

<p align="center">
    <img src="images/files.png"  width="700"/>
</p>
<br>


### Local 

Data are be loaded into Power BI from a local file on your computer. The file isn't transferred to Power BI, nor does a connection persist to the original file. Rather, a fresh dataset is generated in Power BI, populated with the data from the local file. Consequently, any modifications to the original file will not be mirrored in the Power BI dataset unless re-imported. Local data import is suitable for static data that remains unchanged.

### Web

This is a similar solution to loading from a local drive, except via a URL. It does not support real-time updating. The same is true for data hosted on Google drive. In the latter case it is not possible to link directly to the file on Google Drive, but a workaround is to publish the Google Sheet in question to the web, and then connect via the sharing URL

In Google Sheets : click on `File` -> `Publish to the web` -> `Entire Document` -> `CSV`, then click on `Publish`.

For either a standard URL or a shared Google Sheets document, go to `Get Data` -> `web` and enter the URL you want to import data from.

### OneDrive for Business

This approach is useful for maintaining consistency between a flat file and your Power BI dataset, reports, and dashboards. Power BI frequently checks your file on OneDrive for any changes. Should any updates be detected, your dataset, reports, and dashboards in Power BI are seamlessly refreshed.

### OneDrive (Personal)

You can use data from files on a personal OneDrive account, and get many of the same benefits that you would with OneDrive for Business. However, you'll need to sign in with your personal OneDrive account, and select the Keep me signed in option. Check with your system administrator to determine whether this type of connection is allowed in your organization.

### SharePoint - Team Sites 

Storing your Power BI Desktop files on SharePoint Team Sites resembles the process of saving to OneDrive for Business. The main difference is in the method of connecting to the file from Power BI. You have the option to designate a URL or establish a connection to the root folder.





## Refreshing Data  imported from local or URL-Hosted Flat File

Data imported from a local or URL-hosted flat file will not automatically refresh to reflect any changes to the source file. In order to update the Power BI representation to reflect any changes to the source file, it must be manually refreshed. To do so, click the "Refresh" button on the Home tab in the Power BI Desktop. This will reload the data from the flat file into your Power BI report.

<p align="center">
    <img src="images/refresh.png"  width="100"/>
</p>
<br>


## Importing Data from Relational Sources

>Power BI makes it relatively easy to connect to SQL databases, either locally-hosted or hosted on Microsoft Azure. It is also possible to connect to databases hosted on AWS RDS, but per the now familiar pattern for anything outside of the Microsoft ecosystem, it requires more steps! 

First let's look at how to connect to a locally hosted database, or one hosted on Azure. RDS will then be covered at the end of this section.

1. Open Power BI Desktop and select 'Get Data' on the home ribbon
2. Select 'SQL Server' under the 'Database' category
3. In the SQL Server window, provide the server name and database name. If it's a local database, use the format 'server\instance' for the server name. For Azure, it'll be the fully qualified domain name.
4. Choose the appropriate authentication method. If the database requires a username and password, select `Database`level authentication and enter these credentials. For Azure, you can use `Microsoft Account` or `Azure Active Directory` options.
5. Click 'Connect'

After the connection is established, you can start importing or directly querying data from the database.



## Authenticating Access to the Database

There are three sign-in options for authenticating access to an SQL database with Power BI:

- Database: use your database credentials, for example via your postgres username and password
- Windows: use your Windows account (Azure Active Directory credentials)
- Microsoft account: Use your Microsoft account credentials. This option is often used for Azure services


<p align="center">
    <img src="images/three_connection_modes.png"  width="700"/>
</p>
<br>

## Import Modes for Relational Databases

> There are two connectivity modes for relational database connections in Power BI:  Import and DirectQuery. 
Import is selected by default. Their characteristics and relative merits are detailed below.

#### Import Mode

> In this mode, a snapshot of the data is imported into Power BI. This data can be refreshed on a schedule, or manually updated, but it isn't real-time. 

- Allows you to import all the data from the data source into the Power BI model
- All queries you then run against the data are run locally
- It enables quick data exploration and provides high performance.
- Limit to the volume of data you can import due to memory restrictions, according to payment plan
- Data isn't real-time, so not suitable for rapidly-changing data

#### DirectQuery

> This mode doesn't import or copy the data into Power BI. Instead, it directly runs the queries against the data source whenever you interact with the visual. This allows real-time updates.

- Doesn't require you to import data into Power BI, so low memory usage
- Allows real-time data visualization as it queries the data source directly for each interaction
- Good for large datasets which exceed Power BIâ€™s memory limits
- Less performant, as queries must be run anew for each interaction with the visualisaiton
- Some Power BI features are limited or unavailable when using DirectQuery due to the fact that the data is not stored within Power BI, such as some types of calculated columns or measures.


<p align="center">
    <img src="images/import_vs_directquery.png"  width="400"/>
</p>
<br>

#### Dual Mode

> In Dual mode, you can identify some data to be directly imported and other data that must be queried. Any table that is brought in to your report is a product of both Import and DirectQuery modes. Using the Dual mode allows Power BI to choose the most efficient form of data retrieval. Dual mode is only applicable to the entire project, as each connection will use either Import or DirectQuery mode.




## Connecting to AWS RDS



This is similar to the general instructions for connecting to a relational database, in that you should use the data connector appropriate to the type of database you are hosting on AWS. However there are additional steps to authenticate the connection. Options are as follows:


<p align="center">
    <img src="images/SQL_db.png"  width="700"/>
</p>
<br>

### 1. Convert Certificate

If you require a secure connection to your database, use the following approach:

1. Download the RDS access key `.pem` file and convert it to a **PKCS#7/P7B** certificate. It can be converted online using this [link](https://www.sslshopper.com/ssl-converter.html)
2. Import the certificate to the trusted root certificate. The steps to import certificate are available [here](https://www.sslsupportdesk.com/how-to-enable-or-disable-all-puposes-of-root-certificates-in-mmc/)
3. Open Power BI Desktop. Click on `Getdata` => `Database` => `Postgresql database`. Enter credentials (`Database` is postgres, `Server` is endpoint, `Port` is port, and Username and Password of AWS RDS Database) and try to login.


### 2. Turn off Secure Connection

Alternatively you can bypass the above steps by unchecking the `Encrypt connection` option in the SQL Server database connection settings. Please note however this is not recommended practice. While it might conceivably be ok for a hobby project, this would not be acceptable for sensitive data in the workplace.

<p align="center">
    <img src="images/encryption_settings3.png"  width="700"/>
</p>
<br>


## Selecting Data to Import

Once you have successfully connected to the relational database, you have the option to import data either vias the UI, or using an SQL query. The latter option is useful if you do not wish to import the entirety of a given table, for example if you only want to focus on sales in a particular year, then there is no need to import the data for any of the other years.

If you select to use an SQL query, this can be written in the `SQL Statement` box in the `Advanced` tab:

<p align="center">
    <img src="images/via_query.png"  width="700"/>
</p>
<br>




## Key Takeaways
- Power BI supports diverse data sources like relational databases, flat files, cloud services, and web APIs, accessible via dedicated data connectors from the `Get Data` dropdown
- It can connect with a wide range of data sources via a wide selection of specialised data connectors
- Flat files, such as CSV and Excel files, can be imported into Power BI from various locations including local machine, web URL,  OneDrive for Business, OneDrive Personal, and SharePoint Team Sites
- Changes to data imported from local or URL-hosted flat files in Power BI are not automatic and require manual refreshing using the `Refresh` button on the `Home` tab.
- Power BI readily connects to SQL databases hosted locally or on Azure platforms
- Connecting to databases on AWS RDS is feasible but requires additional steps
- Power BI offers two main connectivity modes for relational databases: Import and DirectQuery
- Import Mode is best for quick data exploration and smaller datasets due to memory restrictions
- DirectQuery Mode suits larger datasets and situations where real-time updates are crucial, but is less performant
