<img src="./images/logo.png" alt="Drawing" style="width: 500px;"/>

# **Exercise 3:** Visualizing Data with Superset

In the previous two exercises, we generated a large amount of synthetic sales data and created a smaller dataset using the Query Editor and EzPresto. In this exercise, we will visualize insights within this smaller dataset with **Apache Superset**. Superset allows users to create interactive dashboards and visualizations from various data sources, making it a valuable tool for data analysis and exploration.

In this exercise, you will:

- Explore how to create a dataset from a cached data asset, ensuring efficient data retrieval and processing.
- Learn to generate several types of charts, including bar charts, line charts, and scatter plots, to visualize different aspects of your data.
- Combine these individual charts into a comprehensive dashboard, providing a holistic view of your data and facilitating insights and decision-making.

By the end of this exercise, you will have the skills to leverage Superset effectively for data visualization, enabling you to communicate complex insights and trends with clarity and impact.

Let's dive in and start visualizing data with Superset!

## **1. Connecting Superset to EzPresto**

First, you will need to make the data assets available to Superset. As demonstrated throughout Exercise 2, **HPE Ezmeral Unified Analytics** leverages **EzPresto** to connect the Data Sources and Cached Assets to hosted applications. 

To create an EzPresto connection in Superset:


1. Navigate back to the Unified Analytics dashboard.
1. In the sidebar navigation menu, select `Tools & Applications` > `Data Engineering`.
1. Under `Superset`, click `Open`. Superset will now open in a new tab.
1. In Superset, click the `+` button in the top-right corner.
1. Under `Data`, click `Connect database`.

<img src="./images/exercise3/DBConnections.png" alt="Drawing" style="width: 40%;"/>

6. Click `Presto`
7. In the `SQLAlchemy URI` field, enter the following: 

    `presto://ezpresto.<YOUR_DOMAIN>:443/cache`
 
    Replace **< YOUR_DOMAIN >** with the domain name of your Unified Analytics cluster. (To find this, observe the URL of your Unified Analytics dashboard. It will be in the format of `https://home.<YOUR_DOMAIN>`

<img src="./images/exercise3/presto.png" alt="Drawing" style="width: 30%;"/>

8. Click `Test Connection`. If the connection is successful, a message will appear in the bottom right corner stating that the 'Connection looks good!'
9. Click `Connect`. 

<div class="alert alert-block alert-danger">
<b>Important</b> Make sure you finished Exercise 2 successfully before and have the name of your cached asset at hand
</div>

## **2. Connecting Cached Assets to Superset**

In the first two Exercises, you synthetically generated sales data, ran an SQL query and saved the resulting dataset as a Cached Asset. Regardless of the data source or query, you are now equipped to work data into datasets for specific use cases and make those datasets (as Cached Assets) available to applications on Unified Analytics via EzPresto. This includes visualizing data through charts, graphs and maps on Superset. 

To visualize the sales dataset, which you saved as a Cached Asset, we must first add the Cached Asset as a **Dataset** in Superset.

1. On the Superset home screen, click the `+` button in the top-right corner once again.
1. Under `Data`, click `Create dataset`.
1. Under `Database`, select `Presto`.
1. Under `Schema`, select `retailschema`.
1. Ensure the schema is correct. There should be **eleven** table columns.

image

6. Click `Create Dataset and Create Chart`.
7. You have successfully added the Cached Asset as a Dataset in Superset. 

After clicking `Create Dataset and Create Chart`, you will automatically be presented with the 'Create Chart' menu. To understand how to get back to this window in the future to create more Charts, we'll navigate here a different way. 

Click the **Superset** logo in the top-left corner to return to the home screen.

## **3. Creating Data Charts**

In Superset, a Chart is a visual representation of data created using various types of visualizations such as bar charts, line charts, and scatter plots. Users can easily create Charts by selecting the desired visualization type, specifying the data source and columns, and customizing various aspects of the chart's appearance. 

Let's create some Charts to visualize different aspects of sales data and gather some insights. 



### Creating a new Chart
1. In the Superset home screen's top left navbar menu, select `Charts`. When you increase the number of data sources, datasets and subsequently, create many charts, you can use the fitler boxes at the top of the Charts table to filter and sort through all Charts. You can also import Charts others have made.
2. Click `+ Chart` in the top-right corner.

<img src="./images/exercise3/filters.png" alt="Drawing" style="width: 80%;"/>

### Selecting a Chart type
1. Under 'Choose a dataset', select `retail`. 

Explore the Chart options by clicking on a few in this window. You can see what types of data are best represented by each type of Chart through the **grey tags** that appear in the bottom-left corner whenever you select a Chart. For our first chart, we are going to represent the **total sales and revenue of each country** using a **Bar chart**. 

<img src="./images/exercise3/charts.png" alt="Drawing" style="width: 60%;"/>

2. Select `Bar Chart` in the Chart options.
3. Click `Create New Chart`.

### Defining Chart Data

Let's first take a quick tour of the Chart creation page:

<img src="./images/exercise3/chartdetails.png" alt="Drawing" style="width: 40%;"/>

**1. Chart Name**: The Unique identifier for the chart. If you end up creating similar charts, identifying each variant with a number suffix is common practice (e.g. *German Sales Chart 3*.)

**2. Chart Source**: Defines the data used in your chart. In the example, data comes from the table `retailschema.retail` from the `retail` dataset.

**3. Chart Types**: Change the type of Chart you are creating. The most common chart types are provided as easy access icons. Clicking `View all charts` will bring up the full Chart Creation window that you saw earlier.

**4. Search Metrics & Columns**: Search through the dataset to find the specific data be displayed in your chart. This is useful for very large datasets that can contain tens or even hundreds of columns. 
* **Metrics:** Represent numerical values derived using functions/queries (e.g., `f(x) SUM(totalsales)` will return the total number of sales for a given Column data, such as a particular product ID).
* **Columns:** Represent categorical data (e.g., `# productid`).

**5. X-Axis**: Configures the x-axis of your chart.
* **X-Axis:** Defines the independent input variable (e.g. get the amount of sales (*output*) over this as time period (*input*), or get the total number (*output*) of an item (*input*) from this location (*input*)). 
* **X-Axis Sort By:** Defines how to sort data on the x-axis (e.g., sorting by `# productid` in ascending order).
    
**6. Metrics**: Numerical values you want to measure or analyze. 
* These are typically displayed on the y-axis of charts and represent quantities you want to visualize (e.g., total sales, average order value, customer count).

**7. Dimensions**: Categorical attributes that define groups or categories within your data. 
* They are often used on the x-axis of charts to segment the data (e.g., product ID, customer name, region).
* You can view Dimension data by hovering over bars or segments on graphs, where the axes may not specifically display that variable.

**8. Filters**: Refine the data used in your chart by applying filters in the form of columns or metrics. 
* This helps you analyze trends or patterns within a particular segment. For example, you might filter a chart showing total sales by region to focus only on sales in "Germany".

**9. Series Limit**: Sets a limit on the number of categories or groups (series) displayed in your chart.
* When your chart displays data with many series, it can become cluttered. Applying a Series Limit helps you control the number of series displayed.

**10. Sort By**: Define the order in which data is displayed within your chart.
* Sorting by metrics can highlight categories with the highest or lowest values, while sorting by dimensions can group related data together.

For our first chart, we will make a bar chart to visualize the **total sales** of the **top 10** selling **fruits** of each year in ascending order of total sales. 

Update your fields (including Chart Name) to match the following, then click `Update Chart`. The resulting Chart should look l

<img src="./images/exercise3/firstchart.png" alt="Drawing" style="width: 80%;"/>



Now, click `Save` in the top-right corner.

In the appearing pop-up window, you will notice the `Add to Dashboard` field. After creating your first Dashboard, this is by far the easiest way to create Charts for a specific use case or presentation and quickly add it to a particular a Dashboard. Seeing as we don't currently have any Dashboards, the option to `Save and Go to Dashboard` is greyed out. 

Finally, click `Save`. With that, you've just created your first Superset chart!


## **4. Creating a Superset Dashboard**

A Superset Dashboard is a collection of charts and visualizations that provide a comprehensive overview of data insights. These dashboards can be used for monitoring key metrics, analyzing trends, and making data-driven decisions within organizations.

1. In the Superset home screen's top-left navbar menu, select `Dashboards`. When you increase the number of Dashboards you have, you can use the fitler boxes at the top of the Dashboards table to filter and sort through them all. You can also import Dashboards others have made and connect your own Charts.
2. Click `+ Dashboard` in the top-right corner.

<img src="./images/exercise3/dashboardfilters.png" alt="Drawing" style="width: 90%;"/>

3. Name your Dashboard `Retail Dashboard` with the field in the top-left corner (replace the default `[untitled dashboard]`).

In the right-hand menu, under the `Charts` tab, is where you will see a complete list of all of your Charts. You can filter your charts by type, keyword and schema and sort them a number of ways. 

In the `Layout Elements` tab, you can add design elements such as headers, rows and columns, to label and present your Charts in whichever presentation you feel is best. 

4. Click on the `Layout Elements` tab. **Drag** a `Row` into the empty Dashboard field. 
5. Drag a `Header` above the Row and write *Fruit Sales*. 
6. Drag a `Divider` below the Row.  
7. Now, click on the `Charts` layout tab. Drag the `Fruit Sales YoY` Chart into the Row. 
8. Click on the **right edge** of the `Fruit Sales YoY` Chart. You will see several blue columns appear across the Dashboard. These are equal dividers to assist with spacing between Charts. 
9. Drag the Chart to the furthest right column. It should snap into place.
10. Click `Save` in the top-right corner. 

Your Dashboard should resemble the following: 

<img src="./images/exercise3/dashboard.png" alt="Drawing" style="width: 90%;"/>

11. Click on the `Dashboards` option in the top-left navbar menu. 
12. Confirm your newly-created Dashboard is visible in the Dashboards list. 

And with that, you've created your first Chart and Dashboard in Apache Superset! 

## **5. Self-Learning Challenge: Add more Charts to your Retail Dashboard!**

Leveraging the skills you have just acquired, let's explore the art of data analysis with Superset! 

**Create three of the following Charts from the following list below**.  

Add each to your Retail Dashboard, styled as you wish!

**Sales Performance**

* **Bar Chart:** Visualize total sales by product category or country. This can reveal which categories or regions are driving the most sales.
* **Line Chart:** Create a line graph showing the quantity of hammers sold between 2020 and 2023. Identify supplier bottlenecks or growth trends.

**Product Analysis**

* **Pie Chart:** Show the proportion of total sales for each country (Germany, Switzerland and the Czech Republic).
* **Scatter Plot:** Explore the relationship between product price and total sales. This might reveal pricing trends or identify potential outliers.

**Customer Insights**

* **Histogram:** Analyze the country distribution of customer order values. This can help understand typical order sizes and identify high-spending customers.
* **GeoJSON Map:** Plot the total sales of each country on a map. 

**Comparative Analysis**

* **Dual Line Chart:** Compare sales trends between different product types. This can highlight performance differences between product categories.
* **Stacked Bar Chart:** Visualize total sales by product type and further break down each category by another variable, such as country or stores in a country.

**Additional Tips:**

* Consider using filters to focus on specific product categories, regions, or timeframes.
* Leverage Sorting and Series Limit options to enhance readability and focus on crucial insights.
* Experiment with different chart types to find the best way to represent your data.

By using Apache Superset's features and exploring various chart options, you can create compelling data visualizations that reveal valuable insights from your retail schema.


# **Conclusion**

In this exercise, you entered the world of data analytics - working with data to tell a visual story. You learned how to navigate Apache Superset, create and manage Charts, and make custom Dashboards fit for a C-suite presentation.

In the next exercise, we will take a slightly different route and delve into using machine learning to create a produce detection model that will form the basis of our final end product - a cashierless, barcodeless checkout experience! 