Skip to content

Latest commit

 

History

History
288 lines (224 loc) · 11.5 KB

workshop_dataviz.adoc

File metadata and controls

288 lines (224 loc) · 11.5 KB

Creating Dashboards with Cloudera Data Viz

In this workshop you will create a simple interactive real-time dashboard to visualize sensor data that is being stored in Kudu.

The data you will use is the sensor data collected and processed in previous workshops (see Preparation below).

Preparation

This workshop builds upon the content developed in the Edge and Nifi workshops.

To clean your environment and make it ready for the beginning of this lab, please SSH to your cluster host and run the following command:

Note
The command below will undo everything done in the cluster in previous workshops.
/tmp/resources/reset-to-lab.sh dataviz 1

Labs summary

  • Lab 1 - Navigate to Cloudera Data Visualization

  • Lab 2 - Creating a new connection

  • Lab 3 - Exploring the data

  • Lab 4 - Creating a dashboard

  • Lab 5 - Adding a chart

Lab 1 - Navigate to Cloudera Data Visualization

This lab shows you how to navigate to Cloudera Data Visualization (DataViz) page.

If you are in a guided workshop you may already been given the link to the DataViz page. If that’s the case, feel free to skip to the next lab.

  1. Open CDP Data Visualization and log in

    CDP Data Visualization can be accessed through the Cloudera Data Science Workbench (CDSW). Follow the navigation steps below if you don’t know how to get there:

    1. In Cloudera Manager, click on Clusters > Cloudera Data Science Workbench.

    2. On the CDSW page, click on the CDSW Web UI link.

    3. Log on to CDSW.

    4. On the CDSW page, click on Applications and then on the "Viz Server Application", which has been previously set up for the workshop.

      opening dataviz
    5. Log in to the Cloudera Data Visualization application. After logging in you should see the application home page:

      dataviz home page

Lab 2 - Creating a new connection

Kudu is purely a storage engine and does not provide a SQL interface for querying. SQL access to Kudu is done through an Impala engine, which is what you will use in this workshop. You will set up a new connection to the Impala engine to use for your dashboard queries.

  1. Select the Data tab and click on NEW CONNECTION.

    new data connection
  2. At the top of the form, set the following properties:

    Connection type: Impala
    Connection name: Local Impala
  3. In the Basic tab set the following:

    Hostname: <CLUSTER_HOSTNAME> (something like: cdp.x.x.x.x.nip.io)
    Port #:   21050
    Username: [leave blank]
    Password: [leave blank]
    new connection basic
  4. In the Advanced tab set the following:

    Connection mode:     Binary
    Socket type:         Normal
    Authentication mode: NoSasl
  5. Click on TEST to test the connection.

    You should see "Connection Verified", as shown below.

    new connection advanced
  6. Click on CONNECT.

Lab 3 - Exploring the data

Cloudera Data Visualization provides a Data Explorer tool that enables you to explore, transform and create views of the data to suit your needs. In this lab you will look at the data available in Kudu and prepare it for your dashboard.

  1. Select the newly created Local Impala connection, which you can see on the left-hand pane.

  2. Select the Connection Explorer tab, then the default database and finally the sensors table. A preview with sample data will be loaded.

    connection explorer table

    You can see in the data sample that the sensor_ts column contains the timestamp in microseconds. For your dashboard you need to convert these values into seconds instead. In the next steps you will create a new dataset and make the necessary data adjustments.

  3. Click on the New dataset option besides the sensor table. Name the dataset "sensor data"

    add dataset

    A new dataset will be created and displayed under the Datasets tab:

    new dataset
  4. Click on the dataset to open it and select the Fields tab. You will notice that DataViz didn’t automatically detect any dimension for the dataset.

    Since the sensor_ts column is of a numeric type, and not a date/time, which is indicated by the # icon besides the field name, it was classified as a measure rather than a dimension. You will fix in the next steps.

    dataset fields
  5. You need to convert the numeric fields from microseconds to seconds and convert it to a TIMESTAMP data type. In order to do this, click on the EDIT FIELDS button.

    edit dataset
  6. In the Measures list, find the sensor_ts measure, open its drop-down menu and click on Clone. A new measure Copy of sensor_ts will appear.

    clone field
  7. Open the drop-down menu for this new measure, and select Edit field.

    edit measure
  8. In the Edit Field Parameters window, change the following:

    1. In the Basic Settings tab:

      Display Name: sensor_timestamp
      Category:     Dimension
    2. In the Expression tab, enter the following expression:

      microseconds_add(to_timestamp(cast([sensor_ts]/1000000 as bigint)), [sensor_ts] % 1000000)
    3. Validate the expression by clicking on VALIDATE EXPRESSION.

    4. Click APPLY to save the changes

      add expression
  9. You will notice that the category (Dim), data type (calendar icon) and field name were updated. The field still shows up in the Measure category, though.

    updated field category

    This is just refresh issue. Click on the REFRESH button at the top and you should see the sensor_timestamp field "jump" to the Dimensions category.

    refreshed fields
  10. The sensor_id field is also a dimension and needs to be moved to the correct category.

    To do this, find the sensor_id field under the Measures category and click on the mes icon icon to toggle it to dim icon. Click on the REFRESH button again and you should see the following structure for your dataset:

    updated dataset
  11. Save you changes by clicking the green Save button.

You have just created a dataset to feed your dashboard and performed the necesssary adjustments for your data source. In the next lab you will create the dashboard from it.

Lab 4 - Creating a dashboard

You have everything ready now to start building your dashboard. Let’s jump straight into it:

  1. On your dataset page, click on the NEW DASHBOARD button.

    new dasboard
  2. Since we initiated the dashboard creation from the dataset page, will you notice that the dashboard is already created by default with a "table visual" displaying all fields of the dataset.

    create dashboard
  3. Click on the table visual to ensure it is selected (you see a blue border around the visual when it is selected). With the table visual selected, click on the Build tab on the right.

  4. Click on the Measures input box to select it. Then click on the fields sensor_0 and sensor_1 from the Measures list. These fields will be added to the Measures input box.

    add measures
  5. The measures are added, by default, with the sum() aggregation. Change it to avg() by selecting each one of the newly added measures and selecting Aggregates > Average. Ensure this is done for both measures.

    change measure aggregation type
  6. Click on the Dimensions input box to select it. Then click on the fields sensor_timestamp and sensor_id from the Dimensions list. These fields will be added to the Dimensions input box.

  7. Highlight sensor_timestamp field in the Measures input box and select Order and Top K > Descending. This will show the values in the table visual in descending order with the newest sensor readings on top.

    dashboard add dimensions
  8. Click on Refresh visual to update the visual with the latest changes.

  9. Finally, select the Settings tab on the right of the screen and change the value for Auto-refresh period (sec) to 5.

    dashboard auto refresh
  10. Click on the Save button at the top of the dashboard to save the changes and click View to enter view/publish mode. This is what your dashboard consumers will see: the sensor reading coming in through the streaming pipeline, displayed in a real-time dashboard, updating automatically.

    dashboard view

Lab 5 - Adding a chart

Dashboards are usually synonym with graphs and charts. Cloudera Data Visualization comes with a myriad of charts types to help visualize your data. In this lab you’ll add a simple bar chart to your dashboard to make it more interesting.

  1. On the view mode dashboard above, click on the EDIT button to go back into editing mode.

  2. Click on the Visuals tab on the right. Ensure the Local Impala connection and the sensor data dataset are selected and click on the NEW VISUAL button.

    add visuals
  3. On the Visuals tab, select the Scatter visual type:

    explore visuals icon
  4. Based on what you learned in the previous lab, enter the following properties:

    X Axis:  sensor_id
    Y Axis:  avg(sensor_0)
    Colors:  sensor_id
    Size:    avg(sensor_0)
    Filters: sensor_timestamp
  5. Click on the sensor_timestamp filter to select it and then click on [] Enter/Edit Expression.

    edit filter expresion
  6. Enter the following expression in the Enter/Edit Expression window to limit the data shown in the chart to the last minute of data received. This will create a chart over a rolling window of 1 minute.

    [sensor_timestamp] > seconds_sub(now(), 60)
  7. Validate the expression and click Save.

  8. Click on VISUAL > Style on the right-hand tab, and select a colorful palette in the Colors section.

    visual style
  9. Click on VISUAL > Settings on the right-hand tab, and set the Y Axis Scale to log10 in the Axes section.

    visual axes settings
  10. Expand the Marks section and set the Legend style to None.

    visual legend style
  11. Click on the layout button button, at the top of the Dashboard Designer to arrange the visuals in your dashboard. Drag the two visuals in the diagram to position them as you would like. Once you are done, click on APPLY LAYOUT.

    layout
  12. Click on the Save buttons to save the changes to your dashboard and then click on View to switch to the view mode and check your real-time dashboard in action:

    real time dashboard