Navigation Menu

Skip to content

IBM/db2-event-store-clickstream

 
 

Repository files navigation

Build Status

Clickstream Analysis with IBM Db2 Event Store

IBM Db2 Event Store offers high-speed ingestion and real-time analytics for large volumes of streaming data. The platform enables event-driven applications to persist event data at scale and powers high performance Spark analytics on all data for quick insights. In this code pattern, we will see how a retail business uses IBM Db2 Event Store to capture and analyze clickstream data from its web channels. The clickstream analysis helps the business to closely track customer browsing patterns and better understand their changing interests. Acting on these insights, the business offers a personalized experience for every customer with targeted offers to drive sales.

Sample notebooks demonstrate the use case of clickstream analysis with IBM Db2 Event Store using Scala APIs to ingest and analyze web event data. Credit goes to Siva Anne of the IBM Data Science Elite Team for the original Jupyter Notebooks.

When the reader has completed this code pattern, they will understand how to:

  • Install IBM Db2 Event Store developer edition
  • Ingest data into Event Store using Scala in a Jupyter Notebook
  • Query the Event Store using Scala and Spark SQL in a Jupyter Notebook
  • Use Brunel to visualize the data with interactive charts

architecture

Flow

  1. Add a CSV file as a data asset
  2. Run a Jupyter Notebook using Scala to ingest data from the CSV file into Event Store
  3. Run a Jupyter Notebook using Scala and the Brunel visualization language to analyze the data from Event Store

Included components

  • IBM Db2 Event Store: In-memory database optimized for event-driven data processing and analysis.
  • Jupyter Notebook: An open source web application that allows you to create and share documents that contain live code, equations, visualizations, and explanatory text.
  • Scala: Scala combines object-oriented and functional programming in one concise, high-level language.
  • Brunel: Brunel defines a highly succinct and novel language that defines interactive data visualizations based on tabular data.

Featured technologies

  • Databases: Repository for storing and managing collections of data.
  • Analytics: Analytics delivers the value of data for the enterprise.
  • Data Science: Systems and scientific methods to analyze structured and unstructured data in order to extract knowledge and insights.

Watch the Video

video

Steps

Run locally

  1. Install IBM Db2 Event Store Developer Edition
  2. Clone the repo
  3. Add the CSV file as a data asset
  4. Import and run the Jupyter Notebook to ingest data
  5. Import and run the Jupyter Notebook to analyze the data
  6. See the results

1. Install IBM Db2 Event Store Developer Edition

Install IBM® Db2® Event Store Developer Edition on Mac, Linux, or Windows by following the instructions here.

Note: This code pattern was developed with Event Store Developer Edition 1.1.4

2. Clone the repo

Clone the db2-event-store-clickstream locally. In a terminal, run:

git clone https://github.com/IBM/db2-event-store-clickstream

3. Add the CSV file as a data asset

Use the Db2 Event Store UI to add the CSV input file as a data asset.

  1. From the upper-left corner drop down menu, select My Notebooks.

    go_to_my_notebooks

  2. Scroll down and click on add data assets.

    add_to_my_notebooks

  3. Click browse and navigate to the data directory in your cloned repo. Open the file clickstream_data.csv.

    data_assets

4. Import and run the Jupyter Notebook to ingest data

Use the Db2 Event Store UI to create, edit, and run the notebook.

  1. From the upper-left corner drop down menu, select My Notebooks.

  2. Click on add notebooks.

  3. Select the From File tab.

  4. Provide a name.

  5. Click Choose File and navigate to the notebooks directory in your cloned repo. Open the file ingest_clickstream_events.ipynb.

  6. Scroll down and click on Create Notebook.

    create_notebook

  7. Edit the HOST constant in the first code cell. You will need to enter your host's IP address in place of the XXX.XXX.XXX.XXX value.

  8. Run the notebook using the menu Cell > Run all or run the cells individually with the play button.

This notebook demonstrates how to:

  • Connect to Event Store
  • Create a database
  • Drop a database
  • Create a table
  • Load data from a CSV file or a DataFrame

5. Import and run the Jupyter Notebook to analyze the data

Use the Db2 Event Store UI to create, edit, and run the notebook.

  1. Follow the same steps as above, but select the file analyze_clickstream_events.ipynb from your repo's notebooks directory.

  2. Edit the HOST constant in the first code cell. You will need to enter your host's IP address in place of the XXX.XXX.XXX.XXX value.

  3. Run the notebook using the menu Cell > Run all or run the cells individually with the play button.

This notebook demonstrates how to:

  • Connect to Event Store
  • Query Event Store using Spark SQL
  • Prepare and aggregate data for analysis
  • Use Brunel to create interactive charts

6. See the results

  • Code cells that prepare DataFrames with calculated and aggregated fields include show() output to give you a peek at the data as it is being processed.

    show

  • The first Brunel charts use aggregated web metris for product lines. Here we show 4 charts to help you compare page views with time spent on web pages.

    • The bar charts use the same order and color for product lines (sorted by page hits). The charts are placed with one directly below the other so that your eyes will easily spot where they differ.

      • The charts show that smart phones web pages are the most popular in both page views and time spent on pages.

      • videogames stands out as a product line with significantly higher total time relative to its page hits.

    • Notice the tooltips when you hover over the bars.

    • Click on the videogames bar.

      • The charts are wired so that when you select a bar, it will highlight that product line's area in the treemap charts. The treemap charts, on the right side, show another way to visualize the relative stats of the product lines. The top one is weighted by page views. The bottom one is weighted by time spent on web pages.

    product_lines

  • The next Brunel charts show aggregated web metris for products in the smart phones product line. Here we show 4 charts similar to those described above.

    • These charts show that the A-phone is the leading smart phone product in terms of both page hits and time spent on a page.

    • Notice that the X-phone stands out as the phone with higher time spent on web pages per page view.

    smart_phones

  • Next we look at specific features of the A-phone.

    • Here we use a bar chart to show page views by feature and a pie chart to show time spent on pages.

      • Clicking on a bar will highlight the same feature in the pie chart.

      • The tool tips show additional information when hovering over bars or pie slices.

      • color was the most important feature for both page views and time spent on web pages.

    features

  • Finally, after more data manipulation, we look into web metrics for a specific user.

    • This view could be used by a support agent or a targeted offering campaign to analyze a user's current interests.

      • This user has shown significant interest in smart phones.
      • This user has also visited web pages for headphones and computers.
    • A legend is displayed on the right. Color is by product line.

    • The bar chart shows the user's page views over the past seven days. A stacked bar is used to show each product line viewed.

    • Clicking on a bar will highlight the pie chart slices for that day and that product line.

      • The pie slices are also divided by day and product.
      • The pie chart tool-tip shows how you can use HTML tags for formatting in a tool-tip.

    user

Sample output

See the notebook with example output and interactive charts here.

Links

Learn more

  • Data Analytics Code Patterns: Enjoyed this code pattern? Check out our other data analytics code patterns
  • AI and Data Code Pattern Playlist: Bookmark our playlist with all of our code pattern videos
  • IBM Watson Studio: Master the art of data science with IBM's Watson Studio

License

This code pattern is licensed under the Apache License, Version 2. Separate third-party code objects invoked within this code pattern are licensed by their respective providers pursuant to their own separate licenses. Contributions are subject to the Developer Certificate of Origin, Version 1.1 and the Apache License, Version 2.

Apache License FAQ

About

Sample notebooks demonstrate a use case of clickstream analysis with IBM Db2 Event Store using Scala APIs to ingest and analyze web event data.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%