Skip to content

duyguHsnHsn/vmware-explore

 
 

Repository files navigation

Table of Contents

Purpose

The purpose of this scenario is to demonstrate how to operationalize Jupyter notebooks using the Versatile Data Kit (VDK) Jupyter integration. By the end of this guide, you'll understand how to:

  • Create a data job with VDK within a Jupyter notebook.
  • Write a data workflow in a notebook and make it ready to be put in a production environment.

Background

Objective:

All the following objectives will be executed within a Jupyter notebook:

  1. Retrieve Data: - Extract data from the specified URL using pandas.
  2. Data Cleansing: - Eliminate records associated with 'testuser'.
  3. Score Classification: - Assign scores into predefined categories for clarity.
  4. Data Ingestion: - Use VDK job_input to ingest the organized data.

Versatile Data Kit (VDK)

For detailed instructions on working with VDK, please refer to the guide from the provided link.

Exercises

The tutorial-job directory contains the ready-to-use code from this demo. Make sure to explore it as it will provide hands-on experience with the objectives and VDK Jupyter integration discussed in this guide. Please open up MyBinder to get started on the exercises!

Binder

The link did not work? Try this one out: Binder

Lessons Learned

Throughout this scenario, you've:

  • Explored the capabilities of the VDK Jupyter integration.
  • Retrieved, cleaned, and processed data using Jupyter and VDK tools.
  • Classified scores into meaningful categories.
  • Understood the process of ingesting data through VDK within a Jupyter environment.

Congratulations!

> Go back to the main page of the Tutorial.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 97.8%
  • Python 1.5%
  • Shell 0.7%