Skip to content

davew-msft/E2EBigData

Repository files navigation

EndtoEndBigData

Getting started with Big Data can be daunting if you haven't utilized the tooling. In this session we'll start with a large data set and load it into Spark/Databricks using standard Big Data ingestion techniques. We'll build a basic ML model using Jupyter notebooks against Spark to show you data analytics. We'll finish with Power BI visuals against the data. The goal is to show the end-to-end Data Science Process to discover and enrich and model data.

Requirements:

  • Azure subscription

Goals

  • develop proficiency with Spark
  • build ML models
  • Introduction to Deep Learning
  • understand the data exploration (profiling) process
  • this is a short workshop so we can't cover everything, but we want you to come away with your appetite whetted to learn more

Audience

  • anyone interested in learning about data science using a hands-on approach
  • some experience with notebooks helps
  • some python experience helps
  • some experience with Azure

What we won't cover

In a one day workshop we can't cover everything. Please keep that in mind.

Workshop Contents

  1. Provision Databricks Exercise.

    • Data files can be found at /data
    • Lab files can be found at /Labs
    • consider downloading Power BI if you don't have it
  2. Understand the basics of databricks

    • use the Basics.dbc file run through the notebooks after importing
  3. Advanced Data Exploration

  4. Build a Supervised Learning Model

  5. Advanced Supervised Learning Models

  6. Unsupervised Learning: Recommenders and Clustering

  7. MMLSpark : Microsoft ML tools for Spark

Supplementary Information

Notes

Due to time I removed administration.dbc and replaced it with Basics.dbc. EH streaming was mostly removed but can be added back. If needed, restore it back.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages