Getting started with Big Data can be daunting if you haven't utilized the tooling. In this session we'll start with a large data set and load it into Spark/Databricks using standard Big Data ingestion techniques. We'll build a basic ML model using Jupyter notebooks against Spark to show you data analytics. We'll finish with Power BI visuals against the data. The goal is to show the end-to-end Data Science Process to discover and enrich and model data.
Requirements:
- Azure subscription
- develop proficiency with Spark
- build ML models
- Introduction to Deep Learning
- understand the data exploration (profiling) process
- this is a short workshop so we can't cover everything, but we want you to come away with your appetite whetted to learn more
- anyone interested in learning about data science using a hands-on approach
- some experience with notebooks helps
- some python experience helps
- some experience with Azure
In a one day workshop we can't cover everything. Please keep that in mind.
-
Provision Databricks Exercise.
- Data files can be found at
/data
- Lab files can be found at
/Labs
- consider downloading Power BI if you don't have it
- Data files can be found at
-
Understand the basics of databricks
- use the
Basics.dbc
file run through the notebooks after importing
- use the
-
Unsupervised Learning: Recommenders and Clustering
-
MMLSpark : Microsoft ML tools for Spark
- see Open Hack slides for another deck
- distributed learning notebook
Due to time I removed administration.dbc and replaced it with Basics.dbc. EH streaming was mostly removed but can be added back. If needed, restore it back.