EndtoEndBigData

Getting started with Big Data can be daunting if you haven't utilized the tooling. In this session we'll start with a large data set and load it into Spark/Databricks using standard Big Data ingestion techniques. We'll build a basic ML model using Jupyter notebooks against Spark to show you data analytics. We'll finish with Power BI visuals against the data. The goal is to show the end-to-end Data Science Process to discover and enrich and model data.

Requirements:

Azure subscription

Goals

develop proficiency with Spark
build ML models
Introduction to Deep Learning
understand the data exploration (profiling) process
this is a short workshop so we can't cover everything, but we want you to come away with your appetite whetted to learn more

Audience

anyone interested in learning about data science using a hands-on approach
some experience with notebooks helps
some python experience helps
some experience with Azure

What we won't cover

In a one day workshop we can't cover everything. Please keep that in mind.

Workshop Contents

Provision Databricks Exercise.
- Data files can be found at /data
- Lab files can be found at /Labs
- consider downloading Power BI if you don't have it
Understand the basics of databricks
- use the Basics.dbc file run through the notebooks after importing
Advanced Data Exploration
Build a Supervised Learning Model
Advanced Supervised Learning Models
Unsupervised Learning: Recommenders and Clustering
MMLSpark : Microsoft ML tools for Spark

Supplementary Information

see Open Hack slides for another deck
distributed learning notebook

Notes

Due to time I removed administration.dbc and replaced it with Basics.dbc. EH streaming was mostly removed but can be added back. If needed, restore it back.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Labs		Labs
OpenHack-DL		OpenHack-DL
data		data
01_distributed_dl.ipynb		01_distributed_dl.ipynb
Basics.dbc		Basics.dbc
README.md		README.md
advanced.md		advanced.md
provision.md		provision.md
slides.pptx		slides.pptx
supervised.md		supervised.md
supervised_advanced.md		supervised_advanced.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EndtoEndBigData

Goals

Audience

What we won't cover

Workshop Contents

Supplementary Information

Notes

About

Releases

Packages

Languages

davew-msft/E2EBigData

Folders and files

Latest commit

History

Repository files navigation

EndtoEndBigData

Goals

Audience

What we won't cover

Workshop Contents

Supplementary Information

Notes

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages