Skip to content

Distributed Computing with Spark SQL by University of California Davis on Coursera

Notifications You must be signed in to change notification settings

Linlin-Li-1/Distributed-Computing-with-Spark-SQL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Distributed Computing with Spark SQL

This course is provided by University of California Davis on coursera, which provides a comprehensive overview of distributed computing using Spark.

The four modules build on one another and by the end of the course are:

  • Spark architecture:
  • Spark DataFrame
  • Optimizing reading/writing data
  • How to build a machine learning model.

By understanding when to use Spark, either scaling out when the model or data is too large to process on a single machine, or having a need to simply speed up to get faster results, students like me will hone their SQL skills and become a more adept Data Scientist.

This repository includes the following things:

About

Distributed Computing with Spark SQL by University of California Davis on Coursera

Resources

Stars

Watchers

Forks