Distributed Computing with Spark SQL

This course is provided by University of California Davis on coursera, which provides a comprehensive overview of distributed computing using Spark.

The four modules build on one another and by the end of the course are:

Spark architecture:
Spark DataFrame
Optimizing reading/writing data
How to build a machine learning model.

By understanding when to use Spark, either scaling out when the model or data is too large to process on a single machine, or having a need to simply speed up to get faster results, students like me will hone their SQL skills and become a more adept Data Scientist.

This repository includes the following things:

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
assignments		assignments
quizes		quizes
Distributed-Computing-with-Spark-SQL-1.2.3.dbc		Distributed-Computing-with-Spark-SQL-1.2.3.dbc
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Distributed Computing with Spark SQL

1.Assignments

2.Quizes

3.Notebooks

About

Linlin-Li-1/Distributed-Computing-with-Spark-SQL

Folders and files

Latest commit

History

Repository files navigation

Distributed Computing with Spark SQL

1.Assignments

2.Quizes

3.Notebooks

About

Resources

Stars

Watchers

Forks