Skip to content

DatacollectorVN/PySpark-Tutorial

Repository files navigation

PySpark-Tutorial

My self-learning about PySpark

Run PySpark with Miniconda environment

After you setup Apache-Spark, following our tutorial in setup.

1. Create Miniconda enviroment and install PySpark

conda create -n pyspark python=3.8 -y
conda activate pyspark
pip install -r requirements.txt

2. Getting start Pyspark

cd 1.getting-start
python 1.initalize_spark.py

If you run if successfully, mean your setup is success.

3. Lecture about PySpark

Read my own documnet, it clundes some parts:

    1. Introduction to Big Data.
    1. Common terminologies in Big Data.
    1. Apache Hadoop.
    1. Apache Spark.
    1. Compare Apache Spark and Hadoop.
    1. Spark Streaming.

About

My self-learning about PySpark

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published