My self-learning about PySpark
After you setup Apache-Spark, following our tutorial in setup
.
conda create -n pyspark python=3.8 -y
conda activate pyspark
pip install -r requirements.txt
cd 1.getting-start
python 1.initalize_spark.py
If you run if successfully, mean your setup is success.
Read my own documnet, it clundes some parts:
-
- Introduction to Big Data.
-
- Common terminologies in Big Data.
-
- Apache Hadoop.
-
- Apache Spark.
-
- Compare Apache Spark and Hadoop.
-
- Spark Streaming.