Skip to content

a big data lab about large-scale data processing using the PySpark framework

Notifications You must be signed in to change notification settings

chouaibMo/big-data-pySpark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Big Data : Introduction to Spark

Requirements

- python
- PySpark 
- Docker

Using Spark on your machine :

Assuming that you have installed spark on your machine, you can run it by simply executing in a terminal:

$ python3 ./navigation.py

Using Spark installed with Docker :

{absolute_path_to_folder} should be replaced by the actual path to the directory where your Python scripts are stored.

When you launch this command, logs are written into the terminal. The last displayed line is an url that you should copy into a web browser.
This url allows you to connect to a Jupyter Notebook that gives access to Spark.

docker run -v {absolute_path_to_folder}:/home/jovyan/work -it \
       --rm -p 8888:8888 -p 4040:4040 jupyter/pyspark-notebook

About

a big data lab about large-scale data processing using the PySpark framework

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published