Skip to content

Latest commit

 

History

History
33 lines (27 loc) · 885 Bytes

README.md

File metadata and controls

33 lines (27 loc) · 885 Bytes

Pyspark Example

Pyspark examples on how to load data from different format into Spark Dataframes.

Installation

Python 3.x should be available on OS. Create virtual environment in $HOME dir ($HOME/venv3x)
Ensure JAVA_HOME is setup in environment

Setup

Setup

$:~/pyspark_example$ source ~/venv3x/bin/activate;
$:~/pyspark_example$ pip install -r requirements.txt

Add src folder to PYTHONPATH

$:~/pyspark_example$ export PYTHONPATH=$PYTHONPATH:$PWD/src

Run a module

$:~/pyspark_example$python csv_2_dataframe.py

Meta

https://github.com/afzals2000/pyspark_example

Contributing

  1. Fork it (https://github.com/afzals2000/pyspark_example)
  2. Create your feature branch (git checkout -b feature/fooBar)
  3. Commit your changes (git commit -am 'Add some fooBar')
  4. Push to the branch (git push origin feature/fooBar)
  5. Create a new Pull Request