Skip to content

afzals2000/pyspark_example

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Pyspark Example

Pyspark examples on how to load data from different format into Spark Dataframes.

Installation

Python 3.x should be available on OS. Create virtual environment in $HOME dir ($HOME/venv3x)
Ensure JAVA_HOME is setup in environment

Setup

Setup

$:~/pyspark_example$ source ~/venv3x/bin/activate;
$:~/pyspark_example$ pip install -r requirements.txt

Add src folder to PYTHONPATH

$:~/pyspark_example$ export PYTHONPATH=$PYTHONPATH:$PWD/src

Run a module

$:~/pyspark_example$python csv_2_dataframe.py

Meta

https://github.com/afzals2000/pyspark_example

Contributing

  1. Fork it (https://github.com/afzals2000/pyspark_example)
  2. Create your feature branch (git checkout -b feature/fooBar)
  3. Commit your changes (git commit -am 'Add some fooBar')
  4. Push to the branch (git push origin feature/fooBar)
  5. Create a new Pull Request

About

Pyspark Example

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages