Pyspark-config

Pyspark-Config is a Python module for data processing in Pyspark by means of a configuration file, granting access to build distributed data piplines with configurable inputs, transformations and outputs.

Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.

Installation

To install the current release (Ubuntu and Windows):

$ pip install pyspark_config

Dependencies

Python (>= 3.6)
Pyspark (>= 2.4.5)
PyYaml (>= 5.3.1)
Dataclasses (>= 0.0.0)

Example

Given the yaml configuration file '../example.yaml':

input:
  sources:
    - type: 'Parquet'
      label: 'parquet'
      parquet_path: '../table.parquet'

transformations:
  - type: "Select"
    cols: ['A', 'B']
  - type: "Concatenate"
    cols: ['A', 'B']
    name: 'Concatenation_AB'
    delimiter: "-"

output:
  - type: 'Parquet'
    name: "example"
    path: "../outputs"

With the input source saved in '../table.parquet', the following code can then be applied:

from pyspark_config import Config

from pyspark_config.transformations.transformations import *
from pyspark_config.output import *
from pyspark_config.input import *

config_path="../example.yaml"
configuration=Config()
configuration.load(config_path)

configuration.apply()

The output will then be saved in '../outputs/example.parquet'.

Changelog

See the changelog for a history of notable changes to pyspark-config.

License

This project is distributed under the 3-Clause BSD license. - see the LICENSE.md file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 87 Commits
docs		docs
jars		jars
pyspark_config		pyspark_config
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
_config.yml		_config.yml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pyspark-config

Getting Started

Installation

Dependencies

Example

Changelog

License

About

Releases

Packages

Languages

License

Patrizio1301/pyspark-config

Folders and files

Latest commit

History

Repository files navigation

Pyspark-config

Getting Started

Installation

Dependencies

Example

Changelog

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages