Streaming

A sample Stream processing pipeline using pyspark, which reads csv files that are being generated and transformations are applied on the fly before storing them in parquet file.

Environment That is recommended

GitHub
Gitpod
VisualstudioCode

Files and their functionality

Prod-env1.py --> This is part of the streaming pipeline does flatten job only. It transforms the unstructured JSON format to a tabular form and save the data as csv file.
Prod-env2.py --> This job aggregates the values of the columns every 5 seconds by taking the sum of the columns from the files that arrived in the last 5 seconds in delta mode. This has been altered to 5 seconds instead of a larger wait time like 5 minutes. With 5 minutes, spark job is writing empty information into parquet files as stream is maintaining intermediate state and have to wait for 5 mins to view the data.
main.py --> this is to generate source files on a continuous basis. Press ctrl+c to cancel the execution

Requirements

Have pyspark library installed on the environment using pip

pip install pyspark

execution instructions

Have 3 seperate Terminals for executing 3 python files.

Commands Terminal 1 :

CD POC
CD SourceFiles
python main.py

Commands Terminal 2 :

CD POC
python Prod-env1.py

Commands Terminal 3 :

CD POC
python Prod-env2.py

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
Dimensional_Modelling_Ecommerce		Dimensional_Modelling_Ecommerce
POC		POC
.gitignore		.gitignore
.gitpod.yml		.gitpod.yml
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Streaming

Environment That is recommended

Files and their functionality

Requirements

execution instructions

About

Releases

Packages

Languages

BSRV1987/Streaming

Folders and files

Latest commit

History

Repository files navigation

Streaming

Environment That is recommended

Files and their functionality

Requirements

execution instructions

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages