GitHub - RammySekham/spark-kb: Spark standalone architecture, local architecture and reading hadoop file formats i.e. avro, parquet and ORC

Introduction

The repository contains code snippets and explanation of working with different hadoop file formats i.e. ORC, Parquet and Avro, Spark local and standalone architecture

File Structure

Local_Mode_Architecture: Describes the local mode architecture, different ways to use local mode and memory allocation

  Tip: To try different local mode settings, changes should be made in conf file(located in spark folder), before running file.

File_formats: Describes the avaliable options in spark to read different file formats and evaluation of each option and file format

  Tip: There are few external packages for data formats, avaliable as jar files in Maven repository.Should be downloaded and saved in jars folder(located in spark folder), before running file.

Standalone_Mode_Architecture: Describes setting up spark cluster in standalone mode in single machine, setting up environment and memory allocation

  Tip: Configuration should be set up in spark-env.cmd(Windows) and Masters-slaves are initiated by powershell commands(powershell_command.ps1) before running file.

Tools and Environment

spark-3.0.1-bin-hadoop2.7, pyspark library, Jupyter Notebook, Windows 10

How to use the project

To learn some tool, it is always better to do hands on. Through this project, I have explored above mentioned three areas of Spark, learnt through the way by challenging myself to understand and question all related 'whys'. I have gathered information from multiple sources mainly from stackoverflow and Apache documentation to clarify doubts. There may be gaps in information which I may not be able to capture due to my prior familiarity with the topics itself. But this serves as a good supplement/ revision guide for Spark learners and users.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
Data		Data
.gitignore		.gitignore
LICENSE		LICENSE
Local_Mode_Architecture.ipynb		Local_Mode_Architecture.ipynb
README.md		README.md
Standalone_Mode_Architecture.ipynb		Standalone_Mode_Architecture.ipynb
file_formats.ipynb		file_formats.ipynb
powershell_commands.ps1		powershell_commands.ps1
spark-env.cmd		spark-env.cmd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

The repository contains code snippets and explanation of working with different hadoop file formats i.e. ORC, Parquet and Avro, Spark local and standalone architecture

File Structure

Local_Mode_Architecture: Describes the local mode architecture, different ways to use local mode and memory allocation

File_formats: Describes the avaliable options in spark to read different file formats and evaluation of each option and file format

Standalone_Mode_Architecture: Describes setting up spark cluster in standalone mode in single machine, setting up environment and memory allocation

Tools and Environment

spark-3.0.1-bin-hadoop2.7, pyspark library, Jupyter Notebook, Windows 10

How to use the project

About

Releases

Packages

Languages

License

RammySekham/spark-kb

Folders and files

Latest commit

History

Repository files navigation

Introduction

The repository contains code snippets and explanation of working with different hadoop file formats i.e. ORC, Parquet and Avro, Spark local and standalone architecture

File Structure

Local_Mode_Architecture: Describes the local mode architecture, different ways to use local mode and memory allocation

File_formats: Describes the avaliable options in spark to read different file formats and evaluation of each option and file format

Standalone_Mode_Architecture: Describes setting up spark cluster in standalone mode in single machine, setting up environment and memory allocation

Tools and Environment

spark-3.0.1-bin-hadoop2.7, pyspark library, Jupyter Notebook, Windows 10

How to use the project

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages