Skip to content

anishchapagain/PySpark

Repository files navigation

PySpark

PySpark - Big Data Analysis

PySpark is an interface for Apache Spark in Python.

It not only allows you to write Spark applications using Python APIs, but also provides the PySpark shell for interactively analyzing your data in a distributed environment.

PySpark library to apply SQL-like analysis on a huge amount of structured or semi-structured data.

We can also use SQL queries with PySparkSQL. It can also be connected to Apache Hive. HiveQL can be also be applied. PySparkSQL is a wrapper over the PySpark core.

PySparkSQL introduced the DataFrame, a tabular representation of structured data that is similar to that of a table from a relational database management system.

PySpark SQL is a module in Spark which integrates relational processing with Spark's functional programming API. We can extract the data by using an SQL query language. We can use the queries same as the SQL language.


Apache Spark: https://spark.apache.org/

PySpark 3.2.1 documentation: https://spark.apache.org/docs/latest/api/python/

Apache Spark Architecture Explained in Detail: https://www.projectpro.io/article/apache-spark-architecture-explained-in-detail/338

Introduction to Spark with Python: Spark Architecture and Components Explained in Detail: https://medium.datadriveninvestor.com/introduction-to-spark-with-python-spark-architecture-and-components-explained-in-detail-54e2ba09d6fe

Pandas to PySpark in 6 Examples: https://towardsdatascience.com/pandas-to-pyspark-in-6-examples-bd8ab825d389

A journey from Pandas to Spark Data Frames - Indellient: https://www.indellient.com/blog/a-journey-from-pandas-to-spark-data-frames/

How to use Hive and MySql in Pyspark along with some handy transformations: https://sharmashorya1996.medium.com/how-to-use-hive-and-mysql-in-pyspark-along-with-some-handy-transformations-620337a05437

Apache Spark: what makes it a better tool for ETL: https://jay-reddy.medium.com/apache-spark-what-makes-it-a-better-tool-for-etl-15e74267ac8a

Spark by {Examples}: https://sparkbyexamples.com/

https://medium.com/@sahandfarazzarrinkoub/comparing-performance-between-apache-spark-and-pyspark-63d68c067a55

About

PySpark - Big Data Analysis

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages