Skip to content

LucasTrenzado/GooglePlayStore_Analysis_Spark_BigData

Repository files navigation

GooglePlayStore_Analysis_Spark_BigData

Notebook

https://github.com/LucasTrenzado/GooglePlayStore_Analysis_Spark_BigData/blob/main/GooglePlayStore%20Big%20Data%20Tools%20Analysis.ipynb

Abstract

The objective of this project is to get deeper into a real data problem from the technical point of view in order to learn and understand a potential solution, using Big Data, Data Engineering and Analytics tools. This was done using NiFi for data ingestion, HDFS for storage, and Spark for processing. The first step of the project was to use NiFi to ingest the data from various sources such as CSV files, (web scraped data of 10k Play Store apps for analysing the Android market) into HDFS. This ensured that the data was properly formatted before being stored in HDFS for further processing. Once the data was stored in HDFS, I used Spark to process and analyze the data. I used Spark SQL to perform SQL-like operations on the data and Spark DataFrames to perform various data transformations and aggregations. Finally, I used Spark to generate various insights and visualizations from the processed data. I used Spark SQL to generate various statistics and Spark DataFrames to generate various insights.Overall, this project allowed me to gain hands-on experience in using NiFi, HDFS, and Spark to process and analyze large datasets. Additionally, I was able to gain valuable insights from the data, which could be used to inform decision-making and generate business value. The Play Store apps data has enormous potential to drive app-making businesses to success. Actionable insights can be drawn for developers or app owners to work on and increase the performance of their current and futures apps. The following analysis will focus on describing the current status, trends and statistics in the market.

About

Analysis of Google Play Store App's data using Spark.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published