Buisness Problem: A company XYZ wants to work on the database for its visiting customers and it is trying to analyse the year to date total revenue based on the database of retail sales data called adventure works. For this particular idea, company wants to use only customer, individual and credit card details. These datasets comprise of bulk and raw data that needs to be transformed and cleaned in order to bring out the data for analytics purpose and getting the insights out of it.
Approach:
Data is present in MySQL database.
Load the data from MySQL to HDFS using SQOOP.
Create and load data to HIVE table.
Read data from HIVE in Spark and perform data cleaning.
Load the data again to hive and perform analytics.
- Sqoop
- Hive
- Spark
- PySpark
Customer Table: This table contain all customer data related information.
Individual Table: This table contain all Individual data information.
Credit Card Table: This table contain all credit card data information.