Skip to content

divya-anand21/customer_demographic_hive_analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HIVE Analytics on Customer Demographics Data

Buisness Problem: A company XYZ wants to work on the database for its visiting customers and it is trying to analyse the year to date total revenue based on the database of retail sales data called adventure works. For this particular idea, company wants to use only customer, individual and credit card details. These datasets comprise of bulk and raw data that needs to be transformed and cleaned in order to bring out the data for analytics purpose and getting the insights out of it.

Approach:

Data is present in MySQL database.

Load the data from MySQL to HDFS using SQOOP.

Create and load data to HIVE table.

Read data from HIVE in Spark and perform data cleaning.

Load the data again to hive and perform analytics.


HDFS EcoSystem

  • Sqoop
  • Hive
  • Spark
  • PySpark

Data Source Description

Customer Table: This table contain all customer data related information.

Customer

Individual Table: This table contain all Individual data information.

Individual

Credit Card Table: This table contain all credit card data information.

CreditCard

Project Architecture sql_hive_architecture

About

SQL Hive Analaysis on (Adventure Works)dataset

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published