Skip to content

Produced and shared synthetic e-commerce transaction data via Kafka. Teams cleaned, transformed, and analyzed the received data, culminating in insightful visualizations of transaction quality.

Notifications You must be signed in to change notification settings

NewyorkMengHer/Ecommerce-Website-Transaction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Ecommerce Website Transaction

Description

The "Ecommerce Website Transaction" project is a collaborative effort between two teams to simulate and analyze ecommerce transaction data. Both teams generate a vast amount of simulated data based on a predefined schema, stream it to each other, clean and transform the data, and finally perform analytical queries. The results are visualized using Zeppelin and presented to an audience with diverse backgrounds.

Our team's primary objective is to analyze the other team's data, finding trends and patterns that can provide valuable insights. Part of our simulation process also involves generating "bad data" by selecting specific columns and replacing them with unrelated data, challenging the data cleaning and transformation process.

Click here to see the demo

Schema

The schema used to generate the transaction data includes:

  • order_id: Order ID
  • customer_id: Customer ID
  • customer_name: Customer Name
  • product_id: Product ID
  • product_name: Product Name
  • product_category: Product Category
  • payment_type: Payment Type
  • qty: Quantity Ordered
  • price: Price of Product
  • datetime: Date & Time when Order was Placed
  • country: Customer Country
  • city: Customer City
  • ecommerce_website_name: Site where Order was Placed
  • payment_txn_id: Payment Transaction ID
  • payment_txn_success: Payment Success/Failure
  • failure_reason: Reason for Payment Failure

Features

  • Generates over 2 million rows of transaction data spanning 10 years.
  • Uses base data on products, companies, and customers stored in files for transaction generation.
  • Highly customizable data generation.
  • Generates customers from over 20 different countries with region-accurate names.
  • Converts transaction prices from USD to the customer's local currency.
  • Introduces bad data at a rate of 3% for testing and validation.
  • Simulates logistic growth for each company at different rates.

Workflow

  1. Both teams generate transaction data based on the schema.
  2. Data is streamed to the opposite team via Kafka and stored in a CSV file.
  3. Each team cleans and transforms the received data.
  4. Analytical queries are performed on the cleaned data.
  5. Results are visualized using Tableau and Zeppelin.
  6. Teams come together to share findings and present to a mixed audience.

Technologies

  • Apache Spark
  • Spark SQL
  • Kafka
  • Scala 2.12.11
  • Zeppelin

Contributors

A big thank you to all our contributors who made this project possible:

About

Produced and shared synthetic e-commerce transaction data via Kafka. Teams cleaned, transformed, and analyzed the received data, culminating in insightful visualizations of transaction quality.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages