Skip to content

PrinceOfCoding007/ETL-Process-Java

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ETL Project: Users & Products with SQL Server

Overview

This project is a Java-based ETL (Extract, Transform, Load) pipeline that moves data from external APIs into a SQL Server database. It also supports basic CRUD operations on product data and logs all operations for verification.

It demonstrates how to:

  • Extract data from APIs
  • Transform and filter data
  • Load data into database tables
  • Handle duplicates and updates automatically
  • Log and visualize data

Flow of Data

1️⃣ Extract

  • User data and Product data are fetched from external APIs.
  • The APIs return JSON responses, which are converted into Java objects.
  • This process is handled by:
    • APIExtractor.java → Extracts user data
    • ProductAPIExtractor.java → Extracts product data

At this stage, data is still in memory and ready for transformation.


2️⃣ Transform

  • The extracted user data may need filtering or modifications before loading.
  • For example, only users with a valid city are retained.
  • This transformation ensures the database only receives clean and relevant data.
  • Transformation is handled by:
    • DataTransformer.java → Applies filtering and transformation logic

3️⃣ Load

  • Transformed data is written to SQL Server tables.
  • Users are inserted directly into the Users table.
  • Products use a MERGE strategy:
    • Existing product records are updated
    • New products are inserted automatically
  • Loading is handled by:
    • DatabaseLoader.java → Loads user data
    • ProductDAO.java → Handles products data, including insert, update, and delete

This step moves data from Java objects into permanent storage in the database.


4️⃣ CRUD Operations on Products

  • After loading, products can be read, updated, or deleted as needed.
  • ProductDAO.java provides methods to:
    • Read all products
    • Insert or update products (using MERGE)
    • Delete products by ID
  • Updates are optional because MERGE automatically handles changes to existing records.

5️⃣ Logging & Visualization

  • All steps in the pipeline are logged for tracking and debugging.
  • TablePrinter.java prints formatted tables of users and products to the console.
  • Logs show:
    • API data received
    • Records after transformation
    • Data successfully loaded into SQL Server

Key Takeaways

  • Data moves in a linear flow: Extract → Transform → Load → CRUD.
  • Duplicate product handling is automatic using MERGE, avoiding manual update checks.
  • Each Java file has a clear responsibility:
    • Extractors fetch API data
    • Transformer filters or modifies data
    • Loaders / DAO move data to the database and manage CRUD operations
    • TablePrinter helps visualize data
  • Logging ensures transparency and aids debugging for new developers.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages