Skip to content

babji-dev/CSV-Processor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

📦 CSV Batch Upload Service (Spring Boot + Spring Batch)

This project provides an asynchronous CSV upload and processing service using Spring Boot and Spring Batch.
It supports large CSV files, dynamic table & column mapping (via configuration), and background batch processing with job tracking.


🚀 Features

✅ Upload large CSV files (up to 200MB by default)
✅ Asynchronous batch processing — API returns immediately
✅ Job progress/status tracking through REST APIs
✅ Dynamic table name and column mapping (from properties)
✅ Fault-tolerant batch processing with configurable chunk size and thread pool
✅ Auto database schema creation for Spring Batch metadata


⚙️ Tech Stack

  • Java 17+
  • Spring Boot 3+
  • Spring Batch
  • MySQL
  • HikariCP
  • Maven


⚙️ Configuration (application.properties)

# ===============================
# = DataSource Configuration
# ===============================
spring.datasource.url=jdbc:mysql://localhost:3306/commonUtils?useSSL=false&allowPublicKeyRetrieval=true&serverTimezone=UTC
spring.datasource.username=root
spring.datasource.password=root
spring.datasource.driver-class-name=com.mysql.cj.jdbc.Driver

# HikariCP Connection Pool
spring.datasource.hikari.maximum-pool-size=20

# ===============================
# = Spring Batch Configuration
# ===============================
spring.batch.job.enabled=false   # Don't auto-run jobs on startup
spring.batch.jdbc.initialize-schema=always

# ===============================
# = Custom App Configuration
# ===============================
app.csv.chunk-size=1000          # Records per transaction batch
app.csv.thread-pool-size=4       # Parallel workers for batch job

# Allow large uploads
spring.servlet.multipart.max-file-size=200MB
spring.servlet.multipart.max-request-size=200MB

# ===============================
# = Dynamic Table Mapping
# ===============================
app.csv.table-name=employee_data
app.csv.columns=id,firstName,lastName

📝 Notes:

  • Change app.csv.table-name to match your target database table.
  • app.csv.columns should exactly match both your CSV header and database column names.
  • The service will automatically generate SQL like:
    INSERT INTO employee_data (id, firstName, lastName)
    VALUES (:id, :firstName, :lastName)

💾 Database Requirements

Your database should have a table with matching columns defined in app.csv.columns. Example:

CREATE TABLE employee_data (
    id INT PRIMARY KEY,
    firstName VARCHAR(100),
    lastName VARCHAR(100)
);

And also make sure to create the job_status table to track the status.

create table job_status (
  job_id varchar(64) primary key,
  upload_id varchar(64),
  status varchar(32),
  processed_rows bigint default 0,
  failed_rows bigint default 0,
  started_at datetime,
  ended_at datetime,
  last_updated datetime
);

Spring Batch will also create the following metadata tables automatically (for tracking job executions):

BATCH_JOB_INSTANCE
BATCH_JOB_EXECUTION
BATCH_JOB_EXECUTION_PARAMS
BATCH_STEP_EXECUTION
BATCH_STEP_EXECUTION_CONTEXT
BATCH_JOB_EXECUTION_CONTEXT

🔗 API Endpoints

1️⃣ Upload CSV file (Async)

POST /api/uploadAsync

Uploads the CSV and triggers background processing.

Request:

  • Content-Type: multipart/form-data
  • Form Field: file

Example (Postman)

POST http://localhost:8080/api/uploadAsync
Body → form-data:
  file → <select your CSV file>

Response:

{
  "jobId": "3",
  "uploadId": "7e4d6b5a-47a9-47a7-84aa-77aefc734b64"
}

🕒 The job runs asynchronously in the background. You can track its status using the endpoint below.


2️⃣ Check Job Status

GET /api/job/async/{uploadId}

Retrieves current job status (STARTED, COMPLETED, FAILED, etc.).

Example:

GET http://localhost:8080/api/job/async/7e4d6b5a-47a9-47a7-84aa-77aefc734b64

Response:

{
  "jobId": "3",
  "uploadId": "7e4d6b5a-47a9-47a7-84aa-77aefc734b64",
  "status": "COMPLETED",
  "exitCode": "COMPLETED",
  "startTime": "2025-11-02T11:12:23",
  "endTime": "2025-11-02T11:12:41"
}

📊 Example CSV

employee_data.csv

id,firstName,lastName
1,John,Doe
2,Jane,Smith
3,Robert,Brown
4,Alice,Johnson

🧩 How it Works

  1. The file is uploaded to a temporary folder (uploads/).
  2. A new Spring Batch Job is launched asynchronously.
  3. The job:
    • Reads CSV lines using FlatFileItemReader
    • Processes data in chunks (size defined in app.csv.chunk-size)
    • Inserts records into the configured database table
  4. Job progress & status are tracked in a job status table (via JobStatusService).

⚠️ Important Notes

  • The CSV must have a header matching the columns defined in app.csv.columns.
  • Ensure the target table exists before running the job.
  • Large file uploads (>200MB) will be rejected unless you increase:
    spring.servlet.multipart.max-file-size=500MB
    spring.servlet.multipart.max-request-size=500MB
  • The job runs asynchronously, but database inserts still occur in chunks for performance and rollback safety.

🧪 Example Flow

  1. Start your Spring Boot app:
    mvn spring-boot:run
  2. Upload CSV using Postman:
    • Method: POST
    • URL: http://localhost:8080/api/uploadAsync
    • Body: form-data, field name file
  3. Copy the returned uploadId.
  4. Check job status:
    GET http://localhost:8080/api/job/async/{uploadId}
    

🧱 Future Enhancements

  • CSV-to-table schema auto-mapping (no manual table creation)
  • Real-time job progress updates (WebSocket)
  • Configurable delimiter (, / ; / |)
  • Error logging for failed rows
  • File retention & cleanup policy

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages