CSV to JSON Converter API

A Node.js API that converts CSV files to JSON format and uploads the data to PostgreSQL database with age distribution analysis.

Features

Custom CSV Parser: Handles quoted fields, nested properties with dot notation (e.g., name.firstName, address.line1)
Database Integration: Auto-creates database and table if missing
Batch Processing: Efficiently handles large files (>50k records) with chunked inserts
Age Distribution Analysis: Calculates and displays age group percentages
Transaction Safety: Uses database transactions with rollback on errors

Prerequisites

Node.js (>=18.0.0)
PostgreSQL server
npm or yarn

Installation

Clone the repository:

git clone <repository-url>
cd csv-json-api

Install dependencies:

npm install

Create .env file in the project root:

PORT=4000
CSV_FILE_PATH="C:\path\to\your\users.csv"
DB_HOST=127.0.0.1
DB_PORT=5432
DB_USER=postgres
DB_PASSWORD=your_password
DB_NAME=csv_users

Database Schema

The application creates the following table structure:

CREATE TABLE public.users (
  name varchar NOT NULL,           -- Concatenated firstName + lastName
  age int4 NOT NULL,               -- User age
  address jsonb NULL,             -- Address object (line1, line2, city, state)
  additional_info jsonb NULL,     -- Other properties not mapped to specific fields
  id serial4 NOT NULL,            -- Auto-increment primary key
  CONSTRAINT users_pkey PRIMARY KEY (id)
);

CSV Format Requirements

Mandatory Fields

Each CSV row must contain these fields (case-sensitive):

name.firstName
name.lastName
age

Example CSV Structure

name.firstName,name.lastName,age,address.line1,address.line2,address.city,address.state,gender
Rohit,Prasad,35,"A-563 Rakshak Society","New Pune Road",Pune,Maharashtra,male
Anita,Sharma,19,"12, Park View Apt","Sector 5",Mumbai,Maharashtra,female

Field Mapping

Mandatory fields → Direct table columns:
- name.firstName + name.lastName → name (concatenated)
- age → age (integer)
Address fields → address JSONB:
- address.* properties → JSON object
Other fields → additional_info JSONB:
- All remaining properties → JSON object

Usage

Start the server:

npm run dev

Upload CSV data:

# PowerShell
Invoke-RestMethod -Method Post -Uri http://localhost:4000/api/upload

# curl
curl -X POST http://localhost:4000/api/upload

Check server status:

# Browser: http://localhost:4000/
# Or PowerShell:
Invoke-RestMethod -Method Get -Uri http://localhost:4000/

API Endpoints

GET /

Returns API status and available endpoints.

Response:

{
  "message": "CSV to JSON API is running",
  "endpoints": {
    "upload": {
      "method": "POST",
      "path": "/api/upload"
    }
  }
}

POST /api/upload

Processes the CSV file and uploads data to PostgreSQL.

Response:

{
  "message": "Data uploaded successfully",
  "distribution": {
    "<20": 33.33,
    "20-40": 33.33,
    "40-60": 0,
    "60+": 33.33
  }
}

Age Distribution Report

The application calculates and displays age distribution in the console:

Age-Group % Distribution
┌──────────┬────────┐
│ (index)  │ Values │
├──────────┼────────┤
│ < 20     │ 33.33  │
│ 20 to 40 │ 33.33  │
│ 40 to 60 │ 0      │
│ > 60     │ 33.33  │
└──────────┴────────┘

Age groups:

< 20: Under 20 years
20 to 40: 20-39 years
40 to 60: 40-59 years
> 60: 60+ years

Implementation Assumptions

Technical Assumptions

CSV Format: First row contains headers; subsequent rows contain data
Field Separator: Comma (,) as delimiter
Quoted Fields: Double quotes (") for fields containing commas or special characters
Nested Properties: Dot notation (.) for hierarchical field names
Data Types: Age values are numeric; other fields are treated as strings
Database: PostgreSQL with JSONB support for flexible field storage

Business Logic Assumptions

Mandatory Fields: Every record must have name.firstName, name.lastName, and age
Name Concatenation: Full name = firstName + " " + lastName
Address Grouping: All address.* fields grouped into single JSONB object
Additional Fields: Non-mandatory, non-address fields stored in additional_info
Age Buckets: Fixed age ranges for distribution analysis
Error Handling: Missing database/table auto-created; invalid data causes transaction rollback

Performance Assumptions

File Size: Handles files with 50k+ records efficiently
Memory Usage: Processes CSV in chunks to avoid memory issues
Database: Uses connection pooling and batch inserts
Concurrency: Single-threaded processing (no concurrent uploads)

Architecture Decisions

Custom CSV Parser

Rationale: Avoid external dependencies while maintaining RFC 4180 compliance
Features: Handles quoted fields, escaped quotes, CRLF/LF line endings
Performance: Memory-efficient line-by-line processing

Database Design

JSONB Usage: Flexible schema for variable field structures
Normalization: Separate address and additional_info for logical grouping
Indexing: Primary key on id for efficient queries

Error Handling

Transaction Safety: All-or-nothing database operations
Validation: File existence and format validation before processing
Graceful Degradation: Clear error messages for common issues

Development Notes

Code Style

Modular Structure: Separate concerns (parsing, database, routes, utilities)
Consistent Naming: Descriptive function and variable names
Error Handling: Try-catch blocks with meaningful error messages
Documentation: Inline comments for complex logic

Testing Considerations

Unit Tests: Individual functions (CSV parsing, age distribution)
Integration Tests: Database operations and API endpoints
Edge Cases: Empty files, malformed CSV, database connection failures

Troubleshooting

Common Issues

"CSV_FILE_PATH is not configured"
- Ensure .env file exists with CSV_FILE_PATH set
- Use absolute path with proper escaping for Windows
"CSV file not found"
- Verify file exists at specified path
- Check file permissions
"Database connection failed"
- Ensure PostgreSQL is running
- Verify credentials in .env
- Check firewall settings
"Cannot GET /api/upload"
- Use POST method, not GET
- Browser visits send GET requests by default

Performance Optimization

For very large files (>100k records):

Consider streaming CSV processing
Implement pagination for database queries
Add database indexes on frequently queried fields
Use connection pooling for concurrent requests

License

MIT License - see LICENSE file for details.

Contributing

Fork the repository
Create a feature branch
Make changes with tests
Submit a pull request

Support

For issues and questions, please create an issue in the repository.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
node_modules		node_modules
src		src
.gitignore		.gitignore
README.md		README.md
debug.log		debug.log
package-lock.json		package-lock.json
package.json		package.json
server.js		server.js
users.csv		users.csv

Shreya20002/csv-json-api

Folders and files

Latest commit

History

Repository files navigation