Skip to content

Shreya20002/csv-json-api

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CSV to JSON Converter API

A Node.js API that converts CSV files to JSON format and uploads the data to PostgreSQL database with age distribution analysis.

Features

  • Custom CSV Parser: Handles quoted fields, nested properties with dot notation (e.g., name.firstName, address.line1)
  • Database Integration: Auto-creates database and table if missing
  • Batch Processing: Efficiently handles large files (>50k records) with chunked inserts
  • Age Distribution Analysis: Calculates and displays age group percentages
  • Transaction Safety: Uses database transactions with rollback on errors

Prerequisites

  • Node.js (>=18.0.0)
  • PostgreSQL server
  • npm or yarn

Installation

  1. Clone the repository:
git clone <repository-url>
cd csv-json-api
  1. Install dependencies:
npm install
  1. Create .env file in the project root:
PORT=4000
CSV_FILE_PATH="C:\path\to\your\users.csv"
DB_HOST=127.0.0.1
DB_PORT=5432
DB_USER=postgres
DB_PASSWORD=your_password
DB_NAME=csv_users

Database Schema

The application creates the following table structure:

CREATE TABLE public.users (
  name varchar NOT NULL,           -- Concatenated firstName + lastName
  age int4 NOT NULL,               -- User age
  address jsonb NULL,             -- Address object (line1, line2, city, state)
  additional_info jsonb NULL,     -- Other properties not mapped to specific fields
  id serial4 NOT NULL,            -- Auto-increment primary key
  CONSTRAINT users_pkey PRIMARY KEY (id)
);

CSV Format Requirements

Mandatory Fields

Each CSV row must contain these fields (case-sensitive):

  • name.firstName
  • name.lastName
  • age

Example CSV Structure

name.firstName,name.lastName,age,address.line1,address.line2,address.city,address.state,gender
Rohit,Prasad,35,"A-563 Rakshak Society","New Pune Road",Pune,Maharashtra,male
Anita,Sharma,19,"12, Park View Apt","Sector 5",Mumbai,Maharashtra,female

Field Mapping

  • Mandatory fields → Direct table columns:
    • name.firstName + name.lastNamename (concatenated)
    • ageage (integer)
  • Address fieldsaddress JSONB:
    • address.* properties → JSON object
  • Other fieldsadditional_info JSONB:
    • All remaining properties → JSON object

Usage

  1. Start the server:
npm run dev
  1. Upload CSV data:
# PowerShell
Invoke-RestMethod -Method Post -Uri http://localhost:4000/api/upload

# curl
curl -X POST http://localhost:4000/api/upload
  1. Check server status:
# Browser: http://localhost:4000/
# Or PowerShell:
Invoke-RestMethod -Method Get -Uri http://localhost:4000/

API Endpoints

GET /

Returns API status and available endpoints.

Response:

{
  "message": "CSV to JSON API is running",
  "endpoints": {
    "upload": {
      "method": "POST",
      "path": "/api/upload"
    }
  }
}

POST /api/upload

Processes the CSV file and uploads data to PostgreSQL.

Response:

{
  "message": "Data uploaded successfully",
  "distribution": {
    "<20": 33.33,
    "20-40": 33.33,
    "40-60": 0,
    "60+": 33.33
  }
}

Age Distribution Report

The application calculates and displays age distribution in the console:

Age-Group % Distribution
┌──────────┬────────┐
│ (index)  │ Values │
├──────────┼────────┤
│ < 20     │ 33.33  │
│ 20 to 40 │ 33.33  │
│ 40 to 60 │ 0      │
│ > 60     │ 33.33  │
└──────────┴────────┘

Age groups:

  • < 20: Under 20 years
  • 20 to 40: 20-39 years
  • 40 to 60: 40-59 years
  • > 60: 60+ years

Implementation Assumptions

Technical Assumptions

  1. CSV Format: First row contains headers; subsequent rows contain data
  2. Field Separator: Comma (,) as delimiter
  3. Quoted Fields: Double quotes (") for fields containing commas or special characters
  4. Nested Properties: Dot notation (.) for hierarchical field names
  5. Data Types: Age values are numeric; other fields are treated as strings
  6. Database: PostgreSQL with JSONB support for flexible field storage

Business Logic Assumptions

  1. Mandatory Fields: Every record must have name.firstName, name.lastName, and age
  2. Name Concatenation: Full name = firstName + " " + lastName
  3. Address Grouping: All address.* fields grouped into single JSONB object
  4. Additional Fields: Non-mandatory, non-address fields stored in additional_info
  5. Age Buckets: Fixed age ranges for distribution analysis
  6. Error Handling: Missing database/table auto-created; invalid data causes transaction rollback

Performance Assumptions

  1. File Size: Handles files with 50k+ records efficiently
  2. Memory Usage: Processes CSV in chunks to avoid memory issues
  3. Database: Uses connection pooling and batch inserts
  4. Concurrency: Single-threaded processing (no concurrent uploads)

Architecture Decisions

Custom CSV Parser

  • Rationale: Avoid external dependencies while maintaining RFC 4180 compliance
  • Features: Handles quoted fields, escaped quotes, CRLF/LF line endings
  • Performance: Memory-efficient line-by-line processing

Database Design

  • JSONB Usage: Flexible schema for variable field structures
  • Normalization: Separate address and additional_info for logical grouping
  • Indexing: Primary key on id for efficient queries

Error Handling

  • Transaction Safety: All-or-nothing database operations
  • Validation: File existence and format validation before processing
  • Graceful Degradation: Clear error messages for common issues

Development Notes

Code Style

  • Modular Structure: Separate concerns (parsing, database, routes, utilities)
  • Consistent Naming: Descriptive function and variable names
  • Error Handling: Try-catch blocks with meaningful error messages
  • Documentation: Inline comments for complex logic

Testing Considerations

  • Unit Tests: Individual functions (CSV parsing, age distribution)
  • Integration Tests: Database operations and API endpoints
  • Edge Cases: Empty files, malformed CSV, database connection failures

Troubleshooting

Common Issues

  1. "CSV_FILE_PATH is not configured"

    • Ensure .env file exists with CSV_FILE_PATH set
    • Use absolute path with proper escaping for Windows
  2. "CSV file not found"

    • Verify file exists at specified path
    • Check file permissions
  3. "Database connection failed"

    • Ensure PostgreSQL is running
    • Verify credentials in .env
    • Check firewall settings
  4. "Cannot GET /api/upload"

    • Use POST method, not GET
    • Browser visits send GET requests by default

Performance Optimization

For very large files (>100k records):

  • Consider streaming CSV processing
  • Implement pagination for database queries
  • Add database indexes on frequently queried fields
  • Use connection pooling for concurrent requests

License

MIT License - see LICENSE file for details.

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make changes with tests
  4. Submit a pull request

Support

For issues and questions, please create an issue in the repository.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published