A Node.js API that converts CSV files to JSON format and uploads the data to PostgreSQL database with age distribution analysis.
- Custom CSV Parser: Handles quoted fields, nested properties with dot notation (e.g.,
name.firstName,address.line1) - Database Integration: Auto-creates database and table if missing
- Batch Processing: Efficiently handles large files (>50k records) with chunked inserts
- Age Distribution Analysis: Calculates and displays age group percentages
- Transaction Safety: Uses database transactions with rollback on errors
- Node.js (>=18.0.0)
- PostgreSQL server
- npm or yarn
- Clone the repository:
git clone <repository-url>
cd csv-json-api- Install dependencies:
npm install- Create
.envfile in the project root:
PORT=4000
CSV_FILE_PATH="C:\path\to\your\users.csv"
DB_HOST=127.0.0.1
DB_PORT=5432
DB_USER=postgres
DB_PASSWORD=your_password
DB_NAME=csv_usersThe application creates the following table structure:
CREATE TABLE public.users (
name varchar NOT NULL, -- Concatenated firstName + lastName
age int4 NOT NULL, -- User age
address jsonb NULL, -- Address object (line1, line2, city, state)
additional_info jsonb NULL, -- Other properties not mapped to specific fields
id serial4 NOT NULL, -- Auto-increment primary key
CONSTRAINT users_pkey PRIMARY KEY (id)
);Each CSV row must contain these fields (case-sensitive):
name.firstNamename.lastNameage
name.firstName,name.lastName,age,address.line1,address.line2,address.city,address.state,gender
Rohit,Prasad,35,"A-563 Rakshak Society","New Pune Road",Pune,Maharashtra,male
Anita,Sharma,19,"12, Park View Apt","Sector 5",Mumbai,Maharashtra,female
- Mandatory fields → Direct table columns:
name.firstName+name.lastName→name(concatenated)age→age(integer)
- Address fields →
addressJSONB:address.*properties → JSON object
- Other fields →
additional_infoJSONB:- All remaining properties → JSON object
- Start the server:
npm run dev- Upload CSV data:
# PowerShell
Invoke-RestMethod -Method Post -Uri http://localhost:4000/api/upload
# curl
curl -X POST http://localhost:4000/api/upload- Check server status:
# Browser: http://localhost:4000/
# Or PowerShell:
Invoke-RestMethod -Method Get -Uri http://localhost:4000/Returns API status and available endpoints.
Response:
{
"message": "CSV to JSON API is running",
"endpoints": {
"upload": {
"method": "POST",
"path": "/api/upload"
}
}
}Processes the CSV file and uploads data to PostgreSQL.
Response:
{
"message": "Data uploaded successfully",
"distribution": {
"<20": 33.33,
"20-40": 33.33,
"40-60": 0,
"60+": 33.33
}
}The application calculates and displays age distribution in the console:
Age-Group % Distribution
┌──────────┬────────┐
│ (index) │ Values │
├──────────┼────────┤
│ < 20 │ 33.33 │
│ 20 to 40 │ 33.33 │
│ 40 to 60 │ 0 │
│ > 60 │ 33.33 │
└──────────┴────────┘
Age groups:
< 20: Under 20 years20 to 40: 20-39 years40 to 60: 40-59 years> 60: 60+ years
- CSV Format: First row contains headers; subsequent rows contain data
- Field Separator: Comma (
,) as delimiter - Quoted Fields: Double quotes (
") for fields containing commas or special characters - Nested Properties: Dot notation (
.) for hierarchical field names - Data Types: Age values are numeric; other fields are treated as strings
- Database: PostgreSQL with JSONB support for flexible field storage
- Mandatory Fields: Every record must have
name.firstName,name.lastName, andage - Name Concatenation: Full name =
firstName + " " + lastName - Address Grouping: All
address.*fields grouped into single JSONB object - Additional Fields: Non-mandatory, non-address fields stored in
additional_info - Age Buckets: Fixed age ranges for distribution analysis
- Error Handling: Missing database/table auto-created; invalid data causes transaction rollback
- File Size: Handles files with 50k+ records efficiently
- Memory Usage: Processes CSV in chunks to avoid memory issues
- Database: Uses connection pooling and batch inserts
- Concurrency: Single-threaded processing (no concurrent uploads)
- Rationale: Avoid external dependencies while maintaining RFC 4180 compliance
- Features: Handles quoted fields, escaped quotes, CRLF/LF line endings
- Performance: Memory-efficient line-by-line processing
- JSONB Usage: Flexible schema for variable field structures
- Normalization: Separate
addressandadditional_infofor logical grouping - Indexing: Primary key on
idfor efficient queries
- Transaction Safety: All-or-nothing database operations
- Validation: File existence and format validation before processing
- Graceful Degradation: Clear error messages for common issues
- Modular Structure: Separate concerns (parsing, database, routes, utilities)
- Consistent Naming: Descriptive function and variable names
- Error Handling: Try-catch blocks with meaningful error messages
- Documentation: Inline comments for complex logic
- Unit Tests: Individual functions (CSV parsing, age distribution)
- Integration Tests: Database operations and API endpoints
- Edge Cases: Empty files, malformed CSV, database connection failures
-
"CSV_FILE_PATH is not configured"
- Ensure
.envfile exists withCSV_FILE_PATHset - Use absolute path with proper escaping for Windows
- Ensure
-
"CSV file not found"
- Verify file exists at specified path
- Check file permissions
-
"Database connection failed"
- Ensure PostgreSQL is running
- Verify credentials in
.env - Check firewall settings
-
"Cannot GET /api/upload"
- Use POST method, not GET
- Browser visits send GET requests by default
For very large files (>100k records):
- Consider streaming CSV processing
- Implement pagination for database queries
- Add database indexes on frequently queried fields
- Use connection pooling for concurrent requests
MIT License - see LICENSE file for details.
- Fork the repository
- Create a feature branch
- Make changes with tests
- Submit a pull request
For issues and questions, please create an issue in the repository.