csv-auto-datapipe is an intelligent, Node.js–based data ingestion toolkit designed to simplify the process of converting CSV files into structured database records. The core idea is to provide a fully automated data pipeline that can parse raw CSV files, understand their structure, infer database schemas using AI, and then upload the transformed data into any connected database, all in a clean, efficient, and developer-friendly manner.
- Zero-Dependency CSV Parser: Fast, lightweight CSV parsing with support for nested objects via dot notation
- AI-Powered Schema Generation: Automatically generates optimal database schemas using Google Gemini AI
- Natural Language Queries: Query databases using plain English - no SQL knowledge required
- Multi-Database Support: Works with PostgreSQL, MySQL, and MongoDB
- Smart Data Handling: Intelligently merges related fields and uses JSON/JSONB for complex structures
- Automatic Type Inference: Converts strings to numbers, booleans, and null values automatically
- Nested Object Support: Handles dot notation (e.g.,
name.firstName,address.city) elegantly - Production-Ready: Follows database best practices with proper constraints, indexes, and data types
- Getting Started Guide - Complete setup guide for contributors and users
- Contributing Guidelines - How to contribute to the project
- API Reference - Detailed API documentation below
- Installation
- Quick Start
- CSV Parser
- Parse and Upload
- Natural Language Query
- Supported Databases
- AI Schema Generation
- API Reference
- Examples
- Troubleshooting
- Contributing
- License
npm install csv-auto-datapipeInstall the database driver(s) you need:
# For PostgreSQL
npm install pg
# For MySQL
npm install mysql2
# For MongoDB
npm install mongodbGet your free API key from Google AI Studio
const { CSVParser } = require('csv-auto-datapipe');
const parser = new CSVParser();
const csvString = `name,age,city
John,30,New York
Jane,25,Los Angeles`;
const data = parser.parse(csvString);
console.log(data);
// Output: [
// { name: 'John', age: 30, city: 'New York' },
// { name: 'Jane', age: 25, city: 'Los Angeles' }
// ]const csvString = `name.firstName,name.lastName,age,address.city
Fahed,Khan,23,Mumbai
Rohit,Prasad,35,Pune`;
const data = parser.parse(csvString);
console.log(data);
// Output: [
// {
// name: { firstName: 'Fahed', lastName: 'Khan' },
// age: 23,
// address: { city: 'Mumbai' }
// },
// ...
// ]const { parseAndUpload } = require('csv-auto-datapipe');
await parseAndUpload({
filePath: './users.csv',
dbType: 'postgres',
host: 'localhost',
user: 'postgres',
password: 'your_password',
geminiApiKey: process.env.GEMINI_API_KEY,
aiSchema: true
});const { parseAndUpload } = require('csv-auto-datapipe');
const result = await parseAndUpload({
filePath: './users.csv',
dbType: 'postgres',
host: 'localhost',
user: 'postgres',
password: 'your_password',
geminiApiKey: process.env.GEMINI_API_KEY,
aiSchema: true,
query: 'Find all users older than 25'
});
console.log('Upload:', result.rowsInserted, 'rows');
console.log('Query Results:', result.queryResult.results);
console.log('Generated SQL:', result.queryResult.generatedQuery);The CSV parser is a standalone, zero-dependency module that converts CSV data to JSON.
const { CSVParser } = require('csv-auto-datapipe');
const parser = new CSVParser();
const data = parser.parse(csvString, options);| Option | Type | Default | Description |
|---|---|---|---|
delimiter |
string | , |
Field delimiter character |
trimValues |
boolean | true |
Trim whitespace from values |
Automatically converts values to appropriate types:
const csv = `name,age,active,score
Alice,25,true,98.5`;
const data = parser.parse(csv);
// { name: 'Alice', age: 25, active: true, score: 98.5 }Supports dot notation for creating nested structures:
const csv = `user.name,user.email,user.settings.theme
John,john@example.com,dark`;
const data = parser.parse(csv);
// {
// user: {
// name: 'John',
// email: 'john@example.com',
// settings: { theme: 'dark' }
// }
// }Handles commas and quotes within fields:
const csv = `name,address
"Smith, John","123 Main St, Apt 4"`;
const data = parser.parse(csv);
// { name: 'Smith, John', address: '123 Main St, Apt 4' }const csv = `name;age;city
John;30;NYC`;
const data = parser.parse(csv, { delimiter: ';' });The main feature that combines CSV parsing with AI-powered schema generation and database insertion.
await parseAndUpload({
// Required
filePath: './data.csv',
dbType: 'postgres', // 'postgres', 'mysql', or 'mongodb'
geminiApiKey: 'your-api-key',
// Database Connection
host: 'localhost',
port: 5432,
user: 'postgres',
password: 'password',
database: 'mydb', // Optional: will be auto-generated if not provided
// Optional
tableName: 'users', // Optional: will be auto-generated if not provided
aiSchema: true,
createDatabase: true,
parseOptions: { delimiter: ',', trimValues: true },
// NEW: Optional natural language query to run after upload
query: 'Find all users older than 25'
});| Parameter | Type | Description |
|---|---|---|
filePath |
string | Path to the CSV file |
dbType |
string | Database type: 'postgres', 'mysql', or 'mongodb' |
geminiApiKey |
string | Google Gemini API key |
| Parameter | Type | Default | Description |
|---|---|---|---|
host |
string | 'localhost' |
Database host |
port |
number | DB-specific | Database port |
user or username |
string | - | Database username |
password |
string | - | Database password |
database |
string | Auto-generated | Database name |
| Parameter | Type | Default | Description |
|---|---|---|---|
tableName |
string | Auto-generated | Table/collection name |
aiSchema |
boolean | true |
Use AI for schema generation |
createDatabase |
boolean | true |
Create database if not exists |
parseOptions |
object | {} |
CSV parser options |
query |
string | undefined |
NEW: Natural language query to run after upload |
{
success: true,
rowsParsed: 100,
rowsInserted: 100,
database: 'users_db',
table: 'users',
schema: { /* Generated schema object */ },
// NEW: queryResult is included if query parameter was provided
queryResult: {
success: true,
originalQuery: 'Find all users older than 25',
generatedQuery: 'SELECT * FROM users WHERE age > $1',
queryType: 'SELECT',
explanation: 'Retrieves all users where age is greater than 25',
results: [ /* query results */ ],
rowCount: 10,
executionTime: 45
}
}Query your databases using plain English! No need to write SQL or NoSQL queries manually.
const { naturalQuery } = require('csv-auto-datapipe');
const result = await naturalQuery({
query: 'Find all users older than 25',
dbType: 'postgres',
host: 'localhost',
user: 'postgres',
password: 'password',
database: 'mydb',
tableName: 'users',
geminiApiKey: process.env.GEMINI_API_KEY
});
console.log(result.results); // Query results
console.log(result.generatedQuery); // Generated SQL query
console.log(result.explanation); // Human-readable explanation- Multi-Database: Works with PostgreSQL, MySQL, and MongoDB
- Intelligent Translation: Converts natural language to optimized database queries
- Query Types: Supports SELECT, aggregations, filtering, sorting, and more
- Auto-Explanation: Provides clear explanation of what each query does
- Performance Metrics: Returns execution time and row counts
// Simple filtering
"Find all people older than 30"
"Show users from Mumbai"
// Aggregation
"Count users by city"
"What is the average age of customers"
// Complex queries
"Find people between ages 20 and 30 living in Mumbai, sorted by age"
"Group orders by month and show total revenue"
// Pattern matching
"Find users whose name starts with A"
"Search products containing 'laptop'"const { naturalQuery, getSchemaInfo } = require('csv-auto-datapipe');
// Get schema for better accuracy
const schemaInfo = await getSchemaInfo({
dbType: 'postgres',
host: 'localhost',
user: 'postgres',
password: 'password',
database: 'mydb',
tableName: 'users'
});
// Use schema in query
const result = await naturalQuery({
query: 'Show me the top 5 highest paid employees',
dbType: 'postgres',
host: 'localhost',
user: 'postgres',
password: 'password',
database: 'mydb',
tableName: 'users',
schemaInfo: schemaInfo,
geminiApiKey: process.env.GEMINI_API_KEY
});await parseAndUpload({
filePath: './data.csv',
dbType: 'postgres',
host: 'localhost',
port: 5432,
user: 'postgres',
password: 'password',
geminiApiKey: process.env.GEMINI_API_KEY
});Features:
- JSONB support for complex nested data
- Full transaction support
- Advanced indexing
await parseAndUpload({
filePath: './data.csv',
dbType: 'mysql',
host: 'localhost',
port: 3306,
user: 'root',
password: 'password',
geminiApiKey: process.env.GEMINI_API_KEY
});Features:
- JSON column type support
- Automatic type conversion
- Optimized for performance
await parseAndUpload({
filePath: './data.csv',
dbType: 'mongodb',
host: 'localhost',
port: 27017,
geminiApiKey: process.env.GEMINI_API_KEY
});Features:
- Native nested document support
- Schema validation
- Flexible document structure
Google Gemini intelligently analyzes your CSV headers and generates optimal database schemas.
- Analyzes Headers: Examines CSV column names and patterns
- Identifies Relationships: Detects related fields (e.g.,
name.firstName,name.lastName) - Decides Structure:
- Merges name parts into single columns
- Groups address fields into JSONB/JSON
- Creates appropriate data types
- Generates Schema: Produces production-ready database schema
- Maps Fields: Creates field mapping for data transformation
Input CSV:
name.firstName,name.lastName,age,address.line1,address.city,address.stateGenerated Schema (PostgreSQL):
CREATE TABLE people (
id SERIAL PRIMARY KEY,
name VARCHAR(255) NOT NULL, -- Merged: firstName + lastName
age INTEGER,
address JSONB -- Grouped: {line1, city, state}
);Field Mapping:
{
"name.firstName": "name",
"name.lastName": "name",
"age": "age",
"address.line1": "address",
"address.city": "address",
"address.state": "address"
}Transformed Data:
// Original
{ "name.firstName": "Fahed", "name.lastName": "Khan", "age": 23, "address.line1": "Raushan Nagar", "address.city": "Mumbai" }
// After transformation
{ "name": "Fahed Khan", "age": 23, "address": "{\"line1\": \"Raushan Nagar\", \"city\": \"Mumbai\"}" }const parser = new CSVParser();Parses CSV string and returns array of objects.
Parameters:
csvString(string): The CSV dataoptions(object): Parser optionsdelimiter(string): Field delimiter (default:,)trimValues(boolean): Trim whitespace (default:true)
Returns: Array<Object>
Throws: Error if CSV is invalid or field count mismatch
Parses CSV file and uploads data to database with AI-generated schema.
Parameters: See Configuration
Returns: Promise<Object> with result summary
Throws: Error if file not found, database connection fails, or schema generation fails
const generator = new SchemaGenerator(apiKey);Generates database schema using Gemini AI.
Parameters:
headers(Array): CSV headersdbType(string): Database typetableName(string|null): Optional table namedatabaseName(string|null): Optional database name
Returns: Promise<Object> with schema information
const executor = new QueryExecutor(geminiApiKey);Execute a natural language query against a database.
Parameters:
query(string): Natural language querydbType(string): Database typedbConnection(object): Database connection instancedatabase(string): Database nametableName(string): Optional table/collection nameschemaInfo(object): Optional schema information
Returns: Promise<Object> with query results and metadata
High-level API for natural language queries. See Natural Language Query section.
Retrieve database schema information.
Parameters:
dbType(string): Database typehost(string): Database hostuser(string): Database usernamepassword(string): Database passworddatabase(string): Database nametableName(string): Table/collection name
Returns: Promise<Object> with schema details
const connector = new DatabaseConnector(dbType, connectionParams);Connects to the database.
Returns: Promise<void>
Creates a database if it doesn't exist.
Returns: Promise<void>
Creates table/collection based on schema.
Returns: Promise<void>
Inserts data into database.
Returns: Promise<Object> with insertion count
Closes database connection.
Returns: Promise<void>
const { CSVParser } = require('csv-auto-datapipe');
const fs = require('fs');
const parser = new CSVParser();
const csvContent = fs.readFileSync('./data.csv', 'utf-8');
const data = parser.parse(csvContent);
console.log(`Parsed ${data.length} rows`);
console.log(data[0]); // First rowconst { naturalQuery } = require('csv-auto-datapipe');
async function queryDatabase() {
const result = await naturalQuery({
query: 'Find all users from Mumbai who are older than 25',
dbType: 'postgres',
host: 'localhost',
user: 'postgres',
password: process.env.POSTGRES_PASSWORD,
database: 'users_db',
tableName: 'users',
geminiApiKey: process.env.GEMINI_API_KEY
});
console.log('Generated Query:', result.generatedQuery);
console.log('Results:', result.results);
console.log('Explanation:', result.explanation);
}
queryDatabase();const { parseAndUpload } = require('csv-auto-datapipe');
async function uploadToPostgres() {
try {
const result = await parseAndUpload({
filePath: './users.csv',
dbType: 'postgres',
host: 'localhost',
user: 'postgres',
password: process.env.POSTGRES_PASSWORD,
geminiApiKey: process.env.GEMINI_API_KEY
});
console.log(`Success! Inserted ${result.rowsInserted} rows into ${result.database}.${result.table}`);
} catch (error) {
console.error('Upload failed:', error.message);
}
}
uploadToPostgres();const { parseAndUpload } = require('csv-auto-datapipe');
async function uploadAndQuery() {
try {
const result = await parseAndUpload({
filePath: './users.csv',
dbType: 'postgres',
host: 'localhost',
user: 'postgres',
password: process.env.POSTGRES_PASSWORD,
geminiApiKey: process.env.GEMINI_API_KEY,
// Query the data immediately after upload
query: 'Find all users from Mumbai who are older than 25'
});
console.log(`Uploaded ${result.rowsInserted} rows`);
if (result.queryResult && result.queryResult.success) {
console.log('Generated SQL:', result.queryResult.generatedQuery);
console.log('Query Results:', result.queryResult.results);
console.log(`Found ${result.queryResult.rowCount} matching records`);
}
} catch (error) {
console.error('Error:', error.message);
}
}
uploadAndQuery();const { parseAndUpload } = require('csv-auto-datapipe');
async function uploadToMongo() {
const result = await parseAndUpload({
filePath: './products.csv',
dbType: 'mongodb',
host: 'localhost',
port: 27017,
database: 'ecommerce',
geminiApiKey: process.env.GEMINI_API_KEY
});
console.log('Uploaded to MongoDB:', result);
}
uploadToMongo();const { naturalQuery } = require('csv-auto-datapipe');
async function aggregateQuery() {
const result = await naturalQuery({
query: 'Count users by city and show average age for each city',
dbType: 'postgres',
host: 'localhost',
user: 'postgres',
password: process.env.POSTGRES_PASSWORD,
database: 'users_db',
tableName: 'users',
geminiApiKey: process.env.GEMINI_API_KEY
});
console.log('Generated SQL:', result.generatedQuery);
console.log('Results:', result.results);
}
aggregateQuery();const { parseAndUpload } = require('csv-auto-datapipe');
async function uploadWithAggregation() {
const result = await parseAndUpload({
filePath: './users.csv',
dbType: 'postgres',
host: 'localhost',
user: 'postgres',
password: process.env.POSTGRES_PASSWORD,
geminiApiKey: process.env.GEMINI_API_KEY,
// Run aggregation immediately after upload
query: 'Count users by city and show the average age for each city'
});
console.log(`Uploaded ${result.rowsInserted} rows`);
console.log('City Statistics:', result.queryResult.results);
}
uploadWithAggregation();const { parseAndUpload } = require('csv-auto-datapipe');
await parseAndUpload({
filePath: './data.tsv',
dbType: 'postgres',
host: 'localhost',
user: 'postgres',
password: 'password',
parseOptions: {
delimiter: '\t', // Tab-separated
trimValues: true
},
geminiApiKey: process.env.GEMINI_API_KEY
});const { parseAndUpload } = require('csv-auto-datapipe');
await parseAndUpload({
filePath: './orders.csv',
dbType: 'mysql',
host: 'localhost',
user: 'root',
password: 'password',
database: 'sales_db', // Specify database name
tableName: 'orders_2024', // Specify table name
geminiApiKey: process.env.GEMINI_API_KEY
});require('dotenv').config();
const { parseAndUpload } = require('csv-auto-datapipe');
await parseAndUpload({
filePath: './data.csv',
dbType: 'postgres',
host: process.env.DB_HOST,
port: parseInt(process.env.DB_PORT),
user: process.env.DB_USER,
password: process.env.DB_PASSWORD,
database: process.env.DB_NAME,
geminiApiKey: process.env.GEMINI_API_KEY
});Create a .env file:
# Gemini API
GEMINI_API_KEY=your_gemini_api_key
# PostgreSQL
POSTGRES_HOST=localhost
POSTGRES_PORT=5432
POSTGRES_USER=postgres
POSTGRES_PASSWORD=your_password
# MySQL
MYSQL_HOST=localhost
MYSQL_PORT=3306
MYSQL_USER=root
MYSQL_PASSWORD=your_password
# MongoDB
MONGODB_HOST=localhost
MONGODB_PORT=27017Use with dotenv:
require('dotenv').config();
const { parseAndUpload } = require('csv-auto-datapipe');Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
Fahed Khan (@12fahed)
- Google Gemini AI for intelligent schema generation
- PostgreSQL, MySQL, and MongoDB communities
- Open source contributors
- Issues: GitHub Issues
- Documentation: Full Documentation
- Examples: See
__test__directory for more examples
We welcome contributions from the community! Whether it's bug fixes, new features, or documentation improvements, your help is appreciated.
- Read the Getting Started Guide
- Review Contributing Guidelines
- Set up your development environment
- Run tests:
node __test__/test-entire-system.js - Make your changes and submit a PR
- Fork the repository
- Create a feature branch
- Make your changes with tests
- Ensure all tests pass
- Update documentation
- Submit a pull request
See CONTRIBUTING.md for detailed guidelines.
This project is licensed under the MIT License - see the LICENSE file for details.
- Free to use - Use in personal and commercial projects
- Modify - Change the code as you need
- Distribute - Share with others
- Private Use - Use privately without publishing
- No Warranty - Provided "as is" without warranty
Copyright (c) 2025 Fahed Khan
- npm Package
- GitHub Repository
- Getting Started Guide
- Contributing Guidelines
- Google Gemini API
- PostgreSQL Documentation
- MySQL Documentation
- MongoDB Documentation
Made with ❤️ by Fahed Khan