Skip to content

12fahed/csv-auto-datapipe

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

csv-auto-datapipe

npm version License: MIT

csv-auto-datapipe is an intelligent, Node.js–based data ingestion toolkit designed to simplify the process of converting CSV files into structured database records. The core idea is to provide a fully automated data pipeline that can parse raw CSV files, understand their structure, infer database schemas using AI, and then upload the transformed data into any connected database, all in a clean, efficient, and developer-friendly manner.

Features

  • Zero-Dependency CSV Parser: Fast, lightweight CSV parsing with support for nested objects via dot notation
  • AI-Powered Schema Generation: Automatically generates optimal database schemas using Google Gemini AI
  • Natural Language Queries: Query databases using plain English - no SQL knowledge required
  • Multi-Database Support: Works with PostgreSQL, MySQL, and MongoDB
  • Smart Data Handling: Intelligently merges related fields and uses JSON/JSONB for complex structures
  • Automatic Type Inference: Converts strings to numbers, booleans, and null values automatically
  • Nested Object Support: Handles dot notation (e.g., name.firstName, address.city) elegantly
  • Production-Ready: Follows database best practices with proper constraints, indexes, and data types

📚 Documentation

Table of Contents

Installation

npm install csv-auto-datapipe

Database Drivers

Install the database driver(s) you need:

# For PostgreSQL
npm install pg

# For MySQL
npm install mysql2

# For MongoDB
npm install mongodb

Google Gemini API Key

Get your free API key from Google AI Studio

Quick Start

1. Simple CSV Parsing

const { CSVParser } = require('csv-auto-datapipe');

const parser = new CSVParser();
const csvString = `name,age,city
John,30,New York
Jane,25,Los Angeles`;

const data = parser.parse(csvString);
console.log(data);
// Output: [
//   { name: 'John', age: 30, city: 'New York' },
//   { name: 'Jane', age: 25, city: 'Los Angeles' }
// ]

2. Parse CSV with Nested Objects

const csvString = `name.firstName,name.lastName,age,address.city
Fahed,Khan,23,Mumbai
Rohit,Prasad,35,Pune`;

const data = parser.parse(csvString);
console.log(data);
// Output: [
//   {
//     name: { firstName: 'Fahed', lastName: 'Khan' },
//     age: 23,
//     address: { city: 'Mumbai' }
//   },
//   ...
// ]

3. Parse and Upload to Database

const { parseAndUpload } = require('csv-auto-datapipe');

await parseAndUpload({
  filePath: './users.csv',
  dbType: 'postgres',
  host: 'localhost',
  user: 'postgres',
  password: 'your_password',
  geminiApiKey: process.env.GEMINI_API_KEY,
  aiSchema: true
});

4. Parse, Upload, and Query in One Step

const { parseAndUpload } = require('csv-auto-datapipe');

const result = await parseAndUpload({
  filePath: './users.csv',
  dbType: 'postgres',
  host: 'localhost',
  user: 'postgres',
  password: 'your_password',
  geminiApiKey: process.env.GEMINI_API_KEY,
  aiSchema: true,
  query: 'Find all users older than 25'
});

console.log('Upload:', result.rowsInserted, 'rows');
console.log('Query Results:', result.queryResult.results);
console.log('Generated SQL:', result.queryResult.generatedQuery);

CSV Parser

The CSV parser is a standalone, zero-dependency module that converts CSV data to JSON.

Basic Usage

const { CSVParser } = require('csv-auto-datapipe');

const parser = new CSVParser();
const data = parser.parse(csvString, options);

Parser Options

Option Type Default Description
delimiter string , Field delimiter character
trimValues boolean true Trim whitespace from values

Features

Type Inference

Automatically converts values to appropriate types:

const csv = `name,age,active,score
Alice,25,true,98.5`;

const data = parser.parse(csv);
// { name: 'Alice', age: 25, active: true, score: 98.5 }

Nested Objects

Supports dot notation for creating nested structures:

const csv = `user.name,user.email,user.settings.theme
John,john@example.com,dark`;

const data = parser.parse(csv);
// {
//   user: {
//     name: 'John',
//     email: 'john@example.com',
//     settings: { theme: 'dark' }
//   }
// }

Quoted Fields

Handles commas and quotes within fields:

const csv = `name,address
"Smith, John","123 Main St, Apt 4"`;

const data = parser.parse(csv);
// { name: 'Smith, John', address: '123 Main St, Apt 4' }

Custom Delimiter

const csv = `name;age;city
John;30;NYC`;

const data = parser.parse(csv, { delimiter: ';' });

Parse and Upload

The main feature that combines CSV parsing with AI-powered schema generation and database insertion.

Configuration

await parseAndUpload({
  // Required
  filePath: './data.csv',
  dbType: 'postgres', // 'postgres', 'mysql', or 'mongodb'
  geminiApiKey: 'your-api-key',
  
  // Database Connection
  host: 'localhost',
  port: 5432,
  user: 'postgres',
  password: 'password',
  database: 'mydb', // Optional: will be auto-generated if not provided
  
  // Optional
  tableName: 'users', // Optional: will be auto-generated if not provided
  aiSchema: true,
  createDatabase: true,
  parseOptions: { delimiter: ',', trimValues: true },
  
  // NEW: Optional natural language query to run after upload
  query: 'Find all users older than 25'
});

Parameters

Required Parameters

Parameter Type Description
filePath string Path to the CSV file
dbType string Database type: 'postgres', 'mysql', or 'mongodb'
geminiApiKey string Google Gemini API key

Connection Parameters

Parameter Type Default Description
host string 'localhost' Database host
port number DB-specific Database port
user or username string - Database username
password string - Database password
database string Auto-generated Database name

Optional Parameters

Parameter Type Default Description
tableName string Auto-generated Table/collection name
aiSchema boolean true Use AI for schema generation
createDatabase boolean true Create database if not exists
parseOptions object {} CSV parser options
query string undefined NEW: Natural language query to run after upload

Return Value

{
  success: true,
  rowsParsed: 100,
  rowsInserted: 100,
  database: 'users_db',
  table: 'users',
  schema: { /* Generated schema object */ },
  
  // NEW: queryResult is included if query parameter was provided
  queryResult: {
    success: true,
    originalQuery: 'Find all users older than 25',
    generatedQuery: 'SELECT * FROM users WHERE age > $1',
    queryType: 'SELECT',
    explanation: 'Retrieves all users where age is greater than 25',
    results: [ /* query results */ ],
    rowCount: 10,
    executionTime: 45
  }
}

Natural Language Query

Query your databases using plain English! No need to write SQL or NoSQL queries manually.

Quick Start

const { naturalQuery } = require('csv-auto-datapipe');

const result = await naturalQuery({
  query: 'Find all users older than 25',
  dbType: 'postgres',
  host: 'localhost',
  user: 'postgres',
  password: 'password',
  database: 'mydb',
  tableName: 'users',
  geminiApiKey: process.env.GEMINI_API_KEY
});

console.log(result.results); // Query results
console.log(result.generatedQuery); // Generated SQL query
console.log(result.explanation); // Human-readable explanation

Natural Language Query Features

  • Multi-Database: Works with PostgreSQL, MySQL, and MongoDB
  • Intelligent Translation: Converts natural language to optimized database queries
  • Query Types: Supports SELECT, aggregations, filtering, sorting, and more
  • Auto-Explanation: Provides clear explanation of what each query does
  • Performance Metrics: Returns execution time and row counts

Example Queries

// Simple filtering
"Find all people older than 30"
"Show users from Mumbai"

// Aggregation
"Count users by city"
"What is the average age of customers"

// Complex queries
"Find people between ages 20 and 30 living in Mumbai, sorted by age"
"Group orders by month and show total revenue"

// Pattern matching
"Find users whose name starts with A"
"Search products containing 'laptop'"

Advanced Usage with Schema Context

const { naturalQuery, getSchemaInfo } = require('csv-auto-datapipe');

// Get schema for better accuracy
const schemaInfo = await getSchemaInfo({
  dbType: 'postgres',
  host: 'localhost',
  user: 'postgres',
  password: 'password',
  database: 'mydb',
  tableName: 'users'
});

// Use schema in query
const result = await naturalQuery({
  query: 'Show me the top 5 highest paid employees',
  dbType: 'postgres',
  host: 'localhost',
  user: 'postgres',
  password: 'password',
  database: 'mydb',
  tableName: 'users',
  schemaInfo: schemaInfo,
  geminiApiKey: process.env.GEMINI_API_KEY
});

Supported Databases

PostgreSQL

await parseAndUpload({
  filePath: './data.csv',
  dbType: 'postgres',
  host: 'localhost',
  port: 5432,
  user: 'postgres',
  password: 'password',
  geminiApiKey: process.env.GEMINI_API_KEY
});

Features:

  • JSONB support for complex nested data
  • Full transaction support
  • Advanced indexing

MySQL

await parseAndUpload({
  filePath: './data.csv',
  dbType: 'mysql',
  host: 'localhost',
  port: 3306,
  user: 'root',
  password: 'password',
  geminiApiKey: process.env.GEMINI_API_KEY
});

Features:

  • JSON column type support
  • Automatic type conversion
  • Optimized for performance

MongoDB

await parseAndUpload({
  filePath: './data.csv',
  dbType: 'mongodb',
  host: 'localhost',
  port: 27017,
  geminiApiKey: process.env.GEMINI_API_KEY
});

Features:

  • Native nested document support
  • Schema validation
  • Flexible document structure

AI Schema Generation

Google Gemini intelligently analyzes your CSV headers and generates optimal database schemas.

How It Works

  1. Analyzes Headers: Examines CSV column names and patterns
  2. Identifies Relationships: Detects related fields (e.g., name.firstName, name.lastName)
  3. Decides Structure:
    • Merges name parts into single columns
    • Groups address fields into JSONB/JSON
    • Creates appropriate data types
  4. Generates Schema: Produces production-ready database schema
  5. Maps Fields: Creates field mapping for data transformation

Example Transformation

Input CSV:

name.firstName,name.lastName,age,address.line1,address.city,address.state

Generated Schema (PostgreSQL):

CREATE TABLE people (
  id SERIAL PRIMARY KEY,
  name VARCHAR(255) NOT NULL,  -- Merged: firstName + lastName
  age INTEGER,
  address JSONB                -- Grouped: {line1, city, state}
);

Field Mapping:

{
  "name.firstName": "name",
  "name.lastName": "name",
  "age": "age",
  "address.line1": "address",
  "address.city": "address",
  "address.state": "address"
}

Transformed Data:

// Original
{ "name.firstName": "Fahed", "name.lastName": "Khan", "age": 23, "address.line1": "Raushan Nagar", "address.city": "Mumbai" }

// After transformation
{ "name": "Fahed Khan", "age": 23, "address": "{\"line1\": \"Raushan Nagar\", \"city\": \"Mumbai\"}" }

API Reference

CSVParser

Constructor

const parser = new CSVParser();

Methods

parse(csvString, options)

Parses CSV string and returns array of objects.

Parameters:

  • csvString (string): The CSV data
  • options (object): Parser options
    • delimiter (string): Field delimiter (default: ,)
    • trimValues (boolean): Trim whitespace (default: true)

Returns: Array<Object>

Throws: Error if CSV is invalid or field count mismatch

parseAndUpload(options)

Parses CSV file and uploads data to database with AI-generated schema.

Parameters: See Configuration

Returns: Promise<Object> with result summary

Throws: Error if file not found, database connection fails, or schema generation fails

SchemaGenerator

Constructor

const generator = new SchemaGenerator(apiKey);

Methods

generateSchema(headers, dbType, tableName, databaseName)

Generates database schema using Gemini AI.

Parameters:

  • headers (Array): CSV headers
  • dbType (string): Database type
  • tableName (string|null): Optional table name
  • databaseName (string|null): Optional database name

Returns: Promise<Object> with schema information

QueryExecutor

Constructor

const executor = new QueryExecutor(geminiApiKey);

Methods

executeNaturalQuery(options)

Execute a natural language query against a database.

Parameters:

  • query (string): Natural language query
  • dbType (string): Database type
  • dbConnection (object): Database connection instance
  • database (string): Database name
  • tableName (string): Optional table/collection name
  • schemaInfo (object): Optional schema information

Returns: Promise<Object> with query results and metadata

naturalQuery(options)

High-level API for natural language queries. See Natural Language Query section.

getSchemaInfo(options)

Retrieve database schema information.

Parameters:

  • dbType (string): Database type
  • host (string): Database host
  • user (string): Database username
  • password (string): Database password
  • database (string): Database name
  • tableName (string): Table/collection name

Returns: Promise<Object> with schema details

DatabaseConnector

Constructor

const connector = new DatabaseConnector(dbType, connectionParams);

Methods

connect()

Connects to the database.

Returns: Promise<void>

createDatabase(databaseName)

Creates a database if it doesn't exist.

Returns: Promise<void>

createSchema(schema)

Creates table/collection based on schema.

Returns: Promise<void>

insertData(tableOrCollection, data)

Inserts data into database.

Returns: Promise<Object> with insertion count

close()

Closes database connection.

Returns: Promise<void>

Examples

Example 1: Basic CSV Parsing

const { CSVParser } = require('csv-auto-datapipe');
const fs = require('fs');

const parser = new CSVParser();
const csvContent = fs.readFileSync('./data.csv', 'utf-8');
const data = parser.parse(csvContent);

console.log(`Parsed ${data.length} rows`);
console.log(data[0]); // First row

Example 2: Natural Language Query

const { naturalQuery } = require('csv-auto-datapipe');

async function queryDatabase() {
  const result = await naturalQuery({
    query: 'Find all users from Mumbai who are older than 25',
    dbType: 'postgres',
    host: 'localhost',
    user: 'postgres',
    password: process.env.POSTGRES_PASSWORD,
    database: 'users_db',
    tableName: 'users',
    geminiApiKey: process.env.GEMINI_API_KEY
  });
  
  console.log('Generated Query:', result.generatedQuery);
  console.log('Results:', result.results);
  console.log('Explanation:', result.explanation);
}

queryDatabase();

Example 3: Parse and Upload to PostgreSQL

const { parseAndUpload } = require('csv-auto-datapipe');

async function uploadToPostgres() {
  try {
    const result = await parseAndUpload({
      filePath: './users.csv',
      dbType: 'postgres',
      host: 'localhost',
      user: 'postgres',
      password: process.env.POSTGRES_PASSWORD,
      geminiApiKey: process.env.GEMINI_API_KEY
    });
    
    console.log(`Success! Inserted ${result.rowsInserted} rows into ${result.database}.${result.table}`);
  } catch (error) {
    console.error('Upload failed:', error.message);
  }
}

uploadToPostgres();

Example 3a: Parse, Upload, and Query

const { parseAndUpload } = require('csv-auto-datapipe');

async function uploadAndQuery() {
  try {
    const result = await parseAndUpload({
      filePath: './users.csv',
      dbType: 'postgres',
      host: 'localhost',
      user: 'postgres',
      password: process.env.POSTGRES_PASSWORD,
      geminiApiKey: process.env.GEMINI_API_KEY,
      
      // Query the data immediately after upload
      query: 'Find all users from Mumbai who are older than 25'
    });
    
    console.log(`Uploaded ${result.rowsInserted} rows`);
    
    if (result.queryResult && result.queryResult.success) {
      console.log('Generated SQL:', result.queryResult.generatedQuery);
      console.log('Query Results:', result.queryResult.results);
      console.log(`Found ${result.queryResult.rowCount} matching records`);
    }
  } catch (error) {
    console.error('Error:', error.message);
  }
}

uploadAndQuery();

Example 3b: Parse and Upload to MongoDB

const { parseAndUpload } = require('csv-auto-datapipe');

async function uploadToMongo() {
  const result = await parseAndUpload({
    filePath: './products.csv',
    dbType: 'mongodb',
    host: 'localhost',
    port: 27017,
    database: 'ecommerce',
    geminiApiKey: process.env.GEMINI_API_KEY
  });
  
  console.log('Uploaded to MongoDB:', result);
}

uploadToMongo();

Example 4: Natural Language Aggregation Query

const { naturalQuery } = require('csv-auto-datapipe');

async function aggregateQuery() {
  const result = await naturalQuery({
    query: 'Count users by city and show average age for each city',
    dbType: 'postgres',
    host: 'localhost',
    user: 'postgres',
    password: process.env.POSTGRES_PASSWORD,
    database: 'users_db',
    tableName: 'users',
    geminiApiKey: process.env.GEMINI_API_KEY
  });
  
  console.log('Generated SQL:', result.generatedQuery);
  console.log('Results:', result.results);
}

aggregateQuery();

Example 4a: Upload with Aggregation Query

const { parseAndUpload } = require('csv-auto-datapipe');

async function uploadWithAggregation() {
  const result = await parseAndUpload({
    filePath: './users.csv',
    dbType: 'postgres',
    host: 'localhost',
    user: 'postgres',
    password: process.env.POSTGRES_PASSWORD,
    geminiApiKey: process.env.GEMINI_API_KEY,
    
    // Run aggregation immediately after upload
    query: 'Count users by city and show the average age for each city'
  });
  
  console.log(`Uploaded ${result.rowsInserted} rows`);
  console.log('City Statistics:', result.queryResult.results);
}

uploadWithAggregation();

Example 5: Custom Delimiter and Options

const { parseAndUpload } = require('csv-auto-datapipe');

await parseAndUpload({
  filePath: './data.tsv',
  dbType: 'postgres',
  host: 'localhost',
  user: 'postgres',
  password: 'password',
  parseOptions: {
    delimiter: '\t',  // Tab-separated
    trimValues: true
  },
  geminiApiKey: process.env.GEMINI_API_KEY
});

Example 6: Specify Database and Table Names

const { parseAndUpload } = require('csv-auto-datapipe');

await parseAndUpload({
  filePath: './orders.csv',
  dbType: 'mysql',
  host: 'localhost',
  user: 'root',
  password: 'password',
  database: 'sales_db',      // Specify database name
  tableName: 'orders_2024',  // Specify table name
  geminiApiKey: process.env.GEMINI_API_KEY
});

Example 7: Using Environment Variables

require('dotenv').config();
const { parseAndUpload } = require('csv-auto-datapipe');

await parseAndUpload({
  filePath: './data.csv',
  dbType: 'postgres',
  host: process.env.DB_HOST,
  port: parseInt(process.env.DB_PORT),
  user: process.env.DB_USER,
  password: process.env.DB_PASSWORD,
  database: process.env.DB_NAME,
  geminiApiKey: process.env.GEMINI_API_KEY
});

Environment Variables

Create a .env file:

# Gemini API
GEMINI_API_KEY=your_gemini_api_key

# PostgreSQL
POSTGRES_HOST=localhost
POSTGRES_PORT=5432
POSTGRES_USER=postgres
POSTGRES_PASSWORD=your_password

# MySQL
MYSQL_HOST=localhost
MYSQL_PORT=3306
MYSQL_USER=root
MYSQL_PASSWORD=your_password

# MongoDB
MONGODB_HOST=localhost
MONGODB_PORT=27017

Use with dotenv:

require('dotenv').config();
const { parseAndUpload } = require('csv-auto-datapipe');

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Author

Fahed Khan (@12fahed)

Acknowledgments

  • Google Gemini AI for intelligent schema generation
  • PostgreSQL, MySQL, and MongoDB communities
  • Open source contributors

Support

Contributing

We welcome contributions from the community! Whether it's bug fixes, new features, or documentation improvements, your help is appreciated.

For Contributors

  1. Read the Getting Started Guide
  2. Review Contributing Guidelines
  3. Set up your development environment
  4. Run tests: node __test__/test-entire-system.js
  5. Make your changes and submit a PR

Quick Contribution Checklist

  • Fork the repository
  • Create a feature branch
  • Make your changes with tests
  • Ensure all tests pass
  • Update documentation
  • Submit a pull request

See CONTRIBUTING.md for detailed guidelines.

License

This project is licensed under the MIT License - see the LICENSE file for details.

What this means:

  • Free to use - Use in personal and commercial projects
  • Modify - Change the code as you need
  • Distribute - Share with others
  • Private Use - Use privately without publishing
  • No Warranty - Provided "as is" without warranty

Copyright (c) 2025 Fahed Khan

Links


Made with ❤️ by Fahed Khan

About

A powerful Node.js package for parsing CSV files and automatically uploading data to databases with dynamically generating Data Specific schemas

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors