Skip to content

A console application for importing structured and semi-structured data into relational databases.

License

Notifications You must be signed in to change notification settings

Vexelior/DataForge

Repository files navigation

DataForge

A powerful command-line tool for importing structured data files (JSON, XML, CSV) into SQL Server databases with automatic schema inference and table generation.

?? Features

  • Multi-Format Support: Import data from JSON, XML, and CSV files
  • Automatic Schema Inference: Analyzes your data files and automatically generates appropriate database schemas
  • SQL Server Integration: Seamlessly connects to SQL Server instances (local or remote)
  • Windows Authentication: Supports both Windows Authentication and SQL Server Authentication
  • Hierarchical Data Handling: Intelligently converts nested JSON/XML structures into relational tables with foreign key relationships
  • Smart Server Discovery: Automatically discovers available SQL Server instances on your network
  • Interactive Wizard: User-friendly step-by-step wizard interface for configuration
  • Progress Tracking: Real-time progress bars during data import operations
  • Transaction Safety: Uses database transactions to ensure data integrity (rollback on errors)
  • Database Auto-Creation: Automatically creates the target database if it doesn't exist

?? Prerequisites

  • .NET 10.0 Runtime or SDK
  • SQL Server (LocalDB, Express, Developer, Standard, or Enterprise)
  • Windows OS (for Windows Authentication support)

?? Installation

From Source

  1. Clone the repository:
git clone https://github.com/Vexelior/DataForge.git
cd DataForge
  1. Build the project:
dotnet build
  1. Run the application:
dotnet run

Build as Executable

dotnet publish -c Release -r win-x64 --self-contained

The executable will be available in bin/Release/net10.0/win-x64/publish/

?? Usage

Quick Start

  1. Run the application:

    dotnet run
  2. Select your data file:

    • Enter the full path to your JSON, XML, or CSV file
    • The format will be automatically detected
  3. Analyze the data:

    • The tool will analyze your file and display statistics
    • Review the detected structure and record count
  4. Configure database connection:

    • Type list to discover available SQL Server instances
    • Select a server or enter a custom connection string
    • Choose authentication method (Windows or SQL Server)
    • Specify the target database name (will be created if needed)
    • Type test to verify the connection
  5. Review schema:

    • Preview the automatically inferred database schema
    • See table structures, column types, and relationships
  6. Import data:

    • Type run to start the import process
    • Monitor progress with the real-time progress bar
    • Wait for completion confirmation

Navigation Commands

  • next - Move to the next step
  • back - Return to the previous step
  • exit - Exit the application
  • list - Discover SQL Server instances (Database step)
  • test - Test database connection (Database step)
  • reset - Reset database configuration (Database step)
  • run - Execute data import (Import step)

Example Workflow

=== Select Source File ===
Enter path to file:
> C:\data\customers.json

=== Analyze Data ===
Analyzing file...
? Found 1,250 records

=== Database Configuration ===
Type 'list' to show common SQL Server options
> list

Common SQL Server connection options:
  1. localhost\SQLEXPRESS
  2. (localdb)\MSSQLLocalDB

> 1

Detected local SQL Server instance. Use Windows Authentication? (y/n):
> y

Enter database name (will be created if it doesn't exist):
> CustomerDB

? Database 'CustomerDB' created successfully!

=== Schema Preview ===
Table: customers
  Columns:
    - id (INT)
    - name (NVARCHAR(255))
    - email (NVARCHAR(255))
    - created_date (DATETIME)

=== Import Data ===
Type 'run' to begin the import.
> run

Starting data import...
Creating database schema...
Importing data...
[????????????????????] 100% (1,250/1,250)
Import completed successfully!

?? Supported File Formats

JSON

  • Single objects or arrays of objects
  • Nested objects (converted to related tables)
  • Arrays of primitives and objects

XML

  • Element-based structures
  • Nested elements (converted to related tables)
  • Attributes mapped to columns

CSV

  • Headers in the first row
  • Automatic type detection
  • Handles quoted values and delimiters

??? Architecture

The application is built using a wizard-based architecture with the following key components:

Core Components

  • WizardEngine: Orchestrates the step-by-step workflow
  • WizardStep: Base class for all wizard steps with validation and state management
  • IWizardStateStore: Handles persistence of wizard state

Wizard Steps

  1. SelectFileStep: File selection and format detection
  2. AnalyzeStep: Data analysis and statistics
  3. DatabaseStep: Database connection configuration
  4. SchemaStep: Schema inference and preview
  5. ImportExecutionStep: Data import execution

Schema Components

  • IDataAnalyzer: Interface for analyzing data files
  • IDataImporter: Interface for importing data to SQL Server
  • JsonSchemaInferer: Infers database schema from JSON
  • HierarchicalJsonAnalyzer: Handles nested JSON structures
  • TableDefinition: Represents database table structure
  • ColumnDefinition: Represents column metadata

Utilities

  • SqlServerDiscovery: Discovers SQL Server instances on the network
  • FileFormatDetector: Detects file format from extension and content
  • ProgressBar: Console-based progress visualization
  • ImportProgressTracker: Tracks import progress

?? Technical Details

Dependencies

  • Microsoft.Data.SqlClient (6.1.3): Modern SQL Server client library
  • System.Data.SqlClient (4.8.6): Legacy SQL Server support

Schema Inference

DataForge uses intelligent type inference to determine appropriate SQL Server data types:

  • Numeric Detection: INT, BIGINT, DECIMAL
  • Date/Time Detection: DATETIME, DATE
  • Boolean Detection: BIT
  • Text: NVARCHAR with appropriate lengths
  • Binary: VARBINARY for Base64 data

Hierarchical Data Handling

For nested JSON/XML structures:

  1. Parent table is created with a primary key
  2. Child tables are created with foreign keys
  3. Relationships are automatically established
  4. Cascading references ensure referential integrity

?? Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Development Setup

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

?? License

This project is open source and available under the MIT License.

?? Author

Vexelior

?? Acknowledgments

  • Built with .NET 10.0
  • Uses Microsoft.Data.SqlClient for SQL Server connectivity
  • Inspired by the need for quick data import tools in development workflows

?? Support

If you encounter any issues or have questions:

  1. Check existing Issues
  2. Create a new issue with detailed information
  3. Include sample data files (if possible) to reproduce the problem

??? Roadmap

Future enhancements under consideration:

  • Support for PostgreSQL and MySQL
  • Custom schema mapping configuration
  • Batch import from multiple files
  • Export functionality (database to file)
  • GUI version
  • Docker support
  • Configuration file support for automation
  • Advanced type inference options
  • Data transformation rules

Built with ?? using .NET 10.0

About

A console application for importing structured and semi-structured data into relational databases.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages