CSV Parser Project Documentation

Overview

This project is designed to parse CSV files containing taxi trip data, identify and handle duplicate records, and bulk insert the valid records into an MS SQL database. The project leverages dependency injection for service management and follows best practices for configuration and error handling.

The Database

The database was configured as an RDS SQL Server on Amazon!

Project Structure

Common: Contains common utilities and the DI (Dependency Injection) extension.
Helpers: Provides helper methods for CSV parsing.
Interfaces: Defines the interfaces for services used in the project.
Models: Contains the data models.
Services: Implements the services for CSV parsing, writing, and database operations.

Dependencies

CsvHelper
Microsoft.Extensions.Configuration
Microsoft.Extensions.DependencyInjection
System.Data.SqlClient

Configuration

The project expects a configuration file appsettings.json with the following structure:

{
  "environmentVariables": {
    "ConnectionString": "your-database-connection-string",
    "csvUrl": "path-to-your-csv-file",
    "duplicateFilePath": "path-to-save-duplicates.csv"
  }
}

SQL Table Creation

The table was created by this command:

CREATE TABLE TaxiTrips (
    Id INT IDENTITY(1,1) PRIMARY KEY,
    TpepPickupDatetime DATETIME NOT NULL,
    TpepDropoffDatetime DATETIME NOT NULL,
    PassengerCount INT NOT NULL,
    TripDistance DECIMAL(10, 2) NOT NULL,
    StoreAndFwdFlag NVARCHAR(3) NOT NULL,
    PULocationID INT NOT NULL,
    DOLocationID INT NOT NULL,
    FareAmount DECIMAL(10, 2) NOT NULL,
    TipAmount DECIMAL(10, 2) NOT NULL
);

Running the Project

Ensure appsettings.json is properly configured.
Place a csv file at bin → Debug → .NET8.
I will send a connectionString in a mail.
Build the project. (or use a docker)
Run the project.

Performance

If we had to work with 10GB file, for really large datasets, we might want to look into distributed processing frameworks like Apache Spark or Hadoop. These tools are designed to handle big data and can process large files more efficiently than a single machine.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
csv-parser		csv-parser
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CSV Parser Project Documentation

Overview

The Database

Project Structure

Dependencies

Configuration

SQL Table Creation

Running the Project

Performance

About

Releases

Packages

Languages

AlwaysWannaFly21/csv-parser-test

Folders and files

Latest commit

History

Repository files navigation

CSV Parser Project Documentation

Overview

The Database

Project Structure

Dependencies

Configuration

SQL Table Creation

Running the Project

Performance

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages