Skip to content

airbreather/Cursively

develop
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
doc
 
 
src
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Cursively

A fast, RFC 4180-conforming CSV reading library for .NET. Written in C#.

License CI (AppVeyor) NuGet MyGet (pre-release)
License CI NuGet MyGet

Documentation

Documentation is currently being published as GitHub Pages.

Usage

Create a subclass of CsvReaderVisitorBase (or one of its own built-in subclasses) with your own logic for processing the individual elements in order. Then, you have some options.

Example Visitor

public sealed class MyVisitor : CsvReaderVisitorBase
{
    private readonly Decoder _utf8Decoder = Encoding.UTF8.GetDecoder();

    private readonly char[] _buffer;

    private int _bufferConsumed;

    public MyVisitor(int maxFieldLength) =>
        _buffer = new char[maxFieldLength];

    public override void VisitPartialFieldContents(ReadOnlySpan<byte> chunk) =>
        VisitFieldContents(chunk, flush: false);

    public override void VisitEndOfField(ReadOnlySpan<byte> chunk) =>
        VisitFieldContents(chunk, flush: true);

    public override void VisitEndOfRecord() =>
        Console.WriteLine("End of fields for this record.");

    private void VisitFieldContents(ReadOnlySpan<byte> chunk, bool flush)
    {
        int charCount = _utf8Decoder.GetCharCount(chunk, flush);
        if (charCount + _bufferConsumed <= _buffer.Length)
        {
            _utf8Decoder.GetChars(chunk, new Span<char>(_buffer, _bufferConsumed, charCount), flush);
            _bufferConsumed += charCount;
        }
        else
        {
            throw new InvalidDataException($"Field is longer than {_buffer.Length} characters.");
        }

        if (flush)
        {
            Console.Write("Field: ");
            Console.WriteLine(_buffer, 0, _bufferConsumed);
            _bufferConsumed = 0;
        }
    }
}

Fastest

All of the other methods of processing the data are built on top of this, so it gives you the most control:

  1. Create a new instance of your visitor.
  2. Create a new instance of CsvTokenizer.
  3. Call CsvTokenizer.ProcessNextChunk for each chunk of the file.
  4. Call CsvTokenizer.ProcessEndOfStream after the last chunk of the file.

Example:

public static void ProcessCsvFile(string csvFilePath)
{
    var myVisitor = new MyVisitor(maxFieldLength: 1000);
    var tokenizer = new CsvTokenizer();
    using (var file = File.OpenRead(csvFilePath))
    {
        Console.WriteLine($"Started reading '{csvFilePath}'.");
        Span<byte> fileReadBuffer = new byte[4096];
        while (true)
        {
            int count = file.Read(fileReadBuffer);
            if (count == 0)
            {
                break;
            }

            var chunk = fileReadBuffer.Slice(0, count);
            tokenizer.ProcessNextChunk(chunk, myVisitor);
        }

        tokenizer.ProcessEndOfStream(myVisitor);
    }

    Console.WriteLine($"Finished reading '{csvFilePath}'.");
}

Simpler

  1. Create a new instance of your visitor.
  2. Use one of the CsvSyncInput or CsvAsyncInput methods to create an input object you can use to describe the data to your visitor.

Examples:

public static void ProcessCsvFile(string csvFilePath)
{
    Console.WriteLine($"Started reading '{csvFilePath}'.");
    CsvSyncInput.ForMemoryMappedFile(csvFilePath)
                .Process(new MyVisitor(maxFieldLength: 1000));
    Console.WriteLine($"Finished reading '{csvFilePath}'.");
}

public static void ProcessCsvStream(Stream csvStream)
{
    Console.WriteLine($"Started reading CSV file.");
    CsvSyncInput.ForStream(csvStream)
                .Process(new MyVisitor(maxFieldLength: 1000));
    Console.WriteLine($"Finished reading CSV file.");
}

public static async Task ProcessCsvStreamAsync(Stream csvStream)
{
    Console.WriteLine($"Started reading CSV file.");
    await CsvAsyncInput.ForStream(csvStream)
                       .ProcessAsync(new MyVisitor(maxFieldLength: 1000));
    Console.WriteLine($"Finished reading CSV file.");
}