Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added the flexibility to specify DateTimeStyle in CsvDataReaderOptions. #254

Closed
wants to merge 0 commits into from

Conversation

JaseRandall
Copy link

@JaseRandall JaseRandall commented Jun 8, 2024

I had a csv file containing date time values which were in UTC format, but the format did not contain the appropriate notations within the date time string specifying it was in UTC time format.

Datetime string is automatically parsed as a local time unless the DateTimeStyles is set as DateTimeStyles.AssumeUniversal, a change needed to be made because the solution was using a hardcoded style other than what I needed (i.e. was hard coded as DateTimeStyles.RoundtripKind).

i.e. needed to be able to do this:
var dateTimeString = "2020-01-01 00:00:00";
var format = "yyyy-MM-dd hh:mm:ss";
DateTime value;
DateTime.TryParseExact(dateTimeString, format, opts.Culture, DateTimeStyles.AssumeUniversal, out value);

So:

In CsvDataReaderOptions, I added the flexibility to specify DateTimeStyles.
In CsvDataReader, I added a DateTimeStyles property set with a default value of DateTimeStyles.RoundtripKind as that was the value previously hardcoded in the solution.
In CsvDataReaderTests, I added regression test Date4 to ensure the change worked correctly.

Hopefully the change and test are acceptable for a merge, if not, please let me know what further is required.

@MarkPflug
Copy link
Owner

There is also GetDate and GetTime in the CsvDataReader.net6.cs file. Do those need similar treatment? Maybe not. I'm actually not sure.

@JaseRandall
Copy link
Author

I haven't used net6 before but on face value, it appears they do however, I'm not sure how best to make the change as there is an inconsistency between CsvDataReader.cs and CsvDataReader.net6.cs.

CsvDataReader.cs uses DateTimeStyles.RoundtripKind by default.
CsvDataReader.net6.cs use DateTimeStyles.None by default.

In my pull request, I left the default in CsvDataReader.cs as DateTimeStyles.RoundtripKind, however the default setting probably should be DateTimeStyles.None; but to change that may be undesirable as it would be a breaking change?

In CsvDataReaderOptions where I set the default for DateTimeStyle, I can change the code to have the different default for NET6_0_OR_GREATER:

public DateTimeStyles DateTimeStyle { get { return _dateTimeStyle; } set { _dateTimeStyle = value; } }

#if NET6_0_OR_GREATER
DateTimeStyles _dateTimeStyle = DateTimeStyles.None;
#else
DateTimeStyles _dateTimeStyle = DateTimeStyles.RoundtripKind;
#endif

I think I've done this correctly and it seems to pass the test (and fail when I play around with it to try and make it fail) but this is outside of experience level so I'm not certain.

Please let me know how best to proceed with the default values and I can (try to) commit a change to this pull request.
(Again, outside of my experience level - I'm usually a solo citizen/hobby developer rather than someone who does this for a living.)

p.s.

I'll have another pull request coming for additional changes; I had a couple of use cases where the headers are not located on the first row because there was metadata and notes on the rows prior to the header row. I've written changes that allow 1. the header row number to be defined OR 2. the header row string to be defined. The code finds and changes the buffer index position to correspond to the row number by counting \r\n OR it matches the headerRowString against the chunks of chr between sets of \r\n.

I just need to write the tests before I can submit (which might not happen for a week). I'm thinking it best to keep it separate to this pull request? (More so letting you know in case you want to hold off pushing a new release to GitHub NuGet.)

@MarkPflug
Copy link
Owner

I just did a quick experiment, and it looks like the style for DateOnly and TimeOnly parsing is pretty limited:
The only allowed values for the styles are AllowWhiteSpaces, AllowTrailingWhite, AllowLeadingWhite, and AllowInnerWhite. I think leaving those as None is fine. I'll do a full review of this PR and merge it in the next few days. Thanks.

@MarkPflug
Copy link
Owner

MarkPflug commented Jun 28, 2024

@JaseRandall Looking at this again, I'm a bit reluctant to add this feature. I guess I'm worried that I'll end up adding similar options for specifying number format styles, and I don't want to proliferate a ton of options if there are potential workarounds.

So, as a workaround, can you please consider this code:

using Sylvan.Data;
using Sylvan.Data.Csv;
using System.Data.Common;

const string Data =
    """
    a,b
    1,2022-01-13T12:33:45.1234567
    2,2022-01-13T12:33:45.1234567-08:00
    3,2022-01-13T12:33:45.1234567Z
    """;

DbDataReader r = CsvDataReader.Create(new StringReader(Data));

r = new DateTimeAdapter(r);

while (r.Read())
{
    var id = r.GetInt32(0);
    var x = r.GetDateTime(1);
    Console.WriteLine($"{id} {x} {x.Kind}");
}

class DateTimeAdapter : DataReaderAdapter // <-- DataReaderAdapater is from Sylvan.Data nupkg
{
    public DateTimeAdapter(DbDataReader dr) : base(dr)
    {
    }

    public override DateTime GetDateTime(int ordinal)
    {
        var value = this.Reader.GetDateTime(ordinal);
        if(value.Kind == DateTimeKind.Unspecified)
        {
            value = new DateTime(value.Ticks, DateTimeKind.Utc);
        }
        return value;
    }
}

Basically, you can wrap the CsvDataReader with a custom data reader that adjusts your DateTime values as you desire. Would this be an adequate solution for your problem?

@JaseRandall
Copy link
Author

JaseRandall commented Jun 30, 2024

@MarkPflug I agree with keeping the package streamlined rather than catering to different number format options, especially if there are workaround.

That workaround would be perfect except I'm using a CsvSchema:

using Sylvan.Data;
using Sylvan.Data.Csv;
using System.Data.Common;

const string Data =
    """
    a,b
    1,2022-01-13T12:33:45.1234567
    2,2022-01-13T12:33:45.1234567-08:00
    3,2022-01-13T12:33:45.1234567Z
    """;

var schema = new CsvSchema(Schema.Parse("a:int,b:datetime"));
var options = new CsvDataReaderOptions { Schema = schema, HasHeaders = true };

DbDataReader r = CsvDataReader.Create(new StringReader(Data), options);

r = new DateTimeAdapter(r);

while (r.Read())
{
    var id = r.GetInt32(0);
    var x = r.GetDateTime(1);
    var y = r[1];
    Console.WriteLine($"{id} {x} {x.Kind}");
    Console.WriteLine($"{id} {y} {((DateTime)y).Kind}");
}

class DateTimeAdapter : DataReaderAdapter // <-- DataReaderAdapater is from Sylvan.Data nupkg
{
    public DateTimeAdapter(DbDataReader dr) : base(dr)
    {
    }

    public override DateTime GetDateTime(int ordinal)
    {
        var value = this.Reader.GetDateTime(ordinal);
        if(value.Kind == DateTimeKind.Unspecified)
        {
            value = new DateTime(value.Ticks, DateTimeKind.Utc);
        }
        return value;
    }
}

So, I still get this:
1 13/01/2022 12:33:45 PM Utc
1 13/01/2022 12:33:45 PM Unspecified

I have a potential workout for this issue at https://github.com/JaseRandall/Sylvan/tree/Proposed-Change-to-allow-Providing-a-CustomAccessor-in-the-CsvDataAccessor-class. I'll submit this as an issue, and you can decide if you want to include idea before I submit a pull request.

@MarkPflug
Copy link
Owner

Oh, I think you've pointed about a bug with the DataReaderAdapter.

Currently the indexers are implemented to delegate to the inner/wrapped datareader:

	/// <inheritdoc/>
	public override object this[int ordinal] => dr[ordinal];

	/// <inheritdoc/>
	public override object this[string name] => dr[name];

These need to be fixed to be:

	/// <inheritdoc/>
	public override object this[int ordinal] => this.GetValue(ordinal);

	/// <inheritdoc/>
	public override object this[string name] => this.GetValue(this.GetOrdinal(name));

I will make this fix, which should allow this to work correctly. Until then, you can override these and fix it yourself, by adding this override to the adapter:

    public override object this[int ordinal]
    {
        get
        {
            var o = base[ordinal];
            if (o is DateTime dt)
            {
                return new DateTime(dt.Ticks, DateTimeKind.Utc);
            }
            return o;
        }
    }

Maybe not the most elegant solution, but it should allow things to work until I fix the base implementations, at which point you can delete this override (or leave it, at it will be harmless).

@MarkPflug
Copy link
Owner

Nevermind. Not sure what I was thinking, you can just provide the same override:

public override object this[int ordinal] => this.GetValue(ordinal);

I'll still make this change to the base implementation though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants