Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix unicsv reader truncation differences on i386/debian #1201

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

tsteven4
Copy link
Collaborator

with 1.9.0 debian had some test failures on i386.
in the unicsv test with reference unicsv_subsecond.csv
13:20:20.060 was echoed as 13:20:20.059.
13:20:21.600 was echoed as 13:20:21.599.

The same bug also caused a failure in the track test with reference utm_subsecond_track~csv.csv.

These variations aren't serious, but this PR uses an integer arithmetic conversion instead of a floating point arithmetic to avoid any variation in the intentional truncation to milliseconds.

I also see an error in the nmea test with reference nmeadatetime.csv. This is a rounding difference. It is not fixed.
I also see an error in the xcsv test with reference datetime~xcsv.xcsv. This is a Y2038 bug. It is not fixed.

with 1.9.0 debian had some test failures on i386.
in the unicsv test with reference unicsv_subsecond.csv
13:20:20.060 was echoed as 13:20:20.059.
13:20:21.600 was echoed as 13:20:21.599.

These variations aren't serious, but this PR uses an integer arithmetic
conversion instead of a floating point arithmetic to avoid any variation
in the intentional truncation to milliseconds.
@codacy-production
Copy link

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation Diff coverage
+0.00% 100.00%
Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (c95e1cf) 23128 16018 69.26%
Head commit (e5b3093) 23130 (+2) 16020 (+2) 69.26% (+0.00%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#1201) 4 4 100.00%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings    Change summary preferences

@@ -297,14 +296,17 @@ UnicsvFormat::unicsv_parse_time(const char* str, QDate& date)
str += consumed;
date = ldate;
}
int ct = sscanf(str, "%d%*1[.://]%d%*1[.://]%d%lf", &hour, &min, &sec, &frac_sec);
char msecstr[4];
int ct = sscanf(str, "%d%*1[.://]%d%*1[.://]%d.%3[0123456789]", &hour, &min, &sec, msecstr);
Copy link
Collaborator

@robertlipe robertlipe Oct 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you're tweaking that's there, but this passes testo:

@@ -310,6 +311,11 @@ UnicsvFormat::unicsv_parse_time(const char* str, QDate& date)
   }

   QTime result{hour, min, sec, msec};
+#if 1
+  QTime qt = QTime::fromString(QString(str).section(" ", 0, 0).trimmed(), Qt::ISODateWithMs);
+  qDebug() << result << qt;
+  assert(result == qt);
+#endif
   if (!result.isValid()) {
     fatal(FatalMsg() << MYNAME << ": Invalid time parsed from string (" << str << ").");
   }
 
 
 Which makes this a drop-in replacement for the original:
 diff --git a/unicsv.cc b/unicsv.cc
index 92ca8504..5a24b85f 100644
--- a/unicsv.cc
+++ b/unicsv.cc
@@ -282,12 +282,14 @@ UnicsvFormat::unicsv_parse_date(const char* str, int* consumed)
 QTime
 UnicsvFormat::unicsv_parse_time(const char* str, QDate& date)
 {
+#if 0
   int hour;
   int min;
   int sec;
   int msec;
-  int consumed = 0;
   double frac_sec;
+#endif
+  int consumed = 0;
 
   /* If we have something we're pretty sure is a date, parse that
    * first, skip over it, and pass that back to the caller)
@@ -297,6 +299,8 @@ UnicsvFormat::unicsv_parse_time(const char* str, QDate& date)
     str += consumed;
     date = ldate;
   }
+#if 0
+ qDebug() << str;
   int ct = sscanf(str, "%d%*1[.://]%d%*1[.://]%d%lf", &hour, &min, &sec, &frac_sec);
   if (ct < 3) {
     fatal(FatalMsg() << MYNAME << ": Could not parse time string (" << str << ").");
@@ -310,6 +314,11 @@ UnicsvFormat::unicsv_parse_time(const char* str, QDate& date)
   }
 
   QTime result{hour, min, sec, msec};
+#else
+  QTime result;
+
+  result = QTime::fromString(QString(str).trimmed().section(" ", 0, 0), Qt::ISODateWithMs);
+#endif
   if (!result.isValid()) {
     fatal(FatalMsg() << MYNAME << ": Invalid time parsed from string (" << str << ").");
   }

@robertlipe
Copy link
Collaborator

robertlipe commented Oct 28, 2023 via email

@tsteven4
Copy link
Collaborator Author

It sure is cleaner, but we have been burned with QTime::fromString before, but apparently not from Qt::ISODateWithMs although I thought I had tried that back then.

gpsbabel/nmea.cc

Lines 351 to 355 in ae4ac82

QTime NmeaFormat::nmea_parse_hms(const QString& str)
{
// QTime::fromString z expects 1 to 3 digit fractional part of seconds.
// It specifically does not accept 0 digits or > 3 digits.
// QTime::fromString zzz expects exactly 3 digits representing milliseconds.

Does section(" ", 0, 0) do anything?

I do see test changes, I added the last one "22,33.784805,-117.474327,"WPT022",2014/09/17,23:59:59.999999999999". So it appears QTime::fromString may be rounding, but is smart enough not to round to an illegal QTime.

Running unicsv.test
--- ./reference/unicsv_subsec~csv.csv   2023-10-28 05:45:52.092839226 -0600
+++ /tmp/gpsbabel.2699/unicsv_subsec.csv        2023-10-28 05:52:58.487210034 -0600
@@ -17,7 +17,8 @@
 16,33.784805,-117.474327,"WPT016",2014/09/17,13:20:23
 17,33.784805,-117.474327,"WPT017",2014/09/17,13:20:23.690
 18,33.784805,-117.474327,"WPT018",2014/09/17,13:20:23.699
-19,33.784805,-117.474327,"WPT019",2014/09/17,13:20:23.699
-20,33.784805,-117.474327,"WPT020",2014/09/17,13:20:23.699
-21,33.784805,-117.474327,"WPT021",2014/09/17,13:20:23.699
-22,33.784805,-117.474327,"WPT022",2014/09/17,13:20:23.699
+19,33.784805,-117.474327,"WPT019",2014/09/17,13:20:23.700
+20,33.784805,-117.474327,"WPT020",2014/09/17,13:20:23.700
+21,33.784805,-117.474327,"WPT021",2014/09/17,13:20:23.700
+22,33.784805,-117.474327,"WPT022",2014/09/17,13:20:23.700
+23,33.784805,-117.474327,"WPT022",2014/09/17,23:59:59.999

@tsteven4
Copy link
Collaborator Author

QTime::fromString is quite complicated, rounding, watching for rounding up to cause an invalid QTime, potentially limiting at 23:59:59.999 or rolling over into the next day.
https://github.com/qt/qtbase/blob/6.2.4/src/corelib/time/qdatetime.cpp#L2182-L2211

@tsteven4
Copy link
Collaborator Author

@robertlipe some questions:

The unicsv date and time parsing came in with e3f54a668, and has a tortured 24 year history of rounding problems.

  1. Isn't the format specifier %1[-.//] the same as %1[-./]? (unicsv_parse_date) We are not escaping a backslash.
  2. Isn't the format specifier %1[.://] the same as %1[.:/]? (unicsv_parse_time) We are not escaping a backslash.
  3. I can imagine a date using a '-', '.', '/' as a separator, e.g. 10.11.2022 or 10-11-2022 or 10/11/2022, so those three separators make some sense. But I have a harder time imaging a time using '.' or '/' as a separator, e.g. 12.10.55 or 12/10/55, wouldn't everyone use 12:10:55 if they used a separator?

@tsteven4
Copy link
Collaborator Author

https://www.ibm.com/docs/en/i/7.3?topic=80-timsep-time-separator-keyword-display-files suggests time separators of colon, period, comma or blank.

@tsteven4
Copy link
Collaborator Author

https://docs.oracle.com/cd/E19455-01/806-0169/overview-6/index.html suggests time separators of a colon or period.

@GPSBabelDeveloper
Copy link
Collaborator

GPSBabelDeveloper commented Oct 28, 2023 via email

@robertlipe
Copy link
Collaborator

robertlipe commented Oct 29, 2023

I"m not sure about #1 vs #2, but I agree that the // is bogus. For #3, I've never seen periods used, but the original submission (from Olaf, in Germany) said they were used there. Are they common? I have no idea. Quoting the dot may be necessary to stop it from matching any single character.

I was experimenting with letting Bard write them - since I hate writing regexes AND reading them - it produced an overkill solution that's a bit less clever and therefore, IMO, a bit easier to read:
https://g.co/bard/share/3c91a625a6ba
I like that it gives examples of wht it will and will not read. It almost writes our test cases.
This is a bit more on parity with our current parser:
https://g.co/bard/share/95e0b1fc56a6
Even if we don't use that, it clarifies that [.:] (no escaping on dot needed after all) is fine.

But I really don't want to gum this up.

@tsteven4
Copy link
Collaborator Author

Are our wires crossed? How did we get from a format string to a regular expression? Bard is interesting, I have used other online regular expression checkers when I have to go there.

@tsteven4
Copy link
Collaborator Author

After looking at the Qt QDateTime/QDate/QTime::fromString code I don't feel as bad about the amount of thrashing we have done trying to get these conversions correct.

@robertlipe
Copy link
Collaborator

robertlipe commented Oct 29, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants