Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use smaller integer field types for GTFS entity classes #1273

Merged
merged 2 commits into from
Oct 27, 2022

Conversation

aababilov
Copy link
Collaborator

@aababilov aababilov commented Oct 14, 2022

GTFS enums usually fit into one byte, so we don't need an int. GTFS tables usually less than 16 fields, so we need a byte or a short int for storing a bitmask of assigned fields.

Smaller fields allow us to save memory for the largest table - stop_times.txt that may have 30 M lines.

Instead of:

class GtfsStopTime {
  // ...
  private int pickupType;
  private int dropOffType;
  private int continuousPickup;
  private int continuousDropOff;
  private int timepoint;
  private int bitField0_;
}

we have now:

class GtfsStopTime {
  // ...
  private byte pickupType;
  private byte dropOffType;
  private byte continuousPickup;
  private byte continuousDropOff;
  private byte timepoint;
  private short bitField0_;
}

which is 5 * (4 - 1) + 2 = 17 bytes smaller. Java aligns classes by 8 bytes, so we actually save 16 bytes per line.

Total save: 0.5 GiB for 30 M lines in stop_times.txt

@CLAassistant
Copy link

CLAassistant commented Oct 14, 2022

CLA assistant check
All committers have signed the CLA.

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@isabelle-dr
Copy link
Contributor

👋 Welcome back!

Copy link
Collaborator

@asvechnikov2 asvechnikov2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the tests!

@f8full
Copy link
Contributor

f8full commented Oct 21, 2022

Nice

GTFS enums usually fit into one byte, so we don't need an int. GTFS
tables usually less than 16 fields, so we need a byte or a short int for
storing a bitmask of assigned fields.

Smaller fields allow us to save memory for the largest table -
stop_times.txt that may have 30 M lines.

Instead of:

class GtfsStopTime {
  // ...
  private int pickupType;
  private int dropOffType;
  private int continuousPickup;
  private int continuousDropOff;
  private int timepoint;
  private int bitField0_;
}

we have now:

class GtfsStopTime {
  // ...
  private byte pickupType;
  private byte dropOffType;
  private byte continuousPickup;
  private byte continuousDropOff;
  private byte timepoint;
  private short bitField0_;
}

which is 5 * (4 - 1) + 2 = 17 bytes smaller. Java aligns classes
by 8 bytes, so we actually save 16 bytes per line.

Total save: 0.5 GiB for 30 M lines in stop_times.txt
@aababilov
Copy link
Collaborator Author

Nice

Thanks!

@aababilov
Copy link
Collaborator Author

wave Welcome back!

I missed GTFS Validator :)

Copy link
Collaborator

@asvechnikov2 asvechnikov2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks!

@aababilov aababilov merged commit bd68666 into MobilityData:master Oct 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants