-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use smaller integer field types for GTFS entity classes #1273
Conversation
|
👋 Welcome back! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the tests!
...or/src/main/java/org/mobilitydata/gtfsvalidator/processor/EntityImplementationGenerator.java
Outdated
Show resolved
Hide resolved
.../tests/src/test/java/org/mobilitydata/gtfsvalidator/processor/tests/EnumSizesSchemaTest.java
Outdated
Show resolved
Hide resolved
...sor/tests/src/main/java/org/mobilitydata/gtfsvalidator/processor/tests/ManyFieldsSchema.java
Show resolved
Hide resolved
Nice |
GTFS enums usually fit into one byte, so we don't need an int. GTFS tables usually less than 16 fields, so we need a byte or a short int for storing a bitmask of assigned fields. Smaller fields allow us to save memory for the largest table - stop_times.txt that may have 30 M lines. Instead of: class GtfsStopTime { // ... private int pickupType; private int dropOffType; private int continuousPickup; private int continuousDropOff; private int timepoint; private int bitField0_; } we have now: class GtfsStopTime { // ... private byte pickupType; private byte dropOffType; private byte continuousPickup; private byte continuousDropOff; private byte timepoint; private short bitField0_; } which is 5 * (4 - 1) + 2 = 17 bytes smaller. Java aligns classes by 8 bytes, so we actually save 16 bytes per line. Total save: 0.5 GiB for 30 M lines in stop_times.txt
4fd9ac8
to
2168d70
Compare
ed5c615
to
d441c66
Compare
Thanks! |
I missed GTFS Validator :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks!
GTFS enums usually fit into one byte, so we don't need an int. GTFS tables usually less than 16 fields, so we need a byte or a short int for storing a bitmask of assigned fields.
Smaller fields allow us to save memory for the largest table - stop_times.txt that may have 30 M lines.
Instead of:
we have now:
which is 5 * (4 - 1) + 2 = 17 bytes smaller. Java aligns classes by 8 bytes, so we actually save 16 bytes per line.
Total save: 0.5 GiB for 30 M lines in stop_times.txt