Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure that there are at least 7 contiguous days during which all feeds are valid #4

Closed
antrim opened this issue Aug 17, 2017 · 26 comments

Comments

@antrim
Copy link
Collaborator

antrim commented Aug 17, 2017

For each new GTFS dataset ensure that there are at least 7 contiguous days during which all feeds are valid (are between feed start and end dates) (feed can both be valid and have no service for period in question) [Trillium is the logical lead on this activity]
Possible tactics include:

  • Attention to timing of GTFS feed capture
  • Changes to Trillium start/end date feed practices
  • Requests to non-Trillium feed producers to modify start/end practices
  • Forced editing of valid feed range (not-so-desirable)
  • Other…?

This is particularly useful for GTFS-ride. i.e. It is particularly useful to know if past service dates were valid.

@antrim
Copy link
Collaborator Author

antrim commented Aug 17, 2017

@pouyalireza Can you close or delete this?

@BenFields22
Copy link
Collaborator

This task is addressed in the newly added feed duration visualizer.

@ODOT-RPTD-mb
Copy link
Collaborator

This is still a Trillium process issue. How does Trillium trigger a build so that looking back seven days all feeds are valid. We should consider this task complete, when we have a new db in place with 7+ days of contiguous validity, and Trillium understands the issues associated with making this happen. Speed is an issue for ODOT on this issue; we can use a "solid" snapshot of the network to build some report documents. This db build should include the P&R update.

@PPaulsonOregonDOT
Copy link
Collaborator

This may be obvious, but I think that because it needs another db build, this needs to be preceeded by an answer to #16 related to evaluating where the tool will be hosted moving forward. @BenFields724, @pouyalireza, do we have an answer to that question? I think that this may also require, us to not include services that don't run year round, because those significantly shrink our window. @ed-g, in looking at the overlap of service, it looks like feed_end_date could be adjusted to extend this window, while keeping end_date in calendar.txt the same, which I would hope would allow those short term services to continue showing up correctly in google transit, etc.

@antrim
Copy link
Collaborator Author

antrim commented Aug 31, 2017

Trillium should proceed "with an eye to an ongoing, perhaps quarterly build process by Trillium." (via M Barnes, @ODOT-RPTD-mb).

cc: @ed-g

Can someone please assign this to @antrim or @ed-g ? I do not have access to make assignments on this repo.

@antrim
Copy link
Collaborator Author

antrim commented Aug 31, 2017

cc: @thomastrillium

Here is what the Spec says about feed_info.feed_start_date and feed_info.feed_end_date:

Field Name Required Details
feed_start_date Optional The feed provides complete and reliable schedule information for service in the period from the beginning of the feed_start_date day to the end of the feed_end_date day. Both days are given as dates in YYYYMMDD format as for calendar.txt, or left empty if unavailable. The feed_end_date date must not precede the feed_start_date date if both are given. Feed providers are encouraged to give schedule data outside this period to advise of likely future service, but feed consumers should treat it mindful of its non-authoritative status. If feed_start_date or feed_end_date extend beyond the active calendar dates defined in calendar.txt and calendar_dates.txt, the feed is making an explicit assertion that there is no service for dates within the feed_start_date or feed_end_date range but not included in the active calendar dates.
feed_end_date Optional (see above)

(https://github.com/google/transit/blob/master/gtfs/spec/en/reference.md#feed_infotxt)

GTFS Best Practices says:

One GTFS dataset should contain current and upcoming service (sometimes called a “merged” dataset). Google transitfeed tool's merge function can be used to create a merged dataset from two different GTFS feeds.

  • At any time, the published GTFS dataset should be valid for at least the next 7 days, and ideally for as long as the operator is confident that the schedule will continue to be operated.
  • If possible, the GTFS dataset should cover at least the next 30 days of service.

(http://gtfs.org/best-practices/#publishing_4)

Proposed next steps:

  • Look at TNA software and use case: If the Spec and Best Practices are followed, does that give all the data that is necessary for this use case? If not, do we need to propose new data practices?
  • Look at Trillium data practices: What are we providing in the way of feed_start_date and feed_end_date? Do we need to change anything?
  • Trillium changes data practices if necessary: This might include changing our practices around feed_end_date. We could change practice around querying Oregon agencies for upcoming service changes…
  • Trillium changes data practices, if necessary
  • Trillium suggests that other (non-ODOT published GTFS) transit agencies should conform with practice, if necessary
  • We look at/for specific examples where it is not possible to get 7 contiguous days and figure how how to make surgical or systematic changes/improvements.

@ODOT-RPTD-mb
Copy link
Collaborator

First attempt to capture various perspectives on GTFS feed start and end date:

image

@ed-g
Copy link
Collaborator

ed-g commented Sep 14, 2017

Needed for reporting document. Was hoping to have it in place by 15th! But in next week can survive that.

@ed-g ed-g self-assigned this Sep 16, 2017
@ed-g
Copy link
Collaborator

ed-g commented Sep 16, 2017

  • build a new db, "September 2017". Copied from May17.
  • import gtfs feeds to new db
    • Current (2017-09-18 10:32am) feeds are just copied from May 2017.
    • Can we use the list of public+private feeds from archive.oregon-gtfs.com or are there other feeds we need to include?
  • check for where gaps in 7 contiguous days exist, look for 3 candidates for 7 contiguous days
    • For each candidate try to find gtfs feeds of proper start/end dates to fill gap
      • cccxpress. Maybe there is an older feed I can use.
      • klamath. Still doing research on why it's been expired for so long.

@ed-g
Copy link
Collaborator

ed-g commented Sep 18, 2017

@ODOT-RPTD-mb @PPaulsonOregonDOT @srinivas13794 @BenFields724 are the GTFS feeds in the Tna tool loaded from the Public+Private gtfs archive site? If not, what is the process for fetching GTFS feeds?

@ODOT-RPTD-mb
Copy link
Collaborator

Yes, feeds are captured from private+public Trillium site.

@ed-g
Copy link
Collaborator

ed-g commented Sep 18, 2017

Uploading feeds from http://archive.oregon-gtfs.com/archive-download-private/Oregon-Private-GTFS-feeds-2017-09-18Z.zip (you'll need a login to access that URL).

Notes to self:

Next steps after GTFS upload are "Run Update Queries" in the admin interface, and then "Activate Database".

  • "Run Update Queries" might be a good place to add any other pre-calculation necessary to speed up various queries.

It's necessary to make a database "inactive" before uploading GTFS. That part of the admin interface is currently broken due to #64 but you can update database_status directly using psql for the same effect.

@ed-g
Copy link
Collaborator

ed-g commented Sep 18, 2017

If we create a GIST index on census_blocks.shape, testing for contained points goes much faster.

Similar story for gtfs_trips.shape and census_tracts. In general it's a good idea to index any geometry columns when using PostGIS.

create index gtfs_trips_shape_idx ON gtfs_trips using gist(shape);

create index parknride_geom_idx ON parknride using gist(geom);

create index census_blocks_shape_idx on census_blocks using gist (shape) ;
create index census_tracts_shape_idx on census_tracts using gist (shape) ;
create index census_places_shape_idx on census_places using gist (shape) ;
create index census_congdists_shape_idx on census_congdists  using gist (shape) ;
create index census_states_shape_idx on census_states  using gist (shape) ;
create index census_urbans_shape_idx on census_urbans  using gist (shape) ;

@ed-g
Copy link
Collaborator

ed-g commented Sep 18, 2017

This will enable parallel queries on the database which could speed things up depending on the exact query.

alter database september17 set max_parallel_workers_per_gather= 8;

@ed-g
Copy link
Collaborator

ed-g commented Sep 18, 2017

Pretty close, there are a couple agencies which are ruining the party. Klamath seems to have expired a couple years ago, and CCCXPRESS starts in the future.

september17=# select enddate::date, feedname from gtfs_feed_info order by enddate asc limit 10;
  enddate   |          feedname           
------------+-----------------------------
 2016-09-05 | klamathshuttle-or-us
 2017-10-31 | washingtonparkshuttle-or-us
 2017-12-02 | trimet-portland-or-us
 2017-12-06 | cccxpress-or-us
 2017-12-25 | valleyretriever-or-us
 2017-12-30 | salem-or-us
 2018-01-01 | cascadespoint-or-us
 2018-01-01 | albanytransit-or-us
 2018-01-01 | cooscounty-or-us
 2018-01-01 | amtrakcascades-or-us
(10 rows)


september17=# select startdate::date, feedname from gtfs_feed_info order by startdate desc limit 10;
 startdate  |          feedname           
------------+-----------------------------
 2017-09-25 | cccxpress-or-us
 2017-09-10 | ctran-wa-us
 2017-09-03 | salem-or-us
 2017-09-03 | trimet-portland-or-us
 2017-08-01 | northwestpoint-or-us
 2017-08-01 | pacifictransit-wa-us
 2017-06-18 | lanetransitdistrict-or-us
 2017-05-11 | highdesertpoint-or-us
 2017-04-01 | washingtonparkshuttle-or-us
 2017-02-01 | hut-or-us
(10 rows)

@ed-g
Copy link
Collaborator

ed-g commented Sep 18, 2017

CCCXpress from 14 September looks like it starts on the 13th. I wonder why tna tool thinks it starts on the 25th.

oregon-gtfs.com also shows it as starting on the 13th.

feed_publisher_url,feed_publisher_name,feed_lang,feed_version,feed_license,feed_contact_email,feed_contact_url,feed_start_date,feed_end_date
http://www.trilliumtransit.com,"Trillium Solutions, Inc.",en,UTC: 14-Sep-2017 00:38,,support+cccxpress-or-us@trilliumtransit.com,http://support.trilliumtransit.com,20170913,20171206

UPDATE: it's probably because cccxpress is a college commuter bus and doesn't start until term does.

september17=# select * from gtfs_calendars where serviceid_agencyid = '256';
 serviceid_agencyid |    serviceid_id    | gid | monday | tuesday | wednesday | thursday | friday | saturday | sunday | startdate | enddate  
--------------------+--------------------+-----+--------+---------+-----------+----------+--------+----------+--------+-----------+----------
 256                | c_842_b_2858_d_31  |   0 |      1 |       1 |         1 |        1 |      1 |        0 |      0 | 20170925  | 20171206
 256                | c_842_b_2858_d_15  |   0 |      1 |       1 |         1 |        1 |      0 |        0 |      0 | 20170925  | 20171206
 256                | c_2279_b_2859_d_15 |   0 |      1 |       1 |         1 |        1 |      0 |        0 |      0 | 20170925  | 20171206
(3 rows)

@ed-g
Copy link
Collaborator

ed-g commented Sep 18, 2017

Klamath Shuttle (aka Crater Lake Trolley) should run this year until October 10th or 11th. but we're checking with them to make sure.

UPDATE: Crater Lake Trolley is a loop route where you can't disembark the trolley around the lake. It is not included in google transit for that reason. The Klamath Shuttle from the Amtrak station to the lake is active only during summer months, next from 7/1/2018 - 8/3/2018.

UPDATE 2: The schedule information was outdated from 2016, and I've loaded the current version of the feed with updated schedule for 2018.

@ed-g
Copy link
Collaborator

ed-g commented Sep 18, 2017

Protip: use accept: application/json to fetch data from Daterange backend. Browser confuses the backend by asking for html and xml.

curl -H 'accept: application/json' http://tna.trilliumtransit.com:8080/TNAtoolAPI-Webapp/queries/transit/Daterange?dbindex=10

@ed-g
Copy link
Collaborator

ed-g commented Sep 19, 2017

@ODOT-RPTD-mb @PPaulsonOregonDOT it looks like there is a little over a month of contiguous validity (September 25th through October 31st), with the single exception of the Klamath shuttle which runs only seasonally and is done for the year.

http://tna.trilliumtransit.com/TNAtoolAPI-Webapp/Daterange.html?&dbindex=10

Let me know if this gives you what you need for the report?

september17=# select enddate::date, feedname from gtfs_feed_info order by enddate asc limit 5;  select startdate::date, feedname from gtfs_feed_info order by startdate desc limit 5;
  enddate   |          feedname           
------------+-----------------------------
 2017-10-31 | washingtonparkshuttle-or-us
 2017-12-02 | trimet-portland-or-us
 2017-12-06 | cccxpress-or-us
 2017-12-25 | valleyretriever-or-us
 2017-12-30 | salem-or-us
(5 rows)

 startdate  |       feedname        
------------+-----------------------
 2018-07-01 | klamathshuttle-or-us
 2017-09-25 | cccxpress-or-us
 2017-09-10 | ctran-wa-us
 2017-09-03 | trimet-portland-or-us
 2017-09-03 | salem-or-us
(5 rows)

@ODOT-RPTD-mb
Copy link
Collaborator

Potentially this works for a short term fix. I would expect Columbia Gorge Express to show up (or not show up) in the same way that the Klamath Shuttle does in that it is also a seasonal summer service that is not currently operating?

For the longer term we probably want to:

  • Only consider contiguous time blocks in the past with respect to the capture date, when looking for 7 consecutive days to capture, because feed updates can happen at any time (see red circled text in table below). We might want to add data capture date to bottom of agency dropdown menu for context.
  • Make sure that Trillium is using consistent and predictable methodology around service start-end dates
  • Consider how to handle seasonal services like the Klamath Shuttle and Columbia Gorge Express in a way that does not mislead the end user re: time periods while seasonal service is inactive equates to incomplete data for that time period.
  • Consider how to detect/handle contradictions/inconsistencies between feed start/end date and calendar files. Is this already being tested for?
  • Consider modifying text at bottom of agency list dropdown, or modifying data processes, so that it does not show a "no valid date range" message, when the only "expired" data is from out of season, seasonal service.

image

@ed-g
Copy link
Collaborator

ed-g commented Sep 19, 2017

Sorry just a quick response for now

  • Feed capture date is available in the archive; we'll just want a way to load it into TNA tool.

    • Spreadsheet column in last_updates.csv
    • For manually uploaded feeds we should provide the user a means to specify the feed capture date, defaulting to today's date upon load.
    • It should be possible for GTFS-Archive to provide a rolling archive of feeds which are their newest version and which were posted more than a week ago.
      • Edge case for very frequently update feeds -- the feed has been updated but the schedule has not materially changed between the "earlier" and "current" feed for the time period we are analyzing. In other words there's a new feed posted but it didn't change the schedule for the time period we care about, either it doesn't claim to know (start_date in the future) or it gives information but exactly the same as the older feed. If we run "enough" metrics for each day of the 7 days can we be confident enough of this with reasonable effort?
      • Or if we trust the feed provider we put them on a good-list, "Trimet feeds are generally valid for at least a week".
  • Aaron can say more but we're planning to ask agencies about the expected end date of their feeds on each export -- right now we give the end date based on the last calendar which assumes too much.

  • I don't understand the details for how to analyze seasonal feeds, the naive question I'd ask is can we identify seasonal feeds automatically or should they be marked in the data upon load (I'm guessing we'd want to mark them).

    • To identify seasonal maybe we can look back a couple years for periodic service. And then suggest to the user as a warning that "this service looks seasonal, is it?"
  • It looks like inconsistencies are being checked for (bounds reduced to the lesser of feed dates and calendar dates) thanks to OSU teams' SQL code. But we can audit to make sure all corner cases are handled.

  • Having a date range "excluding" seasonal feeds seems pretty easy once seasonal feeds are marked as such.

@PPaulsonOregonDOT
Copy link
Collaborator

@ed-g Generally, we know which feeds are seasonal (at this point, I think it's just Klamath Shuttle and Columbia Gorge Express).

@PPaulsonOregonDOT
Copy link
Collaborator

One observation that I just made related to this is that the floating box with all of the Oregon Transportation Agencies (OTA) currently shows now agencies with expired feeds, which would suggest that at least today should be considered a valid day, and I believe this entire week should qualify. However, in looking at the data graph, there are no days when all feeds are valid. This might be related to issue #32, but it's unclear where in the process the date information is getting corrupted/broken.

@antrim antrim assigned antrim and unassigned ed-g Sep 27, 2017
@ed-g
Copy link
Collaborator

ed-g commented Oct 30, 2017

@antrim I'll let you make a time estimate here

@ODOT-RPTD-mb
Copy link
Collaborator

@PPaulsonOregonDOT Phil comment ported from closed issue:
This points to a bigger conversation that we need to have about what the tool looks at in terms of "up to date" data. RVTD's feed doesn't expire until 2025, but their calendar.txt end date is 9/1 on the old feed and the same file has a start date of 9/25 in the newest feed, which is the same date as it was refreshed on oregon-gtfs.com. The feed validator didn't throw an issue with this feed, which would suggest it's not out of spec, but it causes some problematic behavior in the tool.

@ODOT-RPTD-mb
Copy link
Collaborator

@antrim @PPaulsonOregonDOT @ed-g - word file with some additional notes on GTFS feed set updates and archiving.
TNAST GTFS DATABASE UPDATE NOTES.docx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants