Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rotate bike_id on free_bike_status #147

Merged
merged 7 commits into from
Jan 27, 2020

Conversation

morganherlocker
Copy link
Contributor

This PR changes the description of bike_id to a random id, rotated after each trip. This is a non breaking change, as some providers already rotate to a new random id daily.

@jcn
Copy link
Contributor

jcn commented Aug 30, 2019

Regardless of where we end up on the issue of bike_id, I would argue that this is a breaking change to the spec as it means that vendors that do not rotate their ids will then be out of compliance.

@yocontra
Copy link
Contributor

yocontra commented Sep 30, 2019

If the primary purpose of having a bike_id is for booking handoff, wouldn't it be clearer to rename this field to booking_token or similar to be clearer about its purpose and allow providers to have their own hashing/rolling method, with a baseline being at least once per trip? This also clarifies to consumers that this should not be used as an index or unique vehicle ID, and is ephemeral without having to read the spec deeply.

@heidiguenin
Copy link
Contributor

heidiguenin commented Oct 7, 2019

This was one issue discussed during an in-person GBFS developers workshop held before the NABSA 2019 conference (workshop agenda and notes).

During the workshop, we discussed GBFS dataset producer and consumer uses of bike_id and if/how bike_id is currently rotated. We learned that some producers currently rotate bike_id; we learned of one example where the rotation of bike_id was required by a city as a requirement of permitted operations. We also heard about GBFS producers removing bike_id from their GBFS datasets and being required by cities to add them back in as a requirement of their permits. There was speculation about why producers may have done this, and privacy was discussed as a likely factor.

Identified uses of bike_id include:
Booking - bike_id is used for in-app booking
Operations - bike_id can be used to identify stale bikes
Regulatory compliance - bike_id may be used for counting bikes in support of regulatory enforcement

Because GBFS is concerned with traveler-facing information, we are primarily concerned with the booking use case, though participants did discuss how to address the other two uses cases assuming that bike_id would be rotated regularly in the future. Ultimately, the inconvenience of changing current use cases needs to be weighed against current privacy concerns, and participants generally agreed that the rotation of bike_id was the simplest solution for now.

For booking, the main purpose of bike_id is as a hand-off token. With the deeplinks PR #25 assumed to be moving forward in the near future, it’s possible that bike_id may not even serve that purpose in the future. Participants discussed making bike_id an optional field, but ultimately to reduce the possibility of this being a breaking change, the consensus was to rotate bike_id at least after every trip (some operators suggested they’d prefer to rotate more often for operational reasons) and to provide best practice guidance suggesting methods for rotation and for reversibility for operators.

This change is intended to reduce the exposure of private data & does not guarantee anonymization. This proposal is part of the series of the minimum viable proposals we’re hoping to move forward in the coming weeks to get the spec better in line with industry practices and concerns, but we anticipate the community continuing to examine this issue in more depth on an on-going basis.

@sven4all
Copy link
Contributor

sven4all commented Oct 9, 2019 via email

@barbeau
Copy link
Member

barbeau commented Oct 9, 2019

@sven4all In the workshop we reinforced the division of responsibilities between GBFS and MDS. GBFS is intended to be used for traveler-facing information, and MDS is intended to be used for regularly monitoring and enforcement. Some may that have used GBFS for regulatory purposes, but going forward privacy concerns outweigh alternate use cases like this for GBFS that fall outside its primary purpose. Hence the need to rotate bike_ids.

If MDS doesn't fit the exact use case you're looking at for monitoring, I'd suggest bringing that up on the MDS repository as a new issue - https://github.com/CityOfLosAngeles/mobility-data-specification.

In the workshop we did discuss fleshing out this PR with the exact expected steps for rotating IDs to make expectations clearer to producers and consumers, and that work remains to be done.

@morganherlocker
Copy link
Contributor Author

@HeidiMG thanks for the informative summary for those of us following along online. I have committed the change in language to specify that bike_id needs to be rotated after every trip, "at minimum".

@heidiguenin
Copy link
Contributor

@morganherlocker Want to go ahead and call a vote on this now that versioning has passed?

@tmontes
Copy link

tmontes commented Nov 28, 2019

@sven4all In the workshop we reinforced the division of responsibilities between GBFS and MDS. GBFS is intended to be used for traveler-facing information, and MDS is intended to be used for regularly monitoring and enforcement. Some may that have used GBFS for regulatory purposes, but going forward privacy concerns outweigh alternate use cases like this for GBFS that fall outside its primary purpose. Hence the need to rotate bike_ids.

IIUC, @sven4all could be referring to authenticated (and encrypted?) access to free_bike_status.json feeds. Under those circumstances, mobility providers do know who they're "talking to" and could (should, IMHO) be ready to provide more precise information.

In the case of the City of Lisbon, most free_bike_status.json feeds are both authenticated and transferred over HTTPS such that the city can monitor public space utilisation.

Regarding privacy, and the particularly strict requirements of GDPR in Europe, there should be no limitations given that the City itself is GDPR compliant in the ways it processes and stores information.

My 2c.

@sven4all
Copy link
Contributor

@tmontes that is exactly what I mean. I think that it would be helpful if GBFS supports and standardise this usecase as well (inclusive authentication and encryption). We use GBFS really like a MDS light variant (it's easier for operators to implement and less data has to be handed over). If it's decided that free_bike_status should have rotating bike ids it will lead to a lot of confusion for many operators when governments are still requiring static bike_ids in the free_bike_status.json endpoint.

Would it be an idea to introduce a monitor_bike_status.json exclusively for government access inspired by free_bike_status.json?

@jcn
Copy link
Contributor

jcn commented Nov 28, 2019

In the interest of not duplicating too much data between MDS and GBFS, while respecting the use cases of both (and noting that MDS currently requires providers to produce a GBFS feed), I wonder if a solution might be to leave bike_id as the booking token (which is how people are generally using it), and adding something like a immutable_bike_id which would only be provided for authenticated feeds. It must be excluded in public feeds.

I would only propose that something like that even be considered if a producer was willing to actually test it out, but it could serve both needs in a relatively lightweight way.

@heidiguenin
Copy link
Contributor

heidiguenin commented Dec 4, 2019

We need to have a broader community conversation about how to avoid breaking the existing use cases of bike_id and what guidance to provide around when feed authorization is appropriate and how fields might be different in that case. But we also need updated spec documentation as soon as possible to ensure that we create a new understanding for operators and regulators that publishing stable bike_id in a publicly available GBFS dataset is not GBFS compliant. Everything that we have heard from GBFS consumers and producers indicates that there is a shared understanding about the risks of publicly publishing stable bike_id, and that we just need to come up with the right solutions for how to address it and still meet industry needs.

With other GBFS enhancements we have started with minimum viable proposals, and we should do the same here. What do others think of this plan forward?

  1. For this proposal, we also add "for publicly published GBFS data sets" to the definition of the field?
  2. We open another new issue to discuss bike_id more broadly - considering both publicly published and authenticated feeds?

@barbeau
Copy link
Member

barbeau commented Dec 4, 2019

@heidiguenin I think that path forward sounds reasonable.

I agree with @jcn that a new field for an immutable immutable_bike_id that is only allowed in authenticated feeds is probably the best solution.

I agree that we need better language around what is allowed in a public feed for bike_ids ASAP - this issue surfaced in #201 yesterday.

@barbeau
Copy link
Member

barbeau commented Dec 4, 2019

@heidiguenin Well, thinking more, I'm going to walk back a thumbs up for:

For this proposal, we also add "for publicly published GBFS data sets" to the definition of the field?

I think the above text should be part of the broader issue of private feeds and immutable bike IDs in these feeds.

The GBFS Guiding Principles say (emphasis mine):

  • GBFS is a specification for real-time or semi-real-time, read-only data. The spec is not intended for historical or archival data such as trip records. The spec is about public information intended for bikeshare users.

  • GBFS is targeted at providing transit information to the bikeshare end user. It’s primary purpose is to power tools for riders that will make bikesharing more accessible to users. GBFS is about public information. Producers and owners of GBFS data should take licensing and discoverability into account when publishing GBFS feeds.

IMHO adding explicit text saying "for publicly published GBFS data sets" implies that privately published GBFS data sets are also encouraged/supported, which seems to go directly against the above guiding principles.

So I would propose in this issue we focus only on providing guidance for handling bike IDs in public GBFS feeds, and leave all implicit and explicit modifications related to private feeds for another issue.

Copy link
Contributor

@heidiguenin heidiguenin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added language to provide some context for the change, as it will break certain use cases.

gbfs.md Outdated Show resolved Hide resolved
Co-Authored-By: Heidi G <38441752+heidiguenin@users.noreply.github.com>
@barbeau
Copy link
Member

barbeau commented Dec 10, 2019

I talked to @heidiguenin further about this issue offline.

Given the critical privacy issues at stake, and that we want to follow the community process for changes to GBFS that meet the needs of the entire community, I'd like to propose that we try to address both issues (privacy and IDs in private feeds) in this proposal in order to try and move forwards with a viable solution quickly.

I think the simplest path forward is adding the immutable_bike_id that @jcn proposed earlier that is used only in feeds with authentication (i.e., not open publicly available feeds).

So, we would redefine bike_id as already proposed (with additional text):

Field Name Required Defines
- bike_id Yes Identifier of a bike, rotated to a random string, at minimum, after each trip to protect privacy. Note: Persistent bike_id, published publicly, poses a threat to individual traveler privacy. See immutable_bike_id for a secure way to publish persistent IDs.

and we would also add:

Field Name Required Defines
- immutable_bike_id No Unique identifier of a bike. IMPORTANT: To protect the privacy of end users, this field must not be published in a publicly available open feed. Any feeds containing immutable_bike_id must be properly secured and protected by authentication.

@morganherlocker @tmontes @sven4all @jcn @heidiguenin What do you all think of this?

@mplsmitch
Copy link
Collaborator

@barbeau what do you think of the @sven4all idea of a separate file monitor_bike_status.json to hold immutable ids so that all the non-pii data in free_bike_status can remain auth free? That way we could make a clear distinction between feeds where authentication is appropriate and where it's not. I have a hunch that this won't be the last time that there's a valid argument for including data that shouldn't be fully public.

@Empty2k12
Copy link

And it's clear to data producers that they should take rider privacy seriously.

That's why bike_id should remain public and be set at the discretion of city/jurisdiction/operator! I don't see why educating these parties and putting best practice items in the Guidelines is not as good, if not better then completely removing. This way, safe applications can still benefit from bike_ids, while possibly insecure use cases (e.g. Scooters) chose to not include it.

Could you provide an exact use case of bike_id in free_bike_status.json that removing bike_id from open feeds would break?

An application that uses public Scooter GBFS feed to display Mobility Locations on a map and displays Scooter ID. For your understanding, I attached a screenshot from an application running with one of our scooter feeds:

Screenshot 2020-01-08 at 14 53 17

@morganherlocker
Copy link
Contributor Author

Hey everyone, thank you for the excellent discussion. I really appreciate all the wonderful feedback and perspectives from you all. 🙇

It sounds like we are at a consensus point where:

  1. There is general agreement that stable bike_id does not meet the privacy goals of the spec
  2. Rotating bike_ids or dropping bike_id both solve the immediate privacy/safety concerns
  3. Dropping bike_id will not hurt any primary use cases, but could prevent certain quality improvements (reducing GPS drift)

Since this an active, widespread vulnerability, I would like to see us move to a vote and get this merged as quickly as possible. I'm supportive of the emerging regulatory use cases around authenticated feeds that could contain new metadata, and believe we should follow this up with separate formal proposals to shore up these ideas in the specification. These use cases are important and deserving of dedicated discussion and design process.

With expediency in mind, I voice my support for merging bike_id rotation as the PR currently describes, since this position seems to have the fewest operational concerns and the broadest level of consensus, while addressing the user safety issue. I'll follow up this comment with a call for a formal vote, following the GBFS governance guidelines.

Thank you for taking the time to follow along and contribute to this process!

@morganherlocker
Copy link
Contributor Author

I hereby call a vote on this proposal. Voting will be open for 7 full days, until 11:59PM UTC on January 16.

Please vote for or against the proposal, and include the organization for which you are voting in your comment.

Please note if you can commit to implementing the proposal.

@tmontes
Copy link

tmontes commented Jan 9, 2020

@kanagy As @tmontes pointed out earlier, there doesn't seem to be a valid remaining use case for a rotated bike_id, now that the deep links proposal has passed (#25). The only other use case I'm aware of for a non-stable bike_id is caching and optimization (e.g., to reduce jitter of a bike on a map due to GPS drift), but to my knowledge these aren't critical features and privacy concerns are likely more important.

Clarification:

  • I see no use case for a rotated bike_id with the current wording: "rotated to a random string, at minimum, after each trip".
  • I do see (and have) use cases for a rotated bike_id if it is "rotated to a random string exactly once after each trip".

I know a call for vote is ongoing and I'll review this with the City of Lisbon officials before casting any vote. Nevertheless, I suppose a better PR would say:

  • No bike_id field in public feeds -- privacy 100% guaranteed.
  • Mandatory and stable bike_id in private feeds -- private meaning authenticated and encrypted.
  • I don't see the need for bike_id WRT to the recently integrated deep link capabilities given that its wording explicitly states "(...) Note that the URI does not necessarily include the bike_id (...)" -- whether those links are exploitable from a privacy standpoint is a whole other matter.

My 2c.

@antrim
Copy link
Contributor

antrim commented Jan 14, 2020

Hi Tiago - Thanks for the thorough comments. Here are some notes for your and City of Lisbon's consideration.

(Responses inline)

  • I see no use case for a rotated bike_id with the current wording: "rotated to a random string, at minimum, after each trip".

I believe some operators intend to rotate bike_id after a bike is moved or after some other "non-trip" event. "…At a minimum" would explicitly allow that, and would not diminish functionality in traveler-facing applications.

  • Mandatory and stable bike_id in private feeds -- private meaning authenticated and encrypted.

The stated purpose of GBFS is for traveler-facing applications. GBFS has also been useful for the purposes of monitoring and oversight by cities. I believe there needs to be a more thorough process to determine best practices and standards for that purpose -- beyond this immediate privacy patch. Long-term, I think there is a broadly held intention that GBFS and associated best practices should offer sound guidance to cities looking to set their own policies and requirements related to data for traveler-facing and monitoring applications.

  • I don't see the need for bike_id WRT to the recently integrated deep link capabilities given that its wording explicitly states "(...) Note that the URI does not necessarily include the bike_id (...)" -- whether those links are exploitable from a privacy standpoint is a whole other matter.

Some consumers and producers have committed to implementing the deeplinks proposal (#25 (comment)) but to my knowledge it hasn't yet been implemented. It takes time for new features to be implemented. The purpose of specifying bike_id rotation is that this is already current GBFS practice in many cases and easily implementable. It is an expedient way of solving a privacy issue that otherwise remains open. In future versions, the community will have the opportunity to consider some of the ideas you've put forward.

@quicklywilliam
Copy link

Ride Report votes in favor of this change.

@kanagy
Copy link

kanagy commented Jan 15, 2020

Google Maps votes in favor for the current proposal.

@charlesjump
Copy link

Uber/JUMP votes in favor of this change.

@rickbruce
Copy link

Ito World votes in favor of this change.

@bhandzo
Copy link

bhandzo commented Jan 15, 2020

Bird votes in favor of this change.

@yocontra
Copy link
Contributor

Stae votes in favor of this change.

@MuteQ
Copy link

MuteQ commented Jan 16, 2020

Transit votes in favor of this change.

@heidiguenin
Copy link
Contributor

Voting has closed, and this change has been passed with 7 votes in favor and 0 votes against:

Consumers (5): RideReport, Google Maps, Ito World, Stae, Transit
Producers (2): Uber/JUMP, Bird

Next steps:
We'll merge the PR and add it to the v2.0 Major Release candidate.
@antrim @barbeau

@heidiguenin heidiguenin added the v2.0 Candidate change for GBFS 2.0 (Major release) label Jan 17, 2020
antrim added a commit that referenced this pull request Jan 24, 2020
in README.md
@antrim antrim merged commit 310a8a8 into MobilityData:master Jan 27, 2020
@heidiguenin
Copy link
Contributor

We'd love to make this an official part of the spec, but first we need to see this change being implemented. Could you comment here if your organization has implemented this?

@MuteQ @contra @bhandzo @rickbruce @charlesjump @kanagy @quicklywilliam Others?

@mdwestervelt
Copy link

@heidiguenin Bird has implemented rotating IDs

@heidiguenin
Copy link
Contributor

I opened a new issue #237 to discuss options for maintaining the existing GBFS use cases that may have been broken by bike_id rotation.

chris-sarli added a commit to chris-sarli/GBFS-Viewer that referenced this pull request Jun 26, 2020
GBFS now specifies that IDs should change after each trip, for privacy reasons.

See here: MobilityData/gbfs#147
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
gbfs.md v2.0 Candidate change for GBFS 2.0 (Major release)
Projects
None yet
Development

Successfully merging this pull request may close these issues.