New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disallow anonymous API access #552

Open
wants to merge 1 commit into
base: master
from

Conversation

Projects
None yet
3 participants
@CHTJonas
Copy link
Member

CHTJonas commented Dec 12, 2018

Bit of a placeholder really - feel free to comment...

Can be merged in as/when we feel it's appropriate, but I suggest we give 4-8 weeks notice by email to anyone who has an API app registered, and maybe do a post on social media etc. as well for anyone who's using it anonymously.

@CHTJonas

This comment has been minimized.

Copy link
Member

CHTJonas commented Dec 12, 2018

In the meantime, the API unit tests in here need looking at.

@GKFX
Copy link
Contributor

GKFX left a comment

Seems a bit drastic tbh - nothing wrong with people being able to get the JSON view of a show anonymously, for example. Authentication is always more faff and more to break, esp. for third parties! Would make more sense to require API keys for anything person-related but maybe not for the basics of show data.

Do we have stats on how much traffic, with and without API keys, the API gets?

@GKFX

This comment has been minimized.

Copy link
Contributor

GKFX commented Dec 12, 2018

Although web scraping the people data is not that hard, especially for a low-JS site like Camdram. There seems to be pretty minimal gain to any of this and it will result in wasted time for legitimate API users.

@hoyes

This comment has been minimized.

Copy link
Member

hoyes commented Dec 13, 2018

So...

I agree we do need to lock down the API much more than it is at the moment, and maybe we'll end up with something close to this by some point. I don't really know how many people are using the API (which is admittedly a little scary), but I think we need to go a little slower as it'll be Camdram that looks bad if we break too many other websites at once. And while we don't want to make the rules too complex either, I think there are disadvantages to having a one-size-fits-all approach.

In lieu of a fully thought out response, here's a bit of a brain dump for now:

  • If people can access the data via iCal or RSS, there's not much point restricting the equivalent JSON/XML responses IMO.
  • Maybe still require a User-Agent: header for less sensitive API requests. I think feed/calendar viewers will always specify one, but HTTP client libraries likely won't so this might be a way to force people to manually identity their API requests for less sensitive data without so much implementation effort on their end.
  • We should audit all uses of FOSRestController and consider refactoring some to be a standard Symfony controller (without JSON/XML outputs). Off the top of my head, API apps themselves and users don't really need to be part of the API.
  • Maybe remove anything that isn't listed in https://github.com/CHTJonas/camdram-ruby from the API completely.
  • Consider blocking all writable aspects of the API for now (i.e. GET requests only) until we have time to think about it - I doubt anyone's using this at the moment and use cases are likely limited anyway. We could do this by removing the CSRF token bypass config.
  • Need to consider how to support client-side JS. The G&S website is currently using jsonp IIRC, which doesn't work with OAuth. Likely CORS, but need to think about implications re exposing HTML views which contain CSRF tokens (and maybe other sensitive data) too.
  • As a step 1, we could maybe implement this PR just for role/people endpoints, which limits the number of people we might annoy in one go. There's obviously one specific API client that we'll have to manage if we do this.
  • Maybe build in some sort of IP address white-list feature that disables OAuth as a last resort temporary stop-gap thing to deploy on a case-by-case basis.
  • We maybe ought to balance restricting non-OAuth usage with documentation on how to actually use the OAuth process. I alsways thought it would be nice of more society websites pulled our data instead of maintaining their own lists, but needs some thought as to how to make such use cases as trivial as possible. Some sort of iframe-able HTML view might be handy.
  • We may be any to glean some limited API usage data from the Apache logs.
@CHTJonas

This comment has been minimized.

Copy link
Member

CHTJonas commented Dec 13, 2018

Nothing wrong with people being able to get the JSON view of a show anonymously...

In principle I agree with you - and I do assume good faith on the part of users. However I do worry that Camdram is a potential treasure trove of information that could be quite easily monetised or otherwise exploited in a way that, crucially, users are not expecting nor consenting to. I know this issues seems quite abstract but it would be so easy for someone to graph the entirety of Camdram data for the last, say, 3 years and then predict friendship groups, real-world habits/patters and people's locations from this data. This is actually an issue that someone has directly raised with me in person before now. I think this possibility is remote, but I think it would be more responsible of us than not to implement API access controls. I'm not necessarily talking full registration & approval like a social media giant, but at least free & open registration so we can keep track of what & who is using our site and its data. Then if we find out someone is abusing the API in this way we can revoke access & block their user account.

Additionally, I think it would make API rate limiting, logging and abuse prevention much easier to implement. If we can trace excessive usage of an endpoint to a single application via it's credentials then we in turn know the user responsible; we can contact them and advise them how to change their app to improve their + our performance. Currently the only data we get from anonymous usage is an IP address which is a lot more work to trace (and potentially impossible).

Authentication is always more faff and more to break

I actually dispute this! There's some really great third-party OAuth2 clients out there that can be wrapped around the camdram.net API more easily than trying to roll-ones-own in my opinion. Obviously I don't want to prescribe the way a developer should work but if someone has written a module that handles App ID's, Secrets & Access Tokens completely transparently then why reinvent the wheel.

Do we have stats on how much traffic, with and without API keys, the API gets?

Working on it right now (although Pete's giving me quite a bit of help)!

There seems to be pretty minimal gain to any of this and it will result in wasted time for legitimate API users.

Well if they're legitimate then they should really be using proper credentials as advised on the API page! 😛

I do think this discussion is a bit academic. I would guess the total number of serious services/platforms/apps that use the API numbers ~5 and judging by acts_api_apps, those who have registered are all people who I would be comfortable approaching personally and entering into discussion with, although obviously it's possible there's more anonymous apps out there.

@CHTJonas

This comment has been minimized.

Copy link
Member

CHTJonas commented Dec 13, 2018

Think we commented at the same time @hoyes! Yes happy fro this PR to change to address certain/different endpoints. And more than happy for people to 'chip in' and add commits themselves.

@hoyes

This comment has been minimized.

Copy link
Member

hoyes commented Dec 13, 2018

How about this potential reduced scope for this PR:

  • Require OAuth for anything involving roles or people. I think these are the ones that are particularly sensitive.
  • Disable third party POST/PUT/DELETE requests completely.
  • Add some of IP address white list that can be managed by configuration.

I think all the above can still be achieved fairly simply using security.yml alone (https://symfony.com/doc/current/security/access_control.html#matching-options)

Other, potential separate steps for the near future (perhaps the subject of separate tickets):

  • Audit FOSRestController usage and ensure we're only exposing JSON data where intended.
  • Implement CORS (and consider security implications).
  • Reevaluate non-restricted API endpoints in a few month's time once the dust has settled on this and decide if we want to make any further restrictions.

Thanks for getting the ball rolling @CHTJonas ... I'll admit the complexities of this have probably deterred me from making a move to date.

@CHTJonas

This comment has been minimized.

Copy link
Member

CHTJonas commented Dec 13, 2018

Sounds good to me. I should add that I'm about to release version 2 of https://github.com/CHTJonas/camdram-ruby and whilst obviously I maintain it, it's very much not meant to be Camdram doctrine! Was written as a side project to go with new room booking. Client-side JS & CORS is an interesting point - don't really know a lot about it myself.

To be honest, in order to make POST/PUT/DELETE requests an app would have to be using OAuth already. And I'm tempted to say that they can continue doing this if they want since they're doing nothing wrong. As you say though, I suspect this affects nobody/nothing.

glean some limited API usage data from the Apache logs

Can track endpoint usage & query strings but auth bearer & user agent headers won't show up so difficult to trace usage to a specific application, save dig -x'ing the IP.

This was definitely a long-term proposal on my part and was meant to be coupled with documentation on how OAuth works before being merged.

@hoyes

This comment has been minimized.

Copy link
Member

hoyes commented Dec 13, 2018

I guess the point I was trying to make is that due to the sprawling use of FOSRestController I think Camdram needs to prune the available API endpoints a little, and camdram-ruby is an interesting pseudo-third-party perspective on what a useful subset might look like - obviously evolve it as you see fit!

Hmm maybe we can leave PUT/POST/DELETE in place for now, but leave them out of the documentation or something, or at least put some "here be dragons" warnings. I'm not very confident that we're returning sensible responses in the case of validation errors etc so worry about managing expectations etc.

It might be interesting to summarise which json/xml URLs are being hit with which methods, even if we can't (yet) tell who's making the requests.

@CHTJonas CHTJonas referenced this pull request Dec 13, 2018

Open

API rate limiting #416

@GKFX

This comment has been minimized.

Copy link
Contributor

GKFX commented Dec 17, 2018

Happy with where things are now!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment