Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Help migrating away from ORES #1

Closed
elukey opened this issue Jun 8, 2023 · 13 comments
Closed

Help migrating away from ORES #1

elukey opened this issue Jun 8, 2023 · 13 comments

Comments

@elukey
Copy link

elukey commented Jun 8, 2023

Hi! I am part of the Wikimedia ML team, we are starting the migration of ORES client to another infrastructure, since we are planning to deprecate it. More info in https://wikitech.wikimedia.org/wiki/ORES

TL;DR:

  • The ORES infrastructure is going to be replaced by Lift Wing, a more modern and kubernetes-based service.
  • All the ORES models (damaging, goodfaith, etc..) are running on Lift Wing, more on how to use them in https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Usage
  • We have new models called Revert Risk, to replace goodfaith and damaging for example. The are available on Lift Wing, and we'd like to offer them as valid and more precise/performant alternative to ORES models. If you'd like to try them we'd help in the migration process!

Thanks in advance,

ML team

@elukey
Copy link
Author

elukey commented Jun 28, 2023

@SiarheiGribov @ajbura @kfiven Hi! Do you know who I can talk with for this migration? Thanks in advance :)

@kfiven
Copy link
Collaborator

kfiven commented Jun 28, 2023

@elukey discord would be the best way to discuss it. All of us are available there.

@SiarheiGribov
Copy link
Collaborator

Hello, @elukey . You can join our discord server.

  1. Are you planning to create simple and ready to use stream / integrate data into existing streams? Unfortunately, the ORES stream had a very long delay (up to 60 seconds). This is especially important due to the fact that you set fairly strict limits per hour.
  2. Bearer tokens are only available in oauth2.0. auth1.0 is not supported?
  3. No authorization required for internal endpoints? Will it be the same in the future?
  4. Do you have a machine-readable list of wikis and models?
    Sorry for my English.

@elukey
Copy link
Author

elukey commented Jun 28, 2023

Hi!

Hello, @elukey . You can join our discord server.

  1. Are you planning to create simple and ready to use stream / integrate data into existing streams? Unfortunately, the ORES stream had a very long delay (up to 60 seconds). This is especially important due to the fact that you set fairly strict limits per hour.

We are deprecating the revision-score stream, and we'll create new ones (one model score type for each stream) but it will be on demand, do you have any specific requirements? I think that the latency will be some seconds after an edit happens, not faster than that :)

  1. Bearer tokens are only available in oauth2.0. auth1.0 is not supported?

I am not 100% sure, do you need only oauth1.0 or is it ok oauth2.0 too?

  1. No authorization required for internal endpoints? Will it be the same in the future?

With internal endpoints we mean inside the WMF infrastructure, but for the outside world we have anonymous access without bearer token (at the moment limited to 10000 requests/hour).

  1. Do you have a machine-readable list of wikis and models?

I don't think we have at the moment, since our new infrastructure is one micro-service for each model. We support the same model that ORES supports, plus other ones, but if you need the list for a specific use case we can work on having one.

Sorry for my English.

Not at all, I am a non native english speaker, yours is great. Thanks for taking the time!

@SiarheiGribov
Copy link
Collaborator

SiarheiGribov commented Jun 28, 2023

wmf-ca-certificates.crt not found on Toolforge with root/etc/ssl/certs/ path.

but if you need the list for a specific use case we can work on having one.

I really don't want to check the pages (1 and 2) and update my code when adding supported new wiki :).

do you need only oauth1.0 or is it ok oauth2.0 too?

In my opinion, a better option would be to set token / private key on api.wikimedia without oAuth. oAuth2.0 requires updating token every 4 hours (extra sourcecode), and oauth1.0 is too complicated. oAuth also imposes a requirement to hide oAuth credentials (browser-based JavaScript client apps?). ORES was very easy to use, but now we need to create a large project, 90% of which will be authorization logic. This doesn't scare me (I have projects for Wikimedia on 1.0 and 2.0 oAuth), but for others it increases the threshold.

@elukey
Copy link
Author

elukey commented Jun 28, 2023

@SiarheiGribov toolforge is separated from the Wikimedia production infrastructure, so your client needs to go through the API gateway (like it was a regular client in the outside Internet). You can test the endpoint without authentication, but it will grant to your IP only 10000 requests / hour.

I really don't want to check the pages (1 and 2) and update my code when adding supported new wiki :).

Definitely yes I can understand. For the moment we don't have such a list, but I'll talk with my team to figure out how we can add the feature.

do you need only oauth1.0 or is it ok oauth2.0 too?

In my opinion, a better option would be to set token / private key on api.wikimedia without oAuth. oAuth2.0 requires updating token every 4 hours (extra sourcecode), and oauth1.0 is too complicated. oAuth also imposes a requirement to hide oAuth credentials (browser-based JavaScript client apps?). ORES was very easy to use, but now we need to create a large project, 90% of which will be authorization logic. This doesn't scare me (I have projects for Wikimedia on 1.0 and 2.0 oAuth), but for others it increases the threshold.

IIUC the token from Meta shouldn't last 4 hours, it doesn't have an expiry IIRC, but we'd need to verify. I completely understand about the complexity of the code, but once you have your token it should just be a matter of setting the bearer auth header and that's it. All steps are indicated in https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Usage, but if anything doesn't work as expected let us know and we'll try to fix it. It would mean a lot to us to have new clients migrated, so we can understand what's missing, thanks in advance for the patience!

@SiarheiGribov
Copy link
Collaborator

SiarheiGribov commented Jun 28, 2023

"Common" (not only for owner and not identity only) Bearer token from Meta expires after 14400 s (4 h).
Access token from oAuth1.0 is indefinite (or a very long time; a year?)

UPD: owner-only oauth2.0 token is indefinite.

@elukey
Copy link
Author

elukey commented Jun 29, 2023

"Common" (not only for owner and not identity only) Bearer token from Meta expires after 14400 s (4 h). Access token from oAuth1.0 is indefinite (or a very long time; a year?)

I admit my ignorance about Meta's tokens, reading https://meta.wikimedia.org/wiki/Special:OAuthConsumerRegistration/propose/oauth2 it seemed to me that we issued only OAuth 2.0 tokens, but I see the "Consumer version" set to 1.0. The API Gateway should support both, since we already tested one that doesn't expire.

@SiarheiGribov
Copy link
Collaborator

SiarheiGribov commented Jun 29, 2023

Done.

if (wiki is wikipedia  and namespace is 0) {
    if (wiki in multilingual list and user is anon)
         multilingual
     else
          agnostic
}
else {
    if (wiki in damaging ORES list)
        damaging
    if (wiki in reverted ORES list)
        reverted
}

2-4k requests at peak time.

  1. Today's list of bugs.
  2. ~8% of requests ends as { httpCode: 504, httpReason: 'upstream request timeout' } See full log (~1,5h; default settings of requests (python) / request (nodejs)). The same request can take from 1 second to 30+ seconds. Or timeout. 8% is too much.
  3. be-x-old timeout.
  4. Need a machine-readable list of available languages (agnostic / multilingual) and available model names (ORES). Check every day manually from four location -- no way. :)

@elukey
Copy link
Author

elukey commented Jul 7, 2023

Some updates:

  • The anonymous calls/hour has been raised to 50k (from 10k).
  • The Research team published recommendations about when to use Revert Risk models.

@elukey
Copy link
Author

elukey commented Jul 13, 2023

@SiarheiGribov hi! Not sure if you changed or not the calls to the Revert Risk models as explained in [https://meta.wikimedia.org/wiki/Machine_learning_models/Proposed/Language-agnostic_revert_risk#Ethical_considerations,_caveats,_and_recommendations](this link), if so did you notice less errors than with multi-lingual?

@elukey
Copy link
Author

elukey commented Aug 27, 2023

@SiarheiGribov Hi! I think that we can close, what do you think? Anything missing?

@elukey elukey closed this as completed Sep 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants