Help migrating away from ORES #1

elukey · 2023-06-08T07:54:55Z

Hi! I am part of the Wikimedia ML team, we are starting the migration of ORES client to another infrastructure, since we are planning to deprecate it. More info in https://wikitech.wikimedia.org/wiki/ORES

TL;DR:

The ORES infrastructure is going to be replaced by Lift Wing, a more modern and kubernetes-based service.
All the ORES models (damaging, goodfaith, etc..) are running on Lift Wing, more on how to use them in https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Usage
We have new models called Revert Risk, to replace goodfaith and damaging for example. The are available on Lift Wing, and we'd like to offer them as valid and more precise/performant alternative to ORES models. If you'd like to try them we'd help in the migration process!

Thanks in advance,

ML team

elukey · 2023-06-28T08:44:20Z

@SiarheiGribov @ajbura @kfiven Hi! Do you know who I can talk with for this migration? Thanks in advance :)

kfiven · 2023-06-28T10:48:55Z

@elukey discord would be the best way to discuss it. All of us are available there.

SiarheiGribov · 2023-06-28T13:08:51Z

Hello, @elukey . You can join our discord server.

Are you planning to create simple and ready to use stream / integrate data into existing streams? Unfortunately, the ORES stream had a very long delay (up to 60 seconds). This is especially important due to the fact that you set fairly strict limits per hour.
Bearer tokens are only available in oauth2.0. auth1.0 is not supported?
No authorization required for internal endpoints? Will it be the same in the future?
Do you have a machine-readable list of wikis and models?
Sorry for my English.

elukey · 2023-06-28T14:00:31Z

Hi!

Hello, @elukey . You can join our discord server.

Are you planning to create simple and ready to use stream / integrate data into existing streams? Unfortunately, the ORES stream had a very long delay (up to 60 seconds). This is especially important due to the fact that you set fairly strict limits per hour.

We are deprecating the revision-score stream, and we'll create new ones (one model score type for each stream) but it will be on demand, do you have any specific requirements? I think that the latency will be some seconds after an edit happens, not faster than that :)

Bearer tokens are only available in oauth2.0. auth1.0 is not supported?

I am not 100% sure, do you need only oauth1.0 or is it ok oauth2.0 too?

No authorization required for internal endpoints? Will it be the same in the future?

With internal endpoints we mean inside the WMF infrastructure, but for the outside world we have anonymous access without bearer token (at the moment limited to 10000 requests/hour).

Do you have a machine-readable list of wikis and models?

I don't think we have at the moment, since our new infrastructure is one micro-service for each model. We support the same model that ORES supports, plus other ones, but if you need the list for a specific use case we can work on having one.

Sorry for my English.

Not at all, I am a non native english speaker, yours is great. Thanks for taking the time!

SiarheiGribov · 2023-06-28T14:30:43Z

~~wmf-ca-certificates.crt not found on Toolforge with root/etc/ssl/certs/ path.~~

but if you need the list for a specific use case we can work on having one.

I really don't want to check the pages (1 and 2) and update my code when adding supported new wiki :).

do you need only oauth1.0 or is it ok oauth2.0 too?

In my opinion, a better option would be to set token / private key on api.wikimedia without oAuth. oAuth2.0 requires updating token every 4 hours (extra sourcecode), and oauth1.0 is too complicated. oAuth also imposes a requirement to hide oAuth credentials (browser-based JavaScript client apps?). ORES was very easy to use, but now we need to create a large project, 90% of which will be authorization logic. This doesn't scare me (I have projects for Wikimedia on 1.0 and 2.0 oAuth), but for others it increases the threshold.

elukey · 2023-06-28T15:42:42Z

@SiarheiGribov toolforge is separated from the Wikimedia production infrastructure, so your client needs to go through the API gateway (like it was a regular client in the outside Internet). You can test the endpoint without authentication, but it will grant to your IP only 10000 requests / hour.

I really don't want to check the pages (1 and 2) and update my code when adding supported new wiki :).

Definitely yes I can understand. For the moment we don't have such a list, but I'll talk with my team to figure out how we can add the feature.

do you need only oauth1.0 or is it ok oauth2.0 too?

In my opinion, a better option would be to set token / private key on api.wikimedia without oAuth. oAuth2.0 requires updating token every 4 hours (extra sourcecode), and oauth1.0 is too complicated. oAuth also imposes a requirement to hide oAuth credentials (browser-based JavaScript client apps?). ORES was very easy to use, but now we need to create a large project, 90% of which will be authorization logic. This doesn't scare me (I have projects for Wikimedia on 1.0 and 2.0 oAuth), but for others it increases the threshold.

IIUC the token from Meta shouldn't last 4 hours, it doesn't have an expiry IIRC, but we'd need to verify. I completely understand about the complexity of the code, but once you have your token it should just be a matter of setting the bearer auth header and that's it. All steps are indicated in https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Usage, but if anything doesn't work as expected let us know and we'll try to fix it. It would mean a lot to us to have new clients migrated, so we can understand what's missing, thanks in advance for the patience!

SiarheiGribov · 2023-06-28T16:06:56Z

"Common" (not only for owner and not identity only) Bearer token from Meta expires after 14400 s (4 h).
Access token from oAuth1.0 is indefinite (or a very long time; a year?)

UPD: owner-only oauth2.0 token is indefinite.

elukey · 2023-06-29T05:28:44Z

"Common" (not only for owner and not identity only) Bearer token from Meta expires after 14400 s (4 h). Access token from oAuth1.0 is indefinite (or a very long time; a year?)

I admit my ignorance about Meta's tokens, reading https://meta.wikimedia.org/wiki/Special:OAuthConsumerRegistration/propose/oauth2 it seemed to me that we issued only OAuth 2.0 tokens, but I see the "Consumer version" set to 1.0. The API Gateway should support both, since we already tested one that doesn't expire.

SiarheiGribov · 2023-06-29T20:48:20Z

Done.

if (wiki is wikipedia  and namespace is 0) {
    if (wiki in multilingual list and user is anon)
         multilingual
     else
          agnostic
}
else {
    if (wiki in damaging ORES list)
        damaging
    if (wiki in reverted ORES list)
        reverted
}

2-4k requests at peak time.

Today's list of bugs.
~8% of requests ends as { httpCode: 504, httpReason: 'upstream request timeout' } See full log (~1,5h; default settings of requests (python) / request (nodejs)). The same request can take from 1 second to 30+ seconds. Or timeout. 8% is too much.
be-x-old timeout.
Need a machine-readable list of available languages (agnostic / multilingual) and available model names (ORES). Check every day manually from four location -- no way. :)

elukey · 2023-06-30T09:54:30Z

To keep archives happy, I created:

https://phabricator.wikimedia.org/T340812
https://phabricator.wikimedia.org/T340811
https://phabricator.wikimedia.org/T340813
https://phabricator.wikimedia.org/T340822
https://phabricator.wikimedia.org/T340824

Thanks a lot for the report :)

elukey · 2023-07-07T13:40:16Z

Some updates:

The anonymous calls/hour has been raised to 50k (from 10k).
The Research team published recommendations about when to use Revert Risk models.

elukey · 2023-07-13T13:57:39Z

@SiarheiGribov hi! Not sure if you changed or not the calls to the Revert Risk models as explained in [https://meta.wikimedia.org/wiki/Machine_learning_models/Proposed/Language-agnostic_revert_risk#Ethical_considerations,_caveats,_and_recommendations](this link), if so did you notice less errors than with multi-lingual?

elukey · 2023-08-27T08:32:33Z

@SiarheiGribov Hi! I think that we can close, what do you think? Anything missing?

elukey closed this as completed Sep 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Help migrating away from ORES #1

Help migrating away from ORES #1

elukey commented Jun 8, 2023

elukey commented Jun 28, 2023

kfiven commented Jun 28, 2023

SiarheiGribov commented Jun 28, 2023

elukey commented Jun 28, 2023

SiarheiGribov commented Jun 28, 2023 •

edited

elukey commented Jun 28, 2023

SiarheiGribov commented Jun 28, 2023 •

edited

elukey commented Jun 29, 2023

SiarheiGribov commented Jun 29, 2023 •

edited

elukey commented Jun 30, 2023

elukey commented Jul 7, 2023

elukey commented Jul 13, 2023

elukey commented Aug 27, 2023

Help migrating away from ORES #1

Help migrating away from ORES #1

Comments

elukey commented Jun 8, 2023

elukey commented Jun 28, 2023

kfiven commented Jun 28, 2023

SiarheiGribov commented Jun 28, 2023

elukey commented Jun 28, 2023

SiarheiGribov commented Jun 28, 2023 • edited

elukey commented Jun 28, 2023

SiarheiGribov commented Jun 28, 2023 • edited

elukey commented Jun 29, 2023

SiarheiGribov commented Jun 29, 2023 • edited

elukey commented Jun 30, 2023

elukey commented Jul 7, 2023

elukey commented Jul 13, 2023

elukey commented Aug 27, 2023

SiarheiGribov commented Jun 28, 2023 •

edited

SiarheiGribov commented Jun 28, 2023 •

edited

SiarheiGribov commented Jun 29, 2023 •

edited