Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Smarter link-with-namespace filter #1494

Closed
BoboTiG opened this issue Jan 5, 2023 · 17 comments · Fixed by #1502 or #1530
Closed

Smarter link-with-namespace filter #1494

BoboTiG opened this issue Jan 5, 2023 · 17 comments · Fixed by #1502 or #1530
Labels
bug Something isn't working

Comments

@BoboTiG
Copy link
Owner

BoboTiG commented Jan 5, 2023

Wikicode:

[[Stó:lō]]

Output:

None

Expected:

Stó:lō

The wikicode is stripped at

text = sub(r"\[\[[^:\]]+:[^\]]+\]\]", "", text) # [[foo:b]] -> ''

Not sure what to do for now, just reporting the issue.

@BoboTiG BoboTiG added the bug Something isn't working label Jan 5, 2023
@BoboTiG BoboTiG changed the title Smarter link with namespace filtering? Smarter link-with-namespace filtering? Jan 5, 2023
@lasconic
Copy link
Collaborator

lasconic commented Jan 5, 2023

List of namespaces on french wiktionary: https://fr.wiktionary.org/w/api.php?action=query&meta=siteinfo&siprop=namespaces

Probably the same for other languages. Spanish : https://es.wiktionary.org/w/api.php?action=query&meta=siteinfo&siprop=namespaces

So we could create a script to get the namespaces, and only filter known namespaces...

@BoboTiG
Copy link
Owner Author

BoboTiG commented Jan 6, 2023

With 1b09061 it will be easier to use locale-specific code then.

@lasconic
Copy link
Collaborator

lasconic commented Jan 6, 2023

Mmm ok, I would have pass the language code in clean to get the correct array of namespace. If I understand correctly, here we will need to write a "clean" function per language with the current clean being the default. Correct ?

@BoboTiG
Copy link
Owner Author

BoboTiG commented Jan 6, 2023

I didn't want to add the new argument to clean() everywhere. We already have the locale name passed to process_template().
So you can still go with your idea, it'll just be a 2-lines change to actually add the new argument to clean() (instead of ~30 changes before the refactoring).

@lasconic
Copy link
Collaborator

lasconic commented Jan 6, 2023

I have something more or less working but I'm scratching my head on these two (on top of [[Stó:lō]]) or course. Not sure how to handle this yet

[[Fichier:Blason ville fr Petit-Bersac 24.svg|vignette|120px|'''Base''' d’or ''(sens héraldique)'']]
[[Annexe:Principales puissances de 10|10{{e|−6}}]] [[gray#fr-nom|gray]]

@BoboTiG
Copy link
Owner Author

BoboTiG commented Jan 6, 2023

Would you like to open a draft PR so that I can try too?

lasconic added a commit to lasconic/ebook-reader-dict that referenced this issue Jan 6, 2023
@lasconic
Copy link
Collaborator

lasconic commented Jan 6, 2023

There you go.
1/ I'm not sure it's really needed to keep all-namespaces.py in scripts. They probably don't change that often.
2/ I can't find a way to diffentiate the Fichier and Annexe namespace but we need to clean them at different time.
3/ The line you reference above is commented in my code but it we used to pass this test...

clean("[[http://www.tv5monde.com/cms/chaine-francophone/lf/Merci-Professeur/p-17081-Une-peur-bleue.htm?episode=10 Voir aussi l’explication de Bernard Cerquiglini en images]]")

@BoboTiG
Copy link
Owner Author

BoboTiG commented Jan 6, 2023

1/ You're right, we could move the file to langs.
2/ 3/ Still checking ⌚

@BoboTiG
Copy link
Owner Author

BoboTiG commented Jan 6, 2023

OK I made progress, will propose a patch on the PR.

@BoboTiG BoboTiG changed the title Smarter link-with-namespace filtering? Smarter link-with-namespace filter Jan 6, 2023
@BoboTiG BoboTiG changed the title Smarter link-with-namespace filter Smarter namespace link filter Jan 6, 2023
@BoboTiG BoboTiG changed the title Smarter namespace link filter Smarter link-with-namespace filter Jan 6, 2023
@BoboTiG
Copy link
Owner Author

BoboTiG commented Jan 7, 2023

1/ After more thinking, keeping namespaces in scripts would ease our work when adding new locales, and it will handle updates automatically. I would say we can keep it as-is. The only tiny detail is about using ALL_LOCALES instead of having duplicate lists; but it's a minor issue.

@lasconic
Copy link
Collaborator

lasconic commented Jan 7, 2023

OK but then it's less easy to implement 2 categories of namespaces, the one with text and the one without.

@BoboTiG BoboTiG closed this as completed Jan 16, 2023
@lasconic
Copy link
Collaborator

I believe it breaks

{{recons|lang-mot-vedette=fr|[[Reconstruction:proto-germanique/*berhtaz|berht]]}}

@BoboTiG
Copy link
Owner Author

BoboTiG commented Jan 16, 2023

Can you share the word using it 🙏 ?

@lasconic
Copy link
Collaborator

@BoboTiG BoboTiG reopened this Jan 16, 2023
lasconic added a commit to lasconic/ebook-reader-dict that referenced this issue Jan 17, 2023
@lasconic
Copy link
Collaborator

Ok. I have a simple solution... It passes all the tests already in the code... I must miss something.

#1530

@lasconic
Copy link
Collaborator

Ok. Already found a problem ...

https://fr.wiktionary.org/wiki/Daghestan uses the File namespace ...

A solution could be to always add File and Category, the english namespaces in the pattern list.

@BoboTiG
Copy link
Owner Author

BoboTiG commented Jan 17, 2023

A solution could be to always add File and Category, the english namespaces in the pattern list.

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
2 participants