-
Notifications
You must be signed in to change notification settings - Fork 960
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Template expansion does not seem to work for french #32
Comments
I tried to troubleshoot and part of the problem is that the french templates are localized like this Modèle: instead of Template: "Template:" is hardcoded in the python script. When I fix this, the templating works. However, there are still issues with Lua modules. Example : Modèle:lang points to Modèle:Langue that uses Module:Langue ! And this requires lua support. So I guess the pass to collect templates also needs to collect the Lua modules ... |
The problem is related to that. When loading previously saved templates, it assumes that the template namespace is 'Template'. |
It is fixed in release 2.35. |
Thank you for the quick fix. There are other issues though. At least one of them is related to the original description. For some reason, the #redirect are lower case in french while the regex is upper case. check for redirects
or even like this check for redirects
|
After fixing an issue at line 478, I get : L'aïkibudo (合気武道, #redirect ) est un art martial traditionnel d'origine japonaise ("budō") essentiellement basé sur des techniques de défense. |
Got it, thank you. |
After fixing m = re.match('#(REDIRECT|redirect).?[[([^]])]]', page[0], re.IGNORECASE) L'aïkibudo (合気武道, ) est un art martial traditionnel d'origine japonaise ("budō") essentiellement basé sur des techniques de défense. But now, there is a missing japanese transliteration : {"expandtemplates":{"wikitext":"'''a\u00efkibudo'''<span style="font-weight: normal"> (<span class="lang-ja" lang="ja" xml:lang="ja" title="Japonais">\u5408\u6c17\u6b66\u9053, <span class="t_nihongo_romaji" title="Transcription Hepburn"><span class="lang-ja-latn-alalc97" lang="ja-latn-alalc97">aikibud\u014d<span class="t_nihongo_help"><span class="t_nihongo_icon" style="color:#00e;font:bold 80% sans-serif;text-decoration:none;padding:0 .1em;">[[Aide:Japonais|?]])"}} The missing part comes from a lua module : DEBUG: INVOCATION 0 japonais|'''aïkibudo'''|合気武道|aikibudō DEBUG: TITLE #if:aikibudō see : Hence the need to extract the modules in the way as the templates and evaluate them using lua |
I know, the extensions are written in lua and you will have to access to the code of those extensions. |
First get the template file as TEMPLATES, this requires parsing the whole file.
python extractPage.py --id 275 ../frwiki-20150602-pages-articles.xml.bz2 >aikibudo
python WikiExtractor.py -o extracted --templates ../TEMPLATES -a aikibudo
I get
L' est un art martial traditionnel d'origine japonaise ("budō") essentiellement basé sur des techniques de défense.Correct sentence
L'aïkibudo (合気武道, aikibudō?) est un art martial traditionnel d'origine japonaise (budō) essentiellement basé sur des techniques de défense.
Wiki text :
L'{{japonais|'''aïkibudo'''|合気武道|aikibudō}} est un [[art martial]] traditionnel d'origine [[japon]]aise (''[[budō]]'') essentiellement basé sur des techniques de défense.
The text was updated successfully, but these errors were encountered: