New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support multiple Wikipedias #2

Open
CristianCantoro opened this Issue May 27, 2016 · 9 comments

Comments

Projects
None yet
3 participants
@CristianCantoro
Contributor

CristianCantoro commented May 27, 2016

Hi,

I am interested in extending this library to support Italian Wikipedia. I am opening this bug as a tracker for issues around this topic.

@wetneb

This comment has been minimized.

Show comment
Hide comment
@wetneb

wetneb May 27, 2016

Member

That's fantastic! @symac, we need to do the same for the French Wikipedia.

Member

wetneb commented May 27, 2016

That's fantastic! @symac, we need to do the same for the French Wikipedia.

@symac

This comment has been minimized.

Show comment
Hide comment
@symac

symac May 31, 2016

@wetneb that would be great yet, I won't have much time for this I believe but if you need help to create the lookup table between French template parameters and the English one I might help.

symac commented May 31, 2016

@wetneb that would be great yet, I won't have much time for this I believe but if you need help to create the lookup table between French template parameters and the English one I might help.

@wetneb

This comment has been minimized.

Show comment
Hide comment
@wetneb

wetneb May 31, 2016

Member

The problem is that each Wikipedia has its own templates, but also its own template processing code in Lua… So basically we would need to do the same work I have done for en.wiki for each other language, AND translate the JSON outputs to the same schema. The situation is bad.

French template processing code: https://fr.wikipedia.org/wiki/Module:Biblio
Italian template processing code: https://it.wikipedia.org/wiki/Modulo:Citazione

Member

wetneb commented May 31, 2016

The problem is that each Wikipedia has its own templates, but also its own template processing code in Lua… So basically we would need to do the same work I have done for en.wiki for each other language, AND translate the JSON outputs to the same schema. The situation is bad.

French template processing code: https://fr.wikipedia.org/wiki/Module:Biblio
Italian template processing code: https://it.wikipedia.org/wiki/Modulo:Citazione

@CristianCantoro

This comment has been minimized.

Show comment
Hide comment
@CristianCantoro

CristianCantoro May 31, 2016

Contributor

I am working on adding the code from Modulo:Citazione @ itwiki (see CristianCantoro@91104a0). Again, the idea is to put all the language-dependent code in different subdirectories.

This needs some further tweaking because the Lua code from the Module does not work as of now. @wetneb is absolutely right in saying that:

  • the code from the module needs to be adapted to make it work (which is not the case for Italian);
  • the JSON outputs must be translated to the same schema;
    I do not know how hard and how much work this should take. The code imported from Modulo:Citazione is itself based on an old (2013) version of Module:Citation/CS1 from enwiki.

I am not expert in using and/or writing Lua code, any help in that regard is more than welcome. I think that after we have started porting the library to 2 or 3 languages we will see if this design is working or not.

Contributor

CristianCantoro commented May 31, 2016

I am working on adding the code from Modulo:Citazione @ itwiki (see CristianCantoro@91104a0). Again, the idea is to put all the language-dependent code in different subdirectories.

This needs some further tweaking because the Lua code from the Module does not work as of now. @wetneb is absolutely right in saying that:

  • the code from the module needs to be adapted to make it work (which is not the case for Italian);
  • the JSON outputs must be translated to the same schema;
    I do not know how hard and how much work this should take. The code imported from Modulo:Citazione is itself based on an old (2013) version of Module:Citation/CS1 from enwiki.

I am not expert in using and/or writing Lua code, any help in that regard is more than welcome. I think that after we have started porting the library to 2 or 3 languages we will see if this design is working or not.

@wetneb

This comment has been minimized.

Show comment
Hide comment
@wetneb

wetneb May 31, 2016

Member

OK. My understanding is that these Lua codes (at least CS1) work like this:

  • Map the template parameters to an internal representation as a dictionary
  • Render the HTML output, the errors, and the COinS embedded metadata from this representation.

My strategy to wrap the CS1 module was to get rid of the second part, only keeping the internal representation, and sending it back to Python.

As the Lua code calls Mediawiki-specific functions (which are not available in Lua by default), I had to simulate them by Python functions, which are passed to the Lua code as arguments.

I wonder whether there is a clean way to do this, in order to make Lua code updates manageable.

Member

wetneb commented May 31, 2016

OK. My understanding is that these Lua codes (at least CS1) work like this:

  • Map the template parameters to an internal representation as a dictionary
  • Render the HTML output, the errors, and the COinS embedded metadata from this representation.

My strategy to wrap the CS1 module was to get rid of the second part, only keeping the internal representation, and sending it back to Python.

As the Lua code calls Mediawiki-specific functions (which are not available in Lua by default), I had to simulate them by Python functions, which are passed to the Lua code as arguments.

I wonder whether there is a clean way to do this, in order to make Lua code updates manageable.

@CristianCantoro

This comment has been minimized.

Show comment
Hide comment
@CristianCantoro

CristianCantoro May 31, 2016

Contributor

I think that the code coming from the Lua modules on Wikipedia should be wrapped in a way that as little modifications as possible should be made to it easier to apply updates.

Contributor

CristianCantoro commented May 31, 2016

I think that the code coming from the Lua modules on Wikipedia should be wrapped in a way that as little modifications as possible should be made to it easier to apply updates.

@CristianCantoro

This comment has been minimized.

Show comment
Hide comment
@CristianCantoro

CristianCantoro Jun 2, 2016

Contributor

I wanted to point out this initiative by @nemobis on Italian Wikipedia:

Today I wrote a small script
https://github.com/nemobis/bots/blob/master/doi-doai-openaccess.py that
finds, among existing DOI links, those which are available in open
access via DOAI.io.

I'm now running the script for the ~40 most visited Wikipedias, but here
is the output for the Italian Wikipedia (430 DOIs):
https://it.wikipedia.org/wiki/Progetto:Coordinamento/Bibliografia_e_fonti/DOI

(source: OpenAccess-l)

Contributor

CristianCantoro commented Jun 2, 2016

I wanted to point out this initiative by @nemobis on Italian Wikipedia:

Today I wrote a small script
https://github.com/nemobis/bots/blob/master/doi-doai-openaccess.py that
finds, among existing DOI links, those which are available in open
access via DOAI.io.

I'm now running the script for the ~40 most visited Wikipedias, but here
is the output for the Italian Wikipedia (430 DOIs):
https://it.wikipedia.org/wiki/Progetto:Coordinamento/Bibliografia_e_fonti/DOI

(source: OpenAccess-l)

@wetneb

This comment has been minimized.

Show comment
Hide comment
@wetneb

wetneb Jun 2, 2016

Member

Fantastic! Thank you very much for pointing that out! I can see Dario has pointed Federico to our Wikicite mailing list so I'll just wait for him to go there before replying. But I've joined the list.

Member

wetneb commented Jun 2, 2016

Fantastic! Thank you very much for pointing that out! I can see Dario has pointed Federico to our Wikicite mailing list so I'll just wait for him to go there before replying. But I've joined the list.

@wetneb

This comment has been minimized.

Show comment
Hide comment
@wetneb

wetneb Jun 2, 2016

Member

@JackPotte has a correspondence between french and english citation parameters here: https://github.com/JackPotte/JackBot/blob/master/hyperlynx.py
If we can extract the same thing from other bots and store that in a structured format in one place, I think we're almost done, aren't we?

Member

wetneb commented Jun 2, 2016

@JackPotte has a correspondence between french and english citation parameters here: https://github.com/JackPotte/JackBot/blob/master/hyperlynx.py
If we can extract the same thing from other bots and store that in a structured format in one place, I think we're almost done, aren't we?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment