Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Templates don't get expanded #151
Any idea why none of the templates get expanded? I ran WikiExtractor.py an initial time and saved all templates to a file (named "templates", it's 2358539 lines long) to try to debug. I'm trying to extract all wiktionary articles but the resulting text looks like this (blank text in place of templates):
This was the command I ran:
This was the output:
I have been working on extracting templates for months and this looks like an amazing tool if I can get it to work. Thanks for all the work you all are doing on it!
@dnishiyama Do you still have this issue? I also encountered a similar problem, and it seems that there is an issue with the current script when it's applied to Wiktionary dumps. Specifically, when it expands templates, it tries to "normalize" template titles by converting the first letter of the template to upper case, although template titles are stored without normalization.
After removing those applications of
I am also encountering this problem on the July 20th, 2018 English Wikipedia dump. Here was my command:
Here is an example of an incorrectly extracted sentence from Wikipedia Page ID 12.
WikiExtractor Output: The word "anarchism" is composed from the word "anarchy" and the suffix -ism, themselves derived respectively from the Greek , i.e. "anarchy" (from , "anarchos", meaning "one without rulers"; from the privative prefix ἀν- ("an-", i.e. "without") and , "archos", i.e. "leader", "ruler"; (cf. "archon" or , "arkhē", i.e. "authority", "sovereignty", "realm", "magistracy")) and the suffix or ("-ismos", "-isma", from the verbal infinitive suffix , "-izein").
Real Wikipedia Value: The word "anarchism" is composed from the word "anarchy" and the suffix -ism, themselves derived respectively from the Greek ἀναρχία, i.e. anarchy (from ἄναρχος, anarchos, meaning "one without rulers"; from the privative prefix ἀν- (an-, i.e. "without") and ἀρχός, archos, i.e. "leader", "ruler"; (cf. archon or ἀρχή, arkhē, i.e. "authority", "sovereignty", "realm", "magistracy")) and the suffix -ισμός or -ισμα (-ismos, -isma, from the verbal infinitive suffix -ίζειν, -izein).
I've also found other types of template expansions missing such as distance measurements.