You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First of all, I want to thank you for sharing this script with the community.
I'm trying to regenerate Rebel dataset.
By using python -m wikiextractor.wikiextractor.WikiExtractor data/$1/$1wiki-latest-pages-articles-multistream.xml.bz2 --links --language $1 --output text/$1 --templates data/$1/templates.txt, I get page articles.
The wikidata entities are described by theirs links (--links), but wikidata-triplets.py use the wikidata ID.
How did you turn the links into IDs?
The text was updated successfully, but these errors were encountered:
Hi there, sorry for the late reply, I didn't see the issue until now. See the Readme:
For ./wikiextractor we use a submodule which is a fork of the original wikiextractor that implements wikimapper to extract the Wikidata entities. You can find the fork here, and clone it to the corresponding folder.
Hello,
First of all, I want to thank you for sharing this script with the community.
I'm trying to regenerate Rebel dataset.
By using
python -m wikiextractor.wikiextractor.WikiExtractor data/$1/$1wiki-latest-pages-articles-multistream.xml.bz2 --links --language $1 --output text/$1 --templates data/$1/templates.txt
, I get page articles.The wikidata entities are described by theirs links (--links), but wikidata-triplets.py use the wikidata ID.
How did you turn the links into IDs?
The text was updated successfully, but these errors were encountered: