Skip to content
This repository has been archived by the owner on Nov 25, 2019. It is now read-only.

Italian Wikipedia dictionary isn't displayed correctly #65

Closed
bittachi opened this issue Mar 21, 2013 · 8 comments
Closed

Italian Wikipedia dictionary isn't displayed correctly #65

bittachi opened this issue Mar 21, 2013 · 8 comments

Comments

@bittachi
Copy link
Contributor

Hi, I use sometimes Aard Dictionary to read Italian Wikipedia offline. After I had downloaded from http://aarddict.org/d/itwiki/ the latest version of the it.wiki dictionary (2013-02-22) a few days ago, I noticed that some articles (such as "Italia" or "Roma") are badly displayed on the Android app: they look like the text is heavily shifted on the left (above all if you try to zoom in) and it's impossible to read them, 'cause they can't be slided on the right or zoomed out in order to be readable on the screen device. I've tested this dictionary either on Samsung Galaxy GT-S5570 or on Samsung Galaxy S III, but the problem is the same in both cases. Can you fix it, please? Thanks.

@itkach
Copy link
Member

itkach commented Mar 21, 2013

I see what you mean, but the defect is not in the program: article formatting is just broken in these articles. It may be possible to remove or replace broken HTML during dictionary compilation though.

@bittachi
Copy link
Contributor Author

So, if the defect is in the dictionary, can you contact who compiled it, in order to rebuild it removing these bad codes?

@itkach
Copy link
Member

itkach commented Mar 21, 2013

Can I contact me? I suppose I can :) The only difficulty is that chasing down every bad article and setting up content filters to remove or fix all the broken markup is extremely time consuming and tedious, so I physically can't fix everything for all wikipedias. I accept patches though :) Users can help by setting up aardtools, learning about content filters and sending me improved content filter files.

@bittachi
Copy link
Contributor Author

Well, I would like to help you, but I can't, 'cause I tried to compile one it.wiki dump with my poor 32-bit dual core notebook some time ago, but I remember it took too long to do so. What kind of computer do you use (I mean, how many cores I need to compile a dictionary in 1 day)?

@itkach
Copy link
Member

itkach commented Mar 21, 2013

I'm using 4 core 2.66GHz i7, but there's no need to compile whole wiki. One can compile a big enough sample using --article-count command line option or individually selected titles using --title. --title @mytitles.txt will read titles to compile from a text file (one title per line, file must be in utf8).

@bittachi
Copy link
Contributor Author

I know I can compile a sample from the whole wiki dump, but if I want to extract all the articles from it.wiki (1,000,000+) how much time do we need with a machine like yours? A day? 2 days? A week? Thanks.

@itkach
Copy link
Member

itkach commented Mar 21, 2013

I don't remember exactly, I think it was something around 11-12 hours. Whole enwiki takes 2.5 days (and ~24hours if we exclude reference lists and infobox templates)

@bittachi
Copy link
Contributor Author

OK, thank you so much for the answers. I close this issue, 'cause at the end it's just a problem of bad markup to be fixed with a specific configuration before starting to compile, so after a couple of tests it's possible to find the right one.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants