New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

full code review #44

Closed
wants to merge 361 commits into
base: deploy
from

Conversation

Projects
None yet
@xchrdw
Member

xchrdw commented Mar 27, 2014

No description provided.

felixniemeyer and others added some commits Jan 23, 2014

Merge branch 'modify_entity_selector_from_extension' of github.com:Wi…
…kidata-lib/PropertySuggester into SuggestionsByPrefix
Merge branch 'SuggestionsByPrefix' of https://github.com/Wikidata-lib…
…/property-suggester into modify_entity_selector_from_extension

fix whitespace and some code improvements
Merge pull request #24 from Wikidata-lib/modify_entity_selector_from_…
…extension

Modify entity selector from extension to show suggestions
Access to Property Information changed
Small fix: Only show aliases if available
Merge pull request #29 from Wikidata-lib/use_claim_object_instead_of_…
…tupels

Use entity and claim objects instead of tupels

Dacry and others added some commits Jun 12, 2014

Merge pull request #74 from Wikidata-lib/fixResultSize
fix ResultSize + fix naming + strategy in generateSuggestionsByPropertyList
Merge pull request #76 from Wikidata-lib/fix_review_issues_3
remove condition for autoloader, secure $minProbability
Merge pull request #77 from filbertkm/readme
Fix packagist links in readme
Check that vendor/autoload.php exists
If we install property suggester via composer,
by including it in the Wikidata "build", in
MediaWiki's composer.json or other such setup
then autoload.php will be elsewhere and handled
elsewhere.
Merge pull request #78 from filbertkm/autoload
Check that vendor/autoload.php exists
Remove reference to wikidata in description
Best to make the description general without
specific mention of Wikidata.
Merge pull request #80 from Wikidata-lib/fix_bug
fix qualifier were counted as mainsnak
Fix remoteExtPath for installation in non-standard locations
Same hack used in Wikibase, ValueView, etc. to allow this
to work if an extension is installed in a non-standard place.
Merge pull request #82 from filbertkm/remoteextpath
Fix remoteExtPath for installation in non-standard locations
parent::__construct();
$this->mDescription = "Read CSV Dump and refill probability table";
$this->addOption( 'file', 'CSV table to be loaded (relative path)', true, true );
$this->setBatchSize( 10000 );

This comment has been minimized.

@AaronSchulz

AaronSchulz Jun 26, 2014

1000 would be more appropriate. The current value might cause some slave lag due to having too many rows at once.

@AaronSchulz

AaronSchulz Jun 26, 2014

1000 would be more appropriate. The current value might cause some slave lag due to having too many rows at once.

This comment has been minimized.

@xchrdw

xchrdw Jun 26, 2014

Member

We currently use 10000 as suggested here: https://bugzilla.wikimedia.org/show_bug.cgi?id=63224#c5
Should it be reduced to 1000?

@xchrdw

xchrdw Jun 26, 2014

Member

We currently use 10000 as suggested here: https://bugzilla.wikimedia.org/show_bug.cgi?id=63224#c5
Should it be reduced to 1000?

This comment has been minimized.

@AaronSchulz

AaronSchulz Jun 26, 2014

Preferably. Of course it always be set at run-time, but 1000 seem like a safer default (and doesn't lose much speed vs 10000).

@AaronSchulz

AaronSchulz Jun 26, 2014

Preferably. Of course it always be set at run-time, but 1000 seem like a safer default (and doesn't lose much speed vs 10000).

@@ -0,0 +1,13 @@

This comment has been minimized.

@AaronSchulz

AaronSchulz Jun 26, 2014

It would be nice if the fields had one line description comments.

@AaronSchulz

AaronSchulz Jun 26, 2014

It would be nice if the fields had one line description comments.

) /*$wgDBTableOptions*/;
CREATE INDEX /*i*/propertypairs_pid1_pid2_qid1_context ON /*_*/wbs_propertypairs (pid1, qid1, pid2, context);

This comment has been minimized.

@AaronSchulz

AaronSchulz Jun 26, 2014

What order of size will this table be?

@AaronSchulz

AaronSchulz Jun 26, 2014

What order of size will this table be?

This comment has been minimized.

@filbertkm

filbertkm Jun 26, 2014

Contributor

will be ~84,000 rows initially

@filbertkm

filbertkm Jun 26, 2014

Contributor

will be ~84,000 rows initially

)
);
$this->lb->reuseConnection( $dbr );

This comment has been minimized.

@AaronSchulz

AaronSchulz Jun 26, 2014

Is there are sense of how many rows there are per pid1 values, per (pid1, qid1) tuples, and per (pid1, qid1, pid2) tuples. How often is qid1 going to be null? Also, what are the variations that context might have? The current secondary index looks plausible for the WHERE, though it's hard to say for sure without knowing the selectivity.

What range of values might $limit have? This query will need a quicksort, though that's OK as long as the number of matching rows from the WHERE is moderate. Likewise the HAVING can't use any index, though it's ok if the matching rows from the WHERE are moderate. Is this likely to be the case?

@AaronSchulz

AaronSchulz Jun 26, 2014

Is there are sense of how many rows there are per pid1 values, per (pid1, qid1) tuples, and per (pid1, qid1, pid2) tuples. How often is qid1 going to be null? Also, what are the variations that context might have? The current secondary index looks plausible for the WHERE, though it's hard to say for sure without knowing the selectivity.

What range of values might $limit have? This query will need a quicksort, though that's OK as long as the number of matching rows from the WHERE is moderate. Likewise the HAVING can't use any index, though it's ok if the matching rows from the WHERE are moderate. Is this likely to be the case?

This comment has been minimized.

@xchrdw

xchrdw Jun 26, 2014

Member

There are currently 80.000 rows generated from the last wikidata dump. The default limit will be 7 as this will mostly be queried by the entityselector widget. The limit is increased to 500 in case the suggestions will be filtered by a string afterwards. At the moment qid1 is null in all cases as the code that uses values for suggestions is not ready (and will probably not be finished during the project). We made some measurements on the performance with and without index on a local mysql db:
Here is the result with suggestions for differently sized input property-sets:
https://raw.githubusercontent.com/wiki/Wikidata-lib/PropertySuggester/img/performance.png
(labels: query time / number of properties)
with the current state of wikidata most queries will be for relatively small items:
https://raw.githubusercontent.com/wiki/Wikidata-lib/PropertySuggester/img/property-count.png
(labels: number properties per item / number of items)

@xchrdw

xchrdw Jun 26, 2014

Member

There are currently 80.000 rows generated from the last wikidata dump. The default limit will be 7 as this will mostly be queried by the entityselector widget. The limit is increased to 500 in case the suggestions will be filtered by a string afterwards. At the moment qid1 is null in all cases as the code that uses values for suggestions is not ready (and will probably not be finished during the project). We made some measurements on the performance with and without index on a local mysql db:
Here is the result with suggestions for differently sized input property-sets:
https://raw.githubusercontent.com/wiki/Wikidata-lib/PropertySuggester/img/performance.png
(labels: query time / number of properties)
with the current state of wikidata most queries will be for relatively small items:
https://raw.githubusercontent.com/wiki/Wikidata-lib/PropertySuggester/img/property-count.png
(labels: number properties per item / number of items)

@xchrdw xchrdw closed this Jul 1, 2014

@mariushoch mariushoch deleted the review branch Mar 24, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment