Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempt more exact Category match #2

Merged
merged 6 commits into from Sep 27, 2017
Merged

Attempt more exact Category match #2

merged 6 commits into from Sep 27, 2017

Conversation

Vesihiisi
Copy link
Owner

Attempt to find a more exact category match based on name.

Task: https://phabricator.wikimedia.org/T176838

# Add parish/municipality categorisation when needed
if item.needs_place_cat:
item.make_place_category()

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Break the whole section below out into a separate function. item.exact_category_from_name(cache) and call it with self.category_cache.

Not sure you want to put all

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved it to a separate function.

Could you develop your last sentence?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for that. The last sentence became the comment above instead.

if item.needs_place_cat:
item.make_place_category()

if self.category_exists(self.name):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make use of the full features of category_exists i.e.

exact_category = self.kmb_info.category_exists(cache)
if exact_category:
    parent_cats = exact_category.categories()
    for cat ....

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

# Add parish/municipality categorisation when needed
if item.needs_place_cat:
item.make_place_category()
# Add parish/municipality categorisation when needed
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't want to put the municipality/parish categorisation under the if not found_commonscat.

I would suggest simply putting the item.exact_category_from_name under its own if not found_commonscat but after the parish/municipality categorisation has been dealt with.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved it to a separate function.

exact_cat_name = exact_category.title(withNamespace=False)
self.content_cats.add(exact_cat_name)
else:
self.meta_cats.add('needing more specific categorisation')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

per other category names maybe "needing categorisation (no exact match)"

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


return list(item.content_cats)

def get_exact_cat_from_name(self, cache):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should live in the KMBItem class

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's where it is.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad. github diff made me confused.

@@ -465,6 +465,10 @@ def generate_content_cats(self, item):
:param item: the KMBItem to analyse
:return: list of categories (without "Category:" prefix)
"""
# Add parish/municipality categorisation when needed
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these need to be after both # depicted and # add tag categories unless a commonscat was found as they modify item.needs_place_cat.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this not re-add categories that might have been removed as part of exact category adding?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I'm suggesting is the below. I.e. exact match is the last to be called since it relies on both class, keyword and place categories (if appropriate) for parent category matching. And since it's in the end nothing it removes will be re-added.

# depicted
found_commonscat = item.make_commonscat_categories(self.category_cache)

# add tag categories unless a commonscat was found
if not found_commonscat:
    item.make_item_class_categories(self.category_cache)
    item.make_item_keyword_categories(self.category_cache)

# Add parish/municipality categorisation when needed
if item.needs_place_cat:
    item.make_place_category()

if not found_commonscat:
    item.get_exact_cat_from_name(self.category_cache)

return list(item.content_cats)

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@Vesihiisi Vesihiisi merged commit 5c85967 into master Sep 27, 2017
@Vesihiisi Vesihiisi deleted the categorization branch September 27, 2017 13:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants