New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Attempt more exact Category match #2
Conversation
importer/make_KMB_info.py
Outdated
# Add parish/municipality categorisation when needed | ||
if item.needs_place_cat: | ||
item.make_place_category() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Break the whole section below out into a separate function. item.exact_category_from_name(cache)
and call it with self.category_cache
.
Not sure you want to put all
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved it to a separate function.
Could you develop your last sentence?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for that. The last sentence became the comment above instead.
importer/make_KMB_info.py
Outdated
if item.needs_place_cat: | ||
item.make_place_category() | ||
|
||
if self.category_exists(self.name): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make use of the full features of category_exists
i.e.
exact_category = self.kmb_info.category_exists(cache)
if exact_category:
parent_cats = exact_category.categories()
for cat ....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
importer/make_KMB_info.py
Outdated
# Add parish/municipality categorisation when needed | ||
if item.needs_place_cat: | ||
item.make_place_category() | ||
# Add parish/municipality categorisation when needed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't want to put the municipality/parish categorisation under the if not found_commonscat
.
I would suggest simply putting the item.exact_category_from_name
under its own if not found_commonscat
but after the parish/municipality categorisation has been dealt with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved it to a separate function.
importer/make_KMB_info.py
Outdated
exact_cat_name = exact_category.title(withNamespace=False) | ||
self.content_cats.add(exact_cat_name) | ||
else: | ||
self.meta_cats.add('needing more specific categorisation') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
per other category names maybe "needing categorisation (no exact match)"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
importer/make_KMB_info.py
Outdated
|
||
return list(item.content_cats) | ||
|
||
def get_exact_cat_from_name(self, cache): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should live in the KMBItem class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's where it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad. github diff made me confused.
importer/make_KMB_info.py
Outdated
@@ -465,6 +465,10 @@ def generate_content_cats(self, item): | |||
:param item: the KMBItem to analyse | |||
:return: list of categories (without "Category:" prefix) | |||
""" | |||
# Add parish/municipality categorisation when needed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these need to be after both # depicted
and # add tag categories unless a commonscat was found
as they modify item.needs_place_cat
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this not re-add categories that might have been removed as part of exact category adding?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I'm suggesting is the below. I.e. exact match is the last to be called since it relies on both class, keyword and place categories (if appropriate) for parent category matching. And since it's in the end nothing it removes will be re-added.
# depicted
found_commonscat = item.make_commonscat_categories(self.category_cache)
# add tag categories unless a commonscat was found
if not found_commonscat:
item.make_item_class_categories(self.category_cache)
item.make_item_keyword_categories(self.category_cache)
# Add parish/municipality categorisation when needed
if item.needs_place_cat:
item.make_place_category()
if not found_commonscat:
item.get_exact_cat_from_name(self.category_cache)
return list(item.content_cats)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Attempt to find a more exact category match based on name.
Task: https://phabricator.wikimedia.org/T176838