-
Notifications
You must be signed in to change notification settings - Fork 51
Fill creator name in finnish museum DAG #978
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of thoughts!
openverse_catalog/dags/providers/provider_api_scripts/finnish_museums.py
Outdated
Show resolved
Hide resolved
author = "; ".join(list(author.keys())) | ||
authors.append(author) | ||
|
||
return "; ".join(authors) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will return an empty string if no authors are found, but it might be better to return None
in that case to have NULL
in the database rather than an empty string. Maybe we could do self.get_creator(data) or None
above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make sure that we don't have different implementations for when creator name is not available (""
vs None
), we could add a check and conversion here:
def prepare_string(self, value): |
@AetherUnbound, should we create an issue for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! 😮 I was naive to believe it would return None
this way. Fixed in last commits. Thanks, @AetherUnbound.
@obulat Adding an extra check there or/and in the MediaStore
classes sounds good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your idea about adding a check in MediaStore
is better than the columns one, @krysal. We could add
if title === "":
title = None
in the MediaStore here:
def clean_media_metadata(self, **media_data) -> dict | None: |
Are there any other string fields that need that, too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think title
and creator
are both good fields to check for that!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
We should fix the utf-8 issue in jsons, though.
"Kuva on julkaistu Nime\u00e4 4.0 Kansainv\u00e4linen (CC BY 4.0)-lisenssill\u00e4. Verkkopalvelussa julkaistua n\u00e4ytt\u00f6kuvaa voi k\u00e4ytt\u00e4\u00e4 vapaasti my\u00f6s kaupallisessa k\u00e4yt\u00f6ss\u00e4. Kuvan yhteydess\u00e4 on mainittava tekij\u00e4 ja/tai kuvaaja sek\u00e4 organisaation ja kokoelman nimi. Kuvan k\u00e4ytt\u00e4j\u00e4ll\u00e4 on vastuu tekij\u00e4noikeuksien ja yksityisyyden suojan kunnioittamisesta. Esimerkiksi henkil\u00f6kuvien k\u00e4ytt\u00f6 markkinoinnissa ja/tai mainonnassa on kielletty ilman kuvassa olevan henkil\u00f6n suostumusta. Museovirasto pyrkii julkaisemansa tiedon oikeellisuuteen. Kuvien tiedoissa voi kuitenkin olla puutteellisuuksia ja virheit\u00e4. Virheen huomatessanne pyyd\u00e4mme ottamaan yhteytt\u00e4 Kuvakokoelmien asiakaspalveluun. Kuvia ja tietoja julkaistaessa asiakkaan vastuulla on varmistaa niiden oikeellisuus. Painolaatuiset kuvat ovat maksullisia. Lis\u00e4tietoja tarvittaessa: Museoviraston Kuvakokoelmat (kuvakokoelmat@museovirasto.fi)." | ||
], | ||
"link": "http://creativecommons.org/licenses/by/4.0/deed.fi" | ||
"SA-kuvat ovat lisensoitu Nime\u00e4 4.0 Kansainv\u00e4linen (CC BY 4.0) -lisenssill\u00e4 ja ne ovat vapaasti k\u00e4ytett\u00e4viss\u00e4 ja julkaistavissa. Kuvat ovat n\u00e4ht\u00e4viss\u00e4 my\u00f6s www.sa-kuva.fi -palvelussa." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that ensureAscii=false
wasn't used when saving the json
file so that the utf-8 characters are not converted to \uxxx
-type codes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! It seems that the diff is also showing some changes from #934, it might be best to rebase before we merge just to resolve those!
* Fill creator name in finnish museum DAG (#978) * Bump pre-commit from 2.21.0 to 3.0.2 (#983) * 🔄 synced local '.pre-commit-config.yaml' with remote 'templates/.pre-commit-config.yaml.jinja' * Update docstrings to conform to linting * Update DAGs.md --------- Co-authored-by: Krystle Salazar <krystle.salazar@automattic.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: openverse-bot <null> Co-authored-by: Madison Swain-Bowden <bowdenm@spu.edu>
Fixes
Fixes WordPress/openverse#1294 by @krysal
Description
This PR changes the query parameters of the Finnish museum to request only the fields that the DAG actually uses (including the creators). The creator can be of three types: "primary", "secondary" and/or "corporate". I made it include the available ones in said order, separated by a ";" since the names of an individual author can include commas as well.
Testing Instructions
Run the
finnish_museums_workflow
DAG, wait for a while and observe the rows in the image table.Checklist
Update index.md
).main
) or a parent feature branch.Developer Certificate of Origin
Developer Certificate of Origin