Get multimedia categories from multimedia.txt #61

peterdesmet · 2015-02-11T08:29:26Z

Description

For a given dataset, I want to know how many records have media associated. I also want to know if there are any issues. This implementation is an enhancement over #43, where we also take media into consideration without a type (and there are lots of those).

Output

datasetKey
multimedia_not_provided
multimedia_url_invalid
multimedia_valid

Terms we need

issues (occurrence.txt)
gbifID (multimedia.txt)

Process

Search for related media.

FOR EACH occurrence
    IF issues CONTAINS ( MULTIMEDIA_URI_INVALID )
        THEN category = "multimedia_url_invalid" // This issue should get priority over valid multimedia
    ELSE search for related multimedia in multimedia.txt
         IF found
             THEN category = "multimedia_valid"
         ELSE no related media found
             THEN category = "multimedia_not_provided"

The text was updated successfully, but these errors were encountered:

peterdesmet · 2015-02-11T08:37:49Z

@niconoe, some notes:

I'm ignoring multiple media for the same occurrence, as I want to assess the number of occurrences with media, not the number of linked media. I'm selecting the type of the first media (a drawback). My loop assumes gbifID is ordered to ignore duplicates. In the example I have, that is the case, but it might not be so.
In the above described process, we're not populating media_not_provided: for that we need to look into occurrence.txt as well. Currently, we assume that the remainder (total occurrences minus media_video, media_audio, media_image, media_unknown) will be the media_not_provided, but it's somewhat flaky.
In the above described process, we're not populating media_url_invalid: for that we need to look into occurrence.txt as well. This is quite a big drawback, as all linked images might not work.

If you have a solution for looking into occurrence.txt and multimedia.txt together, we can solve 2 and 3.

niconoe · 2015-02-13T13:15:48Z

@peterdesmet : there are definitely possible improvements over issue #43, since I have tools to look in parallel into occurrence.txt and multimedia.txt. But the way they work currently is starting from the occurrence and then reaching the attached extensions, so your algorithm is not really applicable as-is...

So to be honest I'm a bit lost there in implementation discussions...
Could you clarify the "what and why" ? I'll take care of the "how". More specifically:

Compared to existing: what should change in the output data ? the format ? or only richer content due to the fact that we inspect multimedia.txt ? or both ?
What advantage does that provide over the existing ?
What's the priority of this ticket compared to other ?

Generally speaking, I think we can become even better if we think more in terms of interface/implementation separation (black box analogy) ! Best

peterdesmet · 2015-02-16T09:23:22Z

Ok, it's actually better that you look in parallel. I'll update the issue.

peterdesmet · 2015-02-16T09:38:22Z

Issue updated. Let me know if it makes sense.

peterdesmet · 2015-02-16T10:15:12Z

@niconoe, I discussed this with @bartaelterman. The precedence for one type over the other is a bit weird. We decided to do it simpler:

Has valid media
URL invalid
Has no media

Media types is something we should tackle for media, not occurrences. We'll do this in another issue, and it is probably beyond the scope of this POC.

See #61

bartaelterman · 2015-02-16T11:33:16Z

As discussed with @peterdesmet I documented an extraction procedure in #63 that combines the backend requirements of this issue and those of #60.

peterdesmet added enhancement backend labels Feb 11, 2015

peterdesmet assigned niconoe Feb 11, 2015

peterdesmet added this to the Media type milestone Feb 11, 2015

niconoe mentioned this issue Feb 13, 2015

Get sample of images #60

Closed

peterdesmet changed the title ~~Get media type categories from multimedia.txt~~ Get multimedia from multimedia.txt Feb 16, 2015

peterdesmet changed the title ~~Get multimedia from multimedia.txt~~ Get multimedia categories from multimedia.txt Feb 16, 2015

peterdesmet added a commit that referenced this issue Feb 16, 2015

Use revised multimedia categories

5a91758

See #61

peterdesmet mentioned this issue Feb 16, 2015

Show multimedia categories #44

Closed

bartaelterman mentioned this issue Feb 16, 2015

Extract multimedia information #63

Closed

bartaelterman closed this as completed Feb 16, 2015

peterdesmet added the duplicate label Feb 16, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Get multimedia categories from multimedia.txt #61

Get multimedia categories from multimedia.txt #61

peterdesmet commented Feb 11, 2015

peterdesmet commented Feb 11, 2015

niconoe commented Feb 13, 2015

peterdesmet commented Feb 16, 2015

peterdesmet commented Feb 16, 2015

peterdesmet commented Feb 16, 2015

bartaelterman commented Feb 16, 2015

Get multimedia categories from multimedia.txt #61

Get multimedia categories from multimedia.txt #61

Comments

peterdesmet commented Feb 11, 2015

Description

Output

Terms we need

Process

peterdesmet commented Feb 11, 2015

niconoe commented Feb 13, 2015

peterdesmet commented Feb 16, 2015

peterdesmet commented Feb 16, 2015

peterdesmet commented Feb 16, 2015

bartaelterman commented Feb 16, 2015