Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 18 additions & 1 deletion scrapi/harvesters/umontreal.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,14 +7,31 @@
'''
from __future__ import unicode_literals

from scrapi.base.helpers import updated_schema
from scrapi.base import OAIHarvester


def umontreal_language_processor(languages):

if not languages:
languages = []

return languages

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the original schema, each item in the languages array is transformed to a ISO 639-3 code, for example, 'English' becomes 'eng' and 'Russian' becomes 'rus'. I checked the source and it seems that they were not correctly coded (for example it uses 'en' instead of 'eng'). There is a function get_code in helpers.py that can do that for you. Can you implement it in your function to return the correct code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Below is the description in the schema:

  • "description": "The primary languages in which the content of the resource is presented. Values used for this element MUST conform to ISO 639\u20133. This offers two and three letter tags e.g. "en" or "eng" for English and "en-GB" for English used in the UK."

You can find all the tags here: https://www.loc.gov/standards/iso639-2/php/code_list.php

It is my understanding that the tags are correct - en maps to English and fr maps to French

Can you confirm that this still needs to be done?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeffreyliu3230 This is ready to merge, right?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh good find! Yes this should be ready to merge.



class UmontrealHarvester(OAIHarvester):
short_name = 'umontreal'
long_name = u"PAPYRUS - Dépôt institutionnel de l'Université de Montréal"
url = 'http://papyrus.bib.umontreal.ca'

base_url = 'http://papyrus.bib.umontreal.ca/oai/request'
property_list = ['date', 'identifier', 'type', 'format', 'setSpec']

@property
def schema(self):
return updated_schema(self._schema, {
'languages': ('//dc:language/node()', umontreal_language_processor)
})

property_list = ['identifier', 'type', 'format', 'setSpec']

timezone_granularity = True
Loading