Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"value too long for type character varying(3)" error when importing SKOS with language values more than 3 characters long #65

Closed
c-vradis opened this issue Aug 8, 2023 · 1 comment · Fixed by #68
Assignees

Comments

@c-vradis
Copy link

c-vradis commented Aug 8, 2023

Hi,

I installed VocabsEditor locally (a great tool indeed) and I tried importing the entire GEMET Concepts SKOS vocabulary version 4.2.3. There are many skos:prefLabels with language values such as: "en-US" e.g.:

<skos:prefLabel xml:lang="en-US">SOCIAL ASPECTS, ENVIRONMENTAL POLICY MEASURES</skos:prefLabel>

and that was when I got the error from the job_status page:

{"exc_type": "DataError", "exc_message": ["value too long for type character varying(3)\n"], "exc_module": "django.db.utils"}

I modified models.py and for all CharFields with max_length=3 I changed it to max_length=10. Then I also modified forms.py, run makemigrations + migrate and importing worked just fine.

So, my suggestion would be that the maximum length of the language value could be longer. What the exact maximum value should be I can't tell precisely, but maybe this Best Practice RFC could be of help: https://www.rfc-editor.org/rfc/rfc5646 - hope it helps.

Cheers

@zozlak
Copy link
Member

zozlak commented Aug 9, 2023

Just as a normative note - according to the RDF 1.1 Concepts and Abstract Syntax the language tag should be formatted according to the BCP47 which redirects to the RFC5646 according to which:

 langtag       = language
                 ["-" script]
                 ["-" region]
                 *("-" variant)
                 *("-" extension)
                 ["-" privateuse]
 language      = 2*3ALPHA            ; shortest ISO 639 code
                 ["-" extlang]       ; sometimes followed by
                                     ; extended language subtags
               / 4ALPHA              ; or reserved for future use
               / 5*8ALPHA            ; or registered language subtag
 extlang       = 3ALPHA              ; selected ISO 639 codes
                 *2("-" 3ALPHA)      ; permanently reserved
 script        = 4ALPHA              ; ISO 15924 code
 region        = 2ALPHA              ; ISO 3166-1 code
               / 3DIGIT              ; UN M.49 code
 variant       = 5*8alphanum         ; registered variants
               / (DIGIT 3alphanum)
 extension     = singleton 1*("-" (2*8alphanum))
                                     ; Single alphanumerics
                                     ; "x" reserved for private use
 singleton     = DIGIT               ; 0 - 9
               / %x41-57             ; A - W
               / %x59-5A             ; Y - Z
               / %x61-77             ; a - w
               / %x79-7A             ; y - z
 privateuse    = "x" 1*("-" (1*8alphanum))

which means that the language tag can be of more or less arbitrary length because the variant, extension and (an internal part of) privateuse parts can be repeated any number of times but assuming all parts are used exactly once, it's something like 8 [language] + 5 [script] + 4 [region] + 9 [variant] + 16 [extension] + 11 [privateuse] = 53.

@csae8092 csae8092 self-assigned this Oct 13, 2023
@csae8092 csae8092 linked a pull request Oct 13, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants