Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data model: compare existing MARC tags with JSON data model #779

Closed
tiborsimko opened this issue Feb 27, 2015 · 9 comments
Closed

data model: compare existing MARC tags with JSON data model #779

tiborsimko opened this issue Feb 27, 2015 · 9 comments

Comments

@tiborsimko
Copy link
Member

@pamfilos noticed one record containing 556 $u $y, which is not present in the JSON data model configuration.

@tiborsimko should run comparison script to discover more like this...

@pherterich
Copy link
Member

It will be included in the JSON data model config. It is modeled on paper already and we're working on getting it from paper into the config file. Will happen next week. Should be the only field, seeing as we took the config file + my notes.

@tiborsimko
Copy link
Member Author

I've run the script and there are actually more "unknown" tags. Here is the list:

  • 037__a
  • 041__a
  • 269__a
  • 269__b
  • 269__c
  • 300__a
  • 516__y
  • 538__u
  • 556__u
  • 556__y

This may be important if we would like to index them and/or display them properly.

@pherterich Are you going to amend opendata.cfg or shall I do that?

@tiborsimko tiborsimko added this to the v2.0 milestone Jul 14, 2015
@tiborsimko tiborsimko removed their assignment Jul 14, 2015
@tiborsimko
Copy link
Member Author

For reference, here is complete overview of which tags are used and how frequently:

# TAG     NB_RECORDS    # recID1 recID2 ... recID9 (example records)
005__     99         # 1 2 3 4 5 6 7 8 9
0247_2    58         # 1102 1100 1101 328 329 331 700 201 202
0247_a    58         # 1 2 3 4 5 6 7 8 9
037__a    1          # 410
041__a    11         # 55 330 50 59 61 57 58 410 411
100__a    57         # 50 52 55 56 57 58 61 101 200
100__h    1          # 450
100__i    1          # 450
100__u    6          # 329 330 450 61 56 410
110__a    46         # 1 2 3 4 5 6 7 8 9
110__g    2          # 53 54
245__a    99         # 1 2 3 4 5 6 7 8 9
246__a    17         # 1 2 3 4 5 6 7 8 9
256__a    62         # 1200 1201 1202 1203 331 212 200 101 553
256__b    47         # 1 2 3 4 5 6 7 8 9
256__e    52         # 1 2 3 4 5 6 7 8 9
256__f    47         # 1 2 3 4 5 6 7 8 9
256__t    62         # 1200 1201 1202 1203 328 331 203 204 205
260__b    87         # 1200 1201 1202 1203 1204 320 321 322 323
260__c    94         # 1200 1201 1202 1203 1102 1100 1101 203 204
264_0c    14         # 1 13 2 3 4 5 12 11 9
269__a    4          # 410 411 412 413
269__b    4          # 410 411 412 413
269__c    4          # 410 411 412 413
300__a    6          # 330 55 410 411 412 413
500__a    1          # 328
505__g    1          # 328
505__t    1          # 328
516__a    12         # 50 51 59 61 101 200 329 331 550
516__w    6          # 329 331 200 552 101 553
516__y    1          # 329
520__a    97         # 1 2 3 4 5 6 7 8 9
538__a    36         # 201 202 600 601 602 603 604 605 606
538__b    28         # 600 601 602 603 604 605 606 607 608
538__i    6          # 200 101 553 552 550 555
538__u    1          # 249
538__w    5          # 200 101 553 552 550
540__a    35         # 331 200 101 552 550 201 202 300 301
556__a    46         # 1 2 3 4 5 6 7 8 9
556__u    1          # 250
556__w    14         # 1 13 2 3 4 5 12 11 9
556__y    1          # 250
567__a    47         # 1 2 3 4 5 6 7 8 9
567__w    15         # 700 600 601 602 603 604 605 606 607
581__a    78         # 1 2 3 4 5 6 7 8 9
581__u    61         # 1 2 3 4 5 6 7 8 9
581__w    15         # 328 201 202 203 204 205 206 207 208
581__y    67         # 1200 1201 1202 1100 1101 1204 328 331 200
583__a    24         # 328 700 201 202 300 1 13 2 3
583__u    16         # 1 13 2 3 4 5 12 11 9
583__w    4          # 700 200 101 550
583__y    16         # 1 13 2 3 4 5 12 11 9
6531_a    22         # 1201 1202 55 60 53 56 1102 50 51
653__a    4          # 328 329 330 331
693__a    88         # 1200 1201 1202 1203 1102 55 1100 1101 1204
693__e    88         # 1200 1201 1202 1203 1102 1100 1101 1204 412
700__a    15         # 50 203 204 205 206 207 208 209 210
700__h    1          # 450
700__i    1          # 450
700__u    2          # 329 450
710__a    2          # 53 59
772__a    31         # 1 2 3 4 5 6 7 8 9
772__o    17         # 201 202 600 601 602 603 604 605 606
772__w    17         # 201 202 600 601 602 603 604 605 606
777__a    2          # 201 202
777__w    2          # 201 202
787__a    6          # 200 101 553 552 550 249
787__n    1          # 550
787__o    1          # 101
787__w    5          # 200 553 552 550 249
8564_s    79         # 1102 55 1100 1101 320 321 322 323 324
8564_u    90         # 1102 55 1100 1101 320 321 322 323 324
8564_y    11         # 328 329 60 50 51 52 53 54 59
8564_z    22         # 320 321 322 323 324 325 1 13 2
8567_2    31         # 1200 1201 1202 1203 1102 1100 1101 320 321
8567_s    31         # 1 2 3 4 5 6 7 8 9
8567_u    31         # 1 2 3 4 5 6 7 8 9
942__e    28         # 1 2 3 4 5 6 7 8 9
960__c    14         # 1 13 2 3 4 5 12 11 9
980__a    99         # 1 2 3 4 5 6 7 8 9
980__b    55         # 60 600 601 602 603 604 605 606 607

@pherterich
Copy link
Member

Sorry, put that on hold after the half finished last try. I can amend this before I go on holiday.
When I move all this to json schema, is it better to create one master file or is it okay to create one per collection?

@tiborsimko
Copy link
Member Author

For the current production site, we need to amend opendata.cfg still:

as I've been doing in the following commits:

  • 70301ea records: fix for 110__g/710__g in data model
  • e41bb3f templates: updates - fixes url section
  • 9e5ca8b templates: fixes wrong link displaying for '516'
  • 2c56b10 records: addition of 505 to JSON data model
  • ce5fb77 records: addition of 787__n
  • e8f3556 records: addition of 787__o
  • d38347a records: MARC tag 8567 in data model
  • a5b4be4 records: new field 538__i
  • 0bf57c6 records: fix for 583__b in data model
  • 385208e records: new 583__b in data model
  • 9452e6c records: methodology_note is now a list
  • 4de702b records: material_publication_note forced as list
  • 94fc506 records: JSON model update
  • 2415d55 records: new record data model fields
  • e98249c base: JSON data model fix for external URLs
  • 652ac91 records: initial release of JSON data model

For the future JSON schema, we may see later, because we would not most probably need any MARC mapping anymore...

@tiborsimko
Copy link
Member Author

@AnxhelaDani This issue is relevant for your PR #1024. I'll re-run the comparison scripts and update the latest status.

@tiborsimko tiborsimko assigned AnxhelaDani and unassigned suenjedt Apr 8, 2016
@tiborsimko
Copy link
Member Author

Here is the tag usage overview on latest QA. (that does not run several last commits WRT field changes, such as run period)

# TAG     NB_RECORDS    # recID1 recID2 ... recID9 (example records)
005__     2061           # 1 2 3 4 5 6 7 8 9
0247_2    500            # 1120 1100 1101 1102 1103 1104 1105 1106 1107
0247_a    500            # 1 2 3 4 5 6 7 8 9
035__9    408            # 3000 3001 3002 3003 3004 3005 3006 3007 3008
035__a    408            # 3000 3001 3002 3003 3004 3005 3006 3007 3008
037__a    1              # 410
041__a    12             # 40 330 50 59 61 55 57 58 410
100__a    86             # 50 52 55 56 57 58 61 101 200
100__h    2              # 450 451
100__i    2              # 450 451
100__u    7              # 329 330 450 451 61 56 410
110__a    1978           # 1200 1201 1202 1203 1120 40 1100 1101 1102
110__g    2              # 53 54
245__a    2061           # 1 2 3 4 5 6 7 8 9
246__a    431            # 1 2 3 4 5 6 7 8 9
256__a    548            # 1200 1201 1202 1203 331 352 212 200 101
256__b    514            # 1 2 3 4 5 6 7 8 9
256__e    535            # 1 2 3 4 5 6 7 8 9
256__f    514            # 1 2 3 4 5 6 7 8 9
256__t    548            # 1200 1201 1202 1203 328 331 554 203 204
260__b    2036           # 1200 1201 1202 1203 1204 320 321 322 323
260__c    2054           # 1200 1201 1202 1203 1120 1100 1101 1102 1103
264_0c    1944           # 201 202 600 601 602 603 604 605 606
269__a    4              # 410 411 412 413
269__b    4              # 410 411 412 413
269__c    4              # 410 411 412 413
300__a    6              # 330 55 410 411 412 413
500__a    1              # 328
505__g    2              # 328 554
505__t    2              # 328 554
516__a    20             # 50 51 59 61 101 200 220 221 233
516__u    2              # 560 460
516__w    8              # 101 200 233 234 329 331 552 553
516__y    1              # 329
520__a    2058           # 1 2 3 4 5 6 7 8 9
538__a    464            # 201 202 600 601 602 603 604 605 606
538__b    448            # 600 601 602 603 604 605 606 607 608
538__i    12             # 352 200 101 553 552 550 560 551 460
538__u    2              # 249 251
538__w    10             # 200 101 553 552 550 560 460 551 233
540__a    445            # 331 200 101 552 550 560 551 233 234
556__a    471            # 700 201 202 230 231 600 601 602 603
556__u    2              # 250 252
556__w    34             # 1 13 2 3 4 5 12 11 9
556__y    2              # 250 252
567__a    474            # 1 2 3 4 5 6 7 8 9
567__u    1              # 252
567__w    40             # 554 700 201 202 600 601 602 603 604
581__a    573            # 1200 1201 1202 1100 1101 1102 1103 1104 1105
581__u    546            # 1200 1201 1202 1100 1101 1102 1103 1104 1105
581__w    18             # 328 201 202 55 203 204 205 206 207
581__y    502            # 1200 1201 1202 1100 1101 1102 1103 1104 1105
583__a    434            # 328 700 201 202 300 1 13 2 3
583__u    38             # 1 13 2 3 4 5 12 11 9
583__w    12             # 700 201 200 101 550 560 460 551 233
583__y    38             # 1 13 2 3 4 5 12 11 9
593__a    381            # 1300 1305 1310 1312 1314 1315 1317 1318 1320
593__b    381            # 1300 1301 1302 1303 1304 1305 1306 1307 1308
6531_a    72             # 1201 1202 40 60 53 41 1120 50 51
653__a    4              # 328 329 330 331
655_79    381            # 1300 1301 1302 1303 1304 1305 1306 1307 1308
655_7a    381            # 1300 1301 1302 1303 1304 1305 1306 1307 1310
6931_a    56             # 320 321 322 323 324 325 340 341 342
6931_e    56             # 320 321 322 323 324 325 340 341 342
693__a    2002           # 1200 1201 1202 1203 1120 40 1100 1101 1102
693__e    2002           # 1200 1201 1202 1203 1120 40 1100 1101 1102
700__a    20             # 50 203 204 205 206 207 208 209 210
700__h    2              # 450 451
700__i    2              # 450 451
700__u    3              # 329 450 451
710__a    2              # 53 59
770__a    381            # 1300 1301 1302 1303 1304 1305 1306 1307 1308
770__w    381            # 1300 1301 1302 1303 1304 1305 1306 1307 1308
772__a    454            # 1 2 3 4 5 6 7 8 9
772__o    39             # 554 609 700 201 606 202 602 600 601
772__w    39             # 554 609 700 201 606 202 602 600 601
777__a    4              # 201 202 230 231
777__w    4              # 201 202 230 231
787__a    10             # 200 233 101 553 552 550 551 249 251
787__n    2              # 550 551
787__o    1              # 101
787__u    2              # 550 551
787__w    8              # 200 233 249 251 550 551 552 553
787__y    2              # 550 551
8564_a    2              # 57 58
8564_s    1060           # 1120 40 1100 1101 1102 1103 1104 1105 1106
8564_u    1073           # 1120 40 1100 1101 1102 1103 1104 1105 1106
8564_y    12             # 40 328 329 60 50 51 52 53 54
8564_z    473            # 320 321 322 323 324 325 340 341 342
8567_2    493            # 1200 1201 1202 1203 1120 1100 1101 1102 1103
8567_s    493            # 1 2 3 4 5 6 7 8 9
8567_u    493            # 1 2 3 4 5 6 7 8 9
942__e    1909           # 600 601 602 603 604 605 606 607 608
942__r    30             # 600 601 602 603 604 605 606 607 608
960__c    34             # 1 13 2 3 4 5 12 11 9
964_0c    1906           # 201 202 200 101 553 552 550 250 249
980__a    2061           # 1200 1201 1202 1203 1204 1120 40 1100 1101
980__b    1488           # 40 60 600 601 602 603 604 605 606

@tiborsimko
Copy link
Member Author

... and here are the tags that should be especially verified:

  • 005__
  • 035__9
  • 035__a
  • 037__a
  • 041__a
  • 269__a
  • 269__b
  • 269__c
  • 300__a
  • 516__u
  • 516__y
  • 538__u
  • 567__u
  • 593__a
  • 593__b
  • 6531_a
  • 653__a
  • 655_79
  • 655_7a
  • 6931_a
  • 6931_e
  • 770__a
  • 770__w
  • 787__u
  • 787__y
  • 942__r
  • 964_0c

Perhaps some of them are typos and the data should be corrected, but for most of them they are missing in opendata.cfg and should be added...

@AnxhelaDani
Copy link
Contributor

6931_a
6931_e

According to local field in Invenio indicators should be both undefined, I'll change the 56 records to 693__ a and 693__e

942__r

I won't include it as we decided 964_0c field for Run period

tiborsimko pushed a commit that referenced this issue Apr 20, 2016
* Addition of `520__u` and `520__w` to the data model.
  (addresses #779) (PR #1084)

* Deletion of `653__a`, using only `6531_a`. The remaining 4 records
  will be updated later.

* Addition of schema with forcing list for `964_0c`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants