/
tutorial.xml
927 lines (829 loc) · 46 KB
/
tutorial.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
<article>
<articleinfo>
<title>MARC4J tutorial</title>
<copyright>
<year>2002-2006</year>
<holder>Bas Peters</holder>
</copyright>
</articleinfo>
<sect1>
<title>Introduction</title>
<para>This tutorial is for library programmers who want to learn to use MARC4J to process MARC and XML data. MARC4J (<ulink url="http://marc4j.tigris.org">http://marc4j.tigris.org</ulink>) is an open source software library for working with MARC records in Java, a popular platform independent programming language. The MARC (Machine Readable Cataloging) format was originally designed to enable the exchange of bibliographic data between computer systems by providing a structure and format for the storage of bibliographic records on half-inch magnetic tape. Though today most records are transferred by other media, the exchange format has not changed since its first release in 1967 and is still widely used worldwide. At the same time, there is a growing interest in the use of XML in libraries, mainly because the Web is moving towards a platform- and application-independent interface for information services, with XML as its universal data format.</para>
<para>MARC4J is designed to bridge the gap between MARC and XML. The software library has build-in support for reading and writing MARC and MARC XML data. MARC XML is a simple XML schema for MARC data published by the Library of Congress. MARC4J also provides a "pipeline" to enable MARC records to go through further transformations using XSLT, for example to convert MARC records to MODS (Metadata Object Description Schema). This feature is particular useful there is currently no agreed-upon standard for XML in library applications.</para>
<para>Although MARC4J can be used as a command-line tool for conversions between MARC and XML, its main goal is to provide an Application Programming Interface (API) to develop any kind of Java program or servlet that involves reading or writing MARC data. The core piece is a MARC reader that hides the complexity of the MARC record by providing a simple interface to extract information from MARC records. Support for XML is implemented using the standard Java XML interfaces as specified in Sun's Java API for XML Processing (<ulink url="http://www.ifla.org/VI/3/p1996-1/sec-uni.htm">JAXP</ulink>). By limiting itself to the JAXP API, MARC4J is XML processor-independent and easy to integrate in applications that build on industry standards such as SAX (Simple API for XML) or DOM (Document Object Model).</para>
</sect1>
<sect1>
<title>What you should already know</title>
<para>This tutorial assumes that you are interested in developing Java applications that involve MARC and XML. You have a basic understanding of a MARC format like <ulink url="http://www.loc.gov/marc/">MARC 21</ulink> or <ulink url="http://java.sun.com/webservices/jaxp/">UNIMARC</ulink> and you are familiar with the basics of <ulink url="http://www.w3.org/XML/">XML</ulink> and related standards like XML Namespaces and XSLT. Working with MARC4J does not require exceptional skills in Java programming. The API is designed to be easy to learn and easy to use. It works very straight-forwardly, and has a very shallow learning curve, so you should be able to get up and running with MARC4J very quickly. If you have no experience with the Java programming language at all, you should start with getting familiar with the basic concepts of the language. Sun's <ulink url="http://java.sun.com/">Java Technology</ulink> site provides some good introductory tutorials on Java.</para>
</sect1>
<sect1>
<title>Getting the Software</title>
<para>You can download a MARC4J distribution at <ulink url="http://marc4j.tigris.org">http://marc4j.tigris.org</ulink>. On the project home page you can find a direct link to the distribution at the Download section. You can also find links to MARC4J distributions on the Documents & Files page. A link to this page can be found in the Project tools menu. The latest version at the time of this writing is MARC4J 2.2. The download includes Javadoc documentation, source code and two JAR files: <filename>marc4j.jar</filename> and <filename>normalizer.jar</filename>. Add both files to your CLASSPATH environment variable.</para>
<note>
<para>Starting from release 2.0 MARC4J was completely rebuild. The 2.0 and later releases are not compatible with older versions of MARC4J. The event based parser is replaced by an easier to use interface that uses a simple iterator over a collection of MARC records.</para>
</note>
<para>MARC4J requires Sun JDK 1.4 or later because it uses the <classname>java.util.regex</classname> package (since version 2.1). The JDK already contains the JAXP and SAX2 compliant XML parser and XSLT processor required by MARC4J, but you can use a different implementation.</para>
</sect1>
<sect1>
<title>Reading MARC data</title>
<para>For reading MARC data, MARC4J provides implementations of an interface called <classname>org.marc4j.MarcReader</classname>. This interface has two methods that provide an iterator to read MARC data from an input source:</para>
<variablelist>
<varlistentry>
<term><methodname>hasNext()</methodname></term>
<listitem>
<para>Returns true if the iteration has more records, false otherwise.</para>
</listitem>
</varlistentry>
<varlistentry>
<term><methodname>next()</methodname></term>
<listitem>
<para>Returns the next record in the iteration as a <classname>org.marc4j.marc.Record</classname> object.</para>
</listitem>
</varlistentry>
</variablelist>
<para>If you are familiar with the Java Collections Framework you might have used iterators. For example when you have <classname>java.util.List</classname> in Java you can access the items on the list through a <classname>java.util.Iterator</classname> that can be obtained from the <classname>List</classname> object:</para>
<programlisting lang="Java">
Iterator i = list.iterator();
while (i.hasNext()) {
Object item = i.next();
// do something with the item object
}
</programlisting>
<para>MARC4J provides two classes that implement <classname>MarcReader</classname>:</para>
<variablelist>
<varlistentry>
<term><classname>org.marc4j.MarcStreamReader</classname></term>
<listitem>
<para>An iterator over a collection of MARC records in ISO 2709 format.</para>
</listitem>
</varlistentry>
<varlistentry>
<term><classname>org.marc4j.MarcXmlReader</classname></term>
<listitem>
<para>An iterator over a collection of MARC records in MARC XML format.</para>
</listitem>
</varlistentry>
</variablelist>
<para>Let's start with reading MARC records in ISO 2709 format. To do this we need to import some classes:</para>
<programlisting lang="Java">
import org.marc4j.MarcReader;
import org.marc4j.MarcStreamReader;
import org.marc4j.marc.Record;
</programlisting>
<para>The first two classes are required to read MARC data and the third class imports the class that represents a MARC record. We also need an input stream to read records from, for example:</para>
<programlisting lang="Java">
InputStream in = new FileInputStream("summerland.mrc");
</programlisting>
<para>We can then initialize the <classname>org.marc4j.MarcReader</classname> implementation with the given input stream:</para>
<programlisting lang="Java">
MarcReader reader = new MarcStreamReader(in);
</programlisting>
<para>And start reading records:</para>
<programlisting lang="Java">
while (reader.hasNext()) {
Record record = reader.next();
}
</programlisting>
<para>If we simply want to examine the records we can write each record to standard output using the <methodname>toString()</methodname> method:</para>
<programlisting lang="Java">
System.out.println(record.toString());
</programlisting>
<para>Here is the complete program:</para>
<example>
<title>A first example</title>
<programlisting lang="Java">
import org.marc4j.MarcReader;
import org.marc4j.MarcStreamReader;
import org.marc4j.marc.Record;
import java.io.InputStream;
import java.io.FileInputStream;
public class ReadMarcExample {
public static void main(String args[]) throws Exception {
InputStream in = new FileInputStream("summerland.mrc");
MarcReader reader = new MarcStreamReader(in);
while (reader.hasNext()) {
Record record = reader.next();
System.out.println(record.toString());
}
}
}
</programlisting>
</example>
<para>When you compile and run this program it will write each record in tagged display format to standard output:</para>
<example>
<title>Output in tagged display format</title>
<programlisting>
LEADER 00714cam a2200205 a 4500
001 12883376
005 20030616111422.0
008 020805s2002 nyu j 000 1 eng
020 $a0786808772
020 $a0786816155 (pbk.)
040 $aDLC$cDLC$dDLC
100 1 $aChabon, Michael.
245 10$aSummerland /$cMichael Chabon.
250 $a1st ed.
260 $aNew York :$bMiramax Books/Hyperion Books for Children,$cc2002.
300 $a500 p. ;$c22 cm.
520 $aEthan Feld, the worst baseball player in the history of the game, finds
himself recruited by a 100-year-old scout to help a band of fairies triumph over
an ancient enemy.
650 1$aFantasy.
650 1$aBaseball$vFiction.
650 1$aMagic$vFiction.
</programlisting>
</example>
<para>If you know that your input stream only contains a single record, you can also simple read the record using the <methodname>next()</methodname> method:</para>
<programlisting>
MarcReader reader = new MarcStreamReader(input);
Record record = reader.next();
System.out.println(record.toString());
</programlisting>
<para>You can check if there is a record using an if statement:</para>
<programlisting>
MarcReader reader = new MarcStreamReader(input);
if (reader.hasNext()) {
Record record = reader.next();
System.out.println(record.toString());
} else {
System.err.println("Reader has no record.");
}
</programlisting>
<para>This can be useful when a different class reads each single record as a byte stream. You can then create a <classname>java.io.ByteArrayInputStream</classname> using the constructor that takes a byte array as a parameter and use that to initialize the <classname>MarcReader</classname> implementation.</para>
</sect1>
<sect1>
<title>The record object model</title>
<para>Now let's examine the <classname>org.marc4j.marc.Record</classname> class more closely. Basically a <classname>Record</classname> object provides acces to the leader and variable fields. For example the following method returns the leader:</para>
<programlisting lang="Java">
Leader leader = record.getLeader();
</programlisting>
<para>The <classname>org.marc4j.marc.Leader</classname> class provides access to all the leader values. While the Leader represents mostly MARC structural information, some character positions provide bibliographic information. The method <methodname>getTypeOfRecord()</methodname> for example identifies the type of material being catalogued, such as map, musical sound recording, or projected medium.</para>
<para>There are several methods available to retrieve variable fields. The <methodname>getVariableFields()</methodname> method for example returns all variable fields as a <classname>java.util.List</classname>, but in most cases you will use methods that provide more control. The following method for example returns all control fields:</para>
<programlisting lang="Java">
// returns fields for tags 001 through 009
List fields = record.getControlFields();
</programlisting>
<para>And this method return all data fields:</para>
<programlisting lang="Java">
// returns fields for tags 010 through 999
List fields = record.getDataFields();
</programlisting>
<para>For control fields MARC4J does not provide you with the level of detail you might expect. You can retrieve the tag and the data, but to retrieve specific data elements at character positions you need to use some standard Java. This is because MARC4J is designed to handle different MARC formats like MARC 21 and UNIMARC. To retrieve the language of the item in a MARC 21 record, for example, you should do something like this:</para>
<programlisting lang="Java">
// get control field with tag 008
ControlField field = (ControlField) record.getVariableField("008");
String data = field.getData();
// the three-character MARC language code takes character positions 35-37
String lang = data.substring(35,38);
System.out.println("Language: " + lang);
</programlisting>
<para>For our example record this would produce the following output: </para>
<programlisting>
Language: eng
</programlisting>
<para>For the control number field MARC4J provides two specific methods. Use <methodname>getControlNumberField()</methodname> to retrieve the control number object for tag 001, or use <methodname>getControlNumber()</methodname> to retieve the control number as a <classname>String</classname> object.</para>
<para>The previous example also showed how you can retrieve variable fields for a given tag using the <methodname>getVariableField(String tag)</methodname> method. If you want to retrieve specific fields you can use one of the following methods:</para>
<programlisting lang="Java">
// get the first field occurence for a given tag
DataField title = (DataField) record.getVariableField("245");
// get all occurences for a particular tag
List subjects = record.getVariableFields("650");
// get all occurences for a given list of tags
String[] tags = {"010", "100", "245", "250", "260", "300"};
List fields = record.getVariableFields(tags);
</programlisting>
<para>These methods return a <classname>org.marc4j.marc.VariableField</classname>, so if you need to access specific methods, like <methodname>getData()</methodname> for a control field, you need to cast the variable field to a <classname>org.marc4j.marc.ControlField</classname> or <classname>org.marc4j.marc.DataField</classname>. A <classname>DataField</classname> is slightly more complex than a control field since it has indicators and subfields. The following example retrieves the title information field and writes the tag, indicators and subfields to standard output:</para>
<programlisting lang="Java">
DataField field = (DataField) record.getVariableField("245");
String tag = field.getTag();
char ind1 = field.getIndicator1();
char ind2 = field.getIndicator2();
System.out.println("Tag: " + tag + " Indicator 1: " + ind1 + " Indicator 2: " + ind2);
List subfields = field.getSubfields();
Iterator i = subfields.iterator();
while (i.hasNext()) {
Subfield subfield = (Subfield) i.next();
char code = subfield.getCode();
String data = subfield.getData();
System.out.println("Subfield code: " + code + " Data element: " + data);
}
</programlisting>
<para>For our record for <emphasis>Summerland</emphasis> by Michael Chabon this would produce the following output:</para>
<programlisting>
Tag: 245 Indicator 1: 1 Indicator 2: 0
Subfield code: a Data element: Summerland /
Subfield code: c Data element: Michael Chabon.
</programlisting>
<para>The <classname>org.marc4j.marc.DataField</classname> class also provides some methods to retrieve specific subfields:</para>
<programlisting>
// retrieve the first occurrence of subfield with code 'a'
Subfield subfield = field.getSubfield('a');
// retrieve all subfields with code 'a'
List subfields = field.getSubfields('a');
</programlisting>
<para>The following code snippet uses <methodname>getSubfield(char code)</methodname> to retrieve the title proper. It then removes the non-sort characters:</para>
<programlisting>
// get data field 245
DataField field = (DataField) record.getVariableField("245");
// get indicator as int value
char ind2 = field.getIndicator2();
// get the title proper
Subfield subfield = field.getSubfield('a');
String title = subfield.getData();
// remove the non sorting characters
int nonSort = Character.digit(c, 10);
title = title.substring(nonSort);
</programlisting>
<para>In addition to retrieving fields by tag name, you can also retrieve fields by data element values using the <methodname>find()</methodname> methods. The search capabilities are limited, but they can be useful when processing records. The following code snippet provides some basic examples:</para>
<programlisting lang="Java">
// find any field containing 'Chabon'
List fields = record.find("Chabon");
// find 'Summerland' in a title field
List fields = record.find("245", "Summerland");
// find 'Graham, Paul' in main or added entries for a personal name:
String tags = {"100", "600"};
List fields = record.find(tags, "Graham, Paul")
</programlisting>
<para>The find method is also useful if you want to retrieve records that meet certain criteria, such as a specific control number, title words or a particular publisher or subject. The example below checks if the cataloging agency is DLC. It also shows how you can extend the find capailities to specific subfields, a feature not directly available in MARC4J, since it is easy to accomplish using the record object model together with the standard Java API's.</para> <example>
<title>A check agency program</title>
<programlisting lang="Java">
import java.io.InputStream;
import java.io.FileInputStream;
import org.marc4j.MarcReader;
import org.marc4j.MarcStreamReader;
import org.marc4j.marc.Record;
import org.marc4j.marc.DataField;
import org.marc4j.marc.Subfield;
import java.util.List;
public class CheckAgencyExample {
public static void main(String args[]) throws Exception {
InputStream input = new FileInputStream("file.mrc");
MarcReader reader = new MarcStreamReader(input);
while (reader.hasNext()) {
Record record = reader.next();
// check if the cataloging agency is DLC
List result = record.find("040", "DLC");
if (result.size() > 0)
System.out.println("Agency for this record is DLC");
// there is no specific find for a specific subfield
// so to check if it is the orignal cataloging agency
DataField field = (DataField)result.get(0);
String agency = field.getSubfield('a').getData();
if (agency.matches("DLC"))
System.out.println("DLC is the original agency");
}
}
}
</programlisting>
</example>
<para>By using <methodname>find()</methodname> you can also implement a kind of search and replace to batch update records that meet certain criteria. You can use Java regular expressions in <methodname>find()</methodname> methods. Check the <package>java.util.regex</package> package for more information and examples.</para>
</sect1>
<sect1>
<title>Creating and updating records</title>
<para>You can also use the record object model to create or update records. This is done using the <classname>org.marc4j.marc.MarcFactory</classname>. For example:</para>
<programlisting lang="Java">
// create a factory instance
MarcFactory factory = MarcFactory.newInstance();
// create a record with leader
Record record = factory.newRecord("00000cam a2200000 a 4500");
// add a control field
record.addVariableField(factory.newControlField("001", "12883376"));
// add a data field
DataField df = factory.newDataField("245", '1', '0');
df.addSubfield(factory.newSubfield('a', "Summerland /"));
df.addSubfield(factory.newSubfield('c', "Michael Chabon."));
record.addVariableField(df);
</programlisting>
<para>You can then use a <classname>org.marc4j.marc.MarcWriter</classname> implementation to serialize your records for example to MARC in ISO 2709 or MARC XML. The code snippet below writes a single record in ISO 2709 format to standard output:</para>
<programlisting lang="Java">
MarcWriter writer = new MarcStreamWriter(System.out);
writer.write(record);
writer.close();
</programlisting>
</sect1>
<sect1>
<title>Reading MARC XML data</title>
<para>Until now we have been processing MARC data in ISO 2709 format, but you can also read MARC data in <ulink url="http://www.loc.gov/standards/marcxml/">MARC XML</ulink> format. The MARC 21 XML schema was published in June 2002 by the Library of Congress to encourage the standardization of MARC 21 records in the XML environment. The schema was developed in collaboration with OCLC and RLG after a survey of schema's that where used in various projects trying to bridge the gap between MARC and XML, including a MARC XML schema developed by the OAI (Open Archives Initiative) and the one used in early versions of MARC4J, published as James (Java MARC Events). The MARC XML schema is specified in a W3C XML Schema and provides lossless conversion between MARC ISO 2709 and MARC XML. As a consequence of the lossless conversion, information in a MARC XML record enables recreation of a MARC ISO 2709 record without loss of data. This is the record for <emphasis>Summerland</emphasis> by Michael Chabon in MARC XML:</para>
<example>
<title>MARC XML record</title>
<programlisting><![CDATA[
<?xml version="1.0" encoding="UTF-8"?>
<collection xmlns="http://www.loc.gov/MARC21/slim">
<record>
<leader>00714cam a2200205 a 4500</leader>
<controlfield tag="001">12883376</controlfield>
<controlfield tag="005">20030616111422.0</controlfield>
<controlfield tag="008">020805s2002 nyu j 000 1 eng </controlfield>
<datafield tag="020" ind1=" " ind2=" ">
<subfield code="a">0786808772</subfield>
</datafield>
<datafield tag="020" ind1=" " ind2=" ">
<subfield code="a">0786816155 (pbk.)</subfield>
</datafield>
<datafield tag="040" ind1=" " ind2=" ">
<subfield code="a">DLC</subfield>
<subfield code="c">DLC</subfield>
<subfield code="d">DLC</subfield>
</datafield>
<datafield tag="100" ind1="1" ind2=" ">
<subfield code="a">Chabon, Michael.</subfield>
</datafield>
<datafield tag="245" ind1="1" ind2="0">
<subfield code="a">Summerland /</subfield>
<subfield code="c">Michael Chabon.</subfield>
</datafield>
<datafield tag="250" ind1=" " ind2=" ">
<subfield code="a">1st ed.</subfield>
</datafield>
<datafield tag="260" ind1=" " ind2=" ">
<subfield code="a">New York :</subfield>
<subfield code="b">Miramax Books/Hyperion Books for Children,</subfield>
<subfield code="c">c2002.</subfield>
</datafield>
<datafield tag="300" ind1=" " ind2=" ">
<subfield code="a">500 p. ;</subfield>
<subfield code="c">22 cm.</subfield>
</datafield>
<datafield tag="520" ind1=" " ind2=" ">
<subfield code="a">Ethan Feld, the worst baseball player in the history of the game, finds himself
recruited by a 100-year-old scout to help a band of fairies triumph over an ancient enemy.</subfield>
</datafield>
<datafield tag="650" ind1=" " ind2="1">
<subfield code="a">Fantasy.</subfield>
</datafield>
<datafield tag="650" ind1=" " ind2="1">
<subfield code="a">Baseball</subfield>
<subfield code="v">Fiction.</subfield>
</datafield>
<datafield tag="650" ind1=" " ind2="1">
<subfield code="a">Magic</subfield>
<subfield code="v">Fiction.</subfield>
</datafield>
</record>
</collection>]]>
</programlisting>
</example>
<para>Reading MARC XML data is not different from reading MARC data in ISO 2709 format, but MARC XML reader provides some additional XML related features. Here is our first exampe, but now reading a file containing records in MARC XML format:</para>
<example>
<title>Reading MARC XML</title>
<programlisting lang="Java">
import org.marc4j.MarcReader;
import org.marc4j.MarcXmlReader;
import org.marc4j.marc.Record;
import java.io.InputStream;
import java.io.FileInputStream;
public class ReadMarcXmlExample {
public static void main(String args[]) throws Exception {
InputStream in = new FileInputStream("summerland.xml");
MarcReader reader = new MarcXmlReader(in);
while (reader.hasNext()) {
Record record = reader.next();
System.out.println(record.toString());
}
}
}
</programlisting>
</example>
<para>When you compile and run this program it will write each record in tagged display format to standard output:</para>
<example>
<title>Output from MARC XML in tagged display format</title>
<programlisting>
LEADER 00714cam a2200205 a 4500
001 12883376
005 20030616111422.0
008 020805s2002 nyu j 000 1 eng
020 $a0786808772
020 $a0786816155 (pbk.)
040 $aDLC$cDLC$dDLC
100 1 $aChabon, Michael.
245 10$aSummerland /$cMichael Chabon.
250 $a1st ed.
260 $aNew York :$bMiramax Books/Hyperion Books for Children,$cc2002.
300 $a500 p. ;$c22 cm.
520 $aEthan Feld, the worst baseball player in the history of the game, finds
himself recruited by a 100-year-old scout to help a band of fairies triumph over
an ancient enemy.
650 1$aFantasy.
650 1$aBaseball$vFiction.
650 1$aMagic$vFiction.
</programlisting>
</example>
</sect1>
<sect1>
<title>Reading MODS data</title>
<para>Now let's look at the specific XML related features of <classname>org.marc4j.MarcXmlreader</classname>. Probably the most interesting feature is that you can pre-pocess the input using a stylesheet. This makes it possible to create a stylesheet in XSLT that transforms some kind of XML data to MARC XML. You can then process the result like you would do with MARC XML or MARC in ISO 2709 format. The Library of congress, for example, provides a stylesheet that transforms MODS (Metadata Object Description Schema) to MARC XML. MODS is a schema for a bibliographic element set that is maintained by The Library of Congress. The schema provides a subset of the MARC standard, but an advantage to the MARC XML format is that it uses language-based tags rather than numeric ones. A bibliographic record for <emphasis>Summerland</emphasis> by Michael Chabon in MODS looks like this:</para>
<example>
<title>MODS record</title>
<programlisting>
<![CDATA[
<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods version="3.0">
<titleInfo>
<title>Summerland</title>
</titleInfo>
<name type="personal">
<namePart>Chabon, Michael.</namePart>
<role>
<roleTerm authority="marcrelator" type="text">creator</roleTerm>
</role>
</name>
<typeOfResource>text</typeOfResource>
<originInfo>
<place>
<placeTerm type="code" authority="marccountry">nyu</placeTerm>
</place>
<place>
<placeTerm type="text">New York</placeTerm>
</place>
<publisher>Miramax Books/Hyperion Books for Children</publisher>
<dateIssued>c2002</dateIssued>
<dateIssued encoding="marc">2002</dateIssued>
<edition>1st ed.</edition>
<issuance>monographic</issuance>
</originInfo>
<language>
<languageTerm authority="iso639-2b" type="code">eng</languageTerm>
</language>
<physicalDescription>
<form authority="marcform">print</form>
<extent>500 p. ; 22 cm.</extent>
</physicalDescription>
<abstract>Ethan Feld, the worst baseball player in the history of the game, finds
himself recruited by a 100-year-old scout to help a band of fairies triumph over
an ancient enemy.</abstract>
<targetAudience authority="marctarget">juvenile</targetAudience>
<note type="statement of responsibility">Michael Chabon.</note>
<subject>
<topic>Fantasy</topic>
</subject>
<subject>
<topic>Baseball</topic>
<topic>Fiction</topic>
</subject>
<subject>
<topic>Magic</topic>
<topic>Fiction</topic>
</subject>
<identifier type="isbn">0786808772</identifier>
<identifier type="isbn">0786816155 (pbk.)</identifier>
<recordInfo>
<recordContentSource authority="marcorg">DLC</recordContentSource>
<recordCreationDate encoding="marc">020805</recordCreationDate>
<recordChangeDate encoding="iso8601">20030616111422.0</recordChangeDate>
<recordIdentifier>12883376</recordIdentifier>
</recordInfo>
</mods>
</modsCollection>
]]>
</programlisting>
</example>
<para>By using a <ulink url="http://www.loc.gov/standards/marcxml/xslt/MODS2MARC21slim.xsl">stylesheet</ulink> available from The Library of Congress you can process the bibliographic information contained in a collection of MODS records as MARC data:</para>
<example>
<title>Reading MODS data</title>
<programlisting lang="Java">
import org.marc4j.MarcReader;
import org.marc4j.MarcXmlReader;
import org.marc4j.marc.Record;
import java.io.InputStream;
import java.io.FileInputStream;
public class ModsToMarc21Example {
public static void main(String args[]) throws Exception {
InputStream in = new FileInputStream("mods.xml");
MarcXmlReader reader = new MarcXmlReader(in, "http://www.loc.gov/standards/marcxml/xslt/MODS2MARC21slim.xsl");
while (reader.hasNext()) {
Record record = reader.next();
System.out.println(record.toString());
}
}
}
</programlisting>
</example>
<para>When you compile and run this program it will write each record in tagged display format to standard output:</para>
<example>
<title>Output from MODS data in tagged format</title>
<programlisting>
LEADER 00000nam 2200000uu 4500
001 12883376
005 20030616111422.0
008 020805|2002 nyu||||j |||||||||||eng||
020 $a0786808772
020 $a0786816155 (pbk.)
040 $aDLC
100 1 $aChabon, Michael.$ecreator
245 10$aSummerland$cMichael Chabon.
250 $a1st ed.
260 $aNew York$bMiramax Books/Hyperion Books for Children$cc2002$c2002
300 $a500 p. ; 22 cm.
520 $aEthan Feld, the worst baseball player in the history of the game, finds
himself recruited by a 100-year-old scout to help a band of fairies triumph over
an ancient enemy.
650 1 $aFantasy
650 1 $aBaseball$xFiction
650 1 $aMagic$xFiction
</programlisting>
</example>
<para>The stylesheet first transforms the MODS record to MARC XML and the XSLT output is then parsed by the <classname>org.marc4j.MarcXmlReader</classname>.</para>
</sect1>
<sect1>
<title>Writing MARC data</title>
<para>For writing MARC data MARC4J provides a <classname>org.marc4j.MarcWriter</classname> interface. This interfaces provides two important methods:</para>
<variablelist>
<varlistentry>
<term><methodname>write(Record record)</methodname></term>
<listitem>
<para>Writes a single <classname>org.marc4j.marc.Record</classname> to the output stream.</para>
</listitem>
</varlistentry>
<varlistentry>
<term><methodname>close()</methodname></term>
<listitem>
<para>Closes the writer.</para>
</listitem>
</varlistentry>
</variablelist>
<para>Let's look at an example. The following program reads the record for <emphasis>Summerland</emphasis> and writes the same record back in ISO 2709 format:</para>
<example>
<title>Write MARC in ISO 2709</title>
<programlisting>
import java.io.InputStream;
import java.io.FileInputStream;
import org.marc4j.MarcReader;
import org.marc4j.MarcStreamReader;
import org.marc4j.MarcStreamWriter;
import org.marc4j.MarcWriter;
import org.marc4j.marc.Record;
public class WriteMarcExample {
public static void main(String args[]) throws Exception {
InputStream input = new FileInputStream("summerland.mrc");
MarcReader reader = new MarcStreamReader(input);
MarcWriter writer = new MarcStreamWriter(System.out);
while (reader.hasNext()) {
Record record = reader.next();
writer.write(record);
}
writer.close();
}
}
</programlisting>
</example>
<para>Make sure that you close the <classname>MarcWriter</classname> using the <methodname>close()</methodname> method.</para>
<para>To write the same record as MARC XML:</para>
<example>
<title>Write MARC in MARC XML format</title>
<programlisting>
import java.io.InputStream;
import org.marc4j.MarcReader;
import org.marc4j.MarcStreamReader;
import org.marc4j.MarcWriter;
import org.marc4j.MarcXmlWriter;
import org.marc4j.converter.impl.AnselToUnicode;
import org.marc4j.marc.Record;
public class Marc2MarcXmlExample {
public static void main(String args[]) throws Exception {
InputStream input = new FileInputStream("summerland.mrc");
MarcReader reader = new MarcStreamReader(input);
MarcWriter writer = new MarcXmlWriter(System.out, true);
while (reader.hasNext()) {
Record record = reader.next();
writer.write(record);
}
writer.close();
}
}
</programlisting>
</example>
<para>Of course you can also write MARC XML data to MARC in ISO 2709 format by using a <classname>org.marc4j.MarcXmlReader</classname> to read MARC XML data and a <classname>org.marc4j.MarcStreamWriter</classname> to write MARC data in ISO 2709 format.</para>
</sect1>
<sect1>
<title>Perform character conversions</title>
<para>When serializing <classname>Record</classname> objects you can perform character conversions. This feature is especially important when you convert MARC data between ISO 2709 and MARC XML formats. Most MARC formats use specific character sets and MARC4J is able to convert some of them to UCS/Unicode and back. Converters are available for the following encodings:</para>
<simplelist>
<member>MARC-8</member>
<member>ISO 5426</member>
<member>ISO 6937</member>
</simplelist>
<para>Using the converters is not difficult, but there are some things to remember. MARC4J reads and writes ISO 2709 records as binary data, but data elements in control fields and subfields are converted to <classname>String</classname> values. When Java converts a byte array to a <classname>String</classname> it needs a character encoding. Java can use a default character encoding, but this might not always be the right encoding to use. Therefore both <classname>org.marc4j.MarcReader</classname> and <classname>org.marc4j.MarWriter</classname> implementations provide you with the ability to register a character encoding when constructing a new instance. If you do not provide a character encoding the following defaults are used:</para>
<table>
<title>Character encodings in MARC4J</title>
<tgroup cols="2">
<thead>
<row>
<entry valign="top">Class name</entry>
<entry>Encoding</entry>
</row>
</thead>
<tbody>
<row>
<entry valign="top"><classname>org.marc4j.MarcStreamReader</classname></entry>
<entry>
<para>Tries to detect the encoding from the <classname>org.marc4j.marc.Leader</classname> by reading the character encoding scheme in the leader using the <methodname>getCharEncoding()</methodname> method. You can override the value when instantiating a <classname>MarcStreamReader</classname>:</para>
<programlisting>
MarcReader reader = new MarcStreamReader(input, "UTF8");
</programlisting>
</entry>
</row>
<row>
<entry valign="top"><classname>org.marc4j.MarcXmlReader</classname></entry>
<entry>
<para>Relies on the underlying XML parser implementation. Normally you would provide the encoding in the XML declaration of the input file:</para>
<programlisting><![CDATA[
<?xml version="1.0" encoding="UTF-8"?>]]>
</programlisting>
</entry>
</row>
<row>
<entry valign="top"><classname>org.marc4j.MarcStreamWriter</classname></entry>
<entry>
<para>By default uses ISO 8859-1 (Latin 1) as 8-bit character set alternative since encodings like MARC-8 are not supported by Java. You can override the value when instantiating a <classname>MarcStreamWriter</classname>:</para>
<programlisting>
MarcWriter writer = new MarcStreamWriter(ouput, "UTF8");
</programlisting>
</entry>
</row>
<row>
<entry valign="top"><classname>org.marc4j.MarcXmlWriter</classname></entry>
<entry>
<para>Uses UTF-8 by default. You can override the value when instantiating a <classname>MarcXmlWriter</classname>:</para>
<programlisting>
MarcWriter writer = new MarcXMLWriter(ouput, "UTF8");
</programlisting>
<para>For the encoding in the XML declaration MARC4J relies on the underlying parser.</para>
</entry>
</row>
</tbody>
</tgroup>
</table>
<para>Check the Java <ulink url="http://java.sun.com/j2se/1.4.2/docs/guide/intl/encoding.doc.html">supported encodings</ulink> for the canonical name to use for a specific encoding.</para>
<para>Now let's look at some examples. The following program reads ISO 2709 records using the default encoding and writes the records in ISO 2709 performing a MARC-8 to UCS/Unicode conversion:</para>
<example>
<title>Write MARC in ISO 2709</title>
<programlisting>
import java.io.InputStream;
import java.io.FileInputStream;
import org.marc4j.MarcReader;
import org.marc4j.MarcStreamReader;
import org.marc4j.MarcStreamWriter;
import org.marc4j.MarcWriter;
import org.marc4j.marc.Record;
public class Marc8ToUnicodeExample {
public static void main(String args[]) throws Exception {
InputStream input = new FileInputStream("summerland.mrc");
MarcReader reader = new MarcStreamReader(input);
MarcWriter writer = new MarcStreamWriter(System.out, "UTF8");
AnselToUnicode converter = new AnselToUnicode();
writer.setConverter(converter);
while (reader.hasNext()) {
Record record = reader.next();
writer.write(record);
}
writer.close();
}
}
</programlisting>
</example>
<para>Since <classname>MarcStreamWriter</classname> uses the Latin-1 character encoding by default, we instantiate the writer with the UTF-8 character encoding.</para>
<para>To convert ISO 2709 in MARC-8 to MARC XML in UCS/Unicode:</para>
<example>
<title>Write MARC in ISO 2709</title>
<programlisting>
import java.io.InputStream;
import java.io.FileInputStream;
import java.io.OutputStream;
import java.io.FileOutputStream;
import org.marc4j.MarcReader;
import org.marc4j.MarcStreamReader;
import org.marc4j.MarcStreamWriter;
import org.marc4j.MarcWriter;
import org.marc4j.marc.Record;
public class Marc8ToMarcXmlExample {
public static void main(String args[]) throws Exception {
InputStream input = new FileInputStream("summerland.mrc");
OutputStream out = new FileOutputStream("summerland.xml");
MarcReader reader = new MarcStreamReader(input);
MarcWriter writer = new MarcXmlWriter(out, true);
AnselToUnicode converter = new AnselToUnicode();
writer.setConverter(converter);
while (reader.hasNext()) {
Record record = reader.next();
writer.write(record);
}
writer.close();
}
}
</programlisting>
</example>
<para>In addition to using a character converter, you can perform Unicode normalization. This is for example not done by the MARC-8 to UCS/Unicode converter. With Unicode normalization text is transformed into the canonical composed form. For example "a´bc" is normalized to "ábc". To perform normalization set Unicode normalization to true:</para>
<programlisting>
MarcXmlWriter writer = new MarcXmlWriter(out, true);
AnselToUnicode converter = new AnselToUnicode();
writer.setConverter(converter);
writer.setUnicodeNormalization(true);
</programlisting>
<note>
<para>Please note that it's not garanteed to work if you try to convert normalized Unicode back to MARC-8.</para>
</note>
</sect1>
<sect1>
<title>Advanced MARC XML features</title>
<para>You can write the output of <classname>org.marc4j.MarcXmlWriter</classname> to an implementation of the <classname>javax.xml.transform.Result</classname> interface. This enables you to tightly integrate MARC4J with your XML application. Below are just some examples of what you can do using the <classname>Result</classname> interface. If you want to know more about Java and XML there are numerous books and tutorials available. A good tutorial is <ulink url="http://www.cafeconleche.org/books/xmljava/">Processing XML with Java</ulink> by Elliotte Rusty Harold.</para>
<para>The <classname>org.marc4j.MarcXmlWriter</classname> class provides very basic formatting options. If you need more advanced formatting options, you can use a <classname>SAXResult</classname> containing a <classname>ContentHandler</classname> derived from a dedicated XML serializer. The following example uses <classname>org.apache.xml.serialize.XMLSerializer</classname> to write MARC records to XML using MARC-8 to UCS/Unicode conversion and Unicode normalization:</para>
<example>
<title>Formatting output with the Xerces serializer</title>
<programlisting>
import java.io.InputStream;
import javax.xml.transform.Result;
import javax.xml.transform.sax.SAXResult;
import org.apache.xml.serialize.OutputFormat;
import org.apache.xml.serialize.XMLSerializer;
import org.marc4j.MarcReader;
import org.marc4j.MarcStreamReader;
import org.marc4j.MarcXmlWriter;
import org.marc4j.converter.impl.AnselToUnicode;
import org.marc4j.marc.Record;
public class XercesSerializerExample {
public static void main(String args[]) throws Exception {
InputStream input = new FileInputStream("summerland.mrc");
MarcReader reader = new MarcStreamReader(input);
OutputFormat format = new OutputFormat("xml", "UTF-8", true);
XMLSerializer serializer = new XMLSerializer(System.out, format);
Result result = new SAXResult(serializer.asContentHandler());
MarcXmlWriter writer = new MarcXmlWriter(result);
writer.setConverter(new AnselToUnicode());
writer.setUnicodeNormalization(true);
while (reader.hasNext()) {
Record record = reader.next();
writer.write(record);
}
writer.close();
}
}
</programlisting>
</example>
<para>You can post-process the result using a <classname>Source</classname> object pointing to a stylesheet resource and a <classname>Result</classname> object to hold the transformation result tree. The example below converts MARC to MARC XML and transforms the result tree to MODS using the <ulink url="http://www.loc.gov/standards/mods/v3/MARC21slim2MODS3.xsl">stylesheet</ulink> provided by The Library of Congress:</para>
<example>
<title>Write MODS data</title>
<programlisting>
import java.io.InputStream;
import javax.xml.transform.Result;
import javax.xml.transform.Source;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;
import org.marc4j.MarcReader;
import org.marc4j.MarcStreamReader;
import org.marc4j.MarcXmlWriter;
import org.marc4j.converter.impl.AnselToUnicode;
import org.marc4j.marc.Record;
public class Marc2ModsExample {
public static void main(String args[]) throws Exception {
String stylesheetUrl = "http://www.loc.gov/standards/mods/v3/MARC21slim2MODS3.xsl";
Source stylesheet = new StreamSource(stylesheetUrl);
Result result = new StreamResult(System.out);
InputStream input = new FileInputStream("summerland.mrc");
MarcReader reader = new MarcStreamReader(input);
MarcXmlWriter writer = new MarcXmlWriter(result, stylesheet);
writer.setConverter(new AnselToUnicode());
while (reader.hasNext()) {
Record record = (Record) reader.next();
writer.write(record);
}
writer.close();
}
}
</programlisting>
</example>
<para>It is also possible to write the result into a DOM Node. You can then use the DOM document for further processing in your XML application, for example to embed MARC XML or MODS data in other XML documents.</para>
<example>
<title>Write output to a DOM tree</title>
<programlisting>
import java.io.InputStream;
import javax.xml.transform.dom.DOMResult;
import org.marc4j.MarcReader;
import org.marc4j.MarcStreamReader;
import org.marc4j.MarcXmlWriter;
import org.marc4j.converter.impl.AnselToUnicode;
import org.marc4j.marc.Record;
import org.w3c.dom.Document;
public class Marc2DomExample {
public static void main(String args[]) throws Exception {
InputStream input = new FileInputStream("summerland.mrc");
MarcReader reader = new MarcStreamReader(input);
DOMResult result = new DOMResult();
MarcXmlWriter writer = new MarcXmlWriter(result);
writer.setConverter(new AnselToUnicode());
while (reader.hasNext()) {
Record record = (Record) reader.next();
writer.write(record);
}
writer.close();
Document doc = (Document) result.getNode();
}
}
</programlisting>
</example>
</sect1>
<sect1>
<title>Summary</title>
<para>This tutorial covers a lot of features of MARC4J. If this tutorial didn't show you how to do what you need to do, try looking in the Javadoc that is included in the MARC4J distribution or send an e-mail to the <ulink url="http://marc4j.tigris.org/servlets/ProjectMailingListList">mailing-list</ulink> for MARC4J users. Most of the samples in this tutorial are available in the <classname>org.marc4j.samples</classname> package.</para>
</sect1>
</article>