forked from it-ebooks/75cheatsheets
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Cheat Sheets - Using XML Java.html
1084 lines (986 loc) · 41.8 KB
/
Cheat Sheets - Using XML Java.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<meta charset="gbk">
<title>Using XML in Java</title>
<script type="text/javascript">
var uagent = navigator.userAgent.toLowerCase();
if (uagent.search("android") > -1) {
document.write('<link rel="stylesheet" href="../css/refcardz_html_android.css" type="text/css" media="screen">');
}
</script>
<base href="http://refcardz.dzone.com/" />
<h1><span id="main_topic">Using XML</span> in Java</h1>
<p class="author_name">By Masoud Kalali</p>
<h2>ABOUT XML</h2>
<p>XML is a general-purpose specification for creating custom
mark-up languages. It is classified as an extensible language
because it allows its users to define their own elements. Its
primary purpose is to help information systems share structured
data, particularly via the Internet, and it is used both to encode
documents and to serialize data. In the latter context, it is
comparable with other text-based serialization languages such
as JSON and YAML.</p>
<p>As a diverse platform, Java has several solutions for working
with XML. This refcard provides developers a concise overview
of the different xml processing technologies in Java, and a use
case of each technology.</p>
<h2>XML FILE SAMPLE</h2>
<pre><code>
1 <?xml version=”1.0” encoding=”UTF-8”?>
2 <!DOCTYPE publications SYSTEM “publications.dtd”>
3 <publications
4 xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance”
5 xsi:schemaLocation=”http://xml.dzone.org/schema/publications
6 publications.xsd”
7 xmlns=”http://xml.dzone.org/schema/publications”
8 xmlns:extras=”http://xml.dzone.org/schema/publications”>
9 <book id=”_001”>
10 <title>Beginning XML, 4th Edition </title>
11 <author>David Hunter</author>
12 <copyright>2007</copyright>
13 <publisher>Wrox</publisher>
14 <isbn kind=”10”>0470114878</isbn>
15 </book>
16 <book id=”_002”>
17 <title>XML in a Nutshell, Third Edition</title>
18 <author>O’Reilly Media, Inc</author>
19 <copyright>2004</copyright>
20 <publisher>O’Reilly Media, Inc.</publisher>
21 <isbn kind=”10”>0596007647</isbn>
22 </book>
23 <extras:book id=”_003” image=”erik_xml.jpg”>
24 <title>Learning XML, Second Edition</title>
25 <author>Erik Ray</author>
26 <copyright>2003</copyright>
27 <publisher>O’Reilly Media, Inc.</publisher>
28 <isbn kind=”10”>0596004206</isbn>
29 </extras:book>
30 </publications>
</code>
</pre>
<table cellpadding="0" cellspacing="0">
<tbody><tr>
<td class="light_cream"><strong>Line 1:</strong> An XML document always starts with a prolog which describes the XML file. This
prolog can be minimal, e.g. <?xml version=”1.0”?> or can contain other information. For
example, the encoding:
<?xml version=”1.0” encoding=”UTF-8” standalone=”yes” ?></td>
</tr>
<tr>
<td class="light_cream"><strong>Line 2:</strong> DOCTYPE : DTD definitions can either be embedded in the XML document or referenced
from a DTD file. Using the System keyword means that the DTD file should be in
the same folder our XML file resides.</td>
</tr>
<tr>
<td class="light_cream"><strong>Line 3:</strong> ROOT ELEMENT: Every well-formed document should have one and only one root
element. All other elements reside inside the root element.</td>
</tr>
<tr>
<td class="light_cream"><strong>Lines 4 – 8:</strong> namespace declaration: Line 4 defines the XSI prefix, lines 5 & 6 defines the
current URL and XSD file location, line 7 defines the current document default namespace,
and line 8 defines a prefix for an XML schema.</td>
</tr>
<tr>
<td class="light_cream"><strong>Line 20:</strong> Element: An element is composed of its start tag, end tag and the possible content
which can be text or other nested elements.</td>
</tr>
</tbody></table>
<h3>XML File Sample, continued</h3>
<table cellpadding="0" cellspacing="0">
<tbody><tr>
<td class="light_cream"><strong>Line 23:</strong> namespace prefixed tag: a start tag prefixed by a namespace. End tag must be namespace
prefixed in order to get a document, the end tag is line 29.</td>
</tr>
<tr>
<td class="light_cream"><strong>Line 28:</strong> Attribute: an attribute is part of an element, consisting of an attribute name and
its value.</td>
</tr>
</tbody></table>
<h3>Capabilities of Element and Attribute</h3>
<h3>Capabilities of Element and Attribute</h3>
<table cellpadding="0" cellspacing="0">
<tbody><tr>
<td class="dark_blue"><strong>Capability</strong></td>
<td class="dark_cream"><strong>Attribute</strong></td>
<td class="dark_cream"><strong>Element</strong></td>
</tr>
<tr>
<td class="light_blue">Hierarchical</td>
<td class="light_cream">No – flat</td>
<td class="light_cream">Yes</td>
</tr>
<tr>
<td class="light_blue">Ordered</td>
<td class="light_cream">No – undefined</td>
<td class="light_cream">Yes</td>
</tr>
<tr>
<td class="light_blue">Complex types</td>
<td class="light_cream">No – string only</td>
<td class="light_cream">Yes</td>
</tr>
<tr>
<td class="light_blue">Verbose</td>
<td class="light_cream">Less – usually</td>
<td class="light_cream">More</td>
</tr>
<tr>
<td class="light_blue">Readability</td>
<td class="light_cream">Less</td>
<td class="light_cream">More – usually</td>
</tr>
</tbody></table>
<h3>XML Use Cases</h3>
<table cellpadding="0" cellspacing="0">
<tbody><tr>
<td class="dark_blue"><strong>Requirement/Characteristic</strong></td>
<td class="dark_cream"><strong>Suitable XML Features</strong></td>
</tr>
<tr>
<td class="light_blue">Interoperability</td>
<td class="light_cream">XML can be used independent of the target language or
platform or target device.
Use XML when you need to support or interact with
multiple platforms.</td>
</tr>
<tr>
<td class="light_blue">Multiple output format for multiple devices</td>
<td class="light_cream">XML Transformation can help you get a required format
from plain XML files.
Use XML as the preferred output format when multiple
output formats are required.</td>
</tr>
<tr>
<td class="light_blue">Content size</td>
<td class="light_cream">Use XML when messaging and processing efficiency is
less important than interoperability and availability of
standard tools.<br>
Large content can create a big XML document. Use
compression for XML documents or use other industry
standards like ASN.1.</td>
</tr>
<tr>
<td class="light_blue">Project size</td>
<td class="light_cream">For Using XML you need at least XML parsing libraries
and helper classes to measure the project size and XML
related required man/ hour before using XML.<br>
For small projects with simple requirements, you might
not want to incur the overhead of XML.</td>
</tr>
<tr>
<td class="light_blue">Searching</td>
<td class="light_cream">There are some technologies for searching in a XML
document like XPath (<a href="www.w3schools.com/XPath/default.
asp">www.w3schools.com/XPath/default.
asp</a>) and Xquery (<a href="http://www.xquery.com/">http://www.xquery.com/</a>) but they are
relatively young and immature.
Don’t use XML documents when searching is important.
Instead, store the content in a traditional database, use
XML databases or use XML-aware databases.</td>
</tr>
</tbody></table>
<h2>PARSING TECHNIQUES<</h2>
<p>In order to use a XML file or a XML document inside an application,it will be required to read it and tokenize it. For the XML
files, this is called XML Parsing and the piece of software which performs this task is called a Parser.</p>
<p>There are two general parsing techniques:
</p><ul>
<li>In Memory Tree: The entire document is read into memory
as a tree structure which allows random access to any part
of the document by the calling application.</li>
<li> Streaming (Event processing): A Parser reads the document
and fires corresponding event when it encounters
XML entities.</li>
</ul>
<p></p>
<p>Two types of parsers use streaming techniques:
</p><ul>
<li>Push parsers: Parsers are in control of the parsing and
the parser client has no control over the parsing flow.</li>
<li>Pull parsers: The Parser client is in control of the parsing
and the parser goes forward to the next infoset element
when it is asked to.</li>
</ul>
<p></p>
<p>Following are parsers generally available in the industry:
</p><ul>
<li>DOM: DOM is a tree-based parsing technique that builds up an entire parse tree in memory. It allows complete dynamic access to a whole XML document.</li>
<li>SAX: SAX is an event-driven push model for processing XML. It is not a W3C standard, but it’s a very wellrecognized API that most SAX parsers implement in a compliant way. Rather than building a tree representation of an entire document as DOM does, a SAX parser fires off a series of events as it reads through the document.</li>
<li>StAX (JSR 173): StAX was designed as a median between DOM and SAX. In StAX, the application moves the cursor forward ‘pulling’ the information from the parser as it needs. So there is no event firing by the parser or huge memory consumption. You can use 3rd party libraries for Java SE 5 and older or bundled StAX parser of Java SE 6 and above.</li>
</ul>
<p></p>
<table cellpadding="0" cellspacing="0">
<tbody><tr>
<td class="dark_blue"><strong>Feature</strong></td>
<td class="dark_cream"><strong>StAX</strong></td>
<td class="dark_cream"><strong>SAX</strong></td>
<td class="dark_cream"><strong>DOM</strong></td>
</tr>
<tr>
<td class="light_blue"><strong>API Type</strong></td>
<td class="light_cream">Pull, streaming</td>
<td class="light_cream">Push, streaming</td>
<td class="light_cream">In memory tree</td>
</tr>
<tr>
<td class="light_blue"><strong>Ease of Use</strong></td>
<td class="light_cream">High</td>
<td class="light_cream">Medium</td>
<td class="light_cream">High</td>
</tr>
<tr>
<td class="light_blue"><strong>XPath Capability</strong></td>
<td class="light_cream">No</td>
<td class="light_cream">No</td>
<td class="light_cream">Yes</td>
</tr>
<tr>
<td class="light_blue"><strong>CPU and Memory<br> Efficiency</strong></td>
<td class="light_cream">Good</td>
<td class="light_cream">Good</td>
<td class="light_cream">Varies</td>
</tr>
<tr>
<td class="light_blue"><strong>Forward Only</strong></td>
<td class="light_cream">Yes</td>
<td class="light_cream">Yes</td>
<td class="light_cream">No</td>
</tr>
<tr>
<td class="light_blue"><strong>Read XML</strong></td>
<td class="light_cream">Yes</td>
<td class="light_cream">Yes</td>
<td class="light_cream">Yes</td>
</tr>
<tr>
<td class="light_blue"><strong>Write XML</strong></td>
<td class="light_cream">Yes</td>
<td class="light_cream">No</td>
<td class="light_cream">Yes</td>
</tr>
<tr>
<td class="light_blue"><strong>Create, Read,<br>Update or Delete<br>Nodes</strong></td>
<td class="light_cream">No</td>
<td class="light_cream">No</td>
<td class="light_cream">Yes</td>
</tr>
</tbody></table>
<h3>Parsing Techniques, continued</h3>
<table cellpadding="0" cellspacing="0">
<tbody><tr>
<td class="light_blue"><strong>Best for<br>Applications in<br>need of:</strong></td>
<td class="light_cream"><ul>
<li>Streaming Model</li>
<li>Not modifying <br> the document</li>
<li>Memory efficiency</li>
<li>XML read and XML write </li>
<li>Parcing multiple<br>documents in the<br>same thread</li>
<li>Small devices</li>
<li>Looking certain tag </li>
</ul></td>
<td class="light_cream">
<ul>
<li>Read only manipulation</li>
<li>Not modifying the document </li>
<li>Memory efficiency</li>
<li>Small devices</li>
<li>Looking for a certain tag</li>
</ul>
</td>
<td class="light_cream"><ul>
<li>Modifying the XML document </li>
<li>XPath, XSLT</li>
<li>XML tree traversing and<br> random access<br>to any section </li>
<li>Merging documents</li>
</ul></td>
</tr>
</tbody></table>
<p>All of these parsers fall under JAXP implementation. The following sample codes show how we can utilize Java SE 6 XML processing capabilities for XML parsing.</p>
<h2>PARSING XML USING DOM</h2>
<pre><code>
14 DocumentBuilderFactory factory = DocumentBuilderFactory.
15 newInstance();
16 factory.setValidating(true);
17 factory.setNamespaceAware(true);
18 factory.setAttribute(“http://java.sun.com/xml/jaxp/properties
19 /schemaLanguage”, “http://www.w3.org/2001/XMLSchema”);
20 DocumentBuilder builder = factory.newDocumentBuilder();
21 builder.setErrorHandler(new SimpleErrorHandler());
22 Document doc = builder.parse(“src/books.xml”);
23 NodeList list = doc.getElementsByTagName(“*”);
24 for (int i = 0; i < list.getLength(); i++) {
25 Element element = (Element) list.item(i);
26 System.out.println(element.getNodeName() + “ “ +
27 element.getTextContent());
28 if (element.getNodeName().equalsIgnoreCase(“book”)) {
29 System.out.println(“Book ID= “ + element
30 getAttribute(“id”));
31 }
32 if (element.getNodeName().equalsIgnoreCase(“isbn”)) {
33 System.out.println(“ISBN Kind=” + element
34 getAttribute(“kind”));
35 }
</code>
</pre>
<table cellpadding="0" cellspacing="0">
<tbody><tr>
<td class="light_cream"><strong>Line 16:</strong> In order to validate the XML using internal DTD we need only to setValidation(true).
To validate a document using DOM, ensure that there is no schema in the document,
and no element prefix for our start and end tags.</td>
</tr>
<tr>
<td class="light_cream"><strong>Line 17:</strong> The created parser is namespace aware (the namespace prefix will be dealt with
as a prefix, and not a part of the element).</td>
</tr>
<tr>
<td class="light_cream"><strong>Lines 18 – 19:</strong> The created parser uses internal XSD to validate the document Dom
BuilderFactory instances accept several features which let developers enable or disable
a functionality, one of them is validating against the internal XSD.</td>
</tr>
<tr>
<td class="light_cream"><pre><code>
<strong>Line 21:</strong> Although DOM can use some default error handler, it’s usually better to set
our own error handler to handle different levels of possible errors in the document. The
default handler has different behaviors based on the implementation that we use. A
simple error handler might be:
11 public class SimpleErrorHandler implements ErrorHandler {
12
13 public void warning(SAXParseException e) throws SAXException
{
14 System.out.println(e.getMessage());
15 }
16
17 public void error(SAXParseException e) throws SAXException {
18 System.out.println(e.getMessage());
19 }
20
21 public void fatalError(SAXParseException e) throws SAXException {
22 System.out.println(e.getMessage());
23 }
24 }
25 }
</code>
</pre></td>
</tr>
</tbody></table>
<h2>PARSING XML USING SAX</h2>
<p>For using SAX, we need the parser and an event handler that
should respond to the parsing events. Events can be a start
element event, end element event, and so forth.</p>
<p>A simple event handler might be:</p>
<pre><code>
public class SimpleHandler extends DefaultHandler {
public void startElement(String namespaceURI, String localName,
String qName, Attributes atts)
throws SAXException {
if (“book”.equals(localName)) {
System.out.print(“Book details: Book ID: “ + atts
getValue(“id”));
} else {
System.out.print(localName + “: “);
}
}
public void characters(char[] ch, int start, int length)
throws SAXException {
System.out.print(new String(ch, start, length));
}
public void endElement(String namespaceURI, String localName,
String qName)
throws SAXException {
if (“book”.equals(localName)) {
System.out.println(“=================================”);
}
}
}
</code>
</pre>
<p>The parser code that uses the event handler to parse the book. xml document might be:</p>
<pre><code>
SAXParser saxParser;
SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setNamespaceAware(true);
factory.setValidating(true);
saxParser = factory.newSAXParser();
saxParser.setProperty(
“http://java.sun.com/xml/jaxp/properties/schemaLanguage”,
“http://www.w3.org/2001/XMLSchema”);
XMLReader reader = saxParser.getXMLReader();
reader.setErrorHandler(new SimpleErrorHandler());
reader.setContentHandler(new SimpleHandler());
reader.parse(“src/books.xml”);
</code>
</pre>
<h2>PARSING XML USING StAX</h2>
<p>StAX is a streaming pull parser. It means that the parser client can ask the parser to go forward in the document when it needs. StAX provides two sets of APIs:
</p><ul>
<li>The cursor API methods return XML information as strings,which minimizes object allocation requirements.</li>
<li>Iterator-based API which represents the current state of the parser as an Object. The parser client can get all the required information about the element underlying the event from the object.</li>
</ul>
<p></p>
<h4>Differences and features of StAX APIs</h4>
<table cellpadding="0" cellspacing="0">
<tbody><tr>
<td class="dark_blue"><strong>Cursor API: Best in frameworks and libraries</strong></td>
<td class="dark_cream"><strong>Iterator API: Best in applications</strong></td>
</tr>
<tr>
<td class="light_blue">More memory efficient</td>
<td class="light_cream">XMLEvent subclasses are immutable(Direct<br>use in other part of the application)</td>
</tr>
<tr>
<td class="light_blue">Better overall performance</td>
<td class="light_cream">New subclass of XMLEvent can be<br>developed and used when required</td>
</tr>
<tr>
<td class="light_blue">Forward only</td>
<td class="light_cream">Applying event filters to reduce event<br>processing costs</td>
</tr>
</tbody></table>
<h2>A SAMPLE USING StAX PARSER</h2>
<pre><code>
XMLInputFactory inputFactory = XMLInputFactory.newInstance();
InputStream in = new FileInputStream(“src/books.xml”);
XMLEventReader eventReader = inputFactory.createXMLEventReader(in);
while (eventReader.hasNext()) {
XMLEvent event = eventReader.nextEvent();
if (event.isEndElement()) {
if (event.asEndElement().getName().getLocalPart()
equals(“book”)) {
event = eventReader.nextEvent();
System.out.println(“=================================”);
continue;
}
}
if (event.isStartElement()) {
if (event.asStartElement().getName().getLocalPart()
equals(“title”)) {
event = eventReader.nextEvent();
System.out.println(“title: “ + event.asCharacters()
getData());
continue;
}
if (event.asStartElement().getName().getLocalPart()
equals(“author”)) {
event = eventReader.nextEvent();
System.out.println(“author: “ + event.asCharacters()
getData());
continue;
}
if (event.asStartElement().getName().getLocalPart()
equals(“copyright”)) {
event = eventReader.nextEvent();
System.out.println(“copyright: “ + event
asCharacters().getData());
continue;
}
if (event.asStartElement().getName().getLocalPart()
equals(“publisher”)) {
event = eventReader.nextEvent();
System.out.println(“publisher: “ + event.asCharacters()
getData());
continue;
}
if (event.asStartElement().getName().getLocalPart()
equals(“isbn”)) {
event = eventReader.nextEvent();
System.out.println(“isbn: “ + event.asCharacters()
getData());
continue;
}
}
}
</code>
</pre>
<h2>XML STRUCTURE</h2>
<p>There are two levels of correctness of an XML document:
</p><ol>
<li><strong>Well-formed-ness.</strong> A well-formed document conforms to
all of XML’s syntax rules. For example, if a start-tag appears
without a corresponding end-tag, it is not well-formed. A
document that is not well-formed is not considered to be
XML.</li>
</ol>
<p>Sample characteristics:</p>
<ul>
<li>XML documents must have a root element</li>
<li>XML elements must have a closing tag</li>
<li>XML tags are case sensitive</li>
<li>XML elements must be properly nested</li>
<li>XML attribute values must always be quoted</li>
</ul>
<ol>
<li><strong>Validity.</strong> A valid document conforms to semantic rules. The rules are included as XML schema, especially DTD. Examples of invalid documents include: if a required
attribute or element is not present in the document; if the document contains an undefined element; if an element is meant to be repeated once, and appears more than once; or if the value of an attribute does not conform to the defined pattern or data type.</li>
</ol>
<h3>XML Structure, continued</h3>
<p>XML validation mechanisms include using DTD and XML schema like XML Schema and RelaxNG.</p>
<h3>Document Type Definition (DTD)</h3>
<p>A DTD defines the tags and attributes used in a XML or HTML document. Elements defined in a DTD can be used, along with the predefined tags and attributes of each markup language.
DTD support is ubiquitous due to its inclusion in the XML 1.0 standard.</p>
<table cellpadding="0" cellspacing="0">
<tbody><tr>
<td class="dark_blue"><strong>DTD Advantages:</strong></td>
<td class="dark_cream"><strong>DTD Disadvantages:</strong></td>
</tr>
<tr>
<td class="light_blue">Easy to read and write (plain text file with a simple semixml format).</td>
<td class="light_cream">No type definition system.</td>
</tr>
<tr>
<td class="light_blue">Can be used as an in-line definition inside the XML documents.</td>
<td class="light_cream">No means of element and attribute content definition and validation.</td>
</tr>
<tr>
<td class="light_blue">Includes #define, #include, and #ifdef; the ability
to define shorthand abbreviations, external
content, and some conditional parsing.</td>
<td class="light_cream"></td>
</tr>
</tbody></table>
<h4>A sample DTD document</h4>
<pre><code>
1 <?xml version=”1.0” encoding=”UTF-8”?>
2 <!ELEMENT publications (book*)>
3 <!ELEMENT book (title, author+, copyright, publisher, isbn,
4 description?)>
5 <!ELEMENT title (#PCDATA)>
6 <!ELEMENT author (#PCDATA)>
7 <!ELEMENT copyright (#PCDATA)>
8 <!ELEMENT publisher (#PCDATA)>
9 <!ELEMENT isbn (#PCDATA)>
10 <!ELEMENT description (#PCDATA)>
11 <!ATTLIST book id ID #REQUIRED image CDATA #IMPLIED>
12 <!ATTLIST isbn kind (10|13) #REQUIRED >
</code>
</pre>
<table cellpadding="0" cellspacing="0">
<tbody><tr>
<td class="light_cream"><strong>Line 2:</strong> publications element has 0...unbounded number of book elements inside it.</td>
</tr>
<tr>
<td class="light_cream"><strong>Line 3:</strong> book element has one or more author elements, 0 or 1 description elements and<br>
exactly one title, copyright, publisher and isbn elements inside it.</td>
</tr>
<tr>
<td class="light_cream"><strong>Line 11:</strong> book element has two attributes, one named id of type ID which is mandatory,<br>
and an image attribute from type CDATA which is optional.</td>
</tr>
<tr>
<td class="light_cream"><strong>Line 12:</strong> isbn element has an attribute named kind which can have 10 or 13 as its value.</td>
</tr>
</tbody></table>
<h4>DTD Attribute Types</h4>
<table cellpadding="0" cellspacing="0">
<tbody><tr>
<td class="dark_blue"><strong>DTD Attribute Type</strong></td>
<td class="dark_cream"><strong>Description</strong></td>
</tr>
<tr>
<td class="light_blue"><strong>CDATA</strong></td>
<td class="light_cream">Any character string acceptable in XML</td>
</tr>
<tr>
<td class="light_blue"><strong>NMTOKEN</strong></td>
<td class="light_cream">Close to being a XML name; first character is looser</td>
</tr>
<tr>
<td class="light_blue"><strong>NMTOKENS</strong></td>
<td class="light_cream">One or more NMTOKEN tokens separated by white space
Enumeration List of the only allowed values for an attribute</td>
</tr>
<tr>
<td class="light_blue"><strong>ENTITY</strong></td>
<td class="light_cream">Associates a name with a macro-like replacement</td>
</tr>
<tr>
<td class="light_blue"><strong>ENTITIES</strong></td>
<td class="light_cream">White-space-separated list of ENTITY names</td>
</tr>
<tr>
<td class="light_blue"><strong>ID</strong></td>
<td class="light_cream">XML name unique within the entire document</td>
</tr>
<tr>
<td class="light_blue"><strong>IDREF</strong></td>
<td class="light_cream">Reference to an ID attribute within the document</td>
</tr>
<tr>
<td class="light_blue"><strong>IDREFS</strong></td>
<td class="light_cream">White-space-separated list of IDREF tokens</td>
</tr>
<tr>
<td class="light_blue"><strong>NOTATION</strong></td>
<td class="light_cream">Associates a name with information used by the client</td>
</tr>
</tbody></table>
<table cellpadding="0" cellspacing="0">
<tbody><tr>
<td class="dark_blue"><strong>What a DTD can validate</strong></td>
</tr>
<tr>
<td class="light_blue">Element nesting</td>
</tr>
<tr>
<td class="light_blue">Element occurrence</td>
</tr>
<tr>
<td class="light_blue">Permitted attributes of an element</td>
</tr>
<tr>
<td class="light_blue">Attribute types and default values</td>
</tr>
</tbody></table>
<h3>XML Schema Definition (XSD)</h3>
<p>XSD provides the syntax and defines a way in which elements and attributes can be represented in a XML document. It also advocates the XML document should be of a specific format and specific data type. XSD is fully recommended by the W3C
consortium as a standard for defining a XML Document. XSD documents are written in XML format.</p>
<table cellpadding="0" cellspacing="0">
<tbody><tr>
<td class="dark_blue"><strong>XSD Advantages:</strong></td>
<td class="dark_cream"><strong>XSD Disadvantages:</strong></td>
</tr>
<tr>
<td class="light_blue">XSD has a much richer language for describing what element or attribute content “looks like.”
This is related to the type system.</td>
<td class="light_cream">Verbose language, hard to read and write</td>
</tr>
<tr>
<td class="light_blue">XSD Schema supports Inheritance, where one schema can inherit from another schema. This is a
great feature because it provides the opportunity for re-usability.</td>
<td class="light_cream">Provides no mechanism for the user to add more data types.</td>
</tr>
<tr>
<td class="light_blue">It is namespace aware and provides the ability to define its own data type from the existing data type.</td>
<td class="light_cream"></td>
</tr>
</tbody></table>
<h4>A sample XSD document</h4>
<pre><code>
1 <?xml version=”1.0” encoding=”UTF-8”?>
2 <xs:schema xmlns:xs=”http://www.w3.org/2001/XMLSchema”
3 xmlns:extras=”http://xml.dzone.org/schema/publications”
4 attributeFormDefault=”unqualified” elementFormDefault=”unqualified”
5 xmlns=”http://xml.dzone.org/schema/publications”
6 targetNamespace=”http://xml.dzone.org/schema/publications”
7 version=”4”>
8 <xs:element name=”publications”>
9 <xs:complexType>
10 <xs:sequence>
11 <xs:element minOccurs=”0” maxOccurs=”unbounded”
12 ref=”book”/>
13 </xs:sequence>
14 </xs:complexType>
15 </xs:element>
16 <xs:element name=”book”>
17 <xs:complexType>
18 <xs:sequence>
19 <xs:element ref=”title”/>
20 <xs:element minOccurs=”1” maxOccurs=”unbounded”
21 ref=”author”/>
22 <xs:element ref=”copyright”/>
23 <xs:element ref=”publisher”/>
24 <xs:element ref=”isbn”/>
25 <xs:element minOccurs=”0” ref=”description”/>
26 </xs:sequence>
27 <xs:attributeGroup ref=”attlist.book”/>
28 </xs:complexType>
29 </xs:element>
30 <xs:element name=”title” type=”xs:string”/>
31 <xs:element name=”author” type=”xs:string”/>
32 <xs:element name=”copyright” type=”xs:string”/>
33 <xs:element name=”publisher” type=”xs:string”/>
34 <xs:element name=”isbn”>
35 <xs:complexType mixed=”true”>
36 <xs:attributeGroup ref=”attlist.isbn”/>
37 </xs:complexType>
38 </xs:element>
39 <xs:element name=”description” type=”xs:string”/>
40 <xs:attributeGroup name=”attlist.book”>
41 <xs:attribute name=”id” use=”required” type=”xs:ID”/>
42 <xs:attribute name=”image”/>
43 </xs:attributeGroup>
44 <xs:attributeGroup name=”attlist.isbn”>
45 <xs:attribute name=”kind” use=”required”>
46 <xs:simpleType>
47 <xs:restriction base=”xs:token”>
48 <xs:enumeration value=”10”/>
49 <xs:enumeration value=”13”/>
50 </xs:restriction>
51 </xs:simpleType>
52 </xs:attribute>
53 </xs:attributeGroup>
54 </xs:schema>
</code>
</pre>
<table cellpadding="0" cellspacing="0">
<tbody><tr>
<td class="light_cream">L<strong>ines 2 – 7:</strong> Line 2 defines XML Schema namespace. Line 3 defines available schemas where it can use its vocabulary. Line 4 specifies whether locally declared elements and attributes are namespace qualified or not. A locally declared element is an element
declared directly inside a complexType (not by reference), Line 5 declares the default namespace for this schema document. Lines 6 and 7 define the namespace that a XML document can use in order to make it possible to validate it with this schema.</td>
</tr>
</tbody></table>
<h3>XML Schema Definition (XSD), continued</h3>
<table cellpadding="0" cellspacing="0">
<tbody><tr>
<td class="light_cream"><strong>Lines 9 – 14:</strong> An element named publications has a sequence of an unbounded number of books inside it.</td>
</tr>
<tr>
<td class="light_cream"><strong>Line 20:</strong> the element named book has a sequence of multiple elements inside it including author which at least should appear as 1, and also an element named description with a minimum occurrence of 0. Its maximum occurrence is the default value which is 1.</td>
</tr>
<tr>
<td class="light_cream"><strong>Lines 34 – 38:</strong> the isbn element has a group of attributes referenced by a attlist.isbn. This attribute group includes one attribute named kind (Lines 46 – 51) with a simple value. The value has a restriction which requires it to be one of the enumerated values included in the definition.</td>
</tr>
</tbody></table>
<div class="hot_tip">
<p><img src="/sites/all/modules/dzone/assets/refcardz/035/../images/hot_tip.gif" alt="Hot Tip" width="64" height="64" class="hot_tip_icon"></p>
The separation of an element type definition and its use. We declared our types separately
from where we referenced them (use them). ref attributes point to a declaration with the same
name. Using this technique we can have separate XSD files and each of them contains definition and declarations related
to one specific package. We can also import or include them in other XSD documents, if needed.
</div>
<div class="hot_tip">
<p><img src="/sites/all/modules/dzone/assets/refcardz/035/../images/hot_tip.gif" alt="Hot Tip" width="64" height="64" class="hot_tip_icon"></p>
Import and include. The import and include elements help to construct a schema from multiple
documents and namespaces. The import element brings in a schema from a different
namespace, while the include element brings in a schema from the same namespace. When include is used, the target
namespace of the included schema must be the same as the target namespace of the including schema. In the case of
import, the target namespace of the included schema must be different.
</div>
<p>To validate XML files using external XSD, replace line 17 – 20 of the DOM sample with:
</p><ul>
<li>factory.setValidating(false);</li>
<li>factory.setNamespaceAware(true);</li>
<li>SchemaFactory schemaFactory = SchemaFactory.newInstance(“http:/</li>
<li>www.w3.org/2001/XMLSchema”);</li>
<li>factory.setSchema(schemaFactory.newSchema(new Source[]{new</li>
<li>StreamSource(“src/publication.xsd”))});</li>
</ul>
<p></p>
<h4>XML Schema validation factors</h4>
<table cellpadding="0" cellspacing="0">
<tbody><tr>
<td class="dark_blue"><strong>Validation factor</strong></td>
<td class="dark_cream"><strong>Description</strong></td>
</tr>
<tr>
<td class="light_blue"><strong>Length, minLength, maxLength, maxExclusive, maxInclusive, minExclusive, minInclusive</strong></td>
<td class="light_cream">Enforces a length for the string derived value, either its maximum, minimum, maximum or minimum, inclusive and exclusive.</td>
</tr>
<tr>
<td class="light_blue"><strong>enumeration</strong></td>
<td class="light_cream">Restricts values to a member of a defined list</td>
</tr>
<tr>
<td class="light_blue"><strong>TotalDigits, fractionDigits</strong></td>
<td class="light_cream">Enforces total digits in a number; signs and decimal points skipped. Enforces total fractional digits in a fractional number</td>
</tr>
<tr>
<td class="light_blue"><strong>whiteSpace</strong></td>
<td class="light_cream">Used to preserve, replace, or collapse document white space</td>
</tr>
</tbody></table>
<h4>XML Schema built-in types</h4>
<table cellpadding="0" cellspacing="0">
<tbody><tr>
<td class="dark_blue"><strong>Type</strong></td>
<td class="dark_cream"><strong>Description</strong></td>
</tr>
<tr>
<td class="light_blue"><strong>anyURI</strong></td>
<td class="light_cream">Uniform Resource Identifier</td>
</tr>
<tr>
<td class="light_blue"><strong>base64Binary</strong></td>
<td class="light_cream">base64 encoded binary value</td>
</tr>
<tr>
<td class="light_blue"><strong>Boolean; byte; dateTime; integer; string</strong></td>
<td class="light_cream">True, false or 0, 1; Signed quantity >= 128 and < 127; An absolute date and time; Signed integer; Unicode string</td>
</tr>
<tr>
<td class="light_blue"><strong>ID, IDREF, IDREFS,ENTITY, ENTITIES,</strong></td>
<td class="light_cream">Used to preserve, replace, or collapse document white space</td>
</tr>
<tr>
<td class="light_blue"><strong>NOTATION, NMTOKEN,NMTOKENS</strong></td>
<td class="light_cream">Same definitions as those in DTD</td>
</tr>
<tr>
<td class="light_blue"><strong>language</strong></td>
<td class="light_cream">"xml:lang" values from XML 1.0 Recommendation.</td>
</tr>
<tr>
<td class="light_blue"><strong>name</strong></td>
<td class="light_cream">An XML name</td>
</tr>
</tbody></table>
<h4>DTD and XSD validation capabilities</h4>
<table cellpadding="0" cellspacing="0">
<tbody><tr>
<td class="dark_blue"><strong>W3C XML Schema Features</strong></td>
<td class="dark_cream"><strong>DTD Features</strong></td>
</tr>
<tr>
<td class="light_blue">Namespace-qualified element and attribute declarations</td>
<td class="light_cream">Element nesting</td>
</tr>
<tr>
<td class="light_blue">Simple and complex data types</td>
<td class="light_cream">Element occurrence</td>
</tr>
<tr>
<td class="light_blue">Type derivation and inheritance</td>
<td class="light_cream">Permitted attributes of an element</td>
</tr>
<tr>
<td class="light_blue">Element occurrence constraints</td>
<td class="light_cream">Attribute types and default values</td>
</tr>
</tbody></table>
<h2>XPATH</h2>
<p>XPath is a declarative language used for referring to sections of XML documents. XPath expressions are used for locating a set
of nodes in a given XML document. Many XML technologies, like XSLT and XQuery, use XPath extensively. To use these
technologies, you’ll need to understand the basics of XPpath. All samples in this section assume we are working on a XML
document similar to the XML document on page 1.</p>
<h3>Sample XPath Expressions and Output</h3>
<table cellpadding="0" cellspacing="0">
<tbody><tr>
<td class="dark_blue"><strong>XPath Expression</strong></td>
<td class="dark_cream"><strong>Output</strong></td>
</tr>
<tr>
<td class="light_blue">/publications/book[publisher="Wrox"]/copyright</td>
<td class="light_cream">2007</td>
</tr>
<tr>
<td class="light_blue">/publications//book[contains(title,"XML")]/author</td>
<td class="light_cream">David Hunter ’Reilly Media, Inc Erik Ray</td>
</tr>
<tr>
<td class="light_blue">/publications//book[contains(title,"XML") and position()=3]/@id</td>
<td class="light_cream">_003</td>
</tr>
<tr>
<td class="light_blue">/publications//book[contains(title,"XML") and position()=3 ]/copyright mod 7</td>
<td class="light_cream">1</td>
</tr>
</tbody></table>
<p>As you can see, contains and positions functions are two widely used XPath functions.</p>
<h3>Important XPath Functions</h3>
<table cellpadding="0" cellspacing="0">
<tbody><tr>
<td class="dark_blue"><strong>Operate On</strong></td>
<td class="dark_cream"><strong>Function</strong></td>
<td class="dark_cream"><strong>Description</strong></td>
</tr>
<tr>
<td class="light_blue">Node set</td>
<td class="light_cream">count(node-set)</td>
<td class="light_cream">Returns the number of nodes that are in the node set.</td>
</tr>
<tr>
<td class="light_blue">Node set</td>
<td class="light_cream">last()</td>
<td class="light_cream">Returns the position of the last node in the node set.</td>
</tr>
<tr>
<td class="light_blue">Numbers</td>
<td class="light_cream">ceiling(number)</td>
<td class="light_cream">Returns an integer value equal to or greater than the specified number.</td>
</tr>
<tr>
<td class="light_blue">Numbers</td>
<td class="light_cream">sum(node-set)</td>
<td class="light_cream">Returns the sum of the numerical values in the specified node set.</td>
</tr>
<tr>
<td class="light_blue">Boolean</td>
<td class="light_cream">lang(language)</td>
<td class="light_cream">Checks to see if the given language matches the language specified by the xsl:lang element.</td>
</tr>
<tr>
<td class="light_blue">Boolean</td>
<td class="light_cream">boolean(argument)</td>
<td class="light_cream">Converts the argument to Boolean.</td>
</tr>
<tr>
<td class="light_blue">String</td>
<td class="light_cream">substringafter(string1, string2)</td>
<td class="light_cream">Returns the portion of string1 that comes after the occurrence of string2 (which is a subset of string1).</td>
</tr>
<tr>
<td class="light_blue">String</td>
<td class="light_cream">normalizespace(string)</td>
<td class="light_cream">Returns the given string with no leading or trailing whitespaces, and removes sequences of whitespaces by replacing them with a single whitespace.</td>