Skip to content

Latest commit

 

History

History
347 lines (273 loc) · 23.9 KB

README.md

File metadata and controls

347 lines (273 loc) · 23.9 KB

NGen XML Parser

##Versions

  • 1.0.1 Added support for arbitrary entity mappings, e.g. instances do not require unique object class in branch

Introduction

The NGen XML Parser is a generic XML parser engineered for n-to-m mapping from XML(s) to Java Object(s). It is designed to efficiently map different XML's with different schema from different data sources to a common Java data model. It is based upon a StAX parser library and it requires Java 8 or greater. It is particularly competitive during the following preconditions.

  • The XML file structure does not map one-to-one to the desirable Java object model.
  • The XML file contains more data than needed.
  • The XML can be parsed as a stream and random access to elements are not needed.
  • n-to-m-mapping, each XML files affects m Java Objects, and each Java Object are affected by n XML files

The NGen XML Parser is built up of three major components that needs to be understood by a user.

  1. The engine itself parses the XML in a StAX manner. It is configured by one setup structure for each XML tag that shall be parsable. The settings structures are thus per XML tag and not per Java object. Each XML tag does not have to be parsed into a specific Java object. Text content and attributes can be mapped to values in several different Java objects (or being discarded).
  2. The Settings structure settles five properties. The Settings structure is further described in the javadoc of the NGen XML Parser.
    1. The Name of the element that it's meant to process
    2. A callback named Start Processor that is invoked each time a start tag of this element is found. This will typically create one or several java objects that's needed to store the information from this element.
    3. A list of Attribute Mappers that each maps one or several attributes to a java typed value and then sets it to a java object.
    4. A list of Text Content Mappers that each maps the text content of the element to a java type and sets it to a java object.
    5. A callback named End Processor that is invoked each time an end tag is found. This will typically add a created java object to a list in another java object.
  3. The Object Branch is a representation of one branch in a tree of java objects. This object branch is where the end result is built up. If Class A contains a list of objects from class B, and B contains a list of C, the object branch will contains one object of typa A, one of type B and one of type C. These objects are the working-branch in the tree. Although not mapped one-to-one, this tree will most probably grow and shrink as the XML tree is traversed.

The Object Branch reference is also available in javadoc, but it may need a bit more conceptual description for a first-time-user. This is elaborated in the rest of this document.

The Object Branch is simply a map of unique objects. It is not possible for two objects derived from the same class to exist in the object branch at the same time. There are no relationship between the objects in the object branch as one can expect a branch to have. But this is not needed since it is supposed to constantly represent only the present branch. If a new Java Object is created as the result of an XML start tag, it is pushed to the object branch. When the XML end tag is processed, the object is removed from the object branch since it's now "out of scope".

When doing a 1-to-m parsing, one document parser may push several result objects to the Object Branch. these result Objects are not connected and thus not part of a common tree, but totally independent. After the parsing is done, all objects pushed to the Object Branch are available as a part of the result using the documentParser.getResult(CLASS) API invocation.

Installation

NGen XML Parser is available at Maven Central. Maven users can include NGen XML Parser as a dependency by adding this to the project pom.xml file:

<dependencies>
    <dependency>
      <groupId>com.mobenga.ngen.xml</groupId>
      <artifactId>ngen-xml-parser</artifactId>
      <version>1.0.0</version>
    </dependency>
</dependencies>

NGen XML Parser Code Examples

Content

  • Java Usage Example 1 describes the usage of the XML parser for a 1-to-1 mapping.
  • Java Usage Example 2 describes the usage of the XML parser for a 1-to-m mapping.
  • Java Usage Example 3 describes the usage of the XML parser for a n-to-1 mapping.
  • Action Sequence 1 describes in text a mapping from one XML file to several JAVA objects.
  • Example 1 describes a direct mapping from an XML to a JAVA object structure that reflects the XML.
  • Example 2 describes a mapping from an XML file to a JAVA object structure, where the XML structure contain outer elements with information that shall be duplicated in Java objects further down the logical tree.
  • Example 3 describes a mapping from an XML file to a JAVA object structure, where the XML structure contain inner elements with information that shall be available in a list of primitives in Java objects on an upper level of the logical tree.
  • Example 4 describes a mapping from two XML files to a single JAVA object structure.

Java Usage Example 1

This example maps a single XML to a Java object. We assume that a few methods and classes are available:

 - getCallUrl(eventId) // URL where to fetch the XML files.
 - XmlMappings() // creates a new mapping class (com.mobenga.ngen.xml.parser.Mappings) for the XML.
    public Event mapData(String eventId) {

        // Setup the parser for the expected data - Event data
        DocumentParser documentParser = new DocumentParser(new XmlMappings());

        // Parse data
        return xmlDataFetcher.fetchAndParseXml(getCallUrl(eventId), documentParser, Event.class);
    }

Java Usage Example 2

This example maps a single XML to a two different Java objects. We assume that a few methods and classes are available:

 - getCallUrl(eventId) // URL where to fetch the first XML files.
 - XmlMappings() // creates a new mapping class (com.mobenga.ngen.xml.parser.Mappings) for the XML.
public Object[] mapData(String eventId) {
    // Setup the parser for the expected data - Event data
    DocumentParser documentParser = new DocumentParser(new XmlMappings());

    // Parse data
    xmlDataFetcher.fetchAndParseXml(getCallUrl(eventId), documentParser, null);

    MyDataClassOne myDataClassOne = documentParser.getResult(MyDataClassOne.class);
    MyDataClassTwo myDataClassTwo = documentParser.getResult(MyDataClassTwo.class);

    return new Object[]{myDataClassOne, myDataClassTwo};
}

Java Usage Example 3

This example maps two different XMLs to a common Java object. The Two different XMLs are not available at the same time (result of two different API calls) and stored in between the calls. We assume that a few methods and classes are available:

 - getCurrentDataForEventOrCreateNewObject(id) // get the previously parsed data or a new data object if no previous data is available.
 - getCallOneUrl(eventId) // URL where to fetch the first XML files.
 - getCallOneUrl(eventId) // URL where to fetch the second XML files.
 - XmlOneMappings() // creates a new mapping class (com.mobenga.ngen.xml.parser.Mappings) for the first XML.
 - XmlTwoMappings() // creates a new mapping class (com.mobenga.ngen.xml.parser.Mappings) for the second XML.
    public Event mapData1(String eventId) {
        // Get current Event Data Component and add data from this Content API call
        Event event = getCurrentDataForEventOrCreateNewObject(id);

        // Provide current Event data to the XML Parser
        ProtectedClassMap objectBranch = new ProtectedClassMap(Event.class, event);

        // Setup the parser for the expected data - Event data
        DocumentParser documentParser = new DocumentParser(new XmlOneMappings(), objectBranch);

        // Parse data
        return xmlDataFetcher.fetchAndParseXml(getCallOneUrl(eventId), documentParser, Event.class);
    }

    public Event mapData2(String eventId) {
        // Get current Event Data Component and add data from this Content API call
        Event event = getCurrentDataForEventOrCreateNewObject(id);

        // Provide current Event data to the XML Parser
        ProtectedClassMap objectBranch = new ProtectedClassMap(Event.class, event);

        // Setup the parser for the expected data - Event data
        DocumentParser documentParser = new DocumentParser(new XmlTwoMappings(), objectBranch);

        // Parse data
        return xmlDataFetcher.fetchAndParseXml(getCallTwoUrl(eventId), documentParser, Event.class);
    }
}

Action Sequence 1

In this example the following XML will be parsed

<sport name="Football">
 <leguage name="Premier League">
  <event id="1" name ="LEICESTER - NORWICH">
   <market id="1" name="Total Goals - Over/Under 2.5">
    <selection id="1" name="over">
     <price value=1.72/>
    </selection>
    <selection id="1" name="under">
     <price value=2.00/>
    </selection>
   </market>
  </event>
 </leguage>
 <leguage name="Champions League">
  <event id="2" name ="ARSENAL - BARCELONA">
   <market id="1" name="Total Goals - Over/Under 2.5">
    <selection id="1" name="over">
     <price value=1.61/>
    </selection>
    <selection id="1" name="under">
     <price value=2.20/>
    </selection>
   </market>
  </event>
 </leguage>
</sport>

The desired result is a list over all events where the information about sport and league is stored in each event (in the Java data structure). Also, the Java structure does not have individual objects for price, but this information is stored in the selection object.

The table below describes a typical XML structure (in the left column) that is built into a Java data structure in form of a tree. The Java data structure and the XML structure is shown in the image below. In the table below the image, each row in the table describes a single step in the XML parsing process. Walk through the table to follow the process step by step as the StAX parser works itself tag by tag in the XML. (More elaborate examples including source code is available in Example 2-5 below.)

XML tag Action New Java Object Object Branch
<sport> Sport name is extracted. A new proprietary Java object for data storage is created and the sport name is stored there. The data storage object is pushed to the object branch. Also an ArrayList with Events is created and pushed to the object branch and this object is the expected end result of the parsing DataStorage
ArrayList<Event> 
DataStorage
ArrayList<Event> 
<league> League name is extracted. The data storage object from the object branch is fetched and the league name is stored there. - DataStorage
ArrayList<Event> 
<event> Event information is extracted. This will be mapped to a Java object called Event. The ArrayList object is fetched from the object branch and the Event is added to this list. The data storage object is also fetched from the object branch and the sports and league names are read and stored in the Event object. The Event object is pushed to the object branch. Event DataStorage
ArrayList<Event>
Event
<market> Market Information is extracted. A new java object of type Market is instantiated and the information is stored there. The Event object is fetched from the object branch and the market object is added to a list of markets for the event. The market object is pushed to the object branch. Market DataStorage
ArrayList<Event>
Event
Market
<selection> Selection Information is extracted. A new java object of type Selection is instantiated and the information is stored there. The Market object is fetched from the object branch and the selection object is added to a list of selections for the market. The selection object is pushed to the object branch. Selection DataStorage
ArrayList<Event>
Event
Market
Selection 
<price/> Price Information is extracted. No new java object is created The Selection object is fetched from the object branch and the price information is directly mapped to the selection object. - same...
</selection> Remove Selection object from object branch - DataStorage
ArrayList<Event>
Event
Market
<selection> Selection Information is extracted for the second selection. A new java object of type Selection is instantiated and the information is stored there. The Market object is fetched from the object branch and the selection object is added to a list of selections for the market. The selection object is pushed to the object branch. Selection DataStorage
ArrayList<Event>
Event
Market
Selection (no 2)
<price/> Price Information is extracted. No new java object is created The Selection object is fetched from the object branch and the price information is directly mapped to the selection object. It is now the second selection object in the markets selection list that is present on the object branch. - same...
</selection> Remove Selection object from object branch - DataStorage
ArrayList<Event>
Event
Market
</market> Remove Market object from the object branch - DataStorage
ArrayList<Event>
Event
</event> Remove Event object from the object branch - DataStorage
ArrayList<Event>
</league> Fetch the data storage object from the object branch and set the league name to null since it's now no longer valid in the XML scope. - DataStorage
ArrayList<Event>
</sport> Remove DataStorage object from the object branch. - ArrayList<Event>

Example 1

In this example the following XML will be parsed

<event id="1" name ="Foo">
 Main Market Name
 <market id="1" name="Mkt Foo">
 </market>
 <market id="2" name="Mkt Bar">
 </market>
</event>

The result is a java object structure that is one event with a list of two markets.The table below is the flow when parsing the XML

XML Line NGen XML Parser Event Java Object Event Object Branch Content After Operation
<event id="1" name ="Foo"> Start Processor new Event() Event.class (the new event)
-"- Attribute Mappings get event from obj branch
event.setId("1")
event.setName("Foo")
Event.class
Main Market Name Element Text Mappings event.setMainMarket("Main Market Name") Event.class
<market id="1" name="Mkt Foo"> Start Processor new Market() Event.class
Market.class (the new market)
-"- Attribute Mappings market.setId("1")
market.setName("Mkt Foo")
Event.classMarket.class
</market> End Processor get event from obj branch
pop market from obj branch
event.getMarkets.add(market)
Event.class
<market id="2" name="Mkt Bar"> Start Processor new Market() Event.class
Market.class (the new market)
-"- Attribute Mappings market.setId("2")
market.setName("Mkt Bar")
Event.class
Market.class
</market> End Processor get event from obj branch
pop market from obj branch
event.getMarkets.add(market)
Event.class
</event> End Processor Do nothing Event.class

Finally, below is a link to the setting done in JAVA to instruct the flow described above. See file EventMapperExample1.java

Here is a unit test that executes the code.

Example 2

In this example the following XML will be parsed

<sport name="Football">
 <event id="1" name ="Foo">
  <market id="1" name="Mkt Foo">
  </market>
 </event>
</sport>

The result is a java object structure that is one event that contains the sport name with a list of one market. The table below is the flow when parsing the XML. The sport name is temporarily stored in an object called ProprietaryDataStorage

XML Line NGen XML Parser Event Java Object Event Object Branch Content After Operation
<sport name="Football> Start Processor new ProprietaryDataStorage() ProprietaryDataStorage().class
-"- Attribute Mappings Get data storage from obj branch
myDataStorage.setSport();
ProprietaryDataStorage().class
<event id="1" name ="Foo"> Start Processor Get data storage from obj branch:
new Event()
event.setSport(myDataStorage.getSport())
ProprietaryDataStorage().class
Event.class (the new event)
-"- Attribute Mappings Get event from obj branch
event.setId("1")
event.setName("Foo")
ProprietaryDataStorage().class
Event.class
<market id="1" name="Mkt Foo"> Start Processor new Market() ProprietaryDataStorage().class
Event.class
Market.class (the new market)
-"- Attribute Mappings market.setId("1")
market.setName("Mkt Foo")
ProprietaryDataStorage().class
Event.class
Market.class
</market> End Processor Get event from obj branch
Pop market from obj branch
event.getMarkets.add(market)
ProprietaryDataStorage().class
Event.class
</event> End Processor Do nothing ProprietaryDataStorage().class
Event.class
</sport> End Processor pop dataStorage from obj branch Event.class

Finally, below is the link to the setting done in JAVA to instruct the flow described above, see EventMapperExample2.java

Here is a unit test that executes the code.

Example 3

In this example the following XML will be parsed. Selections will not be represented by objects in the Market object, but just as a list of names.

<event id="1" name ="Foo">
 <market id="1" name="Mkt Foo">
  <selection id="1" name="Sel Foo">
   <price odds_frac="1/3" odds_dec="1.33"/>
  </selection>
 </market>
</event>

The result is a java object structure that is one event with a list of two markets. The table below is the flow when parsing the XML

XML Line NGen XML Parser Event Java Object Event Object Branch Content After Operation
<event id="1" name ="Foo"> Start Processor Get data storage from obj branch
new Event()
event.setSport(myDataStorage.getSport())
Event.class (the new event)
-"- Attribute Mappings Get event from obj branch
event.setId("1")
event.setName("Foo")
Event.class
<market id="1", name="Mkt Foo"> Start Processor new Market() Event.class
Market.class (the new market)
-"- Attribute Mappings market.setId("1")
market.setName("Mkt Foo")
Event.class
Market.class
<selection id="1" name="Sel Foo"/> Start Processor Do nothing Event.class
Market.class
-"- Attribute Mappings market.addSelection("Sel Foo") Event.class
Market.class
-"- End Processor Do nothing Event.class
Market.class
<selection id="1" name="Sel Bar"/> Start Processor Do nothing Event.class
Market.class
-"- Attribute Mappings market.addSelection("Sel Bar") Event.class
Market.class
-"- End Processor Do nothing Event.class
Market.class
</market> End Processor Get event from obj branch
Pop market from obj branch
event.getMarkets.add(market)
Event.class
</event> End Processor Do nothing Event.class

Finally, below is the link to the setting done in JAVA to instruct the flow described above, see EventMapperExample3.java

Here is a unit test that executes the code.

Example 4

Describes a mapping from two XML files to a single JAVA object structure. In this example the following two XML files will be parsed. The second XML contains more information that shall be added to a specific market.

<event id="1" name ="Foo">
 Main Market Name
 <market id="1" name="Mkt Foo">
 </market>
 <market id="2" name="Mkt Bar">
 </market>
</event>

<event id="1">
 <market id=\"1\">
  <selection id=\"1\" name=\"Sel Foo\" price=\"7/5\"/>
  <selection id=\"2\" name=\"Sel Bar\" price=\"5/3\"/>
  <selection id=\"3\" name=\"Sel Mitzvah\" price=\"3/1\"/>
 </market>
</event>

The first XML is parsed according to Example 1.

When the second document parser is instantiated, it is not only setup with the parser settings for the second XML. It is also setup with the output data from the first parser. This imitates a second API call to extend the existing data. Old data is injected by pre populating an Object Branch and then use that Object Branch in the second parsing. The setup in JAVA is shown below in method testEventMapperExample4 of EventMapperExampleTest.java in Github.

The result is a java object structure that is one event with a list of two markets. The table below is the flow when parsing the XML. Note that the Start Processor is used. The Start Processor is a callback that is invoked each time a start tag for this XML element is found, see javadoc for ElementParserSettings#setElementStartProcessor.

XML Line NGen XML Parser Event Java Object Event Object Branch Content After Operation
<event id="1"> Start Processor Do Nothing Event.class (the old event)
-"- Attribute Mappings Do Nothing Event.class
<market id="1"> Start Processor(Market, id) Get event from obj branch
Search market in event by ID
Push found market to obj branch
Event.class
Market.class (the found market)
-"- Attribute Mapper Do Nothing Event.class
Market.class
<selection id="1" name="Sel Foo" price="7/5"/> Start Processor Create an Selection object
Push Selection object to object branch
Event.class
Market.class
Selection.class
-"- Attribute Mappings Map attributes to Selection Object Event.class
Market.class
Selection.class
-"- End Processor Add Selection object to Market
Pop Selection object from Object Branch.
Event.class
Market.class
<selection id="2" name="Sel Bar" price="5/3"/> Start Processor Create an Selection object
Push Selection object to object branch
Event.class
Market.class
Selection.class
-"- Attribute Mappings Map attributes to Selection Object Event.class
Market.class
Selection.class
-"- End Processor Add Selection object to Market
Pop Selection object from Object Branch.
Event.class
Market.class
<selection id="3" name="Sel Mitzvah" price="3/1"/> Start Processor Create an Selection object
Push Selection object to object branch
Event.class
Market.class
Selection.class
-"- Attribute Mappings Map attributes to Selection Object Event.class
Market.class
Selection.class
-"- End Processor Add Selection object to Market
Pop Selection object from Object Branch.
Event.class
Market.class
</market> End Processor Pop market from obj branch Event.class
</event> End Processor Do nothing Event.class

Finally, below is the link to the setting done in JAVA to instruct the flow described above, see EventMapperExample4.java

Here is a unit test that executes the code.