Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finish shaving the yak for <xs:all> elements #1

Closed
duck2 opened this issue Jun 27, 2019 · 16 comments
Closed

Finish shaving the yak for <xs:all> elements #1

duck2 opened this issue Jun 27, 2019 · 16 comments

Comments

@duck2
Copy link
Member

duck2 commented Jun 27, 2019

This has really turned into a snake story.

What we are seeking is a solution to the state explosion problem when we want to accept N independent inputs, in any order.

The solution is running N parallel state machines:
fsm

The problem with the solution is knowing how to run parallel state machines. When generating a single state machine, it is OK to generate something like this:

while(1){
    switch(state){
    case 0:
        if(inp == "sizing"){
            load(current_element);
            state = 1;
        } else {
            error("expected sizing, found %s" % inp);
        }
        break;
    case 1:
        if(inp == "end of input"){
            goto accept;
        } else {
            error("expected end of input, found %s" % inp);
        }
        break;
    }
    if(next_element) inp = next_element.name;
    else inp = "end of input";
}
accept:
    return;

When generating several state machines, we can:

  1. Make state machines which silently accept inputs unrelated to them:

modifiedfsm

  1. Add dispatching code which first checks the input and passes it only to the relevant state machine(s).

We also need to know when to accept the input. For that, we would need to check if each of the parallel state machines ended up in an accepting state.

The code for both are very inelegant. If we make silently accepting state machines, it's something like this:

while(1){
    switch(state1){
    case 0:
        if(inp == "sizing"){
            load(current_element);
            state1 = 1;
        } else if(inp == "connection_block" || inp == "default_fc") {
            /* do nothing */
        } else {
            error("expected sizing or connection_block or default_fc, found %s" % inp);
        }
        break;
    case 1:
        if(inp == "end of input"){
            accept[0] = 1;
        } else if(inp == "connection_block" || inp == "default_fc") {
            /* do nothing */
        } else {
            error("expected end of input or connection_block or default_fc, found %s" % inp);
        }
        break;
    }
    switch(state2){
    case 0:
        if(inp == "connection_block"){
            load(current_element);
            state1 = 1;
        } else if(inp == "sizing" || inp == "default_fc") {
            /* do nothing */
    [...]
    if(all_accepted(accept)) break;
    if(next_element) inp = next_element.name;
    else inp = "end of input";
}

The code for the second would look like this:

while(1){
    if(inp == "sizing" || inp == "end of input"){
        switch(state1){
        case 0:
            if(inp == "sizing"){
                load(current_element);
                state1 = 1;
        } else {
            error("expected sizing, found %s" % inp);
        }
        break;
        case 1:
            if(inp == "end of input"){
                accept[0] = 1;
            } else {
                error("expected end of input, found %s" % inp);
            }
        break;
        }
    } else if(inp == "connection_block" || inp == "end of input") {
        switch(state2){
        case 0:
            if(inp == "connection_block"){
                load(current_element);
                state1 = 1;
            } else {
    [...]
    if(all_accepted(accept)) break;
    if(next_element) inp = next_element.name;
    else inp = "end of input";
}

which are definitely not OK to generate, since the code is too long and it's not easy to see what it's doing. On top of that, error messages and checking for accepted states are not complete. I see no way to elegantly implement them in a while-switch loop either.

@duck2
Copy link
Member Author

duck2 commented Jun 27, 2019

A good step to take here is bringing in state tables for transitions. If we had a magic state table where we could look up next_state = states[current_state][current_input], a single state machine would reduce to this:

states = {-1};
states[0][SIZING] = 1;
states[1][END] = ACCEPT;
[...]
while(1){
    state = states[state][inp];
    if(state == -1) error(state, inp); /* can read the table and see what was expected */
    if(state == ACCEPT) break;
    if(next_element) inp = tokenize(next_element.name);
    else inp = END;
}

And, multiple state machines could be run like this:

states0 = {-1};
states0[0][SIZING] = 1;
states0[1][END] = ACCEPT;
states1 = {-1};
states1[0][CONNECTION_BLOCK] = 1;
states1[1][END] = ACCEPT;
[...]
while(1){
    if(inp == SIZING || inp == END){
        state0 = states0[state0][inp];
    }else if(inp == CONNECTION_BLOCK || inp == END){
        state1 = states1[state1][inp];
    }
    if(state0 == -1 || state1 == -1) error(state0, state1, inp); /* I'm sure this would work out. */
    if(state0 == ACCEPT && state1 == ACCEPT) break;
    if(next_element) inp = tokenize(next_element.name);
    else inp = END;
}

This makes much more sense, there aren't countless string comparisons and indentation levels. But this too, misses an important point. The state transitions have actions associated with them. How do we load the elements?

The tokenize function can look like magic, but I'm sure we could have a perfect hash function to reduce every tokenize to a hash+strcmp. I think even if we don't implement it, ~6-7 strcmps for every element won't hurt anyone.

@duck2
Copy link
Member Author

duck2 commented Jun 27, 2019

About the actions, I think I can bail out to using if statements, since typically only a few state transitions in xs:alls have actions associated with them:

states0 = {-1};
states0[0][SIZING] = 1;
states0[1][END] = ACCEPT;
states1 = {-1};
states1[0][CONNECTION_BLOCK] = 1;
states1[1][END] = ACCEPT;
[...]
while(1){
    if(inp == SIZING || inp == END){
        state0 = states0[state0][inp];
        if(state0 == 0 && inp == SIZING){
            load_element(current_node); /* This isn't the same code for every field. If it was so, we could make something more elegant. */
        }
    }else if(inp == CONNECTION_BLOCK || inp == END){
        state1 = states1[state1][inp];
        if(state1 == 1 && inp == CONNECTION_BLOCK){
            load_element(current_node);
        }
    }
    if(state0 == -1 || state1 == -1) error(state0, state1, inp); /* I'm sure this would work out. */
    if(state0 == ACCEPT && state1 == ACCEPT) break;
    if(next_element) inp = tokenize(next_element.name);
    else inp = END;
}

At least we have separated the element loading logic(side effect) from the validation&error logic(pure), which is a win.

Phew. Now, this is a lot of work to do in Python, but at least I poured my floating thoughts about this here.

@mithro
Copy link
Member

mithro commented Jun 28, 2019

@duck2 - So the only <xs:all> I could find in the xsd files we care about is the following;

  <xs:complexType name="device">
    <xs:all>
      <xs:element name="sizing">
        <xs:complexType>
          <xs:attribute name="R_minW_nmos" type="xs:double" use="required"/>
          <xs:attribute name="R_minW_pmos" type="xs:double" use="required"/>
        </xs:complexType>
      </xs:element>
      <xs:element name="connection_block">
        <xs:complexType>
          <xs:attribute name="input_switch_name" type="xs:string" use="required"/>
        </xs:complexType>
      </xs:element>
      <xs:element name="area">
        <xs:complexType>
          <xs:attribute name="grid_logic_tile_area" type="xs:double" use="required"/>
        </xs:complexType>
      </xs:element>
      <xs:element name="switch_block">
        <xs:complexType>
          <xs:attribute name="type" type="switch_block_type" use="required"/>
          <xs:attribute name="fs" type="xs:int"/>
        </xs:complexType>
      </xs:element>
      <xs:element name="chan_width_distr" minOccurs="0" type="chan_width_distr"/>
      <xs:element name="default_fc" minOccurs="0">
        <xs:complexType>
          <xs:attribute name="in_type" type="fc_type_enum"/>
          <xs:attribute name="in_val" type="xs:double"/>
          <xs:attribute name="out_type" type="fc_type_enum"/>
          <xs:attribute name="out_val" type="xs:double"/>
        </xs:complexType>
      </xs:element>
    </xs:all>
  </xs:complexType>

@mithro
Copy link
Member

mithro commented Jun 28, 2019

Ignoring the inside of each of the xs:element, you get the following;

  <xs:complexType name="device">
    <xs:all>
      <xs:element name="sizing"/>
      <xs:element name="connection_block"/>
      <xs:element name="area"/>
      <xs:element name="switch_block"/>
      <xs:element name="chan_width_distr" minOccurs="0" type="chan_width_distr"/>
      <xs:element name="default_fc" minOccurs="0"/>
    </xs:all>
  </xs:complexType>

@mithro
Copy link
Member

mithro commented Jun 28, 2019

This says, you want;

  • One sizing element.
  • One connection_block element.
  • One area element.
  • One switch_block element.
  • Zero or One chan_width_distr element.
  • Zero or One default_fc element.

@mithro
Copy link
Member

mithro commented Jun 28, 2019

Then you could end up with something like this?

def validate_device(device):
  if 'sizing' not in device:
    raise TypeError('sizing tag missing from device tag')
  if 'connection_block' not in device:
    raise TypeError('connection_block tag missing from device tag')
  ...

def parse_sizing(tag):
  ....

def parse_connection_block(tag):
  connection_block = {}
  ....
  validate_connection_block(connection_block)
  return connection_block

def parse_device(tag):
  device = {}
  sub_tag_parsers = {
    'sizing': parse_sizing,
    'connection_block': parse_connection_block,
    'area': parse_area,
    ...,
  } 
  for child in children:
    if child.tag not in sub_tag_parsers:
       raise TypeError("Unknown tag {} while parsing device".format(child.tag))
    if child.tag in device:
       raise TypeError("Duplicate tag {} while parsing device".format(child.tag))
    device[child.tag] = sub_tag_parsers(child)
  validate_device(device)
  return device

@duck2
Copy link
Member Author

duck2 commented Jun 28, 2019

Putting parse functions in a map is a good idea! I can make another table and stuff function pointers in it.

Unfortunately, counting to validate only works well when we only have <xs:element>s in the model group. If we had something like this:

  <xs:complexType name="device">
    <xs:all>
      <xs:element name="sizing"/>
      <xs:element name="connection_block"/>
      <xs:sequence>
          <xs:choice>
              <xs:element name="area"/>
              <xs:element name="area_legacy"/>
          </xs:choice>
          <xs:element name="switch_block" maxOccurs="unbounded"/>
      </xs:sequence>
      <xs:element name="chan_width_distr" minOccurs="0" type="chan_width_distr"/>
      <xs:element name="another_chan_width_distr" type="chan_width_distr" />
      <xs:element name="default_fc" minOccurs="0"/>
    </xs:all>
  </xs:complexType>

then this would get hairy, since we need to count the occurrences of (<area>|<area_legacy>)<switch_block>+ in the input, which requires state machine-like code. If we do that for every member group, we end up with parallel state machines.

@duck2
Copy link
Member Author

duck2 commented Jun 28, 2019

We have only one big xs:all currently, but some types are written as sequences or choices while they really should be alls. For instance:

  <xs:element name="architecture">
    <xs:complexType>
      <xs:choice minOccurs="0" maxOccurs="unbounded">
        <xs:element name="models" type="models"/>
        <xs:element name="layout" type="layout"/>
        <xs:element name="device" type="device"/>
        <xs:element name="switchlist" type="switchlist"/>
        <xs:element name="switchblocklist" type="switchblocklist"/>
        <xs:element name="segmentlist" type="segmentlist"/>
        <xs:element name="power" type="global_power"/>
        <xs:element name="clocks" type="clocks"/>
        <xs:element name="directlist" type="directlist"/>
        <xs:element name="tiles" type="tiles"/>
        <xs:element name="complexblocklist" type="complexblocklist"/>
      </xs:choice>
    </xs:complexType>
  </xs:element>

or

  <xs:complexType name="segment">
    <xs:choice maxOccurs="unbounded">
      <xs:element name="sb" type="segment_block"/>
      <xs:element name="cb" type="segment_block"/>
      <xs:element name="mux" type="segment_mux"/>
      <xs:element name="wire_switch" type="segment_wire_switch"/>
      <xs:element name="opin_switch" type="segment_wire_switch"/>
    </xs:choice>
[...]

@duck2
Copy link
Member Author

duck2 commented Jun 28, 2019

This issue also applies to <xs:attribute> tags, since they also occur in any order.

@mithro
Copy link
Member

mithro commented Jun 28, 2019

Then you could end up with something like this?

def parse_arch(tag):
  arch = {
    'models': [],
    'layouts': [],
    ....
  }
  sub_tag_parsers = {
    'model': parse_model,
    'layout': parse_layout,
    ...,
  } 
  for child in children:
    if child.tag not in sub_tag_parsers:
       raise TypeError("Unknown tag {} while parsing device".format(child.tag))
    subparser = sub_tag_parsers[child.tag]
    current = arch.get(child.tag, None)
    if type(arch[child.tag]) == list:
        arch[child.tag].append(subparser(child))
    elif arch[child.tag] is None:
        arch[child.tag] = subparser(child)
   else:
        raise TypeError("Duplicate tag {} while parsing device".format(child.tag))
  validate_arch(arch)
  return arch

@mithro
Copy link
Member

mithro commented Jun 28, 2019

(FYI that parse doesn't quite match what is described in the xsd... The xsd says you can have at most one models tag which can have multiple model tags under it.)

@duck2
Copy link
Member Author

duck2 commented Jun 30, 2019

Turns out xmlschema itself doesn't support sequences in alls, which is a XSD 1.1 feature. This makes everything about the parallel state machines and all void. I'm currently implementing xs:alls with bitsets.

@mithro
Copy link
Member

mithro commented Jun 30, 2019

Can you add support for sequences in alls to xmlschema?

@duck2
Copy link
Member Author

duck2 commented Jun 30, 2019

I thought about that, but my impression is that the maintainer of xmlschema would like to release support for XSD 1.1 when it's ready and it's not complete yet.

It does have an XMLSchema11 class, the definition of which starts like

# ++++ UNDER DEVELOPMENT, DO NOT USE!!! ++++
class XMLSchema11(XMLSchemaBase):
[...]

Currently, that class can parse this XSD:

<?xml version="1.0"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
	<xsd:group name="hey">
		<xsd:sequence>
			<xsd:element name="greeting" type="xsd:string"></xsd:element>
			<xsd:element name="name" type="xsd:string"></xsd:element>
		</xsd:sequence>
	</xsd:group>
    
	<xsd:complexType name="hello">
		<xsd:all>
			<xsd:group ref="hey"/>
			<xsd:element name="whatsup" type="xsd:int"></xsd:element>
		</xsd:all>
	</xsd:complexType>

	<xsd:element name="hello" type="hello"></xsd:element>
</xsd:schema>

and produce something like this from it:

Xsd11Group(model='all', occurs=[1, 1])
{'_group': [XsdGroup(ref='hey', model='sequence', occurs=[1, 1]),
            Xsd11Element(name='whatsup', occurs=[1, 1])],
 'annotation': None,
 'elem': <Element '{http://www.w3.org/2001/XMLSchema}all' at 0x7f8c14390ae8>,
 'errors': [],
 'model': 'all',
 'name': None,
 'parent': Xsd11ComplexType(name='hello'),
 'schema': XMLSchema11(basename='hello.xsd', namespace=''),
 'validation': 'strict'}

XsdGroup(ref='hey', model='sequence', occurs=[1, 1])
{'_group': [Xsd11Group(name='hey', model='sequence', occurs=[1, 1])],
 'annotation': None,
 'elem': <Element '{http://www.w3.org/2001/XMLSchema}group' at 0x7f8c14390b38>,
 'errors': [],
 'model': 'sequence',
 'name': 'hey',
 'parent': Xsd11Group(model='all', occurs=[1, 1]),
 'schema': XMLSchema11(basename='hello.xsd', namespace=''),
 'validation': 'strict'}

Xsd11Group(name='hey', model='sequence', occurs=[1, 1])
{'_group': [Xsd11Element(name='greeting', occurs=[1, 1]),
            Xsd11Element(name='name', occurs=[1, 1])],
 'annotation': None,
 'elem': <Element '{http://www.w3.org/2001/XMLSchema}group' at 0x7f8c143907c8>,
 'errors': [],
 'model': 'sequence',
 'name': 'hey',
 'schema': XMLSchema11(basename='hello.xsd', namespace=''),
 'validation': 'strict'}

@duck2
Copy link
Member Author

duck2 commented Jun 30, 2019

After reading the W3C standard more carefully, I see group support in xs:all is weird(even in XSD 1.1):

  • You can only put referenced groups(<group ref=...>) in xs:all. It's my fault not reading the standard very carefully before- I saw group in all's possible children and assumed it would be easy to put sequences inside alls.
  • You can't assign min_occurs or max_occurs to anything inside an xs:all.

Now, in this case, we know elements inside alls are only bits(found or not found). Sequences or choices can be run aside as slightly more complicated state machines. Then it's fine to emit elements as bitsets and sequences/choices as state machines.

@duck2
Copy link
Member Author

duck2 commented Jul 21, 2019

I ended up just implementing XSD 1.0 alls. They are good enough for most purposes.

@duck2 duck2 closed this as completed Jul 21, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants