Machine readable format #84

burner1024 · 2020-02-15T13:48:26Z

I'm thinking that it could be beneficial to have IESDP data in machine readable format. Currently there are multiple tools all parsing and reading IE files on their own: DLTCEP, NearInfinity, iesh, probably others. Inevitably, differencies and inconsistencies arise. But if file formats were described in a structured way, we could have a single, truly definitive source of information, and pull updates from it (semi)automatically. (Which is my ulterior motive for BGforge MLS.)

As an example, take a look at sfall docs: functions listed in a yaml file, with some python scripting it converted to markdown, and markdown is published with jekyll, resulting in a nice site.

If you look at opcode description, it's already basically yaml, with html additions. Could probably converted into true yaml semi-automatically.

To be clear, this is not about IESDP looks, just internal data representation: binary file formats, script actions/triggers, effects.

Tagging @Argent77, @4Luke4, @AvengerTeamBG, @FredrikLindgren, @ALIENQuake.

lynxlynxlynx · 2020-02-15T14:25:29Z

Something like this might have been useful 15 years ago, but now that almost everything is already written, I don't see anyone gaining anything by rewriting all the parsers, writing generators and rewriting the way the data is stored.

On the IESDP side of things the last part may still happen though, since it could then convey the data better (eg. #6 #19 #26).

burner1024 · 2020-02-16T05:36:27Z

almost everything is already written

MLS isn't, hence this issue.
Obviously, I can't speak for devs of other tools, but MLS would benefit.

@lynxlynxlynx let me put it like this: if a such pull request was coming your way, would you be opposed to it?

lynxlynxlynx · 2020-02-16T18:48:03Z

Depends on what everything you'd stick in it. Eg. if you want to make the file formats machine readable, that's great and wouldn't need to affect deployment, since jekyll has the needed support: https://jekyllrb.com/docs/datafiles/

burner1024 · 2020-02-16T19:24:58Z

Do you want to define a format for, say, opcodes? Or I should do that myself?

lynxlynxlynx · 2020-02-16T20:37:39Z

I don't know what you mean, since the opcodes are already easily machine readable. Everything is parametrized and the description is plain html most of the time (a few liquid tags here and there).

burner1024 · 2020-02-17T05:00:12Z

Well, html with liquid parts is not exactly machine readable.
Still being parametrized is exactly why it's easier to start with them.

To make an example, maybe something like this
_data/opcodes.yml

- n: 0
  name: AC vs. Damage Type Modifier
  type: stat
  param1: AC Modifier
  param2: Type
  bg1: 1
  bg2: 1
  bgee: 1
  iwd1: 1
  iwd2: 0
  pst: 1
  doc: |
    Applies the modifier value specified by the 'AC Modifier' field to the category specified
    by the 'Type' field.

    Known values for 'Type' are:
      - 0   All
      - 1   Crushing
      - 2   Missile
      - 4   Piercing
      - 8   Slashing
      - 16  Base AC setting (sets the targets AC to the value specified by the 'AC Modifier' field.

    If the targets AC is already 'AC Modifier' or below, this effect will do nothing).

    Each modifier type to AC from this opcode is capped to the range [-20, 20]. Each AC type total is capped to the range [-32768,32767].

  notes:
    - |
      IWD1 and PST use a slightly different version. The "Base AC" sets to **field - 1** instead.
    
      IWD2 uses different parameters altogether.

ALIENQuake · 2020-02-17T06:50:43Z

The "Known values" can also be one of yaml key with values, right?

lynxlynxlynx · 2020-02-17T12:51:01Z

The notes are sometimes interspersed in the description, not always at the end. And there can be several of different severities, so the proposed format is not good enough. I also see no way to avoid needing to clean up the descriptions on the user side of things, eg. to get rid of broken links.

And I'm definitely opposed to cramming them all into one file. That's a simple step users can do if they need it.

burner1024 · 2020-02-17T13:40:29Z

The "Known values" can also be one of yaml key with values, right?

Right, but I'm not sure if will be possible to parametrize them and keep current html layout unchanged. And I don't know how I'd use this data yet. If anyone needs it, they are welcome to chime in.

The notes are sometimes interspersed in the description, not always at the end. And there can be several of different severities, so the proposed format is not good enough. I also see no way to avoid needing to clean up the descriptions on the user side of things, eg. to get rid of broken links.

Severity is easy to deal with, adding a separate stanza (warning, important) should be enough.
Being interspersed is harder. Not sure if there's a way to parametrize this properly. I'll search around, if not, I guess keeping the doc monolithical will have to do.

Could you clarify the bit about broken links? Which ones do you mean, why are they broken?

And I'm definitely opposed to cramming them all into one file. That's a simple step users can do if they need it.

Jekyll can read data from dirs, something like _data/opcodes/0.yml. Though it does start to sound like simply moving/renaming opcodes dir, but I'm trying to work out something that'll work for actions and stuff with minimal changes later.

lynxlynxlynx · 2020-02-17T17:15:02Z

Links: opcodes link to each other via anchors, so you'd end up with broken links unless you replicated the naming and therefore layout (all in one file).

Actions and triggers won't be much better; except for less data, the same problems apply.

burner1024 · 2020-02-19T07:03:13Z

Depending on how this works out, links might be kept and lead back to IESDP:

Maybe it's better to start with something else than opcodes, indeed. Here, I took a shot at actions. Sample data is stored in the data files.
I wanted to avoid duplicating info (action numbers in both filenames and files themselves, etc), but liquid is not flexible enough, so the overall system is almost the same as opcodes, just doesn't require setting game to 0 to filter it out (can be just skipped).
You can launch it locally, check out BG1 and IWD2 action pages.

lynxlynxlynx · 2020-02-22T08:28:49Z

Looks good. From what I can tell, action descriptions only use colors (can be replaced) and links (markdownify takes care of that?) on top of what's shown above.

burner1024 · 2020-02-23T08:05:56Z

Loaded BG2 actions.
There's minor difference in styling, mostly because markdownify wraps everything into paragraphs. I made adjustments, but didn't go out of my way to make it exactly the same before getting feedback.
(One thing I did fix intentionally is ugly codeblocks)

If all's good, next step would be to load actions from other files, run comm/md5 to find and delete the identical ones, then some diffs to find and combine those that only differ in wording, then move all the rest as variants.

lynxlynxlynx · 2020-02-23T08:45:10Z

Sounds good, but let's resolve #85 first, so it's clearer how some of the more interesting actions turn up (since you skipped a few).

burner1024 · 2020-02-23T08:46:01Z

I didn't skip anything on purpose, could you point out some?

lynxlynxlynx · 2020-02-23T08:54:32Z

Most for #85, but also 349 that doesn't have any colouring. Do something like find | wc -l in the dir and you'll see it doesn't match the max action id + 1.

burner1024 · 2020-02-23T11:49:03Z

These were missed due to a mistake in the script, now included. Count won't match anyway, since there are gaps (NID*).
349 is a special one, it has some formatting messed up, but I think it can be corrected latter, after adding variants, along with other manual cleanups here and there.
The table in 349 is just styled as code. I think it's not worth to add a separate kludge just for it, considering that it looks fine, but if you think it's important to have the same background, I guess it can be done.
Pushed updated data.

lynxlynxlynx · 2020-02-23T12:14:48Z

NIDSpecial1 and co are not a gap, just useless to the modder.

burner1024 · 2020-02-23T12:20:51Z

Well, they are counted as one "action" as far as Jekyll is concerned. Otherwise, they'd produce a full list of "not working" actions, so I thought to keep them combined.
If everything's looks so far, please let me know, I'll proceed with variants.

lynxlynxlynx · 2020-02-23T12:24:11Z

It doesn't matter for the output, but for people like you that want it as data, it makes no sense to jumble them together.

burner1024 · 2020-02-23T12:30:33Z

I'm not sure what use they are, only completeness. Certainly no point in adding them to completion, everything not working will be skipped.
So do you want to separate them?

lynxlynxlynx · 2020-02-23T12:36:42Z

Yes, just for consistency.

burner1024 · 2020-02-23T13:07:10Z

All right. I will do that at manual stage. Anything else?

lynxlynxlynx · 2020-02-23T13:11:24Z

Nothing comes to mind, except that let's do plain bg2 first and iron any problems out before continuing. Also, another PR is open that touches the ee action list and it would just cause conflicts if it wasn't merged first.

burner1024 · 2020-02-23T13:33:28Z

You mean merge upstream? If just BG2 has manual updates applied and merged, that'll make it a little harder to search for differences later. Also, there won't be links to variants, since that data doesn't exist yet.

One more thing I'd like to point out, currently action aliases are added to the same file, how's that?

lynxlynxlynx · 2020-02-23T13:53:45Z

Ok, then wait a bit, since @4Luke4 is almost done.

Aliases I don't like that way, since I think eg. the RES variants are present in some games without the default version, so it would complicate the layout logic to iterate properly. Just keep it KISS and create a separate file.

burner1024 · 2020-02-23T14:11:50Z

I would like to avoid duplicating descriptions in data and displaying duplicates too, but not sure how to do that while allowing for variants. I'll think about that meanwhile.

Edit: ah, current version doesn't have variant links too, so my second point about merging is moot.

lynxlynxlynx · 2020-02-23T14:42:17Z

Ideally the descriptions wouldn't be duplicates anyway, since they should explain the parameters used.

burner1024 · 2020-02-23T14:57:59Z

There's many Dialogue/Dialog synonyms, though.

lynxlynxlynx · 2020-07-27T19:58:17Z

What about the various tables, eg. for bits. Would you just leave that as-is? I don't know how much you need for your golem.

Offsets as stripped hex sound fine. But with the work you've already done, it'd be simple to compute them. I'd say specify the first and last, compute the in-between and use the last offset as an assert to verify the data still matches.

The spl example has a bunch of "usused" typos btw.

burner1024 · 2020-07-27T20:59:19Z

Golem is a different beast, it's a CI runner for mods. I'm talking about importing into MLS, a VScode extension.

About other tables, for now I don't see the need. Well, it can be done at any point, so they can wait.

Offsets can indeed be computed, I haven't considered that. I'll see what I can do.

And as for programmatic ids, I don't know whether they'd be useful elsewhere, but they are plain additions, so I can add them to my own fork which would pull the changes from upstream automatically, and then import from that.

burner1024 · 2020-07-27T22:46:46Z

OK, offsets are now calculated automatically. I didn't find a way to assert natively in Jekyll, so here's a kludge, along with another one for padding.

lynxlynxlynx · 2020-07-28T05:52:51Z

Looks fine. Interesting that addition works with those stripped hex values.

burner1024 · 2020-07-28T13:08:52Z

There's just one value left, at the end, I left it in hex, since it doesn't make much difference.

burner1024 · 2020-07-28T15:48:19Z

Stripping paragraph tags doesn't work well when there are multiple paragraphs. It would take a yet one more custom filter to strip just the leading/trailing ones.
The alternative is keep using only html in descriptions, although that wouldn't be consistent with how opcodes/actions are handled.

lynxlynxlynx · 2020-07-28T15:57:43Z

if someone cares enough, they can always clean it up into markdown first. No need to pile too much work on yourself.

burner1024 · 2020-07-28T16:22:03Z

I didn't consider it at first, but actually offset descriptions can be imported too, for toolips. And VScode does use markdown under the hood. So in fact it's all the same to me. It's really down to your preference of keeping them in html or md.

lynxlynxlynx · 2020-07-28T16:39:03Z

HTML is more versatile, but if everything we need is possible through markdown, I see no objection to migrating, if that's better for the plugin.

burner1024 · 2020-07-28T20:51:33Z

I dropped tag stripping and just re-styled paragraphs in the table. It's a more correct solution anyway. The resulting style is not perfect 1-1, but close (I can put more effort into that if needed).
Added other spl data, so the SPL page is basically finished.

lynxlynxlynx · 2020-07-28T20:57:21Z

Is it rendered somewhere?

burner1024 · 2020-07-28T20:59:57Z

No, but I can set it up, will share a link.

burner1024 · 2020-07-29T02:22:24Z

here

lynxlynxlynx · 2020-07-29T08:28:00Z

looks fine 👍

burner1024 · 2020-07-29T19:31:21Z

Added itm v1, eff v1 and v2. Feature blocks seem to be identical except some spelling, so I just symilinked itm to spl one.
I think this is good enough for a start.

Next I plan to try an import, see if any issues come up, maybe some adjustments will be needed. Then send a pull, after it's merged I'll be able to make a real import and publish a release. And then make an announcement on the forum, see if anyone else wants to jump in.

Much of the text is still duplicated between formats (target type, timing, resistance, etc) and probably could be externalized into includes too, but that would make importing somewhat harder. If you ever get to that, please notify.

lynxlynxlynx · 2020-07-29T20:45:10Z

itmv1 lost the "Melee animation" link. Feature block looks worse due to removal of spacing around =. And timing mode info is better in itm (and target type). It might be better to link it the other way around, but it's now simpler just to add the missing info.

eff look fine.

burner1024 · 2020-07-29T21:34:15Z

Addressed all, also applied the same style to Spell type.

lynxlynxlynx · 2020-07-29T21:50:37Z

Target type and timing mode still lost some info.

burner1024 · 2020-07-29T22:05:03Z

Ah, sorry, missed that.
But it looks like like target type 7 clause in ITM doesn't apply to SPL? Then they have to be kept separate. (Or change clause text to something like "In ITM: ranged ability type only, otherwise no target").
As for timing mode, why it isn't in sync? Is it just that spl page wasn't updated when itm was, or it actually works differently?

lynxlynxlynx · 2020-07-30T08:35:35Z

type: sure, be conservative
timing: they do behave the same.

burner1024 · 2020-07-30T10:10:10Z

Cool, added the missing lines.

burner1024 · 2020-07-30T21:56:40Z

Looks good:

If there isn't anything else, I can send a PR.

lynxlynxlynx · 2020-07-31T07:20:51Z

Nice, feel free to go ahead. :)

burner1024 · 2020-08-01T21:01:02Z

Simple script to convert offsets table into yaml: offsets_to_yaml.py.zip

File naming example: _data/file_formats/itm_v1/extended_header.yml (filename is singular).

Format reference:

- desc: |             # required - markdown
    Attack type
    - 0 = None
    - 1 = Melee
  type: char         # required.
  length: 1          # optional, if not specified, size inferred from type. Known types: char, byte, word, dword, resref, strref
  offset: 0x1        # optional, if specified, current offset is checked against this value, if not equal, an error is raised
  mult: 3            # optional, allows to do stuff like "2*3 (word)"
  unknown: 1         # optional, applies "unknown" style span
  unused: 1          # optional, appends " (unused)" to description and applies "unknown" style span

lynxlynxlynx · 2022-10-24T13:54:38Z

While researching for the recent opcode regression, I came across a new liquid link tag. Unfortunately it doesn't look like it supports creating relative paths:
https://jekyllrb.com/docs/liquid/tags/#links

burner1024 · 2023-06-22T02:00:22Z

Something I came by recently: https://kaitai.io/.
A similar yaml format declaration, which then compiles into bindings for C++, Python, Java, etc. Could be actually used to unify databases of formats in GemRB, iesh, NearInfinity, etc.
Just food for thought, not suggesting anything yet.

lynxlynxlynx · 2023-06-22T18:21:30Z

Looks familiar, but I don't think it's feasible. There are so many exceptions, dependencies between fields and special treatment needed besides the fact that we'd need per-game-version copies of many of the formats.

burner1024 mentioned this issue Feb 21, 2020

More formats/constants BGforgeNet/BGforge-MLS-IElib#3

Closed

burner1024 mentioned this issue Jul 31, 2020

Externalised file formats #99

Merged

lynxlynxlynx mentioned this issue May 18, 2022

Opcode #135 / opcode #335 #130

Merged

burner1024 added a commit to BGforgeNet/iesdp that referenced this issue Feb 5, 2023

sto v1 format formalized, ref Gibberlings3#84

e50e28f

Machine readable format #84

Machine readable format #84

Comments

burner1024 commented Feb 15, 2020 • edited Loading

lynxlynxlynx commented Feb 15, 2020

burner1024 commented Feb 16, 2020

lynxlynxlynx commented Feb 16, 2020 • edited Loading

burner1024 commented Feb 16, 2020 • edited Loading

lynxlynxlynx commented Feb 16, 2020

burner1024 commented Feb 17, 2020 • edited Loading

ALIENQuake commented Feb 17, 2020

lynxlynxlynx commented Feb 17, 2020

burner1024 commented Feb 17, 2020

lynxlynxlynx commented Feb 17, 2020

burner1024 commented Feb 19, 2020 • edited Loading

lynxlynxlynx commented Feb 22, 2020

burner1024 commented Feb 23, 2020 • edited Loading

lynxlynxlynx commented Feb 23, 2020

burner1024 commented Feb 23, 2020

lynxlynxlynx commented Feb 23, 2020

burner1024 commented Feb 23, 2020

lynxlynxlynx commented Feb 23, 2020

burner1024 commented Feb 23, 2020 • edited Loading

lynxlynxlynx commented Feb 23, 2020

burner1024 commented Feb 23, 2020

lynxlynxlynx commented Feb 23, 2020

burner1024 commented Feb 23, 2020

lynxlynxlynx commented Feb 23, 2020

burner1024 commented Feb 23, 2020

lynxlynxlynx commented Feb 23, 2020

burner1024 commented Feb 23, 2020 • edited Loading

lynxlynxlynx commented Feb 23, 2020

burner1024 commented Feb 23, 2020

lynxlynxlynx commented Jul 27, 2020

burner1024 commented Jul 27, 2020 • edited Loading

burner1024 commented Jul 27, 2020

lynxlynxlynx commented Jul 28, 2020

burner1024 commented Jul 28, 2020

burner1024 commented Jul 28, 2020

lynxlynxlynx commented Jul 28, 2020

burner1024 commented Jul 28, 2020 • edited Loading

lynxlynxlynx commented Jul 28, 2020

burner1024 commented Jul 28, 2020

lynxlynxlynx commented Jul 28, 2020

burner1024 commented Jul 28, 2020

burner1024 commented Jul 29, 2020

lynxlynxlynx commented Jul 29, 2020

burner1024 commented Jul 29, 2020

lynxlynxlynx commented Jul 29, 2020

burner1024 commented Jul 29, 2020

lynxlynxlynx commented Jul 29, 2020

burner1024 commented Jul 29, 2020 • edited Loading

lynxlynxlynx commented Jul 30, 2020

burner1024 commented Jul 30, 2020

burner1024 commented Jul 30, 2020

lynxlynxlynx commented Jul 31, 2020

burner1024 commented Aug 1, 2020

lynxlynxlynx commented Oct 24, 2022

burner1024 commented Jun 22, 2023

lynxlynxlynx commented Jun 22, 2023

burner1024 commented Feb 15, 2020 •

edited

Loading

lynxlynxlynx commented Feb 16, 2020 •

edited

Loading

burner1024 commented Feb 16, 2020 •

edited

Loading

burner1024 commented Feb 17, 2020 •

edited

Loading

burner1024 commented Feb 19, 2020 •

edited

Loading

burner1024 commented Feb 23, 2020 •

edited

Loading

burner1024 commented Feb 23, 2020 •

edited

Loading

burner1024 commented Feb 23, 2020 •

edited

Loading

burner1024 commented Jul 27, 2020 •

edited

Loading

burner1024 commented Jul 28, 2020 •

edited

Loading

burner1024 commented Jul 29, 2020 •

edited

Loading