Fixed parsing File offsets / sizes to make all files decode properly #4

JanFrederick00 · 2022-09-25T19:41:21Z

I noticed that some files were just a garbled binary mess.
This was the result of an incorrectly detected file length (used as the starting value for the decoding).

I had a look at the way the file offsets / lengths are read from the index block. I noticed, that this probably was not the intended way to read these files.
(Apologies if other people have already figured this out before me, i couldn't find anything).

I hypothesized, that what was being read (the List of offsets to strings) was only a table to be referenced by index later.
I searched for the binary representation of the number of entries in the file and found a matching value at offset 0x14.

Theory: the following bytes must contain the entries' actual descriptions.
The records offsets section for the file Weird.ggback4a was significantly shorter than the one from Weird.ggpack1a which I was using.
This pattern of data repeats every 0x15 bytes: 02 03 00 00 01 00 04. The rest of the Data is similar, but differs from File to File.

I printed out the list of Strings contained in this File:
[0]: "files"
[1]: "filename"
[2]: "MasterBank.strings.bank"
[3]: "offset"
[4]: "16"
[5]: "size"
[6]: "54282"
[7]: "MasterBank.bank"
[8]: "54304"
[9]: "5416174"
[10]: "guid"
[11]: "b554baf88ff004c50cc0214575794b8c"

If my Theory was correct that every 0x15 byte entry contained some sort of Dictionary, each one must contain references to the following strings:
File 0: "filename", "MasterBank.strings.bank", "offset", "16", "size", "54282"
File 1: "filename", "MasterBank.bank", "offset", "54304", "size", "5416174"
or - by index:
File 0: 1, 2, 3, 4, 5, 6
File 1: 1, 7, 3, 8, 5, 9

The actual pattern of bytes was:
File 0: 02 03 00 00 00 01 00 04 02 00 03 00 05 04 00 05 00 05 06 00 02
File 1: 02 03 00 00 00 01 00 04 07 00 03 00 05 08 00 05 00 05 09 00 02

Theory: The indices are stored as 16-bit numbers to save space.
I therefore grouped thogether the values that matched the expected numbers with the 0x00 after them.

File 0: 02 03 00 00 00 0100 04 0200 0300 05 0400 0500 05 0600 02
File 1: 02 03 00 00 00 0100 04 0700 0300 05 0800 0500 05 0900 02

My final Theory is that a file entry is structured as follows:

byte 0x02 (purpose unknown)
uint32 NumberOfKeyValuePairs
KeyValuePair * NumberOfKeyValuePairs
- where each KeyValuePair is structured like this:
uint16 String list index of the key
uint8 unknown - possibly the data type (0x04 between "filename" and MasterBank.bank", 0x05 between "offset" and "54304")
uint16 String list index of the value
byte 0x02 (purpose unknown)

I have implemented this method and the files that were previously garbled are now correct (for example Credits_en.txt from Weird.ggpack1a).
This only applies to RtMI for now, as I have not yet tested whether or not Thimbleweed Park uses the same format.

Further questions:

Where is "guid" = "b554baf88ff004c50cc0214575794b8c" referenced in the File?
I suspect the 0x00000001 at offset 4 references the string "files", as this is otherwise not used.

(Sorry for spamming pull request lately. )

…bleweed-Park-Explorer into RtMI-Support

JanFrederick00 · 2022-09-25T19:49:54Z

A particularly good example are the .lip-files as many of them are the same size and therefore do not contain duplicate string values in the string list.
I think this should also fix the graphics that couldn't be decompressed previously (but not the ones that appear blank).

bgbennyboy · 2022-09-25T19:57:41Z

Please dont call this spam, its brilliant!
My reading of the file records for Thimbleweed was always wonky. The idea that some entries were missing was very dodgy and my 'temporary' solution was a massive hack. I think that others have since figured out the format completely but I never got around to going back and updating it.
I'm sorry I'm not more pro-actively engaged in this, I'm really busy with work at the moment and I really appreciate the pull requests.

JanFrederick00 · 2022-09-25T20:11:32Z

Interestingly, these Tools don't seem to work (at least the json tool does not) wit RtMI's Files.
I tested my Code with TWPs files, where a few extra bytes seem to be present in the header somewhere (I think there are two before the number of Files in the dictionary - good thing this change only applies to RtMI).

JanFrederick00 · 2022-09-26T14:23:01Z

I have created a pull request on that other repo, it should now also be able to open RtMI's files.
I was able to decode the .json files from RtMI - they seem to be created using TexturePacker (an url to the website was included in the first file I tried).
They seem to have changed the GGDict-format so it uses 16-bit string indices, which is why my test with Thimbleweed Park's files failed yesterday.

bgbennyboy · 2022-09-26T18:29:38Z

Great job again :)
I haven't had much time but I've got audio extraction with the .bank files working manually using the dumper I wrote for my Telltale programs. I'll hopefully add that in this weekend and then its time to decide on a new name for the program. "Grumpy Explorer" or "Terrible Toolbox Explorer" are both possibilities.

JanFrederick00 added 2 commits September 25, 2022 21:39

Changed the parsing of the pack files to make all Files decode properly

885211e

Merge branch 'RtMI-Support' of https://github.com/JanFrederick00/Thim…

3684b39

…bleweed-Park-Explorer into RtMI-Support

JanFrederick00 changed the title ~~Rt mi support~~ Fixed parsing File offsets / sizes to make all files decode properly Sep 25, 2022

bgbennyboy merged commit 37a2a19 into bgbennyboy:master Sep 25, 2022

JanFrederick00 mentioned this pull request Oct 3, 2022

Decide on a new name and upload a release #13

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed parsing File offsets / sizes to make all files decode properly #4

Fixed parsing File offsets / sizes to make all files decode properly #4

JanFrederick00 commented Sep 25, 2022

JanFrederick00 commented Sep 25, 2022

bgbennyboy commented Sep 25, 2022

JanFrederick00 commented Sep 25, 2022

JanFrederick00 commented Sep 26, 2022

bgbennyboy commented Sep 26, 2022

Fixed parsing File offsets / sizes to make all files decode properly #4

Fixed parsing File offsets / sizes to make all files decode properly #4

Conversation

JanFrederick00 commented Sep 25, 2022

JanFrederick00 commented Sep 25, 2022

bgbennyboy commented Sep 25, 2022

JanFrederick00 commented Sep 25, 2022

JanFrederick00 commented Sep 26, 2022

bgbennyboy commented Sep 26, 2022