Fixed parsing File offsets / sizes to make all files decode properly #4
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I noticed that some files were just a garbled binary mess.
This was the result of an incorrectly detected file length (used as the starting value for the decoding).
I had a look at the way the file offsets / lengths are read from the index block. I noticed, that this probably was not the intended way to read these files.
(Apologies if other people have already figured this out before me, i couldn't find anything).
I hypothesized, that what was being read (the List of offsets to strings) was only a table to be referenced by index later.
I searched for the binary representation of the number of entries in the file and found a matching value at offset 0x14.
Theory: the following bytes must contain the entries' actual descriptions.
The records offsets section for the file Weird.ggback4a was significantly shorter than the one from Weird.ggpack1a which I was using.
This pattern of data repeats every 0x15 bytes: 02 03 00 00 01 00 04. The rest of the Data is similar, but differs from File to File.
I printed out the list of Strings contained in this File:
[0]: "files"
[1]: "filename"
[2]: "MasterBank.strings.bank"
[3]: "offset"
[4]: "16"
[5]: "size"
[6]: "54282"
[7]: "MasterBank.bank"
[8]: "54304"
[9]: "5416174"
[10]: "guid"
[11]: "b554baf88ff004c50cc0214575794b8c"
If my Theory was correct that every 0x15 byte entry contained some sort of Dictionary, each one must contain references to the following strings:
File 0: "filename", "MasterBank.strings.bank", "offset", "16", "size", "54282"
File 1: "filename", "MasterBank.bank", "offset", "54304", "size", "5416174"
or - by index:
File 0: 1, 2, 3, 4, 5, 6
File 1: 1, 7, 3, 8, 5, 9
The actual pattern of bytes was:
File 0: 02 03 00 00 00 01 00 04 02 00 03 00 05 04 00 05 00 05 06 00 02
File 1: 02 03 00 00 00 01 00 04 07 00 03 00 05 08 00 05 00 05 09 00 02
Theory: The indices are stored as 16-bit numbers to save space.
I therefore grouped thogether the values that matched the expected numbers with the 0x00 after them.
File 0: 02 03 00 00 00 0100 04 0200 0300 05 0400 0500 05 0600 02
File 1: 02 03 00 00 00 0100 04 0700 0300 05 0800 0500 05 0900 02
My final Theory is that a file entry is structured as follows:
byte 0x02 (purpose unknown)
uint32 NumberOfKeyValuePairs
KeyValuePair * NumberOfKeyValuePairs
- where each KeyValuePair is structured like this:
uint16 String list index of the key
uint8 unknown - possibly the data type (0x04 between "filename" and MasterBank.bank", 0x05 between "offset" and "54304")
uint16 String list index of the value
byte 0x02 (purpose unknown)
I have implemented this method and the files that were previously garbled are now correct (for example Credits_en.txt from Weird.ggpack1a).
This only applies to RtMI for now, as I have not yet tested whether or not Thimbleweed Park uses the same format.
Further questions:
(Sorry for spamming pull request lately. )