Skip to content

Docs and Thoughts

Youjose edited this page Aug 3, 2023 · 8 revisions

Thoughts

Going to put my thoughts here, as well as progress and more additional info about these formats. So this library is still WIP, and well, it's badly coded, I am new to this so you will see a lot of weird stuff in the code and whatnot. As for the codecs themselves, I plan to add more, to cover all Criware formats, from old to new if I have the resources and the time. I got tired from using multiple tools to get things managed, so I am trying to compile them in one place using my limited knowledge.

Codecs and Options

ADX

In fact this is more or less ideal as I want it, it only supports ADX 1 (Same as AHX?) but it can encode as well as decode using a C++ extension. It's now mostly complete, the only things left are very minor, which are:

  • ADX Encryption and Decryption
  • AHX support. Which I might or might not do. Main use for this is to make ADX's to build USM files.

Now for the codec it self: The header is fairly simple, at least for the bare level.

struct AdxHeader{
    unsigned short Signature;         /* 0x8000 */
    unsigned short DataOffset;        /* Good to standardize, but made variable here for generic purposes. */
    unsigned char  EncodingMode;      /* Fixed = 2, Linear = 3, Exponential = 4 */
    unsigned char  BlockSize;         /* 0x12 standardized in ADXPlay, but made variable in this encoder and will pad the PCM data if needed, change with care. */
    unsigned char  BitDepth;          /* Fixed at 4 in ADXPlay, but made variable here as long as it fits into BlockSize, change with care. */
    unsigned char  Channels;          /* Same as input PCM */
    unsigned int   SampleRate;        /* Same as input PCM */
    unsigned int   SampleCount;       /* Same as input PCM, calculated for per channel in ADX. */
    unsigned short HighpassFrequency; /* Cutoff Frequency. Not sure what it does, saw some say it alleviates noise when compressing. */
    unsigned char  AdxVersion;        /* 3, 4, or 5. 3 is an old version, 4 is most used, 5 does not support Looping it seems. */
    unsigned char  Flag;              /* 8 or 9 for encryption, unsupported if so. AHX can also be encrypted. */
};

After that the Hist values or looping info comes right after depending on the ADX version. The code that I shamelessly copied (then modified) from Nyagamon/bnnm CRID extractor for this to work, managed to make it pretty good. It supports encoding for any blocksize and channel count given the correct wav file (didn't test it) and it also decodes any given blocksize, unlike VGMstream which surprised me when I noticed that it doesn't. But it still doesn't support arbitrary bitdepths, and now that I think about it, it might not be worth it to change it, as it's not of any use for modifications.

As for my lib, you can specify some other stuff when encoding.

def encode(data: bytes, BitDepth = 0x4, Blocksize = 0x12, Encoding = 3, AdxVersion = 0x4, Highpass_Frequency = 0x1F4, Filter = 0, force_not_looping = False) -> bytes

As you can see, you can specify BitDepth, Blocksize, and AdxVersion within the limits. It works for all encoding version, but still no encryption/decryption support. The code might crash a lot, if it does report to me, possibly due to WAV files having extra data in the header.

CPK

Finished this one recently, but it took so long thanks to building and standardizing the @UTF chunk parser to Donmai's WannaCri, my code is insanely bad, so I advise you not to look at it if possible.

But CPK extraction and building is now supported for CpkMode 0, 1, 2, and 3. CPK's are made of "tables" and "Contents", in which each table has a @UTF chunk inside, there are up until now, 7 known tables:

class CPKChunkHeaderType(Enum):
    CPK   = b"CPK "
    TOC   = b"TOC "
    ITOC  = b"ITOC"
    ETOC  = b"ETOC"
    GTOC  = b"GTOC"
    HTOC  = b"HTOC"
    HGTOC = b"HGTOC"

and their data is in little endian (contrary to the rest of file formats used in Criware formats). The header is 16 bytes long, containing possibly 4 elements

struct CpkHeader{
    unsigned int header;
    unsigned int encflag;
    unsigned int packet_size;
    unsigned int unk0C;
};

Right after those tables, follows a @UTF chunk which has the needed information.

CpkMode 0 is the rarest I could find, which "Nichijou - Uchujin" uses, and only uses an ITOC table, so files are sorted by size and name with no filenames to match.

CpkMode of 1 has just a TOC, found this in one of "Tekken 7"'s CPKs.

CpkMode of 2 has both a TOC and an ITOC, with an ETOC at the end as well.

CpkMode of 3 includes a TOC, GTOC and an ETOC. I still haven't found a game which uses Hgtoc or an Htoc, perhaps modes 4 and 5 if they exist.

This lib also support CIRLAYLA decompression taken from code made by tpu (and modified greatly), it does not support CRILAYLA compression yet when building a CPK file.

UTF

This was pain to code, and is abysmally bad on my code, however it works. Given a UTF file or UTF chunk bytes, the PyCriCodecs.UTF class will parse it in two different ways.

One way used internally by my CPK tools (needs to be scrapped) and the other is a standardized array of dictionaries used to build custom UTF table.

The standardization comes from WannaCri's lib (Although my approach is largely different) which has a list of dictionaries, each dictionary will have key, and value of a tuple;

CpkHeader = [
                {
                    "UpdateDateTime": (UTFTypeValues.ullong, 0),
                    "ContentOffset": (UTFTypeValues.ullong, ContentOffset),
                    "ContentSize": (UTFTypeValues.ullong, ContentSize),
                    "TocOffset": (UTFTypeValues.ullong, 0x800),
                    # etc....
                },
                {
                    ....
                }
            ]

This format allows us to parse any data into a UTF table for possible use of mod-ing these files. I use this one to build the CPK archives, and possible the USM, ACB/AWB files as well. You can do it fairly simple as well:

from PyCriCodecs import *
# Parsing UTF
data = UTF("filename_or_bytes.dat")
data.table # This will give you the internal parsed table that is used by my CPK lib.
payload = data.get_payload() # This will return a list of dictionaries which can be used to build UTF tables, you can modify them as well.
data.encoding # Returns the type of encoding of the UTF data, although not specified in the UTF table, the code will try all 3 possible encodings.

You can also use these payloads or make your own to build UTF data as well:

def __init__(self, dictarray: list[dict], encrypt: bool = False, encoding: str = "utf-8", table_name: str = "PyCriCodecs_table") -> None:

As seen above, the UTFBuilder class only needs a list of dictionaries to build any table, ideally you would want to put your table_name as well The encrypt option is to encrypt the UTF as some games have that feature, it's fairly simple to implement, so why not. The encoding option is to encode the strings given to the UTF data, in which they can be either UTF-8, shift-jis, or UTF-16, although using these 3, the code will check if they produce any null bytes.

utfObj = UTFBuilder(payload, table_name="my_table")
utfObj.parse() # returns a bytearray of the UTF binary data to use.

ACB/AWB, HCA and USM

The README says it all really, but USM and ACB/AWB building is still left, although not hard, but adding HCA support took most of my time recently. USM is simple, UTF chunks for the header, SFV and SFA, as well as any other chunk, some chunks are actually just UTF's, like CUE. Besides that, chunking the data is the hardest part, the metadata within can be random and it seems USM's will still play.

ACB and AWB, AWB are just containers, they can technically contain any type of file, but they are used for Audio files, mainly (only?) HCA's.

HCA, took me a lot to implement despite only copy pasting them from VGMStream and VGAudio and editing them slightly, was determined to understand the codec though, and I kind of get it, but still advanced for me.

Extensive TODO List:

  • Better USM building.
  • ACB correct extraction.
  • ACB/AWB building.
  • ADX Encryption and Decryption.
  • AHX codec. (probably won't do it, it's MPEG Layer II with custom Frame size.)
  • SofDec 1 (Old USM's), should be easy?
  • Port everything into C/C++ except, so this lib can be both used as C/C++ and within python.
  • and after all of that, a GUI, since the normal user can't be bothered with learning Python.

Closing thoughts.

That's all for now, will update this wiki with the C++-API usage as well.

Clone this wiki locally