Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add some Honkai: Star Rail wwnames #15

Merged
merged 1 commit into from
Feb 24, 2024
Merged

Add some Honkai: Star Rail wwnames #15

merged 1 commit into from
Feb 24, 2024

Conversation

davispuh
Copy link
Contributor

Really need some better tools when working with such huge games...

Honkai: Star Rail consists of 194 .pck files taking ~45 GiB. Extracting those we get ~17k .bnk and ~48k .wem files.

From those looks like wwiser gets ~29k IDs. After trying bunch of things maybe I have gotten around ~10k correctly reversed - mainly starting with Ev_ and State.

But still there are a lot of wrong mappings and it needs to be cleaned up/fixed more. Also I think it's doable to get even more names.

Anyway I won't be working on this anymore so here it is :)

Sadly what I noticed that .bnk files are not created for Music*, External* and Streamed* folders which means there's huge amount of audio that's essentially undiscoverable because there doesn't seem any way to find/search.

I'm mostly interested in voice dialogs which are in External*. I tried using whisper speech recognition which transcribes quite well but it's incredibly slow and will take very long time... And even with text it's not clear who exactly said that. Another idea would be try using some speaker recognition model to narrow down searching but haven't tried that.

Anyway do you have any ideas how to map those .wem in External* to something useful? Really need some metadata because there's hours and hours of audio so not realistic to listen manually to find what you're looking for...

@bnnm
Copy link
Owner

bnnm commented Feb 19, 2024

Thanks, I think I could clean it up a bit and use words.py to extract some more but if you could re-generate the names using wwiser from latest commits (in case you didn't): wwiser.zip. I've fixed a few bugs recently that were including some unreversable IDs (mainly SFX).

I don't have a decent way to handle huge games but probably best would be understanding and trying to mimic how the game handles pcs/bnk, as it would be a hassle for them too unless streamlined somehow, I'm not familiar with the game though. An idea I have in mind is loading .pck directly and read all .bnk inside, not sure if would really help your case.

For External/Stream/etc dirs I assume they are using the "externals" feature, see here, where a single event may swap .wem on the fly. From your list they are probably events like Ev_vo_external_archive_play. wwiser can handle this more or less with some manual fiddling, but can't autodetect external-event<>dir/files because (AFAIK) this is done in game's code and is outside what Wwise can see. So mostly the same as manually playing .wem in the dir with vgmstream, and the few named events won't help you much. The game probably has a database somewhere with this external-event<>ids mapping though, so finding that is probably what you'd need.

@davispuh
Copy link
Contributor Author

davispuh commented Feb 24, 2024

I already used latest Git master version. I created names like this

$ find . -type f -iname '*.pck' -print0 | xargs -0 -I!!! -n 1 quickbms wwiser-utils/scripts/wwise_pck_extractor.bms !!! ./extracted/
$ wwiser.py -sl -r '**/*.bnk'

Can't use shell's expansion because you run out of ARGV space due to so many files.

I also tried using words.py aswell but it found only wrong names nothing new that would be correct but I didn't played much with that so yeah definitely could get more with some proper format list.

Anyway I went down this rabbit hole and looked into how game works. It's Unity game that uses IL2CPP to compile C# into C++. Generally most of these games can be decompiled and reversed pretty easily but Star Rail uses a lot of protections and encrypts a lot of stuff so none of public Unity extractors/decompilers work. But I found that there are some people who have managed to extract game's files - https://github.com/Dimbreath/StarRailData/ . Unfortunately they don't reveal how they did it but my guess would be by loading their DLL at runtime that invokes game's functions to get data back.

For now we can just ignore first step and use their already extracted data directly. So if we look there we can find mission files like Config/Level/Mission/1030601/Act/Act103060123.json and inside we can see

{
  "OnInitSequece": [],
  "OnStartSequece": [
    {
      "TaskList": [
        ...
        {
          "$type": "RPG.GameCore.PlayAndWaitSimpleTalk",
          "BlackMask": true,
          "SimpleTalkList": [
            {
              "TalkSentenceID": 103060481,
              "ProtectTime": 0.3
            },
            {
              "TalkSentenceID": 103060482,
              "ProtectTime": 0.3
            },
            {
              "TalkSentenceID": 103060483,
              "ProtectTime": 0.3
            },
            {
              "TalkSentenceID": 103060484,
              "ProtectTime": 0.3
            },
            {
              "TalkSentenceID": 103060485,
              "ProtectTime": 0.3
            },
            {
              "TalkSentenceID": 103060486,
              "ProtectTime": 0.3
            },
            {
              "TalkSentenceID": 103060487,
              "ProtectTime": 0.3
            }
          ]
        },
...

Next we can look at ExcelOutput/TalkSentenceConfig.json and there we see:

{
  ...
  "103060481": {
    "TalkSentenceID": 103060481,
    "TextmapTalkSentenceName": {
      "Hash": 371857150
    },
    "TalkSentenceText": {
      "Hash": 1291295548
    }
  },
  "103060482": {
    "TalkSentenceID": 103060482,
    "TextmapTalkSentenceName": {
      "Hash": 371857150
    },
    "TalkSentenceText": {
      "Hash": 1291295549
    }
  },
  "103060483": {
    "TalkSentenceID": 103060483,
    "TextmapTalkSentenceName": {
      "Hash": 371857150
    },
    "TalkSentenceText": {
      "Hash": 1291295550
    }
  },
  "103060484": {
    "TalkSentenceID": 103060484,
    "VoiceID": 103060484,
    "TextmapTalkSentenceName": {
      "Hash": 2092232028
    },
    "TalkSentenceText": {
      "Hash": 1291295543
    }
  },
  "103060485": {
    "TalkSentenceID": 103060485,
    "VoiceID": 103060485,
    "TextmapTalkSentenceName": {
      "Hash": 2092232028
    },
    "TalkSentenceText": {
      "Hash": 1291295544
    }
  },
  "103060486": {
    "TalkSentenceID": 103060486,
    "TextmapTalkSentenceName": {
      "Hash": 371857150
    },
    "TalkSentenceText": {
      "Hash": 1291295545
    }
  },
  "103060487": {
    "TalkSentenceID": 103060487,
    "VoiceID": 103060487,
    "TextmapTalkSentenceName": {
      "Hash": 2092232028
    },
    "TalkSentenceText": {
      "Hash": 1291295546
    }
  },
...

Now we can use TalkSentenceID from previous step and match against entry from here. This gives us VoiceID and Hash
Next we look at ExcelOutput/VoiceConfig.json and there we see

{
  ...
  "103060484": {
    "VoiceID": 103060484,
    "VoicePath": "chapter3_5_firefly_104"
  },
  "103060485": {
    "VoiceID": 103060485,
    "VoicePath": "chapter3_5_firefly_105"
  },
...

So this allows us to match VoiceID to VoicePath which is name that is used to find external audio. Note that we can't use this path directly but instead we need to prepend Language (eg. English) + /voice/ and append .wem. So real path is English/voice/chapter3_5_firefly_104.wem which can be 64-bit FNV-1 hashed to 7061846086923521376 (0x6200B799CD8F0D60) which is the ExternalID as used in StarRail_Data/Persistent/Audio/AudioPackage/Windows/English/External12.pck.

Unfortunately if we use wwise_pck_extractor.bms we lose ExternalID so we can't map it back. But I found someone has already created tool that can correctly extract from .pck files while keeping ExternalID - https://github.com/RazTools/Audio and that works incredibly well :)

Also by the way, regarding that Hash, we can map those from TextMap/TextMapEN.json. But it's some other hash and not FNV.

{
  ...
  "371857150": "",
  "1291295548": "...",
  "2092232028": "Firefly",
  "1291295543": "In my dream, I saw a scorched earth, and a new sapling emerging from it. It bloomed against the morning sun, and whispered to me.",
  "1291295544": "Why do people choose to sleep? I think...",
  "1291295546": "...It is because they're afraid to awaken from the dream.",
  ...

So if we want to find corresponding audio file we just need to do this in reverse and start from TextMapEN.json => TalkSentenceConfig.json => VoiceConfig.json

As for other sounds, in various files can find keys like SoundName and SpecialHitSoundEvent which can be directly hashed to 32-bit FNV-1 and no need to change it in any way. Now I have updated this PR with all such names extracted from that repo. But note that we still don't get all names because that repo is incomplete and doesn't contain everything and also some names need interpolation which I didn't bother with.

For example in Config/AudioConfig.json we can see

{
  ...
  "JoinTeamWithSpecialTeamate": "Ev_vo_avatar_addtoteam_to{0}_{1}",
  ...

@bnnm bnnm merged commit 246b3ef into bnnm:master Feb 24, 2024
@bnnm
Copy link
Owner

bnnm commented Feb 24, 2024

Thanks, added and updated the list with names from words.py + this. It's a bit involved to use (best explanation I could come with) but will get you many names when used properly.

wwise_pck_extractor.bms should support externals (here) so it may be a bug, if you could post some sample that doesn't properly extract them and some expected ID.

@davispuh
Copy link
Contributor Author

Awesome, thanks! It looks really great! And yeah I had read that and tried it a bit but didn't spend much time on that, it's very involved 😄

wwise_pck_extractor.bms should support externals (here) so it may be a bug, if you could post some sample that doesn't properly extract them and some expected ID.

Indeed it works correctly. I see it did extract 6200b799cd8f0d60.wem. Previously for some reason I didn't notice it.

@davispuh davispuh deleted the rail branch February 24, 2024 22:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants