Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How can I read Japanese text? #4

Open
uwodb opened this issue Sep 30, 2020 · 18 comments
Open

How can I read Japanese text? #4

uwodb opened this issue Sep 30, 2020 · 18 comments

Comments

@uwodb
Copy link

uwodb commented Sep 30, 2020

text.zip

Do you know what encoding it is used? I don't know
Can anyone help me with this? Thanks in advance :)

@galaxyhaxz
Copy link
Member

The entire Japanese character set is quite large, so they came up with a custom encoding to optimize it.

Check out the decompiled code here to get a general idea of how it works:
https://github.com/diabpsx/skeleton/blob/master/JAP_1998_05_29/DIABPSX/PSXSRC/KANJI.CPP

The .OUT file is the font data which contains only the characters used for the text. The .JAP language file is an optimized SHIFT-JIS that points to indices in that file.

The .LGH file is the file you want. It's the regular SHIFT-JIS encoded Japanese text before it was optimized. My guess is they left these on the disc by accident since they aren't used which is funny since it defeats the purpose of reducing file sizes.

All work related to the PSX version is being continued in the aforementioned link, pretty sure this repo is dead since I left.

@uwodb
Copy link
Author

uwodb commented Dec 8, 2020

WOW! This is what I'm looking for :) Thank you for your help.

@galaxyhaxz
Copy link
Member

There was a set of about 60 beta Japanese builds for the PSX sold at auction a few years ago. Don't forget to tell your friends that help is needed and ask if they know anything about the matter. Those discs could contain the source code or original assets, so I don't have to spend countless hours reversing this crap. We could have Kanji support sooner ;)

@uwodb
Copy link
Author

uwodb commented Dec 8, 2020

My goal is Korean & Japanese language support in devilutionX and I wanted to read the PSX's Japanese text. Thank you :)
diasurgical/devilution#1762
It seemed to me that they want to keep the original. So I give up using ttf and I'm using my customized bitmap font.

@AJenbo
Copy link
Member

AJenbo commented Dec 8, 2020

Please see diasurgical/devilutionX#66 for the latest status regarding translation and font handling in DevilutionX.

@galaxyhaxz
Copy link
Member

Thankfully we have fonts for both Korean and Japanese! See here: https://d2mods.info/forum/viewtopic.php?t=55894

Diablo 2 has pixel mapped fonts ready for color transforming, in the same sizes as d1! So 42, 30, 24, and 16 pixels! Diablo 2 uses utf8 iirc rather than shift jis.

@uwodb
Copy link
Author

uwodb commented Dec 10, 2020

Really? I didn't know what Diablo 2 used bitmap font. :(
Warcraft(1994) bitmap, ascii
Warcraft II(1995) bitmap, ascii
Diablo(1996) bitmap, ascii
Starcraft(1998) bitmap, ascii
Diablo II(2000) bitmap?, unicode?
Warcraft III(2002) ttf, unicode
I see...

@AJenbo
Copy link
Member

AJenbo commented Dec 10, 2020

D1 uses TTF for some UI elements.

@galaxyhaxz
Copy link
Member

Diablo 2 has the standard ascii bitmap font for most languages. It has a unicode font for japanese, korean, and chinese. There is also an ascii Russian and Polish font. On top of that it has some extra fonts like a small font and formal font for typing+UI.

Diablo 1 had a combination of pixel fonts and TTF. The developers outsourced the UI to Blizzard South who in turn got lazy and decided to use proprietary TTF for formal fonts instead of coming up with a proper font like they did in Diablo 2.

See below an example of things you can do with the 6pt small font from D2.
DIABLO_20201210_025658

@galaxyhaxz
Copy link
Member

Added font dumping tool. So just as I feared it looks like there isn't a way to directly translate the Japanese text files back. Since the special characters (0x8000+) just point to pixel data, one would either need a Japanese keyboard or have some sort of OCR that can remap them to standard SJIS/Unicode. Example: If the file has the character 0x8265, it references an address in MAINTXT.OUT which has a bit-based pixel font. Here is the output from the tool, with the pound symbol representing the pixels:

----- Id 34 (0x8265) -----
    #       
    #       
 ########   
   #        
   #        
  # #####   
  ##     #  
  #      #  
         #  
        #   
   #####    

@uwodb
Copy link
Author

uwodb commented Dec 10, 2020

Great. Translation of the PSX version is possible as well as translation of DevilutionX.

@galaxyhaxz
Copy link
Member

galaxyhaxz commented Mar 9, 2021

If anyone is still reading this, I'm in need of a Japanese speaker for assistance. I have a file that has all Japanese glyphs not mapped printed out like above, but I have no idea what they mean and need them typed out so I can map the Shift-JIS code to the binary.

Edit: there are about a total of 100 glyphs left. So it shouldn't take too long.

@uwodb
Copy link
Author

uwodb commented Mar 9, 2021

I will see what I can do :) Show me your file and I'll take a look

@galaxyhaxz
Copy link
Member

That's great @uwodb ! Below is a file with all the missing characters, printed out to look like pixels. I've already mapped the rest out with automation. Once we have these last things mapped we can convert the lore text back into text format.

Download: missing.txt

@AJenbo AJenbo closed this as completed Mar 10, 2021
@AJenbo AJenbo reopened this Mar 10, 2021
@uwodb
Copy link
Author

uwodb commented Mar 23, 2021

this results are not always accurate because it is not automated :(
9333=競
9369=協
938D=境
93D5=況
93F9=狭
940B=胸
942F=響
9453=凝
94BF=僅
94D1=緊
953D=屈
9585=係
95BB=形
95CD=掲
9603=継
9627=警
97C5=固
97D7=弧
97FB=互
9831=誤
9843=交
98C1=坑
98D3=拘
98E5=控
9951=郊
9A29=困
9AA7=鎖
9ACB=挫
9B13=砕
9B25=際
9B6D=策
9B7F=索
9B91=錯
9BA3=擦
9BEB=惨
9C45=刺
9CD5=市
9CF9=志
9D65=至
9E07=七
9E2B=嫉
9E3D=室
9ECD=釈
9F03=惹
9FC9=宗
A023=讐
A047=醜
A07D=従?
A0E9=瞬
A0FB=殉
A11F=巡
A179=諸
A1AF=序
A1C1=徐
A1F7=召
A251=尚
A275=晶
A2BD=証
A329=状
A35F=伸
A383=侵
A3B9=浸
A3DD=申
A46D=陣
A4B5=遂
A4D9=枢
A69B=節
A6E3=宣
A7BB=疎
A7DF=訴
A803=創
A815=双
A86F=巣
A8A5=窓
A8B7=総
A8C9=荘
A8FF=送
A96B=側
A97D=則
AA0D=孫
AA1F=尊
AA31=村
AA67=唾
AAD3=態
ABBD=担
AC05=弾
AC83=致
ACA7=秩
ACB9=着
AD7F=頂
ADA3=沈
AE33=廷
AE7B=徹
AE9F=展
AEB1=転
AEF9=堵
AF0B=塗
AF1D=妬
AFD1=統
B073=洞
B0A9=徳
B0DF=独
B235=波
B26B=廃
B2C5=薄
B2D7=迫
B30D=肌
B331=罰
B38B=繁
B3AF=卑
B3E5=比
B3F7=疲
B4AB=貧
B4E1=布
B53B=赴
B595=風
B5A7=副
B5DD=福
B649=奮
B67F=併
B757=奉
B7D5=飽
B7F9=妨
B853=謀
B865=貌
B877=貿
B8BF=摩
B9BB=務
B9F1=冥
BA39=盟
BA5D=鳴
BAA5=模
BAC9=猛
BB11=悶
BB7D=躍
BBD7=有
BBFB=裕
BC31=余
BC55=余
BCD3=踊
BCE5=遥
BD09=浴
BDAB=律
BE17=侶
BE3B=虜
BE71=糧
BEA7=臨
BEDD=令
BEEF=冷
BF7F=路
BFA3=弄
BFD9=論
C021=枠
C033=墟
C045=愕
C057=枷
C069=沐
C07B=狡
C08D=禍
C09F=瞞
C0B1=膠
C0C3=貪
C0D5=踪

@galaxyhaxz
Copy link
Member

Wow, incredible work @uwodb! Very thankful you typed these out, as it would have taken me forever fiddling with OCR software and the like. As a result, all but two characters are mapped and everything seems to be translating back correctly!

When you get the chance, could you have a second look at BC55 and C08D? BC55 appears to be a duplicate of BC31 but it looks a bit different.

Please find below all of the game's text restored back into Shift-JIS!!! Note that the lore section is missing those two characters and may have some slight errors, the other two should be perfect.

jap.zip.txt RENAME TO ZIP

@uwodb
Copy link
Author

uwodb commented Mar 23, 2021

Wow! Thanks for noticing.
BC31=余
BC55=幼
C08D=猾
What's the next plan?

@galaxyhaxz
Copy link
Member

I guess once translation support is complete the text can be used. I'm working on my own game engine, but progress is a bit slow. I'd anticipate you should have translation support soon in DevilutionX, though fonts are missing for asian languages. For now you can use the Diablo 2 versions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants