Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dviasm and python 3 #7

Closed
u-fischer opened this issue May 25, 2019 · 23 comments
Closed

dviasm and python 3 #7

u-fischer opened this issue May 25, 2019 · 23 comments

Comments

@u-fischer
Copy link

@u-fischer u-fischer commented May 25, 2019

dviasm doesn't work with python 3. Would it be possible to update it or add a python 3 variant?

@khaledhosny
Copy link
Contributor

@khaledhosny khaledhosny commented May 25, 2019

I tried that a while ago, but ran into several problems (the code is doing lots of fiddling with bytes, and the Python 3 distinctions between bytes and strings requires changes everywhere).

@u-fischer
Copy link
Author

@u-fischer u-fischer commented Nov 12, 2019

On the texlive mailing list, Norbert just commented that python 2 will be deprecated at the end of the year and that a number of distribuations will remove it.

https://tug.org/pipermail/tex-live/2019-November/044366.html

@khaledhosny
Copy link
Contributor

@khaledhosny khaledhosny commented Nov 12, 2019

Some one has to step forward and do the port, or we might just retire this tool.

@khaledhosny
Copy link
Contributor

@khaledhosny khaledhosny commented Nov 13, 2019

Fixed with #8, I’ll upload to CTAN sometime next week.

@norbusan
Copy link

@norbusan norbusan commented Nov 14, 2019

Hi @khaledhosny @reutenauer
thanks for the Py3 updates, but I still see problems in ptex mode:

$ dviasm -p -o /dev/null ptex-sample.dvi
traceback (most recent call last):
  File "./dviasm.py", line 1214, in <module>
    if options.output: aDVI.Dump(options.output, tabsize=options.tabsize, encoding=options.encoding)
  File "./dviasm.py", line 882, in Dump
    self.DumpToFile(fp, tabsize=tabsize, encoding=encoding)
  File "./dviasm.py", line 937, in DumpToFile
    fp.write("set: %s\n" % PutStr(cmd[1]))
  File "./dviasm.py", line 230, in PutStrUTF8
    s += b''.join([b'\x1b$B', bytes.fromhex('%02x%02x' % (o//256, o%256))]).decode('iso2022-jp')
UnicodeDecodeError: 'iso2022_jp' codec can't decode byte 0xff in position 3: illegal multibyte sequence

Interestingly, the very same file works when I run dviasm without -p

The input is I guess irrelevant, but here it is:

\ifdefined\ucs
  \documentclass[uplatex]{jsarticle}
\else
  \documentclass{jsarticle}
\fi
\begin{document}
\LARGE
\vbox{\hsize=20zw\noindent\kanjiskip=0pt
12345678901234567890
逢芦飴溢茨鰯淫迂厩噂餌襖迦牙廻恢晦蟹葛鞄
釜翰翫徽祇汲灸笈卿饗僅喰櫛屑粂祁隙倦捲牽
鍵諺巷梗膏鵠甑叉榊薩鯖錆鮫餐杓灼酋楯薯藷
哨鞘杖蝕訊逗摺撰煎煽穿箭詮噌遡揃遜腿蛸辿
樽歎註瀦捗槌鎚辻挺鄭擢溺兎堵屠賭瀞遁謎灘
楢禰牌這秤駁箸叛挽誹樋稗逼謬豹廟瀕斧蔽瞥
蔑篇娩鞭庖蓬鱒迄儲餅籾爺鑓愈猷漣煉簾榔屢
冤叟咬嘲囀徘扁棘橙狡甕甦疼祟竈筵篝腱艘芒
虔蜃蠅訝靄靱騙鴉}
\end{document}

@khaledhosny
Copy link
Contributor

@khaledhosny khaledhosny commented Nov 14, 2019

Should be fixed now (I hope).

@norbusan
Copy link

@norbusan norbusan commented Nov 14, 2019

Hmmm ...

$ ./dviasm.py -p -o /dev/null ~/foo/ptex-sample.dvi 
Traceback (most recent call last):
  File "./dviasm.py", line 1214, in <module>
    if options.output: aDVI.Dump(options.output, tabsize=options.tabsize, encoding=options.encoding)
  File "./dviasm.py", line 882, in Dump
    self.DumpToFile(fp, tabsize=tabsize, encoding=encoding)
  File "./dviasm.py", line 937, in DumpToFile
    fp.write("set: %s\n" % PutStr(cmd[1]))
  File "./dviasm.py", line 230, in PutStrUTF8
    s += bytes.fromhex("1b 24 42 %02x %02x" % (o//256, o%256)).decode('iso2022-jp')
UnicodeDecodeError: 'iso2022_jp' codec can't decode byte 0xff in position 3: illegal multibyte sequence

@reutenauer
Copy link
Contributor

@reutenauer reutenauer commented Nov 14, 2019

I can reproduce that, but then again the current dviasm crashes on it too, with the same error message. I suspect your document is UTF-8-encoded, and you’re not supposed to use -p on it.

Which is not to say that we shouldn’t try to fix the bug, of course. But the conversion to Python 3 seems very faithful :-)

@aminophen
Copy link
Owner

@aminophen aminophen commented Nov 14, 2019

When compiling with upTeX, you should NOT use -p option; it's only for pTeX.

(As a maintainer of pTeX, sorry about not providing way of detecting whether a DVI is generated by pTeX or by upTeX.)

@aminophen
Copy link
Owner

@aminophen aminophen commented Nov 14, 2019

By the way, I cannot re-compile the dumped text back into a new DVI.

% hello.tex
\font\x=ec-lmr10\x Hello, \TeX!\bye

OK with the current python2 version

$ tex hello
$ dviasm hello.dvi >hello-o.txt
$ dviasm hello-o.txt -o hello-o.dvi

Error with python3 version

$ ./dviasm.py hello.dvi >hello-n.txt
$ ./dviasm.py hello-n.txt -o hello-n.dvi
Traceback (most recent call last):
  File "./dviasm.py", line 1217, in <module>
    aDVI.Parse(args[0], encoding=options.encoding)
  File "./dviasm.py", line 713, in Parse
    self.ParseFromString(s, encoding=encoding)
  File "./dviasm.py", line 721, in ParseFromString
    for l in s.split('\n'):
TypeError: a bytes-like object is required, not 'str'

@reutenauer
Copy link
Contributor

@reutenauer reutenauer commented Nov 14, 2019

@aminophen I’m really sorry, I didn’t even test that at all. Hence it’s not suprising that it fails. Let me have a look now (and catch the UnicodeDecodeError at the same time).

@reutenauer
Copy link
Contributor

@reutenauer reutenauer commented Nov 14, 2019

Can you try the file from my master branch now? It works on your example although I note that the “undumped” DVI file is not identical to the original one. But the result is the same as with the Python 2 version.

@norbusan
Copy link

@norbusan norbusan commented Nov 14, 2019

@reutenauer yes, that version works without any problem!

@aminophen
Copy link
Owner

@aminophen aminophen commented Nov 15, 2019

@reutenauer Testing your master branch, I noticed two issues:

  1. It prints some diagnostics during DVI compilation, which was not present in py2:
$ ../dviasm.py hello-ja.dvi >hello-ja-3.txt
$ ../dviasm.py -o hello-ja-3.dvi hello-ja-3.txt
 TeX output 2019.11.13:1233
False
0 b'\x00'
243
False
0 b'\x01'
243
False
0 b'\x00'
243
False
0 b'\x01'
243
  1. DVI compilation cannot be completed with -p option:
$ ../dviasm.py -p hello-ja-iso2022-jp.dvi >hello-ja-iso2022-jp-3.txt
$ ../dviasm.py -p -o hello-ja-iso2022-jp-3.dvi hello-ja-iso2022-jp-3.txt
Traceback (most recent call last):
  File "../dviasm.py", line 1228, in <module>
    aDVI.Parse(args[0], encoding=options.encoding)
  File "../dviasm.py", line 724, in Parse
    self.ParseFromString(s, encoding=encoding)
  File "../dviasm.py", line 791, in ParseFromString
    ol = GetStr(val)
  File "../dviasm.py", line 193, in GetStrUTF8
    if is_ptex: return [UCS2toJIS(c) for c in t]
  File "../dviasm.py", line 193, in <listcomp>
    if is_ptex: return [UCS2toJIS(c) for c in t]
  File "../dviasm.py", line 188, in UCS2toJIS
    else:           return (ord(s[3]) << 8) + ord(s[4])
TypeError: ord() expected string of length 1, but int found

@reutenauer
Copy link
Contributor

@reutenauer reutenauer commented Nov 15, 2019

Thanks @aminophen. Better now?

@aminophen
Copy link
Owner

@aminophen aminophen commented Nov 15, 2019

Thanks @reutenauer, now it seems ok on pTeX.

I noticed another problem on XeTeX: please find the file "native.xdv" available here. Converting XDV -> Text, I see extra "b" in the beginning of xxx and fnt, which wasn't there in py2:

34c34
< xxx: 'pdf:pagesize default'
---
> xxx: b'pdf:pagesize default'
43c43
<       fnt: "c:/w32tex/share/texmf-dist/fonts/opentype/public/tex-gyre/texgyretermes-regular.otf:color=220022FF" at 10pt
---
>       fnt: "b'c:/w32tex/share/texmf-dist/fonts/opentype/public/tex-gyre/texgyretermes-regular.otf':color=220022FF" at 10pt

@reutenauer
Copy link
Contributor

@reutenauer reutenauer commented Nov 15, 2019

Thanks @aminophen, I can reproduce that too. The b means that it’s a Python 3 bytes instead of a str object that was dumped. It should be fixed now (still in my repo), as well as the compilation stage that didn’t work either.

@aminophen
Copy link
Owner

@aminophen aminophen commented Nov 15, 2019

Thanks @reutenauer, XDV decode and compilation become OK:

./dviasm.py native.xdv >native-3.txt
./dviasm.py native-3.txt -o native-3.xdv

Lots of diagnostics are printed during compilation, it would be nice to silence them.


Good news: the py2 version could not decode U+10000 or larger code point:

ValueError: unichr() arg not in range(0x10000) (narrow Python build)

However, the py3 version works just fine with

$ cat overbmp.tex
%#!uptex
𠮷野家% U+20BB7
\bye
$ uptex overbmp
$ ./dviasm.py overbmp.dvi >overbmp-3.txt
$ ./dviasm.py overbmp-3.txt -o overbmp-3.dvi

@aminophen
Copy link
Owner

@aminophen aminophen commented Nov 15, 2019

One more proposal:

The help message contains

  -p, --ptex            extended DVI for Japanese pTeX

but it may be misleading for upTeX users who should not use this option. it should be

  -p, --ptex            ISO-2022-JP encoded DVI for Japanese pTeX

@aminophen
Copy link
Owner

@aminophen aminophen commented Nov 16, 2019

checked bac1a75
Improved help message and remove debug printouts.

@reutenauer Thanks.

I noticed another problem: this was present also in py2 version.

Save the following as "uptate.tex"

%#!uptex
\shipout\vbox{\hbox{\dtou 吉野家}\hbox{\yoko 吉野家}\hbox{\tate 吉野家}}
\bye
$ uptex uptate
$ ./dviasm.py uptate.dvi >uptate-3.txt
$ ./dviasm.py uptate-3.txt -o uptate-3.dvi
$ dvipdfmx uptate-3.dvi
uptate-3.dvi -> uptate-3.pdf

dvipdfmx:fatal: DVI opcode 255 only valid for Ascii pTeX

No output PDF file written.

The resulting "uptate-3.dvi" is invalid, since the id byte is 2 instead of 3. Please refer to "ptex-guide-en.pdf" (texdoc ptex-guide-en).

In addition, pTEX/upTEX defines one additional DVI command.

  • dir (255): Used to change directions of text alignment.

The DVI format in the preamble is always set to 2, as with TEX82. On the other hand, the DVI
ID in the postamble can be special. Normally it is set to 2, as with TEX82; however, when dir
(255) appears at least once in a single pTEX/upTEX DVI, the post_post table of postamble
contains ID = 3.

@khaledhosny
Copy link
Contributor

@khaledhosny khaledhosny commented Nov 16, 2019

I’ll wait a few more days then push to CTAN.

@aminophen
Copy link
Owner

@aminophen aminophen commented Nov 16, 2019

@khaledhosny @reutenauer Thanks, all test files I can conceive of are fine now.

@khaledhosny
Copy link
Contributor

@khaledhosny khaledhosny commented Nov 26, 2019

I uploaded the new release to CTAN.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants