Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected error on WHILE(<FILE>) loop #646

Closed
p5pRT opened this issue Sep 21, 1999 · 4 comments
Closed

Unexpected error on WHILE(<FILE>) loop #646

p5pRT opened this issue Sep 21, 1999 · 4 comments

Comments

@p5pRT
Copy link

@p5pRT p5pRT commented Sep 21, 1999

Migrated from rt.perl.org#1519 (status was 'resolved')

Searchable as RT1519$

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Sep 21, 1999

From George.Nugent@Barra.COM

Created by George.Nugent@barra.com

Whilst parsing a large file using the WHILE(<>) loop, the loop seems to exit
when
it encounters a \xf8 character in the file.
Under debug, if the line in the file was "text abc\xf8 more text", then the
$_ variable
would contain "text abc" and the loop would exit on next.

Attemped to use the following script to screen out the \xf8 character​:

while (<>) {
  if (/\xf8/) {
  s/\xf8//g;
  }
  print;
}

This failed.

Using MKS sed -e "s/\xf8//g" filename, successfully read the whole file.

Note​: As a test, I installed the latest release version of perl (5.00503),
but this has the
same problem.

Perl Info


Site configuration information for perl 5.00402:

Configured by GEORGE at Thu Apr 11 06:20:49 PDT 1996.

Summary of my perl5 (5.0 patchlevel 4 subversion 02) configuration:
  Platform:
    osname=MSWin32, osvers=4.0, archname=MSWin32
    uname=''
    hint=recommended, useposix=true, d_sigaction=undef
    bincompat3=undef useperlio=undef d_sfio=undef
  Compiler:
    cc='cl.exe', optimize='-O', gccversion=
    cppflags='-DWIN32'
    ccflags ='-MD -DWIN32'
    stdchar='unsigned char', d_stdstdio=define, usevfork=false
    voidflags=15, castflags=0, d_casti32=define, d_castneg=define
    intsize=4, alignbytes=8, usemymalloc=n, randbits=15
  Linker and Libraries:
    ld='link', ldflags ='-nologo -subsystem:windows'
    libpth=\lib
    libs=oldnames.lib kernel32.lib user32.lib gdi32.lib  winspool.lib
comdlg32.lib advapi32.lib shell32.lib ole32.lib  oleaut32.lib netapi32.lib
uuid.lib wsock32.lib mpr.lib winmm.lib  version.lib odbc32.lib odbccp32.lib
    libc=msvcrt.lib, so=dll
    useshrplib=undef, libperl=undef
  Dynamic Linking:
    dlsrc=dl_win32.xs, dlext=dll, d_dlsymun=undef, ccdlflags=' '
    cccdlflags='', lddlflags='-dll'



@INC for perl 5.00402:
	c:\perl\lib\site
	c:\perl\lib
	c:\perl\lib
	c:\perl\lib\site
	c:\perl\lib\site
	.


Environment for perl 5.00402:
    HOME=c:/home
    LANG (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
 
PATH=c:\Sybase\dll;c:\Sybase\bin;.;c:\perl\bin;c:\myDlls;C:\Utils;C:\mksnt;C
:\WINNT\system32;C:\WINNT;C:\WIN32APP\TOOLKIT;C:\sybtools\WIN32;C:\sybtools\
ASEP;c:\PROGRA~1\DEVSTU~1\SHARED~1\bin\ide;.;c:\perl\bin;c:\lynx\sqlsrvr\uti
ls;c:\PROGRA~1\DEVSTU~1\SHARED~1\bin;c:\PROGRA~1\DEVSTU~1\vc\bin;c:\PROGRA~1
\DEVSTU~1\vss\win32;Z:.;c:\program
files\devstudio\sharedide\bin\ide;c:\lynx\sqlsrvr\utils;c:\FT_datafeed\utils
;c:\program files\devstudio\sharedide\bin;c:\program
files\devstudio\vc\bin;c:\program files\devstudio\vss\win32
    PERL_BADLANG (unset)
    SHELL=C:/mksnt/sh.exe

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Sep 21, 1999

From @tamias

On Tue, Sep 21, 1999 at 05​:30​:38PM +0100, George Nugent wrote​:

This is a bug report for perl from George.Nugent@​barra.com,
generated with the help of perlbug 1.19 running under perl 5.00402.

-----------------------------------------------------------------
[Please enter your report here]

Whilst parsing a large file using the WHILE(<>) loop, the loop seems to exit
when
it encounters a \xf8 character in the file.
Under debug, if the line in the file was "text abc\xf8 more text", then the
$_ variable
would contain "text abc" and the loop would exit on next.

Attemped to use the following script to screen out the \xf8 character​:

while (<>) {
if (/\xf8/) {
s/\xf8//g;
}
print;
}

This failed.

Using MKS sed -e "s/\xf8//g" filename, successfully read the whole file.

Note​: As a test, I installed the latest release version of perl (5.00503),
but this has the
same problem.

After installing 5.00503, I'm not sure why you would go back to using
5.00402...

Site configuration information for perl 5.00402​:

Configured by GEORGE at Thu Apr 11 06​:20​:49 PDT 1996.

Summary of my perl5 (5.0 patchlevel 4 subversion 02) configuration​:
Platform​:
osname=MSWin32, osvers=4.0, archname=MSWin32

I am unable to duplicate this problem with the character "\xf8". Rather, I
would expect it to occur with the character "\x1a". Nonetheless, try
calling binmode() on the filehandle before you read from it.

Ronald

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Sep 22, 1999

From [Unknown Contact. See original ticket]

Whilst parsing a large file using the WHILE(<>) loop, the loop seems to
exit
when
it encounters a \xf8 character in the file.
Under debug, if the line in the file was "text abc\xf8 more text", then
the
$_ variable
would contain "text abc" and the loop would exit on next.

Attemped to use the following script to screen out the \xf8 character​:

while (<>) {
if (/\xf8/) {
s/\xf8//g;
}
print;
}

This failed.

  I do confirm the opposite result with the same script and data -
\xf8 processed well
  in my case. Moreover, \xf8 is a Russian letter 'sh' that used in our
text processing quite often with no problem.

Note​: As a test, I installed the latest release version of perl
(5.00503),
but this has the
same problem.
  The problem is probably somewhere in another place.

After installing 5.00503, I'm not sure why you would go back to using
5.00402...
  I'm still using 5.004_02 in Win32 because it's binaries are more
complete (contains perlTk, ODBC and some others) and may be linked with BC++
libraries in order to embed Perl into my application (It will not link with
MSVC++ binaries provided by ActiveState standard port).
  Fortunately I can compile 5.005_xx with BC++ (thank you, porters)
and use higher versions in my application, but I'm afraid for some modules I
need to adopt from 5.004 to 5.005.

  Good luck,
  Vadim

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Sep 22, 1999

From [Unknown Contact. See original ticket]

Konovalov, Vadim writes​:

I do confirm the opposite result with the same script and data \-

\xf8 processed well
in my case. Moreover, \xf8 is a Russian letter 'sh' that used in our
text processing quite often with no problem.

<OFFTOPIC>

In view of pending utf8 stuff let us start to use a correct
terminology *now*.

  \xf8 *denotes* a Russian letter 'sh'

If a Russian letter 'sh' *is* anything, it is

  ord("\N{CYRILLIC SMALL LETTER SHA}") # is it SMALL?

which is 0x448 with Perl as shipped now.

But I think it is too early in the development curve to insist on
Unicode encoding of letters. Perl has no internal notion of encoding.
[Cultural info - which is the only meaning of encoding if you leave
glyphs aside - is assigned to (characters=) numbers via external
scripts, see lib/Unicode directory.]

So probably one should say that a Russian letter 'sh' *isn't* anything
as far as Perl is concerned. ;-)

Ilya

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
1 participant