Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable UTF-8 in the source code of the skeleton app #782

Closed
wants to merge 1 commit into from
Closed

Enable UTF-8 in the source code of the skeleton app #782

wants to merge 1 commit into from

Conversation

mvuets
Copy link

@mvuets mvuets commented Nov 27, 2014

Since the skeleton app generated by dancer2 gen is by default configured to
use UTF-8 (in config.yml), it makes sense to enable UTF-8 in the source code of
the main module (lib/${APP_NAME}.pm) as well. Thus Unicode string literals are
not double encoded and are rendered properly.

Consider this route:

get '/' => sub { "щука\n" }; # stored in a file with UTF-8 encoding

Before change it would give:

$ curl localhost:3000
��ка

After change:

$ curl localhost:3000
щука

Since the skeleton app generated by `dancer2 gen` is by default configured to
use UTF-8 (in config.yml), it makes sense to enable UTF-8 in the source code of
the main module (lib/${APP_NAME}.pm) as well. Thus Unicode string literals are
not double encoded and are rendered properly.

Consider this route:

    get '/' => sub { "щука\n" }; # stored in a file with UTF-8 encoding

Before change it would give:

    $ curl localhost:3000
    ��ка

After change:

    $ curl localhost:3000
    щука

♥
@DavsX
Copy link

DavsX commented Nov 27, 2014

UTF-8 in config.yml has little to do with 'use utf8' in perl scripts. The latter enables you to have the source code itself in UTF-8, thus enabling for example utf characters in variable names, in string constants. Personally I don't see a reason to use utf8 in the source code, unless someone actually needs it, but that is not very common in my experience (source code that is capable of dealing with unicode data does not neccesarily need 'use utf8'). On the other hand I don't really see any negative aspects of it. Maybe just the one, that it can create confusion, when someone is not that familiar with unicode.

@mvuets
Copy link
Author

mvuets commented Nov 27, 2014

Hey @DavsX, thanks for the quick reaction!

You are right the correlation is very small. I was indeed referring to "string constants" (sorry if my phrasing "Unicode string literals" was obscure). IMO need of non-ASCII string constants is arguable from a point of debugging and a not native English speaker, but I leave it up to you (-:

I should say that I came up with this change after reading remarks to the charset setting in the default config file, that reads:

# when the charset is set to UTF-8 Dancer2 will handle for you
# all the magic of encoding and decoding. You should not care
# about unicode within your app when this setting is set (recommended).

I think one not familiar with Unicode might be confused either utf8 is used or not. And in some cases (like in ones being discussed) someone actually would need to care.

If this change is discouraged, then maybe change phrasing in the mentioned config file remark? Namely avoid using "magic" and "should not care". What do you think?

@DavsX
Copy link

DavsX commented Nov 27, 2014

Well, you are right, that comment in this form is not 100% correct. I personally don't have any preference what regards of rewriting that comment or adding 'use utf8', so it is best to wait for the 'bosses' to come and comment on this :)

@xsawyerx
Copy link
Member

We are already trying to import utf8 to the caller. Seems like it isn't working...
use Dancer2 calls:

strict->import;
warnings->import;
utf8->import;

@mvuets
Copy link
Author

mvuets commented Nov 28, 2014

Yeah, I ran across that code last night and wondered if that was the same intent.

I think that code does not work, because you don't use utf8;: despite utf8 namespace always exists (it is created by core) it does not contain extra functionality loaded and activated by the utf8 pragma, namely enabling utf8-aware source code. This patch seems to help:

diff --git a/lib/Dancer2.pm b/lib/Dancer2.pm
index be5ccfc..d4167db 100644
--- a/lib/Dancer2.pm
+++ b/lib/Dancer2.pm
@@ -3,6 +3,7 @@ package Dancer2;

 use strict;
 use warnings;
+use utf8;
 use List::Util  'first';
 use Class::Load 'load_class';
 use Dancer2::Core;

@DavsX
Copy link

DavsX commented Nov 28, 2014

Putting 'use utf8' in Dancer2.pm should not affect your own Dancer2 scripts in the way you think. If we add use utf8 in the scaffolding, it would mean that you would HAVE to remember to put it in every package of yours when that is needed. Dancer2 apps rarely consist of only one .pm file. On the other hand the current utf8->import would mean, that in every file in which you use Dancer2; would automatically get the 'use utf8'. That is a more general approach I think, altough if it is not working then we should do something about it.

@xsawyerx
Copy link
Member

I meant the utf8->import that we do on sub import in Dancer2.pm.

I checked and utf8 pragma is... special. We could import it using import::into. This raises the question of "should we really force user code to always be in utf8? Would it be better to put it in the scaffolding app? What are the implications of requiring utf8 when you don't need it?

In general I'm not in favor of writing utf8 in source code and being more explicit when you do.

@xsawyerx
Copy link
Member

I'm voting for "we already have utf8->import in our code which we thought would be import utf8 pragma", so if we already had that assumption, let's just put it in the skeleton code.

In short, I vote to merge.

@veryrusty
Copy link
Member

I've added some simple tests (merged in 02f4894) but these pass on all versions of perl we test with TravisCI.

Playing devil's advocate, assuming the added tests are correct then if this is merged, then for some users removing the use utf8; line won't turn off the import of utf8 that Dancer2->import does, leading to someone else getting really confused.

@mvuets could you provide what version(s) of Dancer2 and what platform(s) you ran into this issue?

veryrusty added a commit that referenced this pull request Dec 30, 2014
We were calling $pragma->import which should "just work" to apply the
pragma to the caller. #782 hints that the utf8 pragma may not be
imported correctly. By using Import::Into we have a consistent way to
import pragmas (or anything else may wish to) into the caller.
@mvuets
Copy link
Author

mvuets commented Dec 30, 2014

I ran @veryrusty's unit test and it passed on my (ordinal) system as well. Though the original problem remained - utf8 wasn't imported by Dancer2 in a user app. It took me a good while to figure out why. See my findings below - hope that helps and makes sense.

First off, I'd like to stress out once more - utf8 namespace always exists (it's described in utf8.pod). Here is a quick demo:

$ perl -wE 'say "utf8:: exists" if %utf8::'
utf8:: exists
$ perl -wE 'say "rubbish:: does not exist" if !%rubbish::'
rubbish:: does not exist

Now another bit - you can invoke import on a package with no import defined. So here I lack core understanding of why this is a case. I guess it's made optional to make use always work. Whatever that is, this code appears to work perfectly well:

$ perl -wE 'package rubbish; package main; rubbish->import()'

The last basic piece - utf8 controls the UTF-8 flag in import/unimport. So to give a compiler a hint whether code to be compiled as UTF-8, I need to load utf8.pm and invoke import or unimport subroutines. use utf8 and no utf8 do exactly that.

Now combine all bits together. To utf8->import (like in Dancer2.pm), that import must exist first. Otherwise it silently does nothing.

$ perl -wE 'say "utf8::import", (utf8->can("import") ? "" : " not"), " exists"'
utf8::import not exists
# but
$ perl -wE 'no utf8; say "utf8::import", (utf8->can("import") ? "" : " not"), " exists"'
utf8::import exists
$ perl -wE 'require utf8; say "utf8::import", (utf8->can("import") ? "" : " not"), " exists"'
utf8::import exists

Most importantly:

$ perl -wE 'use Dancer2; say "utf8::import", (utf8->can("import") ? "" : " not"), " exists"'
utf8::import not exists

Now, back to the concrete example - the unit test. It works simply because it loads utf8.pm via no utf8 on line 02f4894#diff-4f40dcfb5080590785ed30832bb822adR218. And once utf8::import is loaded and compiled - everything works as you wanted in the first place. Look:

$ perl -wE 'use Dancer2; say length "щука"'
8
$ perl -wE 'BEGIN {require utf8} use Dancer2; say length "щука"'
4

Summary.

  • My given pull request is not necessary (assuming you do indeed want to keep certain things implicit for users).

  • The @veryrusty's unit test is not accurate (and it is not trivial to make it so, I think).

  • My second proposal to use utf8 in Dancer2.pm was about right: right direction, wrong actions. To make it work you need to load utf8.pm and do it in compile time:

    diff --git a/lib/Dancer2.pm b/lib/Dancer2.pm
    index 9cb0b43..0a9a7b5 100644
    --- a/lib/Dancer2.pm
    +++ b/lib/Dancer2.pm
    @@ -28,6 +28,7 @@ sub import {
    
         strict->import;
         warnings->import;
    +    BEGIN { require utf8 }
         utf8->import;
    
         my @final_args;
    

    With patch applied now that former example seems to work properly:

      $ perl -wE 'use Dancer2; say length "щука"'
      4
    

That's it, thanks. I would like to hear your thoughts.

@veryrusty
Copy link
Member

@mvuets++, hell, @mvuets+5 ! Thanks for spending the time investigating this and the excellent feedback. I even learnt something new! 😄

@xsawyerx its not that the utf8 pragma is "special" - all pragmas are; eg. what do you expect this to do?

perl -wE '"strict"->import; $v=2; say $v'

The suggestion to use Import::Into has been merged (#821). That effectively does use_module($pragma)->import(), which provides what's required. Running @mvuets last example against current master gives:

$ perl -Ilib -wE 'use Dancer2; say length "щука"'
4

I think this can now be closed - though we may want to update the Changes file to reflect this issue is also resolved.

@xsawyerx
Copy link
Member

xsawyerx commented Jan 1, 2015

@mvuets++ Wow!
@veryrusty++

I will add this to the Changes. Amazing work!

@xsawyerx xsawyerx closed this Jan 1, 2015
xsawyerx added a commit that referenced this pull request Jan 1, 2015
    ** Happy new year! **

    [ ENHANCEMENT ]
    * GH #778: Avoid hard-coded static page location. (Dávid Kovács)
    * Speed up big uploads significantly. (Rick Myers)
    * GH #821: Use Import::Into to import pragmas. (Russell Jenkins)
    * GH #782: Fix utf8 pragma import. (Maxim Vuets)
    * GH #786: Perlbrew fix. (Dávid Kovács)
    * GH #622: Refactoring. (James Raspass)

    [ DOCUMENTATION ]
    * GH #713: Change order of statements in Cookbook to not imply that
      D2::P::Ajax::ajax() calls have priority. (Sawyer X)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants