Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Branch: blead
Commits on Sep 14, 2012
  1. @iabyn

    [MERGE] eliminate PL_reginput

    iabyn authored
    The variable PL_reginput (which is actually part of the
    global/per-interpreter variable PL_reg_state), is mainly used just
    locally within the S_regmatch() function. In this role, it effectively
    competes with the local-to-regmatch() variable locinput, as a pointer
    that tracks the current match position.
    
    Having two variables that do this is less efficient,and makes the code
    harder to understand. So this series of commits:
    
    1) removes PL_reginput, and replaces it with a var, reginput, local to
       regmatch();
    2) successively removes more and uses of the reginput variable, until
    3) it is eliminated altogether, leaving locinput as the sole 'here we are'
       pointer.
    
    Looking at the CPU usage of running the t/re/*.t tests on a -O2,
    non-threaded build, running each test suite 3 times, gives:
    
    before: 55.35 55.66 55.69
    after:  55.10 55.13 55.33
    
    which indicates a small performance improvement of around 0.5%.
    
    (The CPU usage of a single run of the whole perl test suite dropped from
    783.31s to 777.23s).
  2. @iabyn

    regmatch(): eliminate reginput variable

    iabyn authored
    The remaining uses of reginput are all assignments; its value is
    never used. So eliminate it.
    
    Also, update the description of S_regrepeat(), which was woefully out of
    date (but mentioned reginput).
  3. @iabyn

    regmatch(): remove remaining reads of reginput

    iabyn authored
    In the remaining place where the value of reginput is used, its value
    should always be equal to locinput, so it can be eliminated there.
    
    This is part of a campaign to eliminate the reginput variable.
  4. @iabyn

    regmatch(): remove reginput from CURLY etc

    iabyn authored
    reginput mostly tracked locinput, except when regrepeat() was called.
    With a bit of jiggling, it could be eliminated for these blocks of code.
    
    This is part of a campaign to eliminate the reginput variable.
  5. @iabyn

    regmatch(): remove reginput from CURLYM

    iabyn authored
    reginput, locinput and st->locinput were being used in a little
    ballet to determine the length of the first match.
    This is now simply locinput - st->locinput, or its unicode equivalent;
    so the code can be simplified.
    
    Elsewhere in the block: where reginput was being used, locinput and/or
    nextchr already contain the same info, so use them instead.
    
    This is part of a campaign to eliminate the reginput variable.
  6. @iabyn

    regmatch(): remove reginput from IFMATCH etc

    iabyn authored
    It was being used essentially as a temporary var within the branch,
    so replace it with a temp var in a new block scope.
    
    On return in IFMATCH_A / IFMATCH_A_fail, there's no need to set reginput
    any more, so don't. The SUSPEND case used to set locinput = reginput, but
    at that point, the two variables already always had the same value anyway.
    
    This is part of a campaign to eliminate the reginput variable.
  7. @iabyn

    regmatch(): remove reginput from TRIE_next_fail:

    iabyn authored
    It was being used essentially as a temporary var within the branch,
    so replace it with a temp var in a new block scope.
    
    This is part of a campaign to eliminate the reginput variable.
  8. @iabyn

    regmatch(): make PUSH_STATE_GOTO dest explicit

    iabyn authored
    Currently, the string position from where matching continues after a PUSH
    is implicitly specified by the value of reginput, which is usually just
    equal to locinput. Make this explicit by adding an extra argument to
    PUSH_STATE_GOTO() etc.
    
    This is part of a campaign to eliminate the reginput variable.
  9. @iabyn

    eliminate PL_reginput

    iabyn authored
    PL_reginput (which is actually #defined to PL_reg_state.re_state_reginput)
    is, to all intents and purposes, state that is only used within
    S_regmatch().
    
    The only other places it is referenced are in S_regtry() and S_regrepeat(),
    where it is used to pass the current match position back and forth between
    the subs.
    
    Do this passing instead via function args, and bingo! PL_reginput is now
    just a local var of S_regmatch().
  10. Fix compilation for -DPERL_POISON and -DPERL_OLD_COPY_ON_WRITE together.

    Nicholas Clark authored
    These have been present since PERL_POISON was added in June 2005 by commit
    94010e7. It seems that no-one has tried compiling with both defined
    together.
  11. Fix buggy -DPERL_POISON code in S_rxres_free(), exposed by a recent t…

    Nicholas Clark authored
    …est.
    
    The code had been buggily attempting to overwrite just-freed memory since
    PERL_POISON was added by commit 94010e7 in June 2005. However, no
    regression test exercised this code path until recently.
    
    Also fix the offset in the array of UVs used by PERL_OLD_COPY_ON_WRITE to
    store RX_SAVED_COPY(). It now uses p[2]. Previously it had used p[1],
    directly conflicting with the use of p[1] to store RX_NPARENS().
    
    The code is too intertwined to meaningfully do these as separate commits.
  12. Restore the build under -DPERL_OLD_COPY_ON_WRITE

    Nicholas Clark authored
    This was broken as a side effect of commit 6502e08, recently merged
    to blead.
  13. @perlDreamer @tsee
  14. @perlDreamer @tsee
  15. @perlDreamer @tsee

    Refactor t/op/overload_integer.t to use test.pl instead of making TAP…

    perlDreamer authored tsee committed
    … by hand.
    
    With minor change from committer: Always assign $@ asap after an eval.
  16. @perlDreamer @tsee
  17. Merge branch for mostly regen/regcharclass.pl into blead

    Karl Williamson authored
    I started this work planning to enhance regen/regcharclass.pl to accept
    Unicode properties as input so that some small properties used in \X
    could be compiled in, instead of having to be read from disk.  In doing
    so, I saw some opportunities to move some EBCDIC dependencies down to a
    more basic level, thus replacing quite a few existing ones with just a
    couple at the lower levels.  This also led to my enhancing the macros
    output by regcharclass.pl to be at least as good (in terms of numbers of
    branches, etc) as the hand-coded ones it replaces.
    
    I also spotted a few bugs in existing code that hadn't been triggered
    yet.
  18. utf8.h: Use machine generated IS_UTF8_CHAR()

    Karl Williamson authored
    This takes the output of regen/regcharclass.pl for all the 1-4 byte
    UTF8-representations of Unicode code points, and replaces the current
    hand-rolled definition there.  It does this only for ASCII platforms,
    leaving EBCDIC to be machine generated when run on such a platform.
    
    I would rather have both versions to be regenerated each time it is
    needed to save an EBCDIC dependency, but it takes more than 10 minutes
    on my computer to process the 2 billion code points that have to be
    checked for on ASCII platforms, and currently t/porting/regen.t runs
    this program every times; and that slow down would be unacceptable.  If
    this is ever run under EBCDIC, the macro should be machine computed
    (very slowly).  So, even though there is an EBCDIC dependency, it has
    essentially been solved.
  19. regen/regcharclass.pl: Add ability to restrict platforms

    Karl Williamson authored
    This adds the capability to skip definitions if they are for other than
    a desired platform.
  20. utf8.h: Remove some EBCDIC dependencies

    Karl Williamson authored
    regen/regcharclass.pl has been enhanced in previous commits so that it
    generates as good code as these hand-defined macro definitions for
    various UTF-8 constructs.  And, it should be able to generate EBCDIC
    ones as well.  By using its definitions, we can remove the EBCDIC
    dependencies for them.  It is quite possible that the EBCDIC versions
    were wrong, since they have never been tested.  Even if
    regcharclass.pl has bugs under EBCDIC, it is easier to find and fix
    those in one place, than all the sundry definitions.
  21. regen/regcharclass.pl: Add optimization

    Karl Williamson authored
    On UTF-8 input known to be valid, continuation bytes must be in the
    range 0x80 .. 0x9F.  Therefore, any tests for being within those bounds
    will always be true, and may be omitted.
  22. regen/regcharclass.pl: White-space only

    Karl Williamson authored
    Indent a newly-formed block
  23. regen/regcharclass.pl: Extend previously added optimization

    Karl Williamson authored
    A previous commit added an optimization to save a branch in the
    generated code at the expense of an extra mask when the input class has
    certain characteristics.  This extends that to the case where
    sub-portions of the class have similar characteristics.  The first
    optimization for the entire class is moved to right before the new loop
    that checks each range in it.
  24. regen/regcharclass.pl: Rmv always true components from gen'd macro

    Karl Williamson authored
    This adds a test and returns 1 from a subroutine if the condition will
    always match; and in the caller it adds a check for that, and omits the
    condition from the generated macro.
  25. regen/regcharclass.pl: Add an optimization

    Karl Williamson authored
    Branches can be eliminated from the macros that are generated here
    by using a mask in cases where applicable.  This adds checking to see if
    this optimization is possible, and applies it if so.
  26. regen/regcharclass.pl: Rename a variable

    Karl Williamson authored
    I find it confusing that the array element name is the same as the full array
  27. regen/regcharclass.pl: Pass options deeper into call stack

    Karl Williamson authored
    This is to prepare for future commits which will act differently at the
    deep level depending on some of the options.
  28. Use macro not swash for utf8 quotemeta

    Karl Williamson authored
    The rules for matching whether an above-Latin1 code point are now saved
    in a macro generated from a trie by regen/regcharclass.pl, and these are
    now used by pp.c to test these cases.  This allows removal of a wrapper
    subroutine, and also there is no need for dynamic loading at run-time
    into a swash.
    
    This macro is about as big as I'm comfortable compiling in, but it
    saves the building of a hash that can grow over time, and removes a
    subroutine and interpreter variables.  Indeed, performance benchmarks
    show that it is about the same speed as a hash, but it does not require
    having to load the rules in from disk the first time it is used.
  29. regen/regcharclass.pl: Add new output macro type

    Karl Williamson authored
    The new type 'high' is used on only above-Latin1 code points.  It is
    designed for code that already knows the tested code point is not
    Latin1, and avoids unnecessary tests.
  30. regen/regcharclass.pl: Add documentation

    Karl Williamson authored
  31. regen/regcharclass.pl: Error check input better

    Karl Williamson authored
    This makes sure that the modifiers specified in the input are known to
    the program.
  32. regen/regcharclass.pl: Allow comments in input

    Karl Williamson authored
    Lines whose first non-blank character is a '#' are now considered to be
    comments, and ignored.  This allows the moving of some lines that have
    been commented out back to after the __DATA__ where they really belong.
  33. regen/unicode_constants.pl: Add name parameter

    Karl Williamson authored
    A future commit will want to use the first surrogate code point's UTF-8
    value.  Add this to the generated macros, and give it a name, since
    there is no official one.  The program has to be modified to cope with
    this.
  34. Move 2 functions from utf8.c to regexec.c

    Karl Williamson authored
    One of these functions is currently commented out.  The other is called
    only in regexec.c in one place, and was recently revised to no longer
    require the static function in utf8.c that it formerly called.  They can
    be made static inline.
  35. regexec.c: Use new macros instead of swashes

    Karl Williamson authored
    A previous commit has caused macros to be generated that will match
    Unicode code points of interest to the \X algorithm.  This patch uses
    them.  This speeds up modern Korean processing by 15%.
    
    Together with recent previous commits, the throughput of modern Korean
    under \X has more than doubled, and is now comparable to other
    languages (which have increased themselved by 35%)
Something went wrong with that request. Please try again.