Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add token index #12

Merged
merged 2 commits into from
Apr 17, 2018
Merged

Add token index #12

merged 2 commits into from
Apr 17, 2018

Conversation

instabledesign
Copy link
Contributor

Hi, i recently work on new project Xpression

My need is to resetPosition at token index but the $token['position'] was the string position of this token in the input string.

My actual workaround was to keep the lexerI index in my Parser code and reset it each time i need it.

So i think if the token index was store in the token i can get it easily with $token['index']

Thank to read.

@instabledesign
Copy link
Contributor Author

Ping @beberlei @stof just to have answer to process or close it..

@stof
Copy link
Member

stof commented Nov 30, 2017

I'm not maintainer on this project

@instabledesign
Copy link
Contributor Author

ok sorry

@jwage
Copy link
Member

jwage commented Apr 11, 2018

Thanks for the change. I think this makes sense and it should be BC. Can you add a unit test?

@jwage
Copy link
Member

jwage commented Apr 16, 2018

Thanks for making this change and adding the tests! I will merge a little later and I am going to follow this PR up with another to add Travis CI.

@jwage jwage added this to the v1.0.2 milestone Apr 16, 2018
@guilhermeblanco guilhermeblanco merged commit 0eda1aa into doctrine:master Apr 17, 2018
@Majkl578
Copy link
Contributor

Unfortunately this change is a BC break and needs to be reverted. :/
It has completely broken egeloen/ivory-serializer: https://gist.github.com/Majkl578/8c1500fd14884091e65e6af3ddef5c84

Thanks @goetas for spotting this!

@jwage
Copy link
Member

jwage commented May 13, 2018

@Majkl578 I reverted this here #18

I kept the other changes from the PR and just reverted the changed functionality.

@jwage
Copy link
Member

jwage commented May 13, 2018

Looking at the code in https://github.com/egeloen/ivory-serializer/tree/master/src/Type and I don't immediately see what it was depending on that caused the break. I will look more later.

@instabledesign
Copy link
Contributor Author

instabledesign commented May 14, 2018

Hi, i think we get drop the

$this->tokens[$index] = array(

but we can keep

'index' => $index,

@jwage
Copy link
Member

jwage commented May 14, 2018

@instabledesign If you have time, can you look at https://github.com/egeloen/ivory-serializer and see why it broke after this change so that we can add tests to cover it?

@instabledesign
Copy link
Contributor Author

Yes.

@instabledesign
Copy link
Contributor Author

Investigation report:

Working solution :
change the egeloen/ivory-serializer for(...)

for ($i = 0; ($i < $count) || ($token === $nextToken); ++$i) {
for ($i = 0; ($i < $count) || ($token['value'] === $nextToken['value'] && $token['type'] === $nextToken['type'] && $token['position'] === $nextToken['position']); ++$i) {

I'll try to fix the AbstractLexer::$index in order to increment only when the match is a not a capture of previous one but without succeed, and i dont think is a good solution.

@instabledesign
Copy link
Contributor Author

I try to fix the 2 problem from above but theire is some logic to build the fixtures with some private method, so this is not easy to reproduce and fix correctly what is going on! I continue to work on it on my free time.

@instabledesign
Copy link
Contributor Author

tests fixed i ping him in order to merge

@instabledesign
Copy link
Contributor Author

egeloen/ivory-serializer look like not active anymore with only one release (from jan 2017)

@jwage did you plan to create new version with this modification?

@jwage
Copy link
Member

jwage commented Jun 13, 2018

Did we figure out a way to make the change in this repo so it doesn't break existing implementations? (even if their regex is "wrong")

@instabledesign
Copy link
Contributor Author

First i try with group naming but the group naming doesn't work with preg_split

preg_split('/(?<FOO>=|>|<)|(?<BAR>[a-z]+)|(?<BAZ>\d+)/i', 'price>5', -1, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_OFFSET_CAPTURE);
/*
array(3) {
  [0]=>
  array(2) {
    [0]=>
    string(5) "price"
    [1]=>
    int(0)
  }
  [1]=>
  array(2) {
    [0]=>
    string(1) ">"
    [1]=>
    int(5)
  }
  [2]=>
  array(2) {
    [0]=>
    string(1) "5"
    [1]=>
    int(6)
  }
}
*/

The second way is to deduplicate the matched element with offset

$matches = preg_split('/((=|>|<)|([a-z]+)|(\d+))/i', 'price>5', -1, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_OFFSET_CAPTURE);
$offset = null;
$matchesDeduplicate = array_filter($matches, function($item)use(&$offset){
    if (null === $offset) {
        $offset = $item[1];
        return true;
    }
    $filter = $offset !== $item[1];
    $offset = $item[1];
    
    return $filter;
});
/*
array(3) {
  [0]=>
  array(2) {
    [0]=>
    string(5) "price"
    [1]=>
    int(0)
  }
  [2]=>
  array(2) {
    [0]=>
    string(1) ">"
    [1]=>
    int(5)
  }
  [4]=>
  array(2) {
    [0]=>
    string(1) "5"
    [1]=>
    int(6)
  }
}
*/

With 300 tokens match 10 each (9000 tokens) we already have 1Mo memory more consuption
preg_split without deduplicate
preg_split with deduplicate

@instabledesign
Copy link
Contributor Author

what did you think about it @jwage

@jwage
Copy link
Member

jwage commented Jul 12, 2018

I don't think we can make this change without breaking BC or increasing memory usage as you noted.

@instabledesign
Copy link
Contributor Author

Hi can you consider apply this change on the v2 ?
it was originally revert because of breaking unmaintained libs.
@greg0ire

@greg0ire
Copy link
Member

If there is a breaking change, then it should go into v4 I'm afraid.

@instabledesign
Copy link
Contributor Author

Im not completely sure it was a BC because it only add a new value in the token details.
Like a explain earlier, the egeloen/ivory-serializer not use properly the lexer, so my change alter his behavior.
For me it was ok is this change go in V3 or at least in V4.

@greg0ire
Copy link
Member

If it's not a breaking change then you should target v3.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants