Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Annotations 2.0 #75

Open
wants to merge 28 commits into
base: master
from

Conversation

@FabioBatSilva
Copy link
Member

commented Mar 20, 2016

  • Move namespace to Doctrine\Annotations (Removing Common)
  • Uses hoa/compiler instead of doctrine/lexer ( see : grammar )
  • Drop AnnotationRegistry and all autoload magic
  • Drop Attribute/Attributes annotations
  • Drop SimpleAnnotationReader
  • Drop FileCacheReader
  • Drop IndexedReader
  • Requires php 7

TODO:

Local reviews (checkout + run locally):

@Ocramius Ocramius added this to the v2.0.0 milestone Mar 21, 2016

@schmittjoh

This comment has been minimized.

Copy link
Member

commented Mar 21, 2016

How does it affect userland? Any benefits/goals?

],
"require": {
"php": ">=5.3.2",
"doctrine/lexer": "1.*"
"php": ">=7.0.0",

This comment has been minimized.

Copy link
@stof

stof Mar 21, 2016

Member

please don't align constraints. It leads to a nightmare when adding new constraints (merge conflicts for nothing due to alignment changes)

This comment has been minimized.

Copy link
@localheinz

localheinz Mar 23, 2016

Contributor

@FabioBatSilva

Aligning constraints tells us you're manually editing composer.son, which should not be necessary when requiring dependencies.

},
"autoload": {
"psr-0": { "Doctrine\\Common\\Annotations\\": "lib/" }
"psr-4": {
"Doctrine\\Annotations\\": "src/"

This comment has been minimized.

Copy link
@stof

stof Mar 21, 2016

Member

couldn't we keep the same namespace to make the change less disruptive for userland (possibility to keep using the same API) ?

This comment has been minimized.

Copy link
@stof

stof Mar 21, 2016

Member

btw, the namespace change (and so the file renaming) also means that many changed files are actually considered as being removed and added in the new location, making it harder to see actual changes (and so harder to understand what are the changes in the tests for instance)

This comment has been minimized.

Copy link
@guilhermeblanco

guilhermeblanco Mar 21, 2016

Member

Majors are for BC breaks, no?

This comment has been minimized.

Copy link
@stof

stof Mar 21, 2016

Member

Indeed. But breaking BC just because you can makes it harder for anyone using your library, and means that they will take much more time to upgrade to the newer version (and ask you to continue providing support for the older version in parallel for much longer).

Rewriting all internals of the library to use different dependencies may be worth it (but we still wait for benefits/goals asked by @schmittjoh), and breaking BC to use PHP 7 return typehints is something I can agree with (it will make it impossible for people to support for 1.x and 2.x if they are extending Doctrine classes, but they should not need to use inheritance on them IMO anyway). But being friendlier to consumers of the library would be good.
Due to the PHP 7 requirement, I can guarantee you that Symfony cannot switch to the v2 only until Symfony 4 (as Symfony 3.x still supports PHP 5.x). This gives us 2 options:

  • supporting 1.x only
  • supporting 1.x and 2.x at the same time (letting people using PHP 7 upgrade to doctrine/annotations 2.x in their projects)

The second case is better (as the version supported by Symfony also impacts all projects build on top of Symfony, and their compatibility with other packages using doctrine/annotations), but only if we can easily use both. Currently the only changes in the Reader interface are its namespace and scalar typehints. adding scalar typehints does not hurt us (as long as we always call it properly by passing a string, the scalar typehint change does not impact the caller. The signature change only impacts people wanting to implement the interface). the namespace change however makes it a huge pain, as we don't have an interface anymore, but 2 different interfaces, and so we cannot typehint anymore when supporting both.
I don't see any benefit in the class renaming, and only drawbacks here (btw, this is why Twig still does not use namespaces for instance, and won't use them either in 2.x: the BC break would split the community using it without bringing any real benefit)

This comment has been minimized.

Copy link
@mickaelandrieu

mickaelandrieu Mar 21, 2016

Due to the PHP 7 requirement, I can guarantee you that Symfony cannot switch to the v2 only until Symfony 4

It's ok... I mean, the world don't turn around Symfony framework full edition, and any developper is able to customize his dependencies

This comment has been minimized.

Copy link
@schmittjoh

schmittjoh Mar 25, 2016

Member

@stof I guess one can introduce a compatibility layer so that you would not use the annotations library directly, but just the compatibility layer:

interface MyAnnotationReader { }
class Doctrinev1Adapter implements MyAnnotationReader { }
class Doctrinev2Adapter implements MyAnnotationReader { }

// just code against MyAnnotationReader in Symfony

I'm still missing why one would want to use version 2 as proposed here at all. Is it faster? Does it have more features? Why would I want to use this over v1?

This comment has been minimized.

Copy link
@FabioBatSilva

FabioBatSilva Mar 25, 2016

Author Member

@schmittjoh the main goal here is to drop our parser and also some code cleanup.
But those are mainly internal improvements with very little userland affect.

This comment has been minimized.

Copy link
@FabioBatSilva

FabioBatSilva Mar 25, 2016

Author Member

@stof we won't force anyone to migrate to 2.x,
And we are probably going continue to maintain 1.x for a while.

Supporting both 1.x and 2.x it's not a very complicated problem,
Having a adapter layer will do the job just fine.

People can pick and choose their dependencies,
and eventually most people are going to end up using PHP7 anyways.

If we are not willing to do some cleanup or upgrade to PHP7 we might as well drop 2.0 and continue with 1.0

This comment has been minimized.

Copy link
@stof

stof Nov 29, 2016

Member

@FabioBatSilva I'm not arguing at all with cleaning up the internals.
the only point I'm arguing with is the namespace change, which does not bring any benefit (except maybe some theoretical one), but has a huge impact on users.
If you don't do this remaining, updating to 2.0 will be a no-brainer for most users as changes are mostly internal.
The fact that a major version can break BC does not mean it must break BC anywhere. The less BC break you have, the easier the migration is (and the less time you will have to maintain 1.x in parallel with its entirely different codebase)

This comment has been minimized.

Copy link
@donquixote

donquixote Aug 27, 2017

I don't know all the details of this, but I agree with @stof to focus on internal changes.

I always found the global AnnotationRegistry to be ugly. Instead, the parser should produce an AST first, where annotation names are not treated as classes. Then a separate step should resolve those names as objects.

I see that the PR does this. The first step is to let the hoa parser construct an AST. Then the DocVisitor (strange name, btw) turns it into an annotation object tree. And instead of the awkward class registry, ist just checks class_exists() || interface_exists(). Which is ok I guess.

This part looks reasonable, so this should be the focus - imo.

$file = new SplFileObject($filename);
while (!$file->eof()) {
$content = '';
$file = new SplFileObject($filename);

This comment has been minimized.

Copy link
@stof

stof Mar 21, 2016

Member

I suggest not aligning = signs. This creates useless git conflicts when adding new variables, and makes git blame much less useful by filling the history with meaningless changes on lines unrelated to the actual change of the commit

%token at @ -> annot
%token text [^@].*

%token annot:identifier [\\a-zA-Z_][\\a-zA-Z0-9_]* -> values

This comment has been minimized.

Copy link
@stof

stof Mar 21, 2016

Member

Should be \\?[a-zA-Z_][a-zA-Z0-9_](\\[a-zA-Z_][\\a-zA-Z0-9_])* actually. Any namespace separator must be followed by a non-digit char (and not by a second separator).
And btw, this regex is more restrictive that the regex of valid PHP class name. Is it on purpose to forbid unicode chars in class names of annotations ?

This comment has been minimized.

Copy link
@guilhermeblanco

guilhermeblanco Mar 21, 2016

Member

Your suggested change is wrong because you're forcing to have 2 chars before ns separator

This comment has been minimized.

Copy link
@stof

stof Mar 21, 2016

Member

sorry, a star is missing in my suggestion. should be \\?[a-zA-Z_][a-zA-Z0-9_]*(\\[a-zA-Z_][a-zA-Z0-9_])*

%token value:number \-?(0|[1-9]\d*)(\.\d+)?([eE][\+\-]?\d+)?

%token value:quote_ " -> string
%token string:string [^"]+

This comment has been minimized.

Copy link
@stof

stof Mar 21, 2016

Member

what about quote escaping for the case where the string needs to contain a quote ?

use Doctrine\Annotations\Reader;
use ReflectionClass;
require __DIR__ . '/Fixtures/Reader/TopLevelAnnotation.php';

This comment has been minimized.

Copy link
@stof

stof Mar 21, 2016

Member

why not relying on autoloading ?

This comment has been minimized.

Copy link
@FabioBatSilva

FabioBatSilva Mar 25, 2016

Author Member

TopLevelAnnotation it's not PSR compliant.

This comment has been minimized.

Copy link
@stof

stof Nov 29, 2016

Member

you can rely on composer classmap autoloading (or make it PSR compliant)

@stof

This comment has been minimized.

Copy link
Member

commented Mar 21, 2016

Drop Attribute/Attributes annotations

Why is it dropped ?

}
/**
* @return \Doctrine\Annotations\Metadata\MetadataFactory

This comment has been minimized.

Copy link
@yourwebmaker

yourwebmaker Mar 21, 2016

Sometimes you're using FQCN and sometimes you're using alias. It'd be better to use only one.

This comment has been minimized.

Copy link
@guilhermeblanco
{
public function testConfigurationAccessors()
{
$metadata = $this->getMock('Doctrine\Annotations\Metadata\MetadataFactory', [], [], '', false);

This comment has been minimized.

Copy link
@yourwebmaker

yourwebmaker Mar 21, 2016

Why not ::class for everything ?

This comment has been minimized.

Copy link
@guilhermeblanco

guilhermeblanco Mar 21, 2016

Member

Yeeeeesss!!!!! (Yoda style)

This comment has been minimized.

Copy link
@stof

stof Mar 21, 2016

Member

I suggest doing this change in the whole testsuite, but in a separate PR. This will allow to review the v2 rewriting of the library without having all these changes in the diff, making it easier to spot what actually changed in the testsuite due to the rewriting (allowing to spot BC breaks).

$ignoredNames = ['todo' => true];
$imports = [
'template' => 'Doctrine\AnnotationsTests\Fixtures\Annotation\Template',
'Route' => 'Doctrine\AnnotationsTests\Fixtures\Annotation\Route;'

This comment has been minimized.

Copy link
@yourwebmaker
* @author Fabio B. Silva <fabio.bat.silva@hotmail.com>
* @author Guilherme Blanco <guilhermeblanco@hotmail.com>
*/
class AnnotationReader implements Reader

This comment has been minimized.

Copy link
@lcobucci

lcobucci Mar 21, 2016

Member

Could this class also be final?

/**
* @return Doctrine\Annotations\Resolver
*/
public function getResolver() : Resolver

This comment has been minimized.

Copy link
@lcobucci

lcobucci Mar 21, 2016

Member

Is this getter really needed? I mean, it's a bit strange a Builder exposing a way to retrieve it's resolver.

If we take a look on DocVisitor we might be able to find where it's being used and it seems a LoD violation to me.

/**
* @return AnnotationException
*/
public static function optimizerPlusSaveComments()

This comment has been minimized.

Copy link
@lcobucci

lcobucci Mar 21, 2016

Member

We could add a return type hint for those methods 😄

*/
private function isAnnotation(ReflectionClass $class, array $annotations) : bool
{
if ($class->isSubclassOf('Doctrine\Annotations\Annotation')) {

This comment has been minimized.

Copy link
@lcobucci

lcobucci Mar 21, 2016

Member

Add ::class also in this class.

This comment has been minimized.

Copy link
@stof

stof Mar 22, 2016

Member

@lcobucci see my previous comment. Review will be much easier if the switch to ::class is done in a separate PR

@@ -27,7 +29,7 @@
* @author Fabien Potencier <fabien@symfony.com>
* @author Christian Kaps <christian.kaps@mohiva.com>
*/
final class PhpParser
class PhpParser

This comment has been minimized.

Copy link
@lcobucci

lcobucci Mar 21, 2016

Member

What's the intent of removing the final modifier here?

.travis.yml Outdated
- composer install -n
matrix:
allow_failures:
- php: hhvm

This comment has been minimized.

Copy link
@localheinz

localheinz Mar 21, 2016

Contributor

@FabioBatSilva

How about allowing builds to finish quickly?

💁 For reference see https://docs.travis-ci.com/user/customizing-the-build/#Fast-Finishing.

This comment has been minimized.

Copy link
@stof

stof Mar 22, 2016

Member

you should just remove HHVM entirely for now, as it cannot work: they provide a PHP 5.6 runtime, not a PHP 7 one currently, meaning they cannot install deps

.travis.yml Outdated
- hhvm

install:
- composer self-update

This comment has been minimized.

Copy link
@localheinz

localheinz Mar 21, 2016

Contributor

@FabioBatSilva

How about updating composer itself in the before_install section?

💁 For reference see https://docs.travis-ci.com/user/customizing-the-build/#The-Build-Lifecycle.

.travis.yml Outdated

install:
- composer self-update
- composer --prefer-source install

This comment has been minimized.

Copy link
@localheinz

localheinz Mar 21, 2016

Contributor

@FabioBatSilva

If you actually want to cache dependencies installed with composer between builds, they need to be installed with --prefer-dist.

(comments() | annotations())*
#annotations:
annotation() ( annotation() )*

This comment has been minimized.

Copy link
@Hywan

Hywan Mar 22, 2016

Can be simplified as annotation()+.

::at:: identifier() ( parameters() | comments() )?
#comments:
text() ( text() )*

This comment has been minimized.

Copy link
@Hywan

Hywan Mar 22, 2016

Can be simplified as text()+.

text() ( text() )*
#values:
value() ( ::comma:: value() )* (::comma::)?

This comment has been minimized.

Copy link
@Hywan

Hywan Mar 22, 2016

No need for a group if you have a single element in it. Thus:

- (::comma::)?
+ ::comma::?

would still be valid.

::brace_:: ( (value() ( ::comma:: value() )*) (::comma::)? )? ::_brace::
#pairs:
pair() ( ::comma:: pair() )*

This comment has been minimized.

Copy link
@Hywan

Hywan Mar 22, 2016

An empty list of pairs is not possible then. Is it what the expected behavior? If no, pairs should be called with ?.

This comment has been minimized.

Copy link
@Hywan

Hywan Mar 22, 2016

I assume the trailing comma can be present with all pairs, so we can add ::comma::? here and remove it in map.

This comment has been minimized.

Copy link
@Hywan

Hywan Mar 22, 2016

Also, some languages adopt ::comma::* (several trailing commas). I don't like this personnally, but I would like to mention it.

::brace_:: pairs() (::comma::)? ::_brace::
#list:
::brace_:: ( (value() ( ::comma:: value() )*) (::comma::)? )? ::_brace::

This comment has been minimized.

Copy link
@Hywan

Hywan Mar 22, 2016

An empty list is possible, but not an empty list of pairs.

value() ( ::comma:: value() )* (::comma::)?
#map:
::brace_:: pairs() (::comma::)? ::_brace::

This comment has been minimized.

Copy link
@Hywan

Hywan Mar 22, 2016

Can be simplified in:

- (::comma::)?
+ ::comma::?
<boolean> | <null> | string() | map() | list() | number() | pair() | annotation() | constant()
parameters:
( ::parenthesis_:: ( values() )? ::_parenthesis:: ) | ( string() )?

This comment has been minimized.

Copy link
@Hywan

Hywan Mar 22, 2016

Empty list of parameters is still valid, so I reckon the expected behavior for list of pairs is to accept empty ones.

This comment has been minimized.

Copy link
@Hywan

Hywan Mar 22, 2016

Again:

- ( values() )?
+ values()?

and

- ( string() )?
+ string()?

This comment has been minimized.

Copy link
@Hywan

Hywan Mar 22, 2016

Also the parenthesis around ::parenthesis_:: … ::_parenthesis:: are not useful, but it helps the reading so it's fine to keep them.

foreach ($element->getChildren() as $child) {
$result = $child->accept($this, $handle, $eldnah);
$values = $values + $result;
// array_merge won`t preserve numeric keys

This comment has been minimized.

Copy link
@Hywan

Hywan Mar 22, 2016

- won`t
+ won't
use Doctrine\Annotations\Builder;
use Doctrine\Annotations\Exception\ParserException;
use Hoa\Compiler\Llk\Llk;

This comment has been minimized.

Copy link
@Hywan

Hywan Mar 22, 2016

I might be wrong but these two declarations seem to not be used.

use ReflectionClass;
use ReflectionMethod;
use ReflectionProperty;

This comment has been minimized.

Copy link
@donquixote

donquixote Aug 26, 2017

Why import those root namespace classes? I always prefer to read \ReflectionClass. If I see ReflectionClass, it makes me think it is some kind of custom reflection layer.

@@ -34,7 +41,7 @@
*
* @return array An array of Annotations.
*/
function getClassAnnotations(\ReflectionClass $class);
public function getClassAnnotations(ReflectionClass $class) : array;

This comment has been minimized.

Copy link
@donquixote

donquixote Aug 26, 2017

I agree with adding public access modifier. But would it not be better to split this into a separate PR, which gets applied before this one, to every branch?

@@ -82,7 +82,7 @@ public function getClassAnnotations(ReflectionClass $class) : array
/**
* {@inheritDoc}
*/
public function getClassAnnotation(ReflectionClass $class, $annotationName)
public function getClassAnnotation(ReflectionClass $class, string $annotationName)

This comment has been minimized.

Copy link
@donquixote

donquixote Aug 26, 2017

I would suggest to split decorative CS changes and BC-breaking signature changes into separate distinct commits.
Adding a type hint changes the signature, which breaks any implementing classes.

This comment has been minimized.

Copy link
@donquixote

donquixote Aug 26, 2017

And then, the decorative code style changes can go into a dedicated PR to the 1.x branch.

* Constructor.
*
* @param \Doctrine\Annotations\Resolver $resolver
* @param \Doctrine\Annotations\MetadataFactory $metadataFactory

This comment has been minimized.

Copy link
@donquixote

donquixote Aug 27, 2017

The type is incorrect, it should be \Doctrine\Annotations\Metadata\MetadataFactory.

$class = new \ReflectionClass($className);
$constructor = $class->getConstructor();
$docComment = $class->getDocComment();

This comment has been minimized.

Copy link
@donquixote

donquixote Aug 27, 2017

Unused local variable $docComment.

@donquixote

This comment has been minimized.

Copy link

commented Aug 27, 2017

Did anyone run a benchmark to compare the hoa compiler to the one currently in use?
Maybe it is all "fast enough" so we don't have to worry. Just asking.

@@ -107,11 +107,11 @@ public function getMetadataFor(string $className)
*/
private function isAnnotation(ReflectionClass $class, array $annotations) : bool
{
if ($class->isSubclassOf('Doctrine\Annotations\Annotation')) {
if ($class->isSubclassOf(Annotation::CLASS)) {

This comment has been minimized.

Copy link
@donquixote

donquixote Aug 28, 2017

::CLASS should be lowercase ::class.

@donquixote

This comment has been minimized.

Copy link

commented Aug 28, 2017

Architecture: I propose to drop Metadata and MetadataFactory.
Instead, have one factory object per class (or give it a different name, dunno (*)).

class Builder
{
    [..]

    /**
     * @param Context   $context
     * @param Reference $reference
     *
     * @return object
     */
    public function create(Context $context, Reference $reference)
    {
        $target    = $reference->nested ? Target::TARGET_ANNOTATION : $context->getTarget();
        $fullClass = $this->resolver->resolve($context, $reference->name);
        $values    = $reference->values;

        if (null === $factory = $this->factoryProvider->classGetFactory($fullClass)) {
            throw InvalidAnnotationException::notAnnotationException($fullClass, $reference->name, $context->getDescription());
        }

        return $factory->instantiate($context, $values, $target);
    }
}

Now all the metadata stuff can be encapsulated in the $factory object.
There can even be separate factory classes depending how the class should be constructed, and how the annotation values are transformed into arguments.
There can also be a ClassNotFoundFactory, which always throws an exception in the ->instantiate() method.
The interface for those factories really only has this one method.

(*) I was initially going to call the "factory" "instantiator". But this name already exists in Doctrine\Instantiator\Instantiator. So...

}
return $this->imports = array_merge($classImports, $traitImports);
}

This comment has been minimized.

Copy link
@donquixote

donquixote Aug 28, 2017

I think this is confusing. If the method is coming from a trait, then the annotation is living in the trait file, and the imports should be from the trait's file only. If the method is declared in the class itself, then the annotation is living in the class file, and it should use the imports from the class file. I don't see a case where it should combine imports from different files.

This comment has been minimized.

Copy link
@donquixote

donquixote Aug 28, 2017

Here is how to find out if a method is defined in a trait, https://stackoverflow.com/a/45912866/246724.

This comment has been minimized.

Copy link
@donquixote

donquixote Aug 28, 2017

Btw, imo all of this reflection + inheritance adds unnecessary complexity.

This comment has been minimized.

Copy link
@stof

stof Aug 28, 2017

Member

Well, forbidding people to use inheritance and traits when using annotations would mean that nobody would migrate to version 2

This comment has been minimized.

Copy link
@Ocramius

Ocramius Aug 28, 2017

Member

From a functional perspective, it should stay the exact same, maybe with some API changes, but no behavioral ones.

The point here is getting rid of our own hacky parser, using a formalised one (HOA's)

This comment has been minimized.

Copy link
@donquixote

donquixote Aug 28, 2017

Oh, maybe I was unclear.

all of this reflection + inheritance adds unnecessary complexity.

What I mean is we don't need to inherit from \ReflectionClass to find the imports. Instead, have an ImportFinder or something like that.

Of course, people who write annotated classes should be allowed to use inheritance, and traits!

From a functional perspective, it should stay the exact same, maybe with some API changes, but no behavioral ones.

If the code in the PR is replicating existing behavior, then it needs to stay this way.
Or, if we agree that the old behavior is wrong, we could have two implementations of ImportFinder or of the class name resolver: One that operates the BC way, another that operates the "correct" way.

Why would we say that "the old behavior is wrong"?

Consider this example:

File T.php:

<?php
namespace Acme\Foo;
use Acme\Annotation\Hello;
trait T {
  /**
   * @Hello("I am an annotation on a trait method.")
   * @Goodbye("I am annotation on a trait method, but the import is in the class file.")
   */
  function foo() {}
}

File C.php:

<?php
namespace Acme\Bar;
use Acme\Annotation\Goodbye;
class C {
  use T;
}

With the behavior proposed in the PR, which I assume is also the current behavior, the second annotation @Goodbye(..) will use the import Acme\Annotation\Goodbye from the class file.

I am saying this is wrong. It should only use the imports from the trait file. So the @Hello(..) should work, but the @Goodbye(..) should not.

This would be consistent with how the language itself works.
Imports are only available within the same file.

Well personally I think having annotations on a method in a trait is probably a bad idea anyway. but if we support it, it should at least be "correct". Unless, of course, it is for BC reasons.

The point here is getting rid of our own hacky parser, using a formalised one (HOA's)

Which I assume will be more maintainable, more reliable, more understandable (people have to look at the grammar only). So yeah, seems like a good idea.

Personally I care more about the registry going away.

This comment has been minimized.

Copy link
@donquixote

donquixote Aug 28, 2017

About the methods defined in traits: It gets even more interesting if the method is renamed.

Trait T {
  /**
   * @Hello()
   */
  function foo() {}
}

class C {
  use T {
    foo as bar;
  }
}

$m = new \ReflectionMethod('C', 'bar');
$reader->getMethodAnnotations($m);

The current behavior will not understand that the method is defined in a trait under a different name.

This comment has been minimized.

Copy link
@donquixote

donquixote Aug 28, 2017

And about properties in traits - this is more difficult. There is no \ReflectionProperty::getFileName().
https://stackoverflow.com/questions/18257158/how-to-extract-start-line-of-a-property-declaration-in-php

As a heuristic, we could say that:

  • If none of the traits of the class has a property with the same name, then the property belongs to the class, obviously.
  • If one or more of the traits define the property then we compare the doc comment. See https://3v4l.org/KY9nl.

Maybe all of this should be discussed in a separate issue. I only brought it up here because the PR affects the code where this behavior is implemented.

This comment has been minimized.

Copy link
@Ocramius

Ocramius Aug 28, 2017

Member

@donquixote we kinda fixed all of these horrors in roave/better-reflection, although it is not the primary aim here to provide very precise reflection of ugly stuff like traits.

@Hywan

This comment has been minimized.

Copy link

commented Aug 28, 2017

Just a link about hoa/compiler performances, https://blog.hoa-project.net/2016/08-Performance-boost-for-Hoa-Compiler.html. cc @donquixote

@donquixote

This comment has been minimized.

Copy link

commented Aug 28, 2017

Thanks @Hywan !
This does not compare it with a hand-written parser, only with previous hoa parsers.

In general, a hand-written or a generated parser should be faster than a parser combinator, or one that needs to interpret a piece of grammar in every step. Since it adds overhead and indirection to every micro-operation, the difference could be something like factor 2 (or more, or less). I remember this, because I used to experiment with the vektah parser combinator.

Your linked article, "Exporting the parser into PHP code", claims that the parser can be exported to PHP code. If this is equivalent to a generator, then it should be similarly fast as a hand-written parser.

Of course, even if it would take 2x longer, does not mean we have to care, if overall it is still "fast enough".

@stof

This comment has been minimized.

Copy link
Member

commented Aug 28, 2017

I'm still against the namespace change if it does not have a migration path:

  • if packages are not able to easily support both v1 and v2 of the library, it means that a project cannot migrate to v2 until all its dependencies using doctrine/annotations are migrated, and it cannot update other dependencies already migrated until that time. Given the number of packages relying on doctrine/annotations out there to parse annotations, such community split is a bad news
  • just migrating to v2 is not an option for packages needing to keep support for PHP 5.x (meaning that if they cannot support both versions, they will stay on v1 forever).

So this means we need a continuous migration path if you want to keep the namespace change.
This could be done by doing a new 1.x release adding class aliases using the new namespace. See how Twig did it for instance.

@Hywan

This comment has been minimized.

Copy link

commented Aug 29, 2017

This does not compare it with a hand-written parser, only with previous hoa parsers.

Yes it doesn't, but it gives an overview of the last big improvements :-). I should have clarified this, sorry.

In general, a hand-written or a generated parser should be faster than a parser combinator, or one that needs to interpret a piece of grammar in every step. Since it adds overhead and indirection to every micro-operation, the difference could be something like factor 2 (or more, or less). I remember this, because I used to experiment with the vektah parser combinator.

True and false (well, you have started your sentence with “In general” 😉). In PHP, a parser combinator might be slower than a generated parser because of function calls and indirections, but it will not be the bottleneck I guess. The real bottleneck is the data copy. In a parser combinator, you have to copy the data being parsed into each parser. Even if PHP does a COW (Copy-On-Write), each data split (substr) will generate a new copy, and so it's going to be slow. This particular problem can also be present in a generated parser, but the API surface is much smaller. For instance, the lexer for hoa/compiler does not consume the data by doing a substring, it just read it with an offset: https://github.com/hoaproject/Compiler/blob/c86ccfbce9b9cad17cf84ffdf5c505c695d83d7a/Llk/Lexer.php#L276-L282 This is highly optimized.

In a parser combinator however, the lexer and the parser phases are “merged”, so the memory peak should be smaller than in a generated parser. However, regarding the last improvement in hoa/compiler, the lexer now works as a buffered iterator, so the behavior is similar to a parser combinator: https://github.com/hoaproject/Compiler/blob/c86ccfbce9b9cad17cf84ffdf5c505c695d83d7a/Llk/Parser.php#L162-L165

A parser combinator is like a hand-written parser, except it has a predefined formalism, is more testable, is more re-usable etc. Compared to a generated parser, the API is larger. However, a generated parser can be seen as a parser combinator with a small API. hoa/compiler has only one method: _parse, which adapts its behavior whether it meets a token, a concatenation, a choice, or a repetition, which are the rules (the grammar description language intrinsics/constructions). One method also means a better caching by the VM, and the CPU.

This thread is not the place to debate about this, but: A generated parser, a parser combinator, or a hand-written parser can all be fast and efficient, or slow and ineffective. It really depends of how they are implemented. They all have pros and cons. I personally prefer a parser combinator when working with Rust (see nom) because it is testable and brings interesting garantees, while when working with PHP, I prefer a generated parser.

Your linked article, "Exporting the parser into PHP code", claims that the parser can be exported to PHP code. If this is equivalent to a generator, then it should be similarly fast as a hand-written parser.

I would claim that a hand-written parser is most of the time not fast. You have to re-optimise and re-implement everything, like the lexer (a good one is not simple) and the parser with all the optimisation. And the error-management, the AST builder, the memory management, the profiling etc. It's better to have a hackable compiler toolchain I guess.

But indeed, once a Hoa\Compiler\Llk\Parser is compiled into PHP code, it just creates an instance of Hoa\Compiler\Llk\Parser directly without loading the grammar from a textual file. It builds the grammar as a set of rules, which is fast to instanciate, and cachable by the VM. The lexing and parsing in themselves are optimized and use a very small API, which is also cachable correctly by the VM.

The most obvious way to be faster now is to use really good data structures instead of generic array, but we are limited by the language (I want php-ds in the core, pleaaase).

Of course, even if it would take 2x longer, does not mean we have to care, if overall it is still "fast enough".

Correct. I don't want to speak for the Doctrine team, but my understanding of the problem is the following: Drop a hand-written, hard to maintain, hacky, and maybe buggy parser by a formal parser which is easy to maintain and fast enough. hoa/compiler plays this role. Also, hoa/compiler brings interesting algorithms to generate data from a grammar (it is called Grammar-based Testing). More resources about this:

These algorithms can help to test the Doctrine annotations, and DQL.

@donquixote

This comment has been minimized.

Copy link

commented Aug 29, 2017

I would claim that a hand-written parser is most of the time not fast. You have to re-optimise and re-implement everything, like the lexer (a good one is not simple) and the parser with all the optimisation. And the error-management, the AST builder, the memory management, the profiling etc. It's better to have a hackable compiler toolchain I guess.

Maybe I should have said "hardcoded" rather than "hand-written".
And instead of "should be faster", I should have said we are comparing the fastest theoretically possible parsers of each category. If you make all the right choices, what remains is the overhead and indirection. E.g. for this same reason, a parser in C would be faster than one in PHP.
You don't need a lexer or memory management, if all you do is string index lookups, like here:https://github.com/donquixote/annotation-parser/blob/1.0/src/Parser/AnnotationParser.php. Also an AST doesn't have to be complicated.

This is not an argument against the hoa parser, just a conversation.

@Hywan

This comment has been minimized.

Copy link

commented Aug 30, 2017

We agree 😃.

#constant:
<identifier> (<colon> <colon> <identifier>)?

string:

This comment has been minimized.

Copy link
@Hywan

Hywan Sep 14, 2017

If a rule recognizes only one token, then it's faster to just use this token. I assume being fast is important in this context.

Same for the text, number, and identifier rules.

@Majkl578 Majkl578 referenced this pull request Feb 3, 2018

Open

Enhancement: Use doctrine/coding-standard #177

1 of 2 tasks complete
Merge pull request #1 from lennerd/2.0
Annotations 2.0: Read annotations from functions

@Majkl578 Majkl578 changed the base branch from 2.0 to master May 7, 2018

@Majkl578 Majkl578 referenced this pull request Dec 18, 2018

Closed

[meta] Annotations 2.0 (ng) #232

8 of 27 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
You can’t perform that action at this time.