This library allows to collect objects and values through associations and provides some entity fetching optimizations for Doctrine ORM to address N+1 queries problem.
It can play especially nicely with Deferred
implementation
from webonyx/graphql-php
allowing to significantly reduce number
of database queries.
This bundle is under the MIT license. See the complete license in LICENSE
file.
Include this bundle in your project using Composer as follows (assuming it is installed globally):
$ composer require xsolve-pl/associate
For more information on Composer see its Introduction.
To get the basic collector you may use the facade provided with the library:
<?php
$facade = new \Xsolve\Associate\Facade();
$basicCollector = $facade->getBasicCollector();
If you want to use collector dedicated for Doctrine ORM, provide appropriate entity manager when instantiating the facade and retrieve dedicated collector:
<?php
$facade = new \Xsolve\Associate\Facade($entityManager);
$doctrineOrmCollector = $facade->getDoctrineOrmCollector();
You can also compose your own collectors using building blocks provided by this library. It's also possible to replace the facade provided and replace it with some configuration for DI container your framework uses.
That's all - now you're ready to go!
First functionality provided by this library it to allow to retrieve all objects that can be reached via specified associations starting from some base objects.
Let's assume we have following classes defined:
<?php
class Car
{
/**
* @var Engine|null
*/
protected $engine;
/**
* @param Engine $engine
*/
public function __construct(Engine $engine = null)
{
$this->engine = $engine;
}
/**
* @return Engine|null
*/
public function getEngine()
{
return $this->engine;
}
}
class Engine
{
/**
* @var Part[]
*/
public $parts;
/**
* @param Part[] $parts
*/
public function __construct(array $parts)
{
$this->parts = $parts;
}
}
class Part
{
/**
* @var string
*/
protected $name;
/**
* @param string $name
*/
public function __construct(string $name)
{
$this->name = $name;
}
/**
* @return string
*/
public function getName(): string
{
return $this->name;
}
/**
* @return string[]
*/
public function getNameAsWords(): array
{
return explode(' ', $this->name);
}
/**
* @return array
*/
public function getNameStats(): array
{
return ['wordCount' => count($this->getNameAsWords())];
}
}
Now let's assume we have some instances of Car
class in $cars
array
as well as some associated objects:
<?php
$cars = [
$sportCar = new Car(
$fastEngine = new Engine([
$valve = new Part('valve'),
$cylinder= new Part('cylinder'),
])
),
$sedan = new Car(
$turboEngine = new Engine([
$valve,
$sparkPlug = new Part('nano spark plug'),
$smartCylinder = new Part('smart cylinder'),
])
),
$suv = new Car(),
];
Now we'd like to collect all Engine
instances that $cars
are associated with.
It's as easy as this:
<?php
$engines = $basicCollector->collect($cars, ['engine']);
// $engines ~= [$fastEngine, $turboEngine]; - order is not guaranteed.
Important! Note that the order of $engines
is not guaranteed.
It's so because \SplObjectStorage
is used internally to assert the uniqueness
of collected objects.
We can go further with that and collect objects that are two associations
away from $cars
by doing:
<?php
$parts = $basicCollector->collect($cars, ['engine', 'parts']);
// $parts ~= [$valve, $cylinder, $sparkPlug, $smartCylinder]; - order is not guaranteed.
Note that only $valve
will be included only once as it will be detected
that the same object was associated view $fastEngine
and $turboEngine
.
It is also possible to collect scalar values but in this case uniqueness will not be imposed on them:
<?php
$names = $basicCollector->collect($cars, ['engine', 'parts', 'name']);
// $names ~= ['valve', 'cylinder', 'spark plug', 'smart cylinder']; - order is not guaranteed.
If given association yields an array with sequential numeric indices
starting with 0
it is automatically assumed that it is a collection
of objects or scalars (i.e. that association links given object to many
objects or scalars). Therefore it's possible to write:
<?php
$words = $basicCollector->collect($cars, ['engine', 'parts', 'nameAsWords']);
// $words ~= [
// 'valve', 'cylinder', 'nano', 'spark', 'plug', 'smart', 'cylinder'
// ]; - order is not guaranteed.
This time and cylinder
is present twice as it is scalar value
and uniqueness was not imposed.
However if an array is associative we can go even deeper when collecting values:
<?php
$wordCounts = $basicCollector->collect($cars, ['engine', 'parts', 'nameStats', '[wordCount]']);
// $wordCounts ~= [1, 1, 3, 2]; - order is not guaranteed.
Internally symfony/property-access
is used to follow associations so they may be accessible in different ways -
for instance as a public property or via a getter method.
Please consult
its documentation
for possible options.
Let's assume that we're building an e-commerce website using doctrine/orm for persistence. One of the things we can run into is N+1 queries problem which occurs when we fetch some entities from database and then attempt to traverse their associations via getters.
For example we can have some products. Each of them has some variants which in turn
have a property storing available inventory quantity. Now we would like to find out
which products are available for sale and we already have Product
instances loaded
from database (e.g. after taking into account some filters that user applied).
We could use code like this:
<?php
$availableProducts = array_filter(
$products,
function(Product $product) {
foreach ($product->getVariants() as $variant) {
if ($variant->getInventoryQuantity() > 0) {
return true;
}
}
return false;
}
);
While this will work perfectly fine it will incur one SELECT
query each time we call
getVariants
method on given Variant
instance for the first time. Hence if we want
to check availability for 100 products we would end up with 101 database queries executed.
You can find out more about this problem at 5 Doctrine ORM Performance Traps You Should Avoid written by Benjamin Eberlei - see section titled in section Lazy-Loading and N+1 Queries. Four ways to address this problem are pointed out there.
Eager loading (solution 3) can be the simplest way to go in some cases but in many cases we will find it too rigid. It is possible that we don't want specific association to be loaded always but just in some cases.
Other solutions are more flexible, like using dedicated DQL query (solution 1) or triggering eager loading of entities after collecting their identifiers (solution 2).
These solutions would however result in clunky code and they have to be adjusted
depending on whether given association is of -to-one or -to-many type
and whether entities that are already initialized are on the inverse or the owning
side of the association. Also some minor optimizations can be applied
if some \Doctrine\Common\Persistence\Proxy
instances
or \Doctrine\ORM\PersistentCollection
instances are already initialized
and hence can be skipped.
This library tries to do exactly what is proposed in solutions 1 and 2 but in a clean and encapsulated manner. Thanks to it loading associated entities is simple and can be applied easily. In the example above it would be only required to precede previously given code with:
<?php
$facade = new \Xsolve\Associate\Facade($entityManager);
$doctrineOrmCollector = $facade->getDoctrineOrmCollector();
$doctrineOrmCollector->collect($products, ['variants']);
After executing this snippet all variants for given products will be loaded
with a single SELECT
query and calling getVariants
will not result
in any additional queries.
If the number of products or associated entities is high they'll be split
in chunks and associations for each chunk will be loaded separately. Chunk size is
set by default to 1000
but you are free to alter it
or set it to null
to disable chunking.
Also property values can be collected this way. If each variant has a property containing its price and we would like to collect prices of all variants of all given products we could execute following code:
<?php
$facade = new \Xsolve\Associate\Facade($entityManager);
$doctrineOrmCollector = $facade->getDoctrineOrmCollector();
$prices = $doctrineOrmCollector->collect($products, ['variants', 'price']);
It's as simple as that!
Important! You won't be able to reduce the number of queries for one-to-one
associations starting from inverse side - Doctrine ORM loads them by default issuing
a separate SELECT
for each entity. You may consider changing such association to
one-to-many (and use collector afterwards) or using embeddable if possible (in which case
embedded entities will be loaded with the same query that loads entities that contain them).
If you're working on a project using Doctrine ORM and providing GraphQL API
then this library can play nicely with Deferred
class provided by
webonyx/graphql-php.
You can read more about the general idea behind this approach at
Solving N+1 Problem
section of its documentation.
Let's assume we need to implement resolve
function that will return Variant
instances for
Product
instance. Basic implementation could look as follows:
<?php
$resolve = function(Product $product) {
return $product->getVariants();
};
But using this approach we would again end up with N+1 queries executed against our database.
To alleviate this problem and to load these objects efficiently we can use instance of
BufferedCollector
like this:
<?php
$facade = new \Xsolve\Associate\Facade($entityManager);
$bufferedCollector = $facade->getBufferedDoctrineOrmCollector();
$resolve = function(Product $product) use ($bufferedCollector) {
$bufferedCollectClosure = $bufferedCollector->createCollectClosure([$product], ['variants']);
return new \GraphQL\Deferred(function() use ($bufferedCollectClosure) {
return $bufferedCollectClosure();
});
};
Et voilà! What BufferedCollector
will do it will accumulate all collect jobs
while query result is build width first. When GraphQL library attempts to resolve
Deferred
that was returned in our resolve
function the collector will group all similar
jobs stored before (comparing base object class and association path) and will load
all of them in a single batch, issuing only 1 SELECT
query
(or 1 query for chunk if the number of base entities is high as mentioned above).
Hence we will end up with 2 queries instead of 101.