DDC-950: Selecting entities in random order #5479

Closed
doctrinebot opened this Issue Dec 26, 2010 · 16 comments

6 participants

@doctrinebot

Jira issue originally created by user enumag:

I need some way to select a random row from a table. There was something like "ORDER BY RAND()" in SQL but I can't really find anything like that for Doctrine.

@doctrinebot

Comment created by @beberlei:

Doctrine allows you to execute SQL and hydrate objects from that using the Native Query API as described in the documentation.

@doctrinebot

Issue was closed with resolution "Won't Fix"

@doctrinebot

Comment created by darkangel:

I need this as well ... why can't we do something like:

$qb->orderBy('RAND()');

Or maybe something at a lower level, like:

$qb->add('orderBy', new Expr\Func('RAND', array()));

?

@doctrinebot

Comment created by petersvp:

I need this as well! ORDER BY RAND() or something like this is a MUST HAVE for Doctrine! I do NOT want to fetch ALL then Hydrate ALL to select some rows, nor I want to do 2 queries in a row! Also, last one method actually can return 2 SAME records....

Hope I see this soon. I'm using Symfony 2

@doctrinebot

Comment created by daslicht:

+1 Please add this feature

@beberlei beberlei was assigned by doctrinebot Dec 6, 2015
@doctrinebot doctrinebot closed this Dec 6, 2015
@thagenbeek

I see this is closed, but this is labelled as a new feature... this meaning this will not happen, or this will be built as a new feature... I really want this, since i agree that any real way of doing this is either by multiple actions, always retrieving more records then wanted, or having to go back to writing queries.. no biggy just not what i want.

It seems pretty default to (ie) only need one random record out of millions, especially if i want to do any testing agains integrity...

@DHager

Is there a compelling reason to make this "integrated" as opposed to a custom DQL method or a native query?

Anything that boils down to ORDER BY RAND() LIMIT 1 sounds like it could cause some unanticipated performance problems in some engines, if they go through the effort of generating a random number for every row and then sorting them all before trimming off the rest.

@Badlapje

+1 to adding this.

Reason for making it integrated imo is that it's a very common use case to get either 1 random result or multiple random results. The purpose of Doctrine is to ease DX ... adding this seems like a no-brainer to me. It's counter-productive to have every dev in need of this feature develop his/her own DQL method or use a native query.

@Ocramius Ocramius added the Won't Fix label Jul 5, 2016
@Ocramius
Doctrine member

ORDER BY RAND() is basically database suicide: don't do it, since it has a complexity of O(rand()) or greater, which can be quite problematic.

There are ways of selecting random sets of records via cursors, but the solution to make it work in one query is very DB-specific, and won't be implemented by Doctrine.

Running n queries (n being the number of records you want) using OFFSET and LIMIT 1 is the safest portable approach, in theory, but it has indeed some performance implications.

@thagenbeek

I have read now plenty that Doctrine won't implement this. I understand but disagree with the argument.
The reason i would give for implementing it, in maybe a different way that is effective and "not suicidal", since come on you guys are smart enough, anyway the reason would be it is basic DB functionality, which you therefor alone should offer. It's available in, as far as what i worked with, all the engines and again you can't come up with an integrated way to do it right?

Every day this kind of thing is argumented away, in what i hear "it's too difficult", makes me step away from doctrine, and also the frameworks that promote using it.

Now you may not care, and maybe you shouldn't but i wanted to have said it none the less.

@Ocramius
Doctrine member

come on you guys are smart enough

"you guys" in an open source project includes also "you" (@thagenbeek).

it is basic DB functionality

It's actually not. There is no "pick a random set of objects" in SQL-92, as far as I know. Here are the possible implementations:

  • SELECT * FROM foo ORDER BY RAND(). This may or may not yield a result within the year, and it may or may not bring down your entire RDBMS or the server where it is hosted. This is unacceptable
  • DB-specific SQL via cursors and stored procedures - no, we won't go there anyway. This is unmaintainable
  • SELECT *, RAND() as sorting_thing FROM foo ORDER BY sorting_thing - this is a full table scan on all your data. It will work via O(n), but n in RDBMSs context is "millions". This is unacceptable
  • SELECT COUNT(1) FROM foo + SELECT * FROM foo OFFSET $rand LIMIT 1 (query to be repeated once per fetched record), $rand being a value from 0 to the result of the first query. Exceptional cases to be considered. Complexity is O(n+1), n being the number of requested records, and it is simply not good for a generic implementation. This is unacceptable

Basically, either you go down the rabbit hole and implement this functionality in DBAL in a decent way (which basically means better than O(n)) or you get a userland/abstracted version that bombs your DB with repeated queries (O(n+1)). The latter is flawed by design, and it will only lead users to complain about its performance implications, once they find it.

It's available in, as far as what i worked with, all the engines and again you can't come up with an integrated way to do it right?

Since you did the research, then release a tiny library that generates the SQL for all the platforms.

Every day this kind of thing is argumented away, in what i hear "it's too difficult", makes me step away from doctrine, and also the frameworks that promote using it.

You are welcome to step into contributing to it though. I just explained above that we have our reasons to not have this in core, and it won't land in core unless somebody provides a maintainable implementation that works cross-platform in a CSPRNG-sorting fashion (because sorting is also used for stuff like lotteries and such: no kidding with it).

Now you may not care, and maybe you shouldn't but i wanted to have said it none the less.

You are free to use the tools of your choice, but if you expect random people on the internet to build and maintain a piece of software just because you said "omg, I'm gonna switch to another tool", then you probably misunderstood our intentions. Doctrine ORM is a very stable, very business-critical and very closed to modification piece of software. We focus on the 80/20 use-cases, and this particular issue can be implemented via custom SQL for the <20% use-cases that need it.

@beberlei
Doctrine member

Implementing this efficiently on your side is easy:

SELECT Max(e.id) FROM entity e;
$randomId = rand(1, $maxId);
SELECT e FROM entity e WHERE e.id > $randomId LIMIT 1;

No need for an extra function.

@Badlapje
Badlapje commented Jul 6, 2016 edited

Here's what i ended up doing, 2 queries:

  • get a headcount of how many entities of that kind there are
  • generate an array of random numbers (taking care to generate a few more then needed in case of doubles), then filter out all unique ones and query the dbase to give the desired amount

I'm sure someone smarter can implement this in a better way, but anywho: here's my php code for it.

  /**
   * Faster alternative to get an array containing only unique values.  Precondition: the array should only contain strings or integers.  It works by flipping the array, removing the doubles, then taking the keys and returning those as the new array.
   * @param  array $array an array to be unique-icized
   * @return array        an array containing only unique numbers/strings
   */
  function uniqueArrayValues($array){
    return array_keys(array_flip($array));
  }

/**
 * Generate an array containing random integer values
 * @param  integer $min         the lowest possible value in the array
 * @param  integer $max         the highest possible value in the array
 * @param  integer $amount      the number of values in the array
 * @param  integer $contingency an offset to ensure there's enough unique values in the array.  Standard value is 10, this should be higher if the spread is very big.
 * @return array                array of random unique integers
 */
  function generateRandomUniqueIntegerArray($min, $max, $amount, $contingency = 10){
    $array = array();
    for ($i = 0; $i < ($amount + $contingency); $i++) {
      $array[] = mt_rand($min, $max);
    }

    return array_slice($this->uniqueArrayValues($array), 0, $amount);
  }

  /**
   * Get the headcount for a specific entity type
   * @param  EntityManager $em    entity manager for Doctrine
   * @param  string        $class a valid class name managed by Doctrine
   * @return integer              the amount of entities in the table
   */
  function getCount(EntityManager $em, $class){
    return $em->createQuery('select Count(c) from ' . $class . ' c')
       ->getSingleScalarResult();
  }

I agree that dbase-implementations are a no-go, as are x amount of queries, one for each random you want (you'd still have to filter those after the query to get rid of doubles, or include previous results in the query, performance wise that seems unwise ... haven't done any tests though).

A small code-piece to generate the random numbers for id's should work though

@thagenbeek

@Ocramius

You stated a little conflicting information, first you state it as if i could promote functionality that would make this work, but you also clearly stated this will not be implemented, so NO i am NOT part of this open source project, because it gets kicked out on its thought.

Then also i mentioned i knew it was implemented in the platforms i have worked with, these are, but not limited to, MySQL (Percona), PostgreSQL, Oracle, MSSQL.

And i've been in the biz for a while, and have talked to many developers in my time, and i might be off, but those are the major ones...

"Since you did the research, then release a tiny library that generates the SQL for all the platforms."
I don't need to research your smart :)

"You are welcome to step into contributing to it though"
Not really true, you are very much discouraging contributing by statements like "This will not be implemented", why would i bother?

Don't get me wrong, i appreciate you even taking the time to discuss it, i know you don't have to, but
"We focus on the 80/20 use-cases, and this particular issue can be implemented via custom SQL for the <20% use-cases that need it."
Where do you get your numbers from? This is much more in the 80 range, at least not in the 20, there are many things you guys do that are in the 1% use case area so that is also a crap argument.

I am going to really leave it at this, you are obviously welcome to respond, and i will let you have the last word.
Thanks again for the hard work and have a great day!

@Ocramius
Doctrine member
Ocramius commented Jul 6, 2016 edited

@thagenbeek:

You stated a little conflicting information, first you state it as if i could promote functionality that would make this work, but you also clearly stated this will not be implemented

It will not be implemented because we can see it from very far far away: this is a huge rabbit hole, bigger than our pagination API, on which we already spent countless hours.

so NO i am NOT part of this open source project, because it gets kicked out on its thought.

As far as I can see, you are not working on any OSS project on github, at least not with this account of yours. This is not to be considered a bad thing, but going around and expecting others to step up and build things that you need is possibly bad, at least in my opinion.

And i've been in the biz for a while, and have talked to many developers in my time, and i might be off, but those are the major ones...

These are the ones you need to consider: https://github.com/doctrine/dbal/tree/97f423a42c100c0ea9a8ff78f17a9ed020b8b6bf/lib/Doctrine/DBAL/Platforms
Each platform would need a specific implementation, while a solution that is domain-specific, like the one suggested by @beberlei, usually takes 30 minutes to implement and test.

Not really true, you are very much discouraging contributing by statements like "This will not be implemented", why would i bother?

I listed the current known methods to make it work reliably: this particular feature needs research on how to make the unmaintainable one maintainable. My snarky language is caused mainly by your really unfriendly way of requesting a feature. I even suggested you to simply build a tiny library for it yourself, since there are issues in keeping such functionality maintained inside doctrine.

Where do you get your numbers from? This is much more in the 80 range, at least not in the 20,

Mostly discussions at conferences, this issue tracker, projects I work on (at least a dozen a year), IRC, etc. If you need a market research, then I'm unlikely the best person for more reliable numbers.

there are many things you guys do that are in the 1% use case area so that is also a crap argument.

Indeed, many things we do are in the 1% use case and will likely be dropped in future version, depending on interest and impact on consumers (reads: "how much time does it take to port this to userland?")

I am going to really leave it at this, you are obviously welcome to respond, and i will let you have the last word.

That's not what I aimed for, but you obviously need to grow some understanding of how software doesn't magically grow off wishes thrown into a well.

@Ocramius
Doctrine member

@Badlapje that solution would work with following assumptions:

  1. the table is using an auto-incremental identifier
  2. indexing starts at 1
  3. no gaps between primary key values (adjacent numbers)
  4. table allows no deletes

That is perfectly fine for your own app, where you can make those assumptions, but not for a general-purpose library

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment