Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AutoExecute does not validate $fields_values #154

Closed
orcinus opened this issue Oct 5, 2015 · 11 comments
Closed

AutoExecute does not validate $fields_values #154

orcinus opened this issue Oct 5, 2015 · 11 comments
Labels
core ADOdb core (library and base classes) enhancement
Milestone

Comments

@orcinus
Copy link

orcinus commented Oct 5, 2015

Attempting to use AutoExecute with an empty array passed for $fields_values will still build a query and attempt to execute it. Yes, fields passed should probably be validated in the external (app) code, but still, not validating it at all and building syntactically invalid queries seems like an omission.

Suggested fix - at the very least:

adodb.inc.php:2070

    if (empty($fields_values)) {
      return false;
    }

or

    if (empty($fields_values)) {
      $this->outp_throw('AutoExecute: Empty fields array', 'AutoExecute');
      return false;
    }
@dregad
Copy link
Member

dregad commented Oct 6, 2015

Checking for an empty array would would by no means be sufficient - we would have to validate each individual field as well, which would have a performance impact.

fields passed should probably be validated in the external (app) code

I think this is how it should be.

@orcinus
Copy link
Author

orcinus commented Oct 6, 2015

Wasn't it argued recently that adding a regexp into the very core (Execute method) is just fine today, as far as performance is concerned? I don't see how this could come even close to the performance impact of that ;)

Note that i wrote that fields passed should be validated in the external code. The array holding it, however, should be validated by the function/method accepting it. If a non-empty array of values is expected, it should validate whether the argument passed is an array and is, in fact, non empty.

At least my 2c.

Not that critical of an issue, really, because the DB will likely error out with an SQL syntax error. The only setups where this would be potentially dangerous are those that have error reporting / exceptions turned off, which is dangerous anyways.

@mnewnham
Copy link
Contributor

mnewnham commented Oct 6, 2015

Well that was me, wasn't it - We all have our own opinions, and my opinion on autoExecute() is this:

AutoExecute is, above all, a helper routine for programmers who need an easy way of manipulating data. It is, after all, a wrapper round a wrapper(getupdatesql) round a wrapper(adodb) round a wrapper(php). Any consideration of performance should take a distant last place to ease of use and robustness, As such, we should try to handle even the most inconceivable data issues and gracefully tell the programmer what the problem is. So lets lay on those regular expression tests until they start popping out of the monitor.

I could be wrong though.

@orcinus
Copy link
Author

orcinus commented Oct 6, 2015

sigh Fine, i give up. I won't bother you again.

@orcinus orcinus closed this as completed Oct 6, 2015
@dregad
Copy link
Member

dregad commented Oct 7, 2015

@orcinus I don't understand why you're reacting like that... I was just attempting to start a discussion, not rejecting your suggestion.

My point on performance with regards to validating $field_values is that to do it "properly" we would have to check each individual entry in the array and check that the field (key) is valid. That implies more than a regular expression (foreach loop on the array, and looking up the list of fields for the affected table).

A performance degradation is acceptable, but the question is whether it is worth the cost considering that the RDBMS will choke on an invalid statement and return an error anyway.

I leave it up to you to decide if you want to resume the discussion or leave it at that. Your call.

@Mike-Benoit
Copy link
Contributor

Checking every field seems like overkill, as like you said the DB will do that for us anyways, but at least checking for an empty array makes perfect sense to me as that could be difficult to debug and the DB might not throw a useful error message in that case.

Performance is always important, and this is a critical function for any application that uses it, bogging it down anymore will just drive people away with a "ADODB is slow" taste in their mouth.

Personally, if a empty array is passed in, there is nothing to do so in my opinion this should be an optimization short circuit and return TRUE without performing any quer or further checks. Similar to how it would be treated if all values were the same as what the DB currently has and no changes are necessary.

@orcinus
Copy link
Author

orcinus commented Oct 7, 2015

Apologies if i interpreted it wrong, but mnewnham's reply seemed like pure sarcasm.
I had a fairly stressful work day and don't really need that, hence my reaction.

Validating every field truly is an overkill, but it's also not really feasible, as you don't really have anything to check or validate against - you don't and can't know what they're supposed to contain, unless you were to grab the schema from the DB, keep a library of data type validators for every DB flavour out there, and checked every field against them. Which would be insanity.

I think a basic empty and is_array check, however, is fairly reasonable, perhaps coupled with checking whether the array passed truly is associative (array_keys($fields_values) !== array_keys(array_keys($fields_values)), but that's an extra.

Regarding performance worries - bear in mind that AutoExecute does an extra SELECT on top of the intended query anyways. I'd say it's fair to assume that the latency introduced by that renders any light-weight typing or string-function checks (not regexps) completely unnoticable, as far as performance is concerned.

@orcinus orcinus reopened this Oct 7, 2015
@dregad
Copy link
Member

dregad commented Oct 7, 2015

mnewnham's reply seemed like pure sarcasm.

I didn't read it that way. Funny enough, I thought his note was targeted at me ;-) maybe he can clarify.

Anyway, I think we agree that an in-depth check is both complex and unnecessary. I'm fine with adding basic sanity check as suggested (is_array, empty).

@mnewnham
Copy link
Contributor

mnewnham commented Oct 7, 2015

Perhaps there is a language difficulty. There was no intended sarcasm in this post at all (and as a Brit, rereading, my comment I still cannot detect any). However, 100% apologies. My position
is that I agree totally with @orcinus. I will lay out my position again in a more professional way.

For me (both as a major contributor to the project and as a user of the product in a large-scale commercial environment), there is clearly a sharp distinction between the lowest level of the product
($db->execute) and the highest($db->autoexecute). If I wanted to write a batch routine that updated, say, 250,000 records, I would find it inconceivable to use autoExecute, for the plain reason that the multiple layers of wrappers are a serious performance impediment. Indeed, my own production version of ADOdb has numerous caching tweaks to maximize the performance and eliminate a lot of the current re-reading done in ADOdb. However, If I'm given 30 minutes to produce a trivial record insertion routine that will likely be discarded in a week, I would of course drop autoExecute into my code, and indeed, anything that provides a more robust interface and better error handling would simplify my own code and make it easier to write and debug.

We could lay out a position in the documentation, e.g.

autoExecute is designed to provide a robust simple interface to record updating and insertion. As such, it contains multiple levels of table, column and data validation to ensure data integrity prior to record insertion or update. Consequently, there may be some performance degradation when used for large numbers of record updates.

Or we could make it a little more complicated, and add optional validations, although I'd probably look to benchmark it to see if its worth it.

@dregad
Copy link
Member

dregad commented Oct 8, 2015

Updating the documentation as suggested is an excellent idea.

@orcinus
Copy link
Author

orcinus commented Oct 8, 2015

@mnewnham I apologise, i've completely misinterpreted the tone of your comment. Don't think it's so much about language difficulty (although English is not my native tongue), as me being a bit overly sensitive that day due to a combination of a lack of sleep and a pile of customer bug reports with 0 useful information in them. Once again, apologies for my reaction.

Regarding the issue at hand - i also tend not to use AutoExecute at all, mostly for performance reasons, but also because i can't leverage some of the (infamous) MySQL specific "shortcuts" using it (REPLACE, ON DUPLICATE KEY, INSERT IGNORE). When i'm updating / inserting objects to DB, i tend to write routines and queries manually, anyway. The code i'm maintaining, however, does use it in some legitimate scenarios (e.g. saving changed object properties to DB) that i leave alone, if they're handling a single object at a time and performance is of no concern. Also, if portability were the focus, i would've used it as well, rather than munge up the codebase with MySQL specific queries, performance be damned.

My point being - i'm willing to bet that most users are aware that AutoExecute is all about convenience, compatibility and flexibility, rather than performance. Robustness would be a good addition to that, and stating all of this in documentation truly is an excellent idea.

@dregad dregad closed this as completed in 180b52a Feb 8, 2016
@dregad dregad added enhancement core ADOdb core (library and base classes) labels Feb 8, 2016
@dregad dregad added this to the v5.21 milestone Feb 8, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core ADOdb core (library and base classes) enhancement
Projects
None yet
Development

No branches or pull requests

4 participants