New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance of fetchTableCols() #61

Closed
mindplay-dk opened this Issue Oct 27, 2013 · 11 comments

Comments

Projects
None yet
4 participants
@mindplay-dk

mindplay-dk commented Oct 27, 2013

Actively fetching the table column meta-data on demand, every time the method is called, seems like a pretty big performance pitfall.

How about adding a getTableCols() method, which would cache the table columns (in memory) for the connection, once fetched for the first time?

Or is there some reason this responsibility is deliberately left to the consumer?

@pmjones

This comment has been minimized.

Show comment
Hide comment
@pmjones

pmjones Oct 27, 2013

Member

What you're describing is a caching mechanism, which is fine, but not a core function of schema discovery. Once you add a little caching, suddenly you're going to need cache clearing, and then you're going to need something that clears it automatically when a table column is modified, etc etc. On the other hand, one could implement the barest possible caching, but then people will ask why the more complicated behaviors are deliberately left to the consumer.

I hope that helps to explain where we're coming from.

Member

pmjones commented Oct 27, 2013

What you're describing is a caching mechanism, which is fine, but not a core function of schema discovery. Once you add a little caching, suddenly you're going to need cache clearing, and then you're going to need something that clears it automatically when a table column is modified, etc etc. On the other hand, one could implement the barest possible caching, but then people will ask why the more complicated behaviors are deliberately left to the consumer.

I hope that helps to explain where we're coming from.

@harikt

This comment has been minimized.

Show comment
Hide comment
@harikt

harikt Oct 27, 2013

Member

@pmjones if I understand @mindplay-dk I feel he is right.

See https://github.com/auraphp/Aura.Sql/blob/develop/src/Aura/Sql/Connection/Mysql.php#L97
If we tried a ->fetchTableCols('user') , it will be nice to save it in memory.

I feel the script will defnitely end once the call is done, or may be we will add a flag also

fetchTableCols($spec, $cached = false)
{
    if ($cached) {
        // do we have the cols
        return $this->cols;
    }
}

So any call after the fetchTableCols will never look again describing the table and getting the column fields.

Member

harikt commented Oct 27, 2013

@pmjones if I understand @mindplay-dk I feel he is right.

See https://github.com/auraphp/Aura.Sql/blob/develop/src/Aura/Sql/Connection/Mysql.php#L97
If we tried a ->fetchTableCols('user') , it will be nice to save it in memory.

I feel the script will defnitely end once the call is done, or may be we will add a flag also

fetchTableCols($spec, $cached = false)
{
    if ($cached) {
        // do we have the cols
        return $this->cols;
    }
}

So any call after the fetchTableCols will never look again describing the table and getting the column fields.

@pmjones

This comment has been minimized.

Show comment
Hide comment
@pmjones

pmjones Oct 27, 2013

Member

We're not adding a caching system to an SQL library.

Member

pmjones commented Oct 27, 2013

We're not adding a caching system to an SQL library.

@pmjones pmjones closed this Oct 27, 2013

@mindplay-dk

This comment has been minimized.

Show comment
Hide comment
@mindplay-dk

mindplay-dk Oct 27, 2013

Given that this is PHP, that's kind of silly - I'm talking about in-memory caching only, to avoid having the same connection instance fetching the same results repeatedly, during the same script run.

I was not suggesting you change the behavior of fetchTableCols(), but rather that you add a second method that does cache the result - you don't need control of that cache, it doesn't need to clear automatically, and it doesn't even need to be manually clearable, since, in the marginal case where you're writing a script that changes the schema, you can still call fetchTableCols() and get an uncached result.

Most of the time, that's not the likely use-case though.

You can argue that everyone should do their own caching, but if they do, it won't help much, because 3 different libraries that each do their own caching will still result in 3 redundant operations when those libraries are used together.

Caching the result really needs to be available at the core API level of this library to be of any real use.

mindplay-dk commented Oct 27, 2013

Given that this is PHP, that's kind of silly - I'm talking about in-memory caching only, to avoid having the same connection instance fetching the same results repeatedly, during the same script run.

I was not suggesting you change the behavior of fetchTableCols(), but rather that you add a second method that does cache the result - you don't need control of that cache, it doesn't need to clear automatically, and it doesn't even need to be manually clearable, since, in the marginal case where you're writing a script that changes the schema, you can still call fetchTableCols() and get an uncached result.

Most of the time, that's not the likely use-case though.

You can argue that everyone should do their own caching, but if they do, it won't help much, because 3 different libraries that each do their own caching will still result in 3 redundant operations when those libraries are used together.

Caching the result really needs to be available at the core API level of this library to be of any real use.

@pmjones

This comment has been minimized.

Show comment
Hide comment
@pmjones

pmjones Oct 27, 2013

Member

"Caching the result really needs to be available at the core API level of this library to be of any real use." Not really; anyone who wants to wrap the caching system of their choice can do so. Also, you are imagining only one use case, when others (e.g. batch processing to pre-create model classes) do not need a cache. And I can guarantee you, from experience, that once you add a little bit of something like a cache, you start seeing people asking for more and more cache features. So at this point I cannot agree that caching built into the SQL library is in line with Aura's goals of separation, decoupling, and independence.

Member

pmjones commented Oct 27, 2013

"Caching the result really needs to be available at the core API level of this library to be of any real use." Not really; anyone who wants to wrap the caching system of their choice can do so. Also, you are imagining only one use case, when others (e.g. batch processing to pre-create model classes) do not need a cache. And I can guarantee you, from experience, that once you add a little bit of something like a cache, you start seeing people asking for more and more cache features. So at this point I cannot agree that caching built into the SQL library is in line with Aura's goals of separation, decoupling, and independence.

@koriym

This comment has been minimized.

Show comment
Hide comment
@koriym

koriym Oct 27, 2013

Member

I cannot agree that caching built into the SQL library is in line with Aura's goals of separation, decoupling, and independence.

+1 That is Aura.

Member

koriym commented Oct 27, 2013

I cannot agree that caching built into the SQL library is in line with Aura's goals of separation, decoupling, and independence.

+1 That is Aura.

@mindplay-dk

This comment has been minimized.

Show comment
Hide comment
@mindplay-dk

mindplay-dk Oct 27, 2013

I think you've completely misunderstood me.

All I'm asking for is this:

abstract class AbstractConnection
{
    ....

    protected $table_cols = array();

    ....

    /**
     * 
     * Returns a cached array of columns in a table.
     * 
     * @param string $spec Return the columns in this table. This may be just
     * a `table` name, or a `schema.table` name.
     * 
     * @return array An associative array where the key is the column name
     * and the value is a Column object.
     * 
     */
    public function getTableCols($spec)
    {
        if (false === isset($this->table_cols[$spec])) {
            $this->table_cols[$spec] = $this->fetchTableCols($spec);
        }

        return $this->table_cols[$spec];
    }
}

It doesn't replace or affect anything else - you can still call fetchTableCols() and avoid the cache if your component expects the schema might change during the same request. To most components, it isn't going to matter, and loading and constructing all of the Column objects repeatedly is simply a waste of time.

It doesn't "grow out of control" - you don't need to clear the cache, you don't benefit from switching "cache providers" or anything else advanced or complicated; it's simply a choice between "performance optimized" or "accurate to the millisecond".

I don't see how this violates any concerns about separation - it's dealing with table columns only. It's independent. It isn't coupled to anything (like a caching provider) and doesn't need to be, now or ever.

Lots of components cache the results of expensive operations - that doesn't violate any principles I'm aware of?

And yes, this is simple, and every developer can implement this easily in their own components - but if three different developers write three different components that consume this information, and each of them implement their own caching, well, so much for those optimizations.

Your component is responsible for providing schema reflection - now, to make sure this happens with acceptable performance, my own components have to take over that responsibility. Doesn't that violate all sorts of principles?

mindplay-dk commented Oct 27, 2013

I think you've completely misunderstood me.

All I'm asking for is this:

abstract class AbstractConnection
{
    ....

    protected $table_cols = array();

    ....

    /**
     * 
     * Returns a cached array of columns in a table.
     * 
     * @param string $spec Return the columns in this table. This may be just
     * a `table` name, or a `schema.table` name.
     * 
     * @return array An associative array where the key is the column name
     * and the value is a Column object.
     * 
     */
    public function getTableCols($spec)
    {
        if (false === isset($this->table_cols[$spec])) {
            $this->table_cols[$spec] = $this->fetchTableCols($spec);
        }

        return $this->table_cols[$spec];
    }
}

It doesn't replace or affect anything else - you can still call fetchTableCols() and avoid the cache if your component expects the schema might change during the same request. To most components, it isn't going to matter, and loading and constructing all of the Column objects repeatedly is simply a waste of time.

It doesn't "grow out of control" - you don't need to clear the cache, you don't benefit from switching "cache providers" or anything else advanced or complicated; it's simply a choice between "performance optimized" or "accurate to the millisecond".

I don't see how this violates any concerns about separation - it's dealing with table columns only. It's independent. It isn't coupled to anything (like a caching provider) and doesn't need to be, now or ever.

Lots of components cache the results of expensive operations - that doesn't violate any principles I'm aware of?

And yes, this is simple, and every developer can implement this easily in their own components - but if three different developers write three different components that consume this information, and each of them implement their own caching, well, so much for those optimizations.

Your component is responsible for providing schema reflection - now, to make sure this happens with acceptable performance, my own components have to take over that responsibility. Doesn't that violate all sorts of principles?

@mindplay-dk

This comment has been minimized.

Show comment
Hide comment
@mindplay-dk

mindplay-dk Oct 27, 2013

also, for the record:

you are imagining only one use case, when others (e.g. batch processing to pre-create model classes) do not need a cache

You're imagining only one use-case, where nobody needs a cache.

I'm imagining two use-cases, one where caching is beneficial, and one where it's not.

mindplay-dk commented Oct 27, 2013

also, for the record:

you are imagining only one use case, when others (e.g. batch processing to pre-create model classes) do not need a cache

You're imagining only one use-case, where nobody needs a cache.

I'm imagining two use-cases, one where caching is beneficial, and one where it's not.

@pmjones

This comment has been minimized.

Show comment
Hide comment
@pmjones

pmjones Oct 27, 2013

Member

The issue with the code you present is: what happens when someone issues a query to modify the table after it's schema has been read once? Even a little bit of caching is going to be an issue.

"You're imagining only one use-case, where nobody needs a cache." Nice. ;-) But in truth, I'm imagining every possible use case, and deferring it to the user who wants to implement it, using the system of their choice.

Again, I get where you're coming from, but if you need it that badly, it is not hard at all to either extend or wrap the classes in question and add the caching system of your preference.

Member

pmjones commented Oct 27, 2013

The issue with the code you present is: what happens when someone issues a query to modify the table after it's schema has been read once? Even a little bit of caching is going to be an issue.

"You're imagining only one use-case, where nobody needs a cache." Nice. ;-) But in truth, I'm imagining every possible use case, and deferring it to the user who wants to implement it, using the system of their choice.

Again, I get where you're coming from, but if you need it that badly, it is not hard at all to either extend or wrap the classes in question and add the caching system of your preference.

@mindplay-dk

This comment has been minimized.

Show comment
Hide comment
@mindplay-dk

mindplay-dk Oct 27, 2013

what happens when someone issues a query to modify the table after it's schema has been read once?

Absolutely nothing - if you use the fetch method, the data will be fetched; if you use the other method, you get the state of the schema as it was at the beginning of the current request/run.

There's nothing weird or unpredictable about that.

In fact, there may be situations where getting a changed version of the schema could be a problem for you - if you've built dependent data based on the first schema-reflection, and then somewhere else in your program you fetch it again and get an extra column that wasn't there before...

There are situations where you don't want the latest most accurate state of something - where the state you're dealing with has to be consistent during the current script execution, for whatever reason.

To give a simpler, similar example, you don't always want time(), which could change between calls - sometimes, in applications that are time-sensitive and perform time-consuming operations, you need to "cache" the time at the beginning of the request and use that for various calculations.

I'm imagining every possible use case, and deferring it to the user who wants to implement it

The only problem with that is, as explained, you can't defer this for multiple users to implement in their own components - or, you can, of course, but with several libraries doing this independently, they won't collectively accomplish what they're trying to achieve individually.

it is not hard at all to either extend or wrap the classes in question and add the caching system of your preference

As explained, once everybody adds their own caching layer, well, there's no point in re-iterating this again...

Anyhow, I believe this debate is going nowhere - we disagree, that's okay, such is life with software... if everyone agreed on everything, there wouldn't be more than one library for anything ;-)

What this library does do, it does very well, even if it doesn't do everything I wished for.

mindplay-dk commented Oct 27, 2013

what happens when someone issues a query to modify the table after it's schema has been read once?

Absolutely nothing - if you use the fetch method, the data will be fetched; if you use the other method, you get the state of the schema as it was at the beginning of the current request/run.

There's nothing weird or unpredictable about that.

In fact, there may be situations where getting a changed version of the schema could be a problem for you - if you've built dependent data based on the first schema-reflection, and then somewhere else in your program you fetch it again and get an extra column that wasn't there before...

There are situations where you don't want the latest most accurate state of something - where the state you're dealing with has to be consistent during the current script execution, for whatever reason.

To give a simpler, similar example, you don't always want time(), which could change between calls - sometimes, in applications that are time-sensitive and perform time-consuming operations, you need to "cache" the time at the beginning of the request and use that for various calculations.

I'm imagining every possible use case, and deferring it to the user who wants to implement it

The only problem with that is, as explained, you can't defer this for multiple users to implement in their own components - or, you can, of course, but with several libraries doing this independently, they won't collectively accomplish what they're trying to achieve individually.

it is not hard at all to either extend or wrap the classes in question and add the caching system of your preference

As explained, once everybody adds their own caching layer, well, there's no point in re-iterating this again...

Anyhow, I believe this debate is going nowhere - we disagree, that's okay, such is life with software... if everyone agreed on everything, there wouldn't be more than one library for anything ;-)

What this library does do, it does very well, even if it doesn't do everything I wished for.

@pmjones

This comment has been minimized.

Show comment
Hide comment
@pmjones

pmjones Oct 27, 2013

Member

"we disagree, that's okay, such is life with software... if everyone agreed on everything, there wouldn't be more than one library for anything ;-) What this library does do, it does very well, even if it doesn't do everything I wished for."

A gracious and civil thing to say, sir, and I thank you.

Member

pmjones commented Oct 27, 2013

"we disagree, that's okay, such is life with software... if everyone agreed on everything, there wouldn't be more than one library for anything ;-) What this library does do, it does very well, even if it doesn't do everything I wished for."

A gracious and civil thing to say, sir, and I thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment