Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

datastore Api repeatedly create index #3932

Closed
zyxbest opened this issue Nov 21, 2017 · 5 comments
Closed

datastore Api repeatedly create index #3932

zyxbest opened this issue Nov 21, 2017 · 5 comments
Assignees

Comments

@zyxbest
Copy link
Contributor

zyxbest commented Nov 21, 2017

CKAN Version if known (or site URL)

2.3

Please describe the expected behaviour

I've directly excuted the SQL string of ckanext/datastore/db.py::_get_index_names() in postgresql:

    sql = u"""
        SELECT
            i.relname AS index_name
        FROM
            pg_class t,
            pg_class i,
            pg_index idx
        WHERE
            t.oid = idx.indrelid
            AND i.oid = idx.indexrelid
            AND t.relkind = 'r'
            AND t.relname = %s
        """

(Also replaced %s with my resource_id )
Then got the output below:
image

I expecte that the method in db.py(for latest ckan it should be ckanext/datastore/backend/postgres.py) returns the same results (16 indexes) .

Please describe the actual behaviour

I've added some logs for create_indexes():

    current_indexes = _get_index_names(context['connection'],
                                       data_dict['resource_id'])
    log.info(current_indexes)
    log.info('index_num:{}'.format(len(current_indexes)))
    for sql_index_string in sql_index_strings:
        has_index = [c for c in current_indexes
                     if sql_index_string.find(c) != -1]
        if not has_index:
            log.info('has no index!create one!')
            log.info(sql_index_string)
            connection.execute(sql_index_string)

When I call datastore_create to create data(again and again), always get the log below:
image
There are only 15 indexes , then the api wants to excute the SQL to create an unique index.
But actually, this unique index was already created ,I even excuted the sql in pg:
image

It seems _get_index_names didn't work as expect, and this create_index operation just took me much time, which made the API in bad performance.

What steps can be taken to reproduce the issue?

Do logging above and use the script below:

def create_resource_primarykey(records, resource_id, primary_key):
    r = requests.post(
        urljoin(HOST, 'api/action/datastore_create'),
        data = json.dumps({
                'resource_id'   : resource_id,
                'force'         : True,
                'records'       : records,
                'primary_key'   : primary_key,
        }),
        headers={
            'Authorization': API_KEY,
            'Content-Type' : 'utf-8',
        }
    )
    print r.content

data = [{
                'primary_key_field':'{}'.format(get_random()),  # get_random just  returns a random string
                'data'      : 'exampledata'}]
create_resource_primarykey(data, local_40k, 'primay_key_field')
@zyxbest
Copy link
Contributor Author

zyxbest commented Nov 23, 2017

Well, maybe I find the reason.
in create_indexes there are such codes:

    if primary_key is not None:
        _drop_indexes(context, data_dict, True)
        indexes.append(primary_key)

Which means even I already have the same primary_key, datatstore would drop the exist unique index , then build it again .
So it's a drop and rebuild operation.

@wardi
Copy link
Contributor

wardi commented Nov 23, 2017

@shengshuaijohnson is the same problem present in more recent versions? 2.3 is no longer supported

@zyxbest
Copy link
Contributor Author

zyxbest commented Nov 24, 2017

@wardi There is similar logic in latest version: https://github.com/ckan/ckan/blob/master/ckanext/datastore/backend/postgres.py#L785
In my case I do some changes like:

image

@smotornyuk
Copy link
Member

Thanks, @shengshuaijohnson. I verified this problem and created PR with provided patch

@zyxbest
Copy link
Contributor Author

zyxbest commented Nov 30, 2017

Ok, thank you :)

amercader added a commit that referenced this issue Dec 5, 2017
…y-create-index

[#3932] Create datastore indexes only if they are not exist
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants