Automatically create sparse indexes in unique and null fields #80

Open
wants to merge 2 commits into from

4 participants

@maraujop

When you allow null in a unique field, you expect to be able to introduce several nulls in that column field. This code generates the sparse indexes on those fields automatically. Otherwise you can forget to create the indexes and get bitten by it in the future.

@maraujop maraujop Creating sparse indexes in unique and null fields
When you allow null in a unique field, you expect to be able to introduce
several nulls in that column field. This generates the sparse indexes
automatically. Otherwise you can forget to create the indexes and get bitten
by it in the future.
1a53f87
@flaper87
Django and NoSQL member

I think that people should explicitly specify whether that index is sparse or not. I don't thing that implying that unique + null creates a sparse index is the right choice.

Another thing. If you create a sparse index and you also set it as unique, you wont be able to insert 2 docs with that field null because that would raise a unique error and besides that, creating a sparse index and propagating null over docs without that field doesn't make any sense, that loses the sparse index scope.

@jonashaag

Miguel, can you give a reason why unique + null should default to sparse indexes? It's not obvious to me.

@maraujop

Wow, that was fast! Ok, let me explain myself. If I have this model:

class ExampleModel(models.Model):
    [...] # Some other fields here
    user = models.OneToOneField(User, null=True, blank=True)

When I go add an instance and save it. I cannot introduce two rows with user set to null, because Mongo doesn't treat null as a special value and says I have repeated null in the key user.

My understanding is that mongodb-engine should generate an sparse index over this field automatically, so that I can do what is the normal behavior with Django models. Otherwise I guess some third party applications will not be plug and play compatible.

Maybe I created a too broad case and we should narrow it to some specific fields.

Cheers,
Miguel

@jonashaag

Thanks for the explanation! I agree that this should be fixed somehow.

I'm not an expert in terms of sparse indices so I need advice from either of you. Flavio said a sparse+unique index treats null as a normal value, therefore making it impossible to store two documents with the same key: null pair. Miguel said, however, that a sparse index would fix that problem. Who of you is right? :-)

Another way to fix this could be to simply leave out these keys from the document -- does that work with non-sparse indices?

@maraujop

No problem. I just want to help improve mongodb-engine :)

What I meant to say is this:

"You can combine sparse with unique to produce a unique constraint that ignores documents with missing fields."
http://www.mongodb.org/display/DOCS/Indexes#Indexes-SparseIndexes

This is extracted from Mongodb official docs. does it make sense?

@flaper87
Django and NoSQL member
> use sparse
switched to db sparse
> db.test.ensureIndex({title : 1}, {unique : true, sparse : true})
> db.test.insert({title : null})
> db.test.insert({title : null})
E11000 duplicate key error index: sparse.test.$title_1  dup key: { : null }
> db.test.getIndexes()
[
    {
        "v" : 1,
        "key" : {
            "_id" : 1
        },
        "ns" : "sparse.test",
        "name" : "_id_"
    },
    {
        "v" : 1,
        "key" : {
            "title" : 1
        },
        "unique" : true,
        "ns" : "sparse.test",
        "name" : "title_1",
        "sparse" : true
    }
]
>

It is not possible to leave those keys out of the document because mongodb will add them with a null value.

http://www.mongodb.org/display/DOCS/Indexes#Indexes-UniqueIndexesandMissingKeys

@flaper87
Django and NoSQL member

hmm,

Seems like in the case of sparse indexes it doesn't add the field. In that case ignoring/removing the field from the doc could work.

Could you please add that to your code?

Thanks Miguel!!

@maraujop

Ok, But you could do:

db.test.insert({otherfield: "blabla"})

There has to be a way to leave unique + sparse indexes fields out of the document, right?

@maraujop

We've answered at the same time.

I forgot to add the rest of it, sorry. I didn't realize it.

I will look at it right away.

@jonashaag

Still I'm not convinced that we should implicitly create sparse indexes here. Indices are a complex topic -- we should not assume anything at all.

Maybe we reached the limit of what's possible adapting MongoDB to SQL databases. That's fine and we should not put up with unexpected behavior only to make code run out of the box.

If the "leave out keys" thing works we should implement that in a more generalized way (always leave out empty fields, independent of indices). We'd need to investigate in how that interferes with default values (and changing the default value after saves), compared to a SQL database. And then we could add a troubleshooting note to the docs explaining why things work differently and how to fix it (using a sparse index).

@maraujop

I don't think this one of those limits. Mongodb treats this a special case and tells you to set a sparse index in the docs, see the link I posted.

I believe this is not a hidden case and has bitten me twice in my first two models using mongodb-engine. Right now even if the coder creates a sparse index, he cannot call things like form.save() because it fails. So now it's blocking me, forcing me to do custom inserts.

Sparse index are not that a big thing, nothing that complicated. The way I've set it, the user can still define his custom indexes and if he sets a different sparse index on it, it overrides the automatic one.

The part I need to include as @FlaPer87 says is ignore those fields when saving a doc that has not set values for them.

@maraujop maraujop Removing unique+null fields from data to be saved
This makes the automatically generated sparse index work. See #80
1e46496
@maraujop

Here is my patch for SQLCompiler. I'm not very familiar with mongodb-engine code and database backends. But I believe this the right place to have the checking.

Of course this could be optimized or implemented different ways:

  • Flagging those fields, so we don't have to iterate them
  • Avoiding them to be put into data directly, not sure if this is mongodb-engine's scope or djangotoolbox.

I've tried my best, if you don't like it, give me some hints on where to look at.

Cheers,
Miguel

@maraujop

BTW I can confirm this is working with my models, out of the box, without having to setup MongoMeta class.

Let me know what you think :)

@jonashaag

Leaving fields out is tricky because of changing default values. Say you have default=None, save the document (i.e. the field is left out), then change the field default value to default=42. On MongoDB, the model instance attribute would be 42, on SQL it is None. Any ideas on how handle this case gracefully are welcome :-)

@jonashaag

You've found the right place for that code. But I think it'd more intuitive to iterate over the data dictionary and get the field instance in case the value is None instead of iterating over the fields . Plus the None in data.values() check is redundant.

The checks you added to creation.py should be located after the if not (... or ...) check.

And of course, it all needs tests. You can put them into tests/query/.

And don't forget yourself to the list of AUTHORS.rst :-)

@maraujop

Leaving fields out is tricky because of changing default values. Say you have default=None, save the document (i.e. the field is left out), then change the field default value to default=42. On MongoDB, the model instance attribute would be 42, on SQL it is None. Any ideas on how handle this case gracefully are welcome :-)

Well. My understanding is that if default is set to 42. Then data[field.column] will be 42 and not None, so it would work, right?

But I think it'd more intuitive to iterate over the data dictionary and get the field instance in case the value is None instead of iterating over the fields. Plus the None in data.values() check is redundant.

Well, is there a 1:1 relation between fields and data. Because data doesn't have an id set, while fields has an AutoField. I didn't assume it. If there is a way to get the field that maps to a value, which one is it?

The checks you added to creation.py should be located after the if not (... or ...) check.

Ok, glad I got it right. But I don't see any if not (... or ...) check, can you point me to the line in github?

And of course, it all needs tests. You can put them into tests/query/.

Sure :) First I wanted to make sure you agree with me on this.

And don't forget yourself to the list of AUTHORS.rst :-)

:D I will.

@flaper87
Django and NoSQL member
@maraujop

Ok, I understand, now we are talking two different things at the same time.

Leaving fields out is tricky because of changing default values. Say you have default=None, save the document (i.e. the field is left out), then change the field default value to default=42. On MongoDB, the model instance attribute would be 42, on SQL it is None. Any ideas on how handle this case gracefully are welcome :-)

Jonas you referring to my sentence "Avoiding them to be put into data directly, not sure if this is mongodb-engine's scope or djangotoolbox.", right?

If that is the case, skip my previous comment:

Well. My understanding is that if default is set to 42. Then data[field.column] will be 42 and not None, so it would work, right?

I would say first let's try to get my code optimized and more robust. I think we don't need an extra setting the way I implemented it.

Cheers,
Miguel

@jonashaag

Miguel, the problem is that if we don't store None value fields and then change the default to something not None, on SQL databases the column value will still be None (because it's stored as NULL) whereas using MongoDB it's the new default (because it is missing from the document).

Flavio, foo=None queries should still work because "key missing" is the same as "value is null" IIRC:

> db.test.insert({a:1})
> db.test.find({b: null})
{ "_id" : ObjectId("4eb111aaf42dc436bcfcf1c2"), "a" : 1 }
@flaper87
Django and NoSQL member

In that case it might work.

Like @maraujop said, lets wait 'til he optimizes his code and in the meanwhile we can think about other issues that might appear.

@maraujop

Now I understand. SQL has default values associated to columns in the RDBMS. While in Mongo defaults are in code.

But I guess this is not the first time this happens. How are you handling defaults in other cases? For example this also happens with an IntegerField that has a default=4 and then is changed to default=5. Future documents will have a different default, right?

Is this something we can manage somehow?

Also Jonas, please answer my above questions so I can work on improving the patch meanwhile.

Cheers,
Miguel

@jonashaag

Well currently default values are written to the db, so if the default changes, we still got the old value. Default values are only applied to fields that aren't represented in the document fetched from MongoDB. But if we decide to leave out default values, and then the default changes...

Example:

class Foo(models.Model):
    a = IntegerField(null=True, default=None)
  • Foo().save(). Record in SQL: {a: NULL}. Record in MongoDB: {}.
  • Foo.objects.get().a is None in both cases.
  • Change the default of a: a = IntegerField(..., default=42).
  • Foo.objects.get().a is None using SQL but 42 using MongoDB (because the key is missing from the document and thus the default value is used)

I don't have any good idea how to handle this case.

The condition I was talking about is https://github.com/django-mongodb-engine/mongodb-engine/pull/80/files#L1R98.

Forget about the iteration thing, there's no column -> field mapping that we can use.

@jonashaag

Miguel, any news on this issue?

@flaper87
Django and NoSQL member

ping ?

@maraujop

Sorry to take so long to answer.

I did a workaround to the OneToOneField to get it working, due to time constraints. I'm sorry, but these were my two cents, I'm not planning to continue working on this patch any farther.

Cheers,
Miguel

@rbdcti

+1 on this feature, I'm migrating over from mysql and it would be great to have!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment