-
Notifications
You must be signed in to change notification settings - Fork 49
Add postgres-specific id generator #26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
4e7f7f8
5c8d7c1
e3d6849
93fda3d
39a450a
2f0d7f2
10b201c
b4955eb
bf13c3e
36722b6
f272356
c6aca05
87688a9
aafe214
60ca204
37e674f
ff8e003
b9c7742
e534b5f
48a5086
3790972
ce7a895
aab706c
3ecaad9
b01bb6c
0158889
4e53196
285c1dd
10ef680
17ddae3
1d34707
bab3c31
e91868c
12b2aff
b85e02e
875de50
2a60c73
94ed98c
927730d
121571e
eca172a
1285872
3534a2f
5bebbd7
62b15db
d42ff2d
bbcac79
dcc3610
fd0f929
838eb4e
9363f03
cf22b24
9a47ef7
de130ef
f23a964
f819d0a
88fec2c
358bb94
63fce0c
a3f27f1
0d16903
f4e7e08
9d3f513
85ce1fe
10ed281
ef1b6a5
a402efa
c76760d
49bc198
43b465f
de1fc41
e5b5101
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -30,3 +30,7 @@ _book | |
|
|
||
| # node | ||
| node_modules | ||
|
|
||
| # Local dev | ||
| Dockerfile | ||
| docker-compose.yml | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,4 +1,4 @@ | ||
| class Backends(object): | ||
| MYSQL = 'django.db.backends.mysql' | ||
| POSTGRES = 'django.db.backends.postgresql_psycopg2' | ||
| SQLITE = 'django.db.backends.sqlite3' | ||
| MYSQL = ('django.db.backends.mysql', 'django.contrib.gis.db.backends.mysql') | ||
| POSTGRES = ('django.db.backends.postgresql_psycopg2', 'django.db.backends.postgresql', 'django.contrib.gis.db.backends.postgis') | ||
| SQLITE = ('django.db.backends.sqlite3', 'django.contrib.gis.db.backends.spacialite') |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,9 +1,15 @@ | ||
| from django.conf import settings | ||
| from django.apps import apps | ||
| from django_sharding_library.constants import Backends | ||
| from django.utils.six import iteritems | ||
| from django.db.models import Manager | ||
|
|
||
| from django_sharding_library.exceptions import NonExistentDatabaseException, ShardedModelInitializationException | ||
| from django_sharding_library.fields import ShardedIDFieldMixin | ||
| from django_sharding_library.manager import ShardManager | ||
| from django.db.models import Manager | ||
| from django_sharding_library.fields import ShardedIDFieldMixin, PostgresShardGeneratedIDField | ||
| from django_sharding_library.utils import register_migration_signal_for_model_receiver | ||
|
|
||
| PRE_MIGRATION_DISPATCH_UID = "PRE_MIGRATE_FOR_MODEL_%s" | ||
|
|
||
|
|
||
| def model_config(shard_group=None, database=None, sharded_by_field=None): | ||
|
|
@@ -26,13 +32,29 @@ def configure(cls): | |
| ) | ||
| setattr(cls, 'django_sharding__database', database) | ||
|
|
||
| postgres_shard_id_fields = list(filter(lambda field: issubclass(type(field), PostgresShardGeneratedIDField), cls._meta.fields)) | ||
| if postgres_shard_id_fields: | ||
| database_dicts = [settings.DATABASES[database]] if database else [db_settings for db, db_settings in | ||
| iteritems(settings.DATABASES) if | ||
| db_settings["SHARD_GROUP"] == shard_group] | ||
| if any([database_dict['ENGINE'] not in Backends.POSTGRES for database_dict in database_dicts]): | ||
| raise ShardedModelInitializationException( | ||
| 'You cannot use a PostgresShardGeneratedIDField on a non-Postgres database.') | ||
|
|
||
| register_migration_signal_for_model_receiver(apps.get_app_config(cls._meta.app_label), | ||
| PostgresShardGeneratedIDField.migration_receiver, | ||
| dispatch_uid=PRE_MIGRATION_DISPATCH_UID % cls._meta.app_label) | ||
|
|
||
| if shard_group: | ||
| sharded_fields = list(filter(lambda field: issubclass(type(field), ShardedIDFieldMixin), cls._meta.fields)) | ||
| if not sharded_fields: | ||
| raise ShardedModelInitializationException('All sharded models require a ShardedIDFieldMixin.') | ||
| if not sharded_fields and not postgres_shard_id_fields: | ||
|
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. update those two exception messages to include that using the
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Will do that today |
||
| raise ShardedModelInitializationException('All sharded models require a ShardedIDFieldMixin or a ' | ||
| 'PostgresShardGeneratedIDField.') | ||
|
|
||
| if not list(filter(lambda field: field == cls._meta.pk, sharded_fields)): | ||
| raise ShardedModelInitializationException('All sharded models require the ShardedAutoIDField to be the primary key. Set primary_key=True on the field.') | ||
| if not list(filter(lambda field: field == cls._meta.pk, sharded_fields)) and not postgres_shard_id_fields: | ||
| raise ShardedModelInitializationException('All sharded models require the ShardedAutoIDField or ' | ||
| 'PostgresShardGeneratedIDFieldto be the primary key. Set ' | ||
| 'primary_key=True on the field.') | ||
|
|
||
| if not callable(getattr(cls, 'get_shard', None)): | ||
| raise ShardedModelInitializationException('You must define a get_shard method on the sharded model.') | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,8 +1,15 @@ | ||
| from django.apps import apps | ||
| from django.conf import settings | ||
| from django.db.models import AutoField, CharField, ForeignKey | ||
| from django.db.models import AutoField, CharField, ForeignKey, BigIntegerField, OneToOneField | ||
|
|
||
| from django_sharding_library.constants import Backends | ||
| from django.db import connections, transaction, DatabaseError | ||
| from django_sharding_library.utils import create_postgres_global_sequence, create_postgres_shard_id_function | ||
|
|
||
| try: | ||
| from django.db.backends.postgresql.base import DatabaseWrapper as PostgresDatabaseWrapper | ||
| except ImportError: | ||
| from django.db.backends.postgresql_psycopg2.base import DatabaseWrapper as PostgresDatabaseWrapper | ||
|
|
||
|
|
||
| class BigAutoField(AutoField): | ||
|
|
@@ -11,12 +18,15 @@ class BigAutoField(AutoField): | |
| 9223372036854775807. | ||
| """ | ||
| def db_type(self, connection): | ||
| if connection.settings_dict['ENGINE'] == Backends.MYSQL: | ||
| if connection.settings_dict['ENGINE'] in Backends.MYSQL: | ||
| return 'serial' | ||
| if connection.settings_dict['ENGINE'] == Backends.POSTGRES: | ||
| if connection.settings_dict['ENGINE'] in Backends.POSTGRES: | ||
| return 'bigserial' | ||
| return super(BigAutoField, self).db_type(connection) | ||
|
|
||
| def get_internal_type(self): | ||
| return "BigIntegerField" | ||
|
|
||
|
|
||
| class ShardedIDFieldMixin(object): | ||
| """ | ||
|
|
@@ -156,3 +166,64 @@ class ShardForeignKeyStorageField(ShardForeignKeyStorageFieldMixin, ForeignKey): | |
| the shard using a pre_save signal. | ||
| """ | ||
| pass | ||
|
|
||
|
|
||
| class PostgresShardGeneratedIDField(AutoField): | ||
| """ | ||
| A field that uses a Postgres stored procedure to return an ID generated on the database. | ||
| """ | ||
| def db_type(self, connection, *args, **kwargs): | ||
|
|
||
| if not hasattr(settings, 'SHARD_EPOCH'): | ||
| raise ValueError("PostgresShardGeneratedIDField requires a SHARD_EPOCH to be defined in your settings file.") | ||
|
|
||
| if connection.vendor == PostgresDatabaseWrapper.vendor: | ||
| return "bigint DEFAULT next_sharded_id()" | ||
| else: | ||
| return super(PostgresShardGeneratedIDField, self).db_type(connection) | ||
|
|
||
| def get_internal_type(self): | ||
| return 'BigIntegerField' | ||
|
|
||
| def rel_db_type(self, connection): | ||
| return BigIntegerField().db_type(connection=connection) | ||
|
|
||
| @staticmethod | ||
| def migration_receiver(*args, **kwargs): | ||
| sequence_name = "global_id_sequence" | ||
|
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ohh it's one sequence per DB,, as opposed to per model?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Correct, its per DB. There is no reason to have a per model, no DB (that I am aware of) can insert more than 1024 rows per millisecond (which is what this id generator is designed to be able to do)!
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. when we were deciding on that strategy, we just weren't sure about id growth and sharing them, although now that I consider the actual size of a bigint, I have no idea why the decision was made not to do that. |
||
| db_alias = kwargs.get('using') | ||
| if not db_alias: | ||
| raise EnvironmentError("A pre-migration receiver did not receive a database alias. " | ||
| "Perhaps your app is not registered correctly?") | ||
| if settings.DATABASES[db_alias]['ENGINE'] in Backends.POSTGRES: | ||
| shard_id = settings.DATABASES[db_alias].get('SHARD_ID', 0) | ||
| create_postgres_global_sequence(sequence_name, db_alias, True) | ||
| create_postgres_shard_id_function(sequence_name, db_alias, shard_id) | ||
|
|
||
|
|
||
| class PostgresShardForeignKey(ForeignKey): | ||
| def db_type(self, connection): | ||
| # The database column type of a ForeignKey is the column type | ||
| # of the field to which it points. An exception is if the ForeignKey | ||
| # points to an AutoField/PositiveIntegerField/PositiveSmallIntegerField, | ||
| # in which case the column type is simply that of an IntegerField. | ||
| # If the database needs similar types for key fields however, the only | ||
| # thing we can do is making AutoField an IntegerField. | ||
| rel_field = self.target_field | ||
| if rel_field.get_internal_type() is "BigIntegerField": | ||
|
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Did you mean to put |
||
| return BigIntegerField().db_type(connection=connection) | ||
| return super(PostgresShardForeignKey, self).db_type(connection) | ||
|
|
||
|
|
||
| class PostgresShardOneToOne(OneToOneField): | ||
| def db_type(self, connection): | ||
| # The database column type of a ForeignKey is the column type | ||
| # of the field to which it points. An exception is if the ForeignKey | ||
| # points to an AutoField/PositiveIntegerField/PositiveSmallIntegerField, | ||
| # in which case the column type is simply that of an IntegerField. | ||
| # If the database needs similar types for key fields however, the only | ||
| # thing we can do is making AutoField an IntegerField. | ||
| rel_field = self.target_field | ||
| if rel_field.get_internal_type() is "BigIntegerField": | ||
|
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ditto |
||
| return BigIntegerField().db_type(connection=connection) | ||
| return super(PostgresShardOneToOne, self).db_type(connection) | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,16 @@ | ||
| postgres_shard_id_function_sql = """CREATE OR REPLACE FUNCTION next_sharded_id(OUT result bigint) AS $$ | ||
| DECLARE | ||
| start_epoch bigint := %(shard_epoch)d; | ||
| seq_id bigint; | ||
| now_millis bigint; | ||
| shard_id int := %(shard_id)d; | ||
| BEGIN | ||
| -- there is a typo here in the online example, which is corrected here | ||
| SELECT nextval('%(sequence_name)s') %% 1024 INTO seq_id; | ||
|
|
||
| SELECT FLOOR(EXTRACT(EPOCH FROM clock_timestamp()) * 1000) INTO now_millis; | ||
| result := (now_millis - start_epoch) << 23; | ||
| result := result | (shard_id << 10); | ||
| result := result | (seq_id); | ||
| END; | ||
| $$ LANGUAGE PLPGSQL;""" |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,32 @@ | ||
| from django.db import connections, DatabaseError, transaction | ||
| from django.conf import settings | ||
| from django_sharding_library.sql import postgres_shard_id_function_sql | ||
| from django.db.models import signals | ||
|
|
||
|
|
||
| def create_postgres_global_sequence(sequence_name, db_alias, reset_sequence=False): | ||
| cursor = connections[db_alias].cursor() | ||
| sid = transaction.savepoint(db_alias) | ||
| try: | ||
| cursor.execute("CREATE SEQUENCE %s;" % sequence_name) | ||
| except DatabaseError: | ||
| transaction.savepoint_rollback(sid, using=db_alias) | ||
| if reset_sequence: | ||
| cursor.execute("SELECT setval('%s', 1, false)" % (sequence_name,)) | ||
| else: | ||
| transaction.savepoint_commit(sid, using=db_alias) | ||
| cursor.close() | ||
|
|
||
|
|
||
| def create_postgres_shard_id_function(sequence_name, db_alias, shard_id): | ||
| cursor = connections[db_alias].cursor() | ||
| cursor.execute(postgres_shard_id_function_sql % {'shard_epoch': settings.SHARD_EPOCH, | ||
| 'shard_id': shard_id, | ||
| 'sequence_name': sequence_name}) | ||
| cursor.close() | ||
|
|
||
|
|
||
| def register_migration_signal_for_model_receiver(model, function, dispatch_uid=None): | ||
| signals.pre_migrate.connect(function, sender=model, dispatch_uid=dispatch_uid) | ||
|
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why pre_migrate? Is there a check that it was successful or a chance there will be a silent failure such that one node doesn't have it?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It will get a DatabaseError if it fails. Its in pre-migrate because its the only common area to add a migration step without editing the migration file itself after its produced, which I don't think is a good feature for a library since it adds complexity to using the library that can be taken care of automatically in the pre migration. Also, it has to be in the pre migrate because if one of the postgres fields gets created and the stored procedure doesnt exist yet, it will raise a programmingerror. Doing it in a post migration or anywhere else doesn't work.
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm definitely don't think we need to force users to modify sharded tables. On Tue, May 3, 2016, 9:02 PM Titus Peterson notifications@github.com
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As far as I can find in the documentation, there is no way to force a dependency because of a field when a user run makemigrations. I also asked in the django irc and the response from the admin there at the time was that he didn't think it was possible. This is exactly why the pre-migration signal was added in though, its for pre-migration processing. I am pretty sure this is the "django" way of doing it short of writing a migration, putting it in the app, and figuring out how to force it as a dependency on another app based on arbitrary, non-built-in logic (for example, foreign key fields have special logic already built into the migration builder). I am not sure I can think of a better use case of the pre migration signal than this
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let me know if you find something to handle a dependency, I actually already have a migration written to do this that I could include if we figure it out (in my current project, I was adding the migration manually to the automated migration files before I wrote this)
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Like I said, I have no ideas, make a special note in your docs about this. Having to run this. Ideally I'd also like to add a management command to run this, as well as one to check that ti was run. Since the django way feels a bit hacky to me.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I updated the docs to indicate this is happening. I can add a migrate command in the future: this is pretty failure-safe. If for any reason the required function or sequence do not exist, postgres itself will error out and send the error message to Django. But I agree, a manage command is in order, at least for checking that the function exists and is working properly. Will do that in a future pull request.
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. add the management command in this one, at least a rudimentary one. ipdb> cursor.execute("SELECT count(*) FROM pg_class c WHERE c.relkind = '%s' and c.relname = '%s';" % ('S', sequence_name))
ipdb> cursor.fetchone()
(1L,)and just take a sequence name (or whatever requirements to generate the ones you use) and the database to check. As well as one to create the sequence and to reset it, using the code you already wrote.
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. well I supose it can be in a different PR but I don't want to cut a release till it's in. |
||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -2,7 +2,7 @@ | |
|
|
||
| In order to shard your database, one of the first decisions to makee is how you assign identifiers to the sharded objects. While it is not required, it is highly recommended that you choose a unique identifier. The main reason here being that you may want to either move data across shards later or that you may choose to analyze data across various shards for analytics and you will have to differentiate those objects before moving them to another server. | ||
|
|
||
| This repository is initially shipping with two strategies but you may impliment your own. The base requirement at the moment is that you define a class like this: | ||
| This repository is initially shipping with three strategies but you may impliment your own. The base requirement for defining your own strategy at the moment is that you define a class like this: | ||
|
|
||
| ```python | ||
| class BaseIDGenerationStrategy(object): | ||
|
|
@@ -22,6 +22,7 @@ The two included in the package are: | |
|
|
||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Shouldn't it say
Also, I wonder if one could clarify what is enumerated here, since all these are mentioned above: strategies, additional arguments, or requirements. (<-- this is me being very pedantic)
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You are right, it should say three. I will update. What do you mean clarify what is enumerated? There are no additional requirements other than what is listed, which basically includes Postgres and defining a shard epoch time. All of the exact requirements are in the Using section for this feature, which is in the ShardingAModel.md file. If thats not what you mean, let me know! |
||
| 1. Use an autoincrement field to mimic the way a default table handles the operation | ||
| 2. Assign each item a UUID with the shard name appended to the end. | ||
| 3. A postgres-specific field that works similarly to Django's auto field, but in a shard safe way (only works for Postgres, don't try it with anything else!) | ||
|
|
||
| ##### The Autoincrement Method | ||
|
|
||
|
|
@@ -33,6 +34,10 @@ Note: The MySQL implementation uses a single row to accomplish this task while P | |
|
|
||
| While the odds of a UUID collision are very low, it is still possible and so we append the database shard name as a way to guarantee that they remain unique. The only drawback to this method is that the items cannot be moved across shards. However, it is the recommendation of the author that you refrain from shard rebalancing and instead focus on maintaining lots of shards rather than worry about balancing few large ones. | ||
|
|
||
| ##### The PostgresShardGeneratedIDField Method | ||
|
|
||
| This strategy is an automated implementation of how Instagram does shard IDs. It uses built-in Postgres functionality to generate a shard-safe ID on the database server at the time of the insert. A stored procedure is created and uses a user-defined epoch time and a shard ID to make sure the IDs it generates are unique. This method (currently) supports up to 8191 shards and up to 1024 inserts per millisecond, which should be more than enough for most use cases, up to and including Instagram scale usage! | ||
|
|
||
|
|
||
| They recently wrote a [lovely article](https://engineering.pinterest.com/blog/sharding-pinterest-how-we-scaled-our-mysql-fleet) about their sharding strategy. They use a 64 bit ID that works like so: | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I spent more time reading that code for dependencies and it isn't worth trying to hack into it, given they don't expose any good hooks for us to use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah its unfortunate that django has opted to keep migrations behind a closed curtain it seems like. Will look at updating the docs today.