Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
72 commits
Select commit Hold shift + click to select a range
4e7f7f8
Added manager and queryset for automatic shard selection in the router
May 1, 2016
5c8d7c1
Updated custom ShardQueryset to call super() functions to help handle…
May 1, 2016
e3d6849
Updated model_config decorator to take an optional "sharded_by_field"…
May 1, 2016
93fda3d
Updated the router to use the new optional sharded by field on the mo…
May 1, 2016
39a450a
Fixed a bug with staticmethod not working properly in the manager
May 1, 2016
2f0d7f2
Fixed a bug with staticmethod not working properly in the manager
May 1, 2016
10b201c
Updated logic for grabbing the shard in the router when the instance …
May 1, 2016
b4955eb
Updated the settings helper database_configs to put a shard id proper…
May 3, 2016
bf13c3e
Fixed a test bug when using SQLite where the BigAutoField would fail …
May 3, 2016
36722b6
Fixed another BigAutoField bug with SQLite being too large to store
May 3, 2016
f272356
Added common local dev files to gitignore
May 3, 2016
c6aca05
Added custom postgres id field that will automatically generate shard…
May 3, 2016
87688a9
Added unit tests for custom postgres shard id field
May 3, 2016
aafe214
Fixed a syntax error with bit manipulation
May 3, 2016
60ca204
Fixed another syntax error with bit manipulation
May 3, 2016
37e674f
Added logging to troubleshoot CI
May 3, 2016
ff8e003
More logging for CI
May 3, 2016
b9c7742
Fixed formatting issue with psycopg2. Removed logging. Updated unit t…
May 3, 2016
e534b5f
Added another test to verify the id coming back from the postgres id …
May 3, 2016
48a5086
Fixed multiple syntax errors in the router where it would look up the…
May 3, 2016
3790972
Created a base model for sharding to go along with the decorator. All…
May 3, 2016
ce7a895
Updated the model instace create function to copy out the kwargs befo…
May 3, 2016
aab706c
Updated tests for filter() and create() shard auto detection (no more…
May 3, 2016
3ecaad9
Fixed a bug in a test where the user created does not get pk=1
May 3, 2016
b01bb6c
Updated constants with all of the backend for each vendor. Updated re…
May 4, 2016
0158889
Updated decorator to raise an error if a person tries to use the post…
May 4, 2016
4e53196
Fixed a bug that would sometimes attempt to run postgres specific mig…
May 4, 2016
285c1dd
Updated docs with Postgres id generator feature sections
May 15, 2016
10ef680
Updated the decorators function with comments from github: updated ex…
May 15, 2016
17ddae3
Refactored the common logic for getting the shard in the router per t…
May 16, 2016
1d34707
Removed the test case settings from github comment, they did not work…
May 16, 2016
bab3c31
Removed the requirement for inheriting from ShardedModel. Added check…
May 19, 2016
e91868c
Updated docs to include the `sharded_by_field` information
May 19, 2016
12b2aff
Merge branch 'postgres-auto-shard-id' into shard-id-and-router-combined
May 19, 2016
b85e02e
Updated error messaging when dealing with an abstract base class and …
May 23, 2016
875de50
Updated test name
May 23, 2016
2a60c73
Removed todo lines for docs that have been completed
May 23, 2016
94ed98c
Fixed a typo
May 23, 2016
927730d
Removed extra import that is no longer needed
May 23, 2016
121571e
Removed some commented out code in the manager
May 23, 2016
eca172a
Merge branch 'master' into shard-by-manager-and-router
May 23, 2016
1285872
Updated tests to make sure exceptions are being rased correctly when …
May 23, 2016
3534a2f
Added more decorator tests
May 23, 2016
5bebbd7
Added test db settings
May 23, 2016
62b15db
Removed delete()'s from tests, travis did not error with the test db …
May 23, 2016
d42ff2d
Merge branch 'master' into postgres-auto-shard-id
May 24, 2016
bbcac79
Fixed a an import bug related to Django 1.8
May 24, 2016
dcc3610
Merge branch 'postgres-auto-shard-id' into shopventory-branch
May 25, 2016
fd0f929
Merge branch 'master' into shard-by-manager-and-router
May 26, 2016
838eb4e
Merge branch 'master' into postgres-auto-shard-id
May 26, 2016
9363f03
Commented out test settings again, pretty sure its breaking the CI
May 26, 2016
cf22b24
Merge branch 'shard-by-manager-and-router' into shopventory-branch
May 27, 2016
9a47ef7
Merge branch 'postgres-auto-shard-id' into shopventory-branch
May 27, 2016
de130ef
Fixed an issue with tests locally. Moved an import to top of file in …
May 27, 2016
f23a964
Removed a duplicate function declaration in the docs
Jun 6, 2016
f819d0a
Removed TestCase settings overrides in runtests. The changes are not …
Jun 6, 2016
88fec2c
Removed duplicate function declaration in docs
Jun 6, 2016
358bb94
Removed TestCase import from runtests: not needed for testing with Tr…
Jun 6, 2016
63fce0c
Merge branch 'shard-by-manager-and-router' into shopventory-branch
Jun 6, 2016
a3f27f1
Merge branch 'master' into shopventory-branch
Jun 6, 2016
0d16903
Added Postgres specific ID field functionality:
Jun 6, 2016
f4e7e08
Added tests for Postgres-specific ID field
Jun 6, 2016
9d3f513
Updated docs with postgres-specific ID information
Jun 6, 2016
85ce1fe
Added more common development files for git to ignore
Jun 6, 2016
10ed281
Updated runtests.py with required settings for postgres ID field testing
Jun 6, 2016
ef1b6a5
Added a foreign key field that can handle Postgres generated ID field…
Jul 14, 2016
a402efa
Merge branch 'postgres-auto-shard-id' into shopventory-branch
Jul 14, 2016
c76760d
Added one-to-one field for Django to work correctly with
Jul 14, 2016
49bc198
Merge branch 'postgres-auto-shard-id' into shopventory-branch
Jul 14, 2016
43b465f
Added rel_db_type function to postgres shard ID field
Jul 15, 2016
de1fc41
Merge branch 'postgres-auto-shard-id' into shopventory-branch
Jul 15, 2016
e5b5101
Fixed formatting in migration command for 1.10
Jul 15, 2016
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -30,3 +30,7 @@ _book

# node
node_modules

# Local dev
Dockerfile
docker-compose.yml
6 changes: 3 additions & 3 deletions django_sharding_library/constants.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
class Backends(object):
MYSQL = 'django.db.backends.mysql'
POSTGRES = 'django.db.backends.postgresql_psycopg2'
SQLITE = 'django.db.backends.sqlite3'
MYSQL = ('django.db.backends.mysql', 'django.contrib.gis.db.backends.mysql')
POSTGRES = ('django.db.backends.postgresql_psycopg2', 'django.db.backends.postgresql', 'django.contrib.gis.db.backends.postgis')
SQLITE = ('django.db.backends.sqlite3', 'django.contrib.gis.db.backends.spacialite')
34 changes: 28 additions & 6 deletions django_sharding_library/decorators.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,15 @@
from django.conf import settings
from django.apps import apps
from django_sharding_library.constants import Backends
from django.utils.six import iteritems
from django.db.models import Manager

from django_sharding_library.exceptions import NonExistentDatabaseException, ShardedModelInitializationException
from django_sharding_library.fields import ShardedIDFieldMixin
from django_sharding_library.manager import ShardManager
from django.db.models import Manager
from django_sharding_library.fields import ShardedIDFieldMixin, PostgresShardGeneratedIDField
from django_sharding_library.utils import register_migration_signal_for_model_receiver

PRE_MIGRATION_DISPATCH_UID = "PRE_MIGRATE_FOR_MODEL_%s"


def model_config(shard_group=None, database=None, sharded_by_field=None):
Expand All @@ -26,13 +32,29 @@ def configure(cls):
)
setattr(cls, 'django_sharding__database', database)

postgres_shard_id_fields = list(filter(lambda field: issubclass(type(field), PostgresShardGeneratedIDField), cls._meta.fields))
if postgres_shard_id_fields:
database_dicts = [settings.DATABASES[database]] if database else [db_settings for db, db_settings in
iteritems(settings.DATABASES) if
db_settings["SHARD_GROUP"] == shard_group]
if any([database_dict['ENGINE'] not in Backends.POSTGRES for database_dict in database_dicts]):
raise ShardedModelInitializationException(
'You cannot use a PostgresShardGeneratedIDField on a non-Postgres database.')

register_migration_signal_for_model_receiver(apps.get_app_config(cls._meta.app_label),
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spent more time reading that code for dependencies and it isn't worth trying to hack into it, given they don't expose any good hooks for us to use.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah its unfortunate that django has opted to keep migrations behind a closed curtain it seems like. Will look at updating the docs today.

PostgresShardGeneratedIDField.migration_receiver,
dispatch_uid=PRE_MIGRATION_DISPATCH_UID % cls._meta.app_label)

if shard_group:
sharded_fields = list(filter(lambda field: issubclass(type(field), ShardedIDFieldMixin), cls._meta.fields))
if not sharded_fields:
raise ShardedModelInitializationException('All sharded models require a ShardedIDFieldMixin.')
if not sharded_fields and not postgres_shard_id_fields:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update those two exception messages to include that using the PostgresShardGeneratedIDField is also valid.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do that today

raise ShardedModelInitializationException('All sharded models require a ShardedIDFieldMixin or a '
'PostgresShardGeneratedIDField.')

if not list(filter(lambda field: field == cls._meta.pk, sharded_fields)):
raise ShardedModelInitializationException('All sharded models require the ShardedAutoIDField to be the primary key. Set primary_key=True on the field.')
if not list(filter(lambda field: field == cls._meta.pk, sharded_fields)) and not postgres_shard_id_fields:
raise ShardedModelInitializationException('All sharded models require the ShardedAutoIDField or '
'PostgresShardGeneratedIDFieldto be the primary key. Set '
'primary_key=True on the field.')

if not callable(getattr(cls, 'get_shard', None)):
raise ShardedModelInitializationException('You must define a get_shard method on the sharded model.')
Expand Down
77 changes: 74 additions & 3 deletions django_sharding_library/fields.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,15 @@
from django.apps import apps
from django.conf import settings
from django.db.models import AutoField, CharField, ForeignKey
from django.db.models import AutoField, CharField, ForeignKey, BigIntegerField, OneToOneField

from django_sharding_library.constants import Backends
from django.db import connections, transaction, DatabaseError
from django_sharding_library.utils import create_postgres_global_sequence, create_postgres_shard_id_function

try:
from django.db.backends.postgresql.base import DatabaseWrapper as PostgresDatabaseWrapper
except ImportError:
from django.db.backends.postgresql_psycopg2.base import DatabaseWrapper as PostgresDatabaseWrapper


class BigAutoField(AutoField):
Expand All @@ -11,12 +18,15 @@ class BigAutoField(AutoField):
9223372036854775807.
"""
def db_type(self, connection):
if connection.settings_dict['ENGINE'] == Backends.MYSQL:
if connection.settings_dict['ENGINE'] in Backends.MYSQL:
return 'serial'
if connection.settings_dict['ENGINE'] == Backends.POSTGRES:
if connection.settings_dict['ENGINE'] in Backends.POSTGRES:
return 'bigserial'
return super(BigAutoField, self).db_type(connection)

def get_internal_type(self):
return "BigIntegerField"


class ShardedIDFieldMixin(object):
"""
Expand Down Expand Up @@ -156,3 +166,64 @@ class ShardForeignKeyStorageField(ShardForeignKeyStorageFieldMixin, ForeignKey):
the shard using a pre_save signal.
"""
pass


class PostgresShardGeneratedIDField(AutoField):
"""
A field that uses a Postgres stored procedure to return an ID generated on the database.
"""
def db_type(self, connection, *args, **kwargs):

if not hasattr(settings, 'SHARD_EPOCH'):
raise ValueError("PostgresShardGeneratedIDField requires a SHARD_EPOCH to be defined in your settings file.")

if connection.vendor == PostgresDatabaseWrapper.vendor:
return "bigint DEFAULT next_sharded_id()"
else:
return super(PostgresShardGeneratedIDField, self).db_type(connection)

def get_internal_type(self):
return 'BigIntegerField'

def rel_db_type(self, connection):
return BigIntegerField().db_type(connection=connection)

@staticmethod
def migration_receiver(*args, **kwargs):
sequence_name = "global_id_sequence"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohh it's one sequence per DB,, as opposed to per model?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, its per DB. There is no reason to have a per model, no DB (that I am aware of) can insert more than 1024 rows per millisecond (which is what this id generator is designed to be able to do)!

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when we were deciding on that strategy, we just weren't sure about id growth and sharing them, although now that I consider the actual size of a bigint, I have no idea why the decision was made not to do that.

db_alias = kwargs.get('using')
if not db_alias:
raise EnvironmentError("A pre-migration receiver did not receive a database alias. "
"Perhaps your app is not registered correctly?")
if settings.DATABASES[db_alias]['ENGINE'] in Backends.POSTGRES:
shard_id = settings.DATABASES[db_alias].get('SHARD_ID', 0)
create_postgres_global_sequence(sequence_name, db_alias, True)
create_postgres_shard_id_function(sequence_name, db_alias, shard_id)


class PostgresShardForeignKey(ForeignKey):
def db_type(self, connection):
# The database column type of a ForeignKey is the column type
# of the field to which it points. An exception is if the ForeignKey
# points to an AutoField/PositiveIntegerField/PositiveSmallIntegerField,
# in which case the column type is simply that of an IntegerField.
# If the database needs similar types for key fields however, the only
# thing we can do is making AutoField an IntegerField.
rel_field = self.target_field
if rel_field.get_internal_type() is "BigIntegerField":
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to put is and not == for the string comparison?

return BigIntegerField().db_type(connection=connection)
return super(PostgresShardForeignKey, self).db_type(connection)


class PostgresShardOneToOne(OneToOneField):
def db_type(self, connection):
# The database column type of a ForeignKey is the column type
# of the field to which it points. An exception is if the ForeignKey
# points to an AutoField/PositiveIntegerField/PositiveSmallIntegerField,
# in which case the column type is simply that of an IntegerField.
# If the database needs similar types for key fields however, the only
# thing we can do is making AutoField an IntegerField.
rel_field = self.target_field
if rel_field.get_internal_type() is "BigIntegerField":
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

return BigIntegerField().db_type(connection=connection)
return super(PostgresShardOneToOne, self).db_type(connection)
2 changes: 1 addition & 1 deletion django_sharding_library/id_generation_strategies.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ def get_next_id(self, database=None):
"""
from django.conf import settings
backing_table_db = getattr(self.backing_model, 'database', 'default')
if settings.DATABASES[backing_table_db]['ENGINE'] == Backends.MYSQL:
if settings.DATABASES[backing_table_db]['ENGINE'] in Backends.MYSQL:
with transaction.atomic(backing_table_db):
cursor = connections[backing_table_db].cursor()
sql = "REPLACE INTO `{0}` (`stub`) VALUES ({1})".format(
Expand Down
2 changes: 1 addition & 1 deletion django_sharding_library/management/commands/migrate.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ def handle(self, *args, **options):
options['database'] = database
# Writen in green text to stand out from the surrouding headings
if options['verbosity'] >= 1:
self.stdout.write(self.style.MIGRATE_SUCCESS("\nDatabase: {}\n").format(database))
self.stdout.write(getattr(self.style, "MIGRATE_SUCCESS", getattr(self.style, "SUCCESS"))("\nDatabase: {}\n").format(database))
super(Command, self).handle(*args, **options)

def get_all_but_replica_dbs(self):
Expand Down
12 changes: 11 additions & 1 deletion django_sharding_library/settings_helpers.py
Original file line number Diff line number Diff line change
Expand Up @@ -70,8 +70,9 @@ def database_configs(databases_dict):
}
"""
configuration = {}
shard_id_hash = {} # Keep track of the IDs of the shards currently. Used to help with migrations.
for (databases, is_sharded) in [(databases_dict.get('unsharded_databases', []), False), (databases_dict.get('sharded_databases', []), True)]:
for database in databases:
for idx, database in enumerate(databases):
db_config = database_config(
database['environment_variable'],
database['default_database_url'],
Expand All @@ -89,4 +90,13 @@ def database_configs(databases_dict):
)
if db_config:
configuration[replica['name']] = db_config

# We assume the numeric shard ID is constant based on the entries in the configuration helper (we assume
# they wont change order, and that new shards will be appended and not inserted randomly)
# This is noted in the docs, leaving this comment for whomever may work on this in the future.
if is_sharded:
shard_id = shard_id_hash.get(configuration[database['name']]['SHARD_GROUP'], 0)
configuration[database['name']]['SHARD_ID'] = shard_id
shard_id_hash[configuration[database['name']]['SHARD_GROUP']] = shard_id + 1

return configuration
16 changes: 16 additions & 0 deletions django_sharding_library/sql.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
postgres_shard_id_function_sql = """CREATE OR REPLACE FUNCTION next_sharded_id(OUT result bigint) AS $$
DECLARE
start_epoch bigint := %(shard_epoch)d;
seq_id bigint;
now_millis bigint;
shard_id int := %(shard_id)d;
BEGIN
-- there is a typo here in the online example, which is corrected here
SELECT nextval('%(sequence_name)s') %% 1024 INTO seq_id;

SELECT FLOOR(EXTRACT(EPOCH FROM clock_timestamp()) * 1000) INTO now_millis;
result := (now_millis - start_epoch) << 23;
result := result | (shard_id << 10);
result := result | (seq_id);
END;
$$ LANGUAGE PLPGSQL;"""
32 changes: 32 additions & 0 deletions django_sharding_library/utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
from django.db import connections, DatabaseError, transaction
from django.conf import settings
from django_sharding_library.sql import postgres_shard_id_function_sql
from django.db.models import signals


def create_postgres_global_sequence(sequence_name, db_alias, reset_sequence=False):
cursor = connections[db_alias].cursor()
sid = transaction.savepoint(db_alias)
try:
cursor.execute("CREATE SEQUENCE %s;" % sequence_name)
except DatabaseError:
transaction.savepoint_rollback(sid, using=db_alias)
if reset_sequence:
cursor.execute("SELECT setval('%s', 1, false)" % (sequence_name,))
else:
transaction.savepoint_commit(sid, using=db_alias)
cursor.close()


def create_postgres_shard_id_function(sequence_name, db_alias, shard_id):
cursor = connections[db_alias].cursor()
cursor.execute(postgres_shard_id_function_sql % {'shard_epoch': settings.SHARD_EPOCH,
'shard_id': shard_id,
'sequence_name': sequence_name})
cursor.close()


def register_migration_signal_for_model_receiver(model, function, dispatch_uid=None):
signals.pre_migrate.connect(function, sender=model, dispatch_uid=dispatch_uid)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why pre_migrate? Is there a check that it was successful or a chance there will be a silent failure such that one node doesn't have it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will get a DatabaseError if it fails. Its in pre-migrate because its the only common area to add a migration step without editing the migration file itself after its produced, which I don't think is a good feature for a library since it adds complexity to using the library that can be taken care of automatically in the pre migration.

Also, it has to be in the pre migrate because if one of the postgres fields gets created and the stored procedure doesnt exist yet, it will raise a programmingerror. Doing it in a post migration or anywhere else doesn't work.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm definitely don't think we need to force users to modify sharded tables.
If we can, I have to look into how database migration dependencies work but
could use a migration dependency on a migration shipped with the app?

On Tue, May 3, 2016, 9:02 PM Titus Peterson notifications@github.com
wrote:

In django_sharding_library/utils.py
#26 (comment):

  •        cursor.execute("SELECT setval('%s', 1, false)" % (sequence_name,))
    
  • else:
  •    transaction.savepoint_commit(sid, using=db_alias)
    
  • cursor.close()

+def create_postgres_shard_id_function(sequence_name, db_alias, shard_id):

  • cursor = connections[db_alias].cursor()
  • cursor.execute(postgres_shard_id_function_sql % {'shard_epoch': settings.SHARD_EPOCH,
  •                                                 'shard_id': shard_id,
    
  •                                                 'sequence_name': sequence_name})
    
  • cursor.close()

+def register_migration_signal_for_model_receiver(model, function, dispatch_uid=None):

  • signals.pre_migrate.connect(function, sender=model, dispatch_uid=dispatch_uid)

It will get a DatabaseError if it fails. Its in pre-migrate because its
the only common area to add a migration step without editing the migration
file itself after its produced, which I don't think is a good feature for a
library since it adds complexity to using the library that can be taken
care of automatically in the pre migration.

Also, it has to be in the pre migrate because if one of the postgres
fields gets created and the stored procedure doesnt exist yet, it will
raise a programmingerror. Doing it in a post migration or anywhere else
doesn't work.


You are receiving this because you commented.

Reply to this email directly or view it on GitHub
https://github.com/JBKahn/django-sharding/pull/26/files/e534b5ffa93bd79fab7bcb8f90fb315a344f3661#r61980374

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can find in the documentation, there is no way to force a dependency because of a field when a user run makemigrations. I also asked in the django irc and the response from the admin there at the time was that he didn't think it was possible.

This is exactly why the pre-migration signal was added in though, its for pre-migration processing. I am pretty sure this is the "django" way of doing it short of writing a migration, putting it in the app, and figuring out how to force it as a dependency on another app based on arbitrary, non-built-in logic (for example, foreign key fields have special logic already built into the migration builder). I am not sure I can think of a better use case of the pre migration signal than this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know if you find something to handle a dependency, I actually already have a migration written to do this that I could include if we figure it out (in my current project, I was adding the migration manually to the automated migration files before I wrote this)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like I said, I have no ideas, make a special note in your docs about this. Having to run this. Ideally I'd also like to add a management command to run this, as well as one to check that ti was run. Since the django way feels a bit hacky to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the docs to indicate this is happening. I can add a migrate command in the future: this is pretty failure-safe. If for any reason the required function or sequence do not exist, postgres itself will error out and send the error message to Django. But I agree, a manage command is in order, at least for checking that the function exists and is working properly. Will do that in a future pull request.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add the management command in this one, at least a rudimentary one.
i.e.

ipdb> cursor.execute("SELECT count(*) FROM pg_class c WHERE c.relkind = '%s' and c.relname = '%s';" % ('S', sequence_name))
ipdb> cursor.fetchone()
(1L,)

and just take a sequence name (or whatever requirements to generate the ones you use) and the database to check. As well as one to create the sequence and to reset it, using the code you already wrote.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well I supose it can be in a different PR but I don't want to cut a release till it's in.



7 changes: 6 additions & 1 deletion docs/components/IDGeneration.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

In order to shard your database, one of the first decisions to makee is how you assign identifiers to the sharded objects. While it is not required, it is highly recommended that you choose a unique identifier. The main reason here being that you may want to either move data across shards later or that you may choose to analyze data across various shards for analytics and you will have to differentiate those objects before moving them to another server.

This repository is initially shipping with two strategies but you may impliment your own. The base requirement at the moment is that you define a class like this:
This repository is initially shipping with three strategies but you may impliment your own. The base requirement for defining your own strategy at the moment is that you define a class like this:

```python
class BaseIDGenerationStrategy(object):
Expand All @@ -22,6 +22,7 @@ The two included in the package are:

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it say

The three included in the package are

Also, I wonder if one could clarify what is enumerated here, since all these are mentioned above: strategies, additional arguments, or requirements. (<-- this is me being very pedantic)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, it should say three. I will update.

What do you mean clarify what is enumerated? There are no additional requirements other than what is listed, which basically includes Postgres and defining a shard epoch time. All of the exact requirements are in the Using section for this feature, which is in the ShardingAModel.md file. If thats not what you mean, let me know!

1. Use an autoincrement field to mimic the way a default table handles the operation
2. Assign each item a UUID with the shard name appended to the end.
3. A postgres-specific field that works similarly to Django's auto field, but in a shard safe way (only works for Postgres, don't try it with anything else!)

##### The Autoincrement Method

Expand All @@ -33,6 +34,10 @@ Note: The MySQL implementation uses a single row to accomplish this task while P

While the odds of a UUID collision are very low, it is still possible and so we append the database shard name as a way to guarantee that they remain unique. The only drawback to this method is that the items cannot be moved across shards. However, it is the recommendation of the author that you refrain from shard rebalancing and instead focus on maintaining lots of shards rather than worry about balancing few large ones.

##### The PostgresShardGeneratedIDField Method

This strategy is an automated implementation of how Instagram does shard IDs. It uses built-in Postgres functionality to generate a shard-safe ID on the database server at the time of the insert. A stored procedure is created and uses a user-defined epoch time and a shard ID to make sure the IDs it generates are unique. This method (currently) supports up to 8191 shards and up to 1024 inserts per millisecond, which should be more than enough for most use cases, up to and including Instagram scale usage!

##### Pinterest

They recently wrote a [lovely article](https://engineering.pinterest.com/blog/sharding-pinterest-how-we-scaled-our-mysql-fleet) about their sharding strategy. They use a 64 bit ID that works like so:
Expand Down
11 changes: 11 additions & 0 deletions docs/usage/Migrations.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,3 +65,14 @@ class Command(MigrationCommand):
```

By using the included router, it's as simple as calling migrate on all the primary databases in the system and allowing the system to decide which databases to run the migration on. The above changes were made to make the interface more simple than having to specify all the relevant databases.

### PostgresShardGeneratedIDField Migration Info

This library hooks into the Django migrations and creates (or updates) the necessary stored procedures before every migration. We made it work this way for two reasons:

1. Django does not have a good way to force a field-specific migration dependency without having to edit the migration files themselves after they are generated
2. This allows unit tests to be run on any arbitrary (PostgreSQL) database without any administrative overhead.

The migration hooks should not affect you in any way, but you should be aware that there is a little bit of "magic" going on to make this field work with Django's migrations, without actually being part of the migration file itself.

If the Django team ever makes migrations easier to customize by adding dependency injection based on specific fields, we will update this and add the migration step to your migration files when they are generated!
23 changes: 23 additions & 0 deletions docs/usage/ShardingAModel.md
Original file line number Diff line number Diff line change
Expand Up @@ -111,3 +111,26 @@ CoolGuyShardedModel.objects.filter(user_pk=123, some_field='some_value')
```

Once you've defined your model, we can move onto how to run migrations.

### Using the PostgresShardGeneratedIDField

If you would like to use the PostgresShardGeneratedIDField, there are a few subtle differences and caveats that you need to be aware of.

1. If you define a PostgresShardGeneratedIDField, you should not use another shard ID generation strategy with that model. Additionally, the field should be marked as the primary key. An example of a model with a PostgresShardIDField:
```python
@model_config(shard_group='default')
class CoolGuyShardedModel(models.Model):
id = PostgresShardGeneratedIDField(primary_key=True)
cool_guy_string = models.CharField(max_length=120)
user_pk = models.PositiveIntegerField()
```
2. You must define a "SHARD_EPOCH" variable in your Django settings file. This can be any epoch start time you want, but once chosen, should NEVER be changed. Here is an example of what it should look like (which will make your shard epoch Jan 1, 2016):
```python
import time
from datetime import datetime
# other settings go here...
SHARD_EPOCH=int(time.mktime(datetime(2016, 1, 1).timetuple()) * 1000)
```

3. When you are editing your DATABASES settings, the order of the shards MUST be maintained. If you add a new shard, it needs to be added to the end of the list of databases, not to the beginning or middle.
4. There is a maximum number of logical shards supported by this field. You can only have up to 8191 logical shards: if you try to go beyond, you will get duplicate IDs between your shards. Do not try to add more than 8191 shards. If you need more than that, I recommend you choose one of the other ID generation strategies.
3 changes: 3 additions & 0 deletions runtests.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
import os
import sys
from datetime import datetime
import time

try:
import django
Expand Down Expand Up @@ -64,6 +66,7 @@
],
SITE_ID=1,
MIDDLEWARE_CLASSES=(),
SHARD_EPOCH=int(time.mktime(datetime(2016, 1, 1).timetuple()) * 1000),
)
django.setup()

Expand Down
8 changes: 8 additions & 0 deletions tests/migrations/0001_initial.py
Original file line number Diff line number Diff line change
Expand Up @@ -107,4 +107,12 @@ class Migration(migrations.Migration):
name='test',
field=models.ForeignKey(to='tests.UnshardedTestModel'),
),
migrations.CreateModel(
name='PostgresCustomIDModel',
fields=[
('id', django_sharding_library.fields.PostgresShardGeneratedIDField(verbose_name='ID', serialize=False, auto_created=True, primary_key=True)),
('random_string', models.CharField(max_length=120)),
('user_pk', models.PositiveIntegerField()),
],
),
]
Loading