Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Changes to convert dynamoDb number type to a python decimal type, to pre... #890

Closed
wants to merge 1 commit into from

4 participants

@elee-nst

...serve the 38 decimal digits that dynamoDb supports (after the decimal point).

This avoids converting the number to a float or int, which causes problems since the value no longer matches what is in dynamoDb.

See boot issue 873 for more details.
#873 (comment)

@elee-nst elee-nst Changes to convert dynamoDb number type to a python decimal type, to …
…preserve the 38 decimal digits that dynamoDb supports (after the decimal point).

This avoids converting the number to a float or int, which causes problems since the value no longer matches what is in dynamoDb.

See boot issue 873 for more details.
boto#873 (comment)
64e3c06
@garnaat
Owner

I should have used Decimal to begin with. That was my mistake. I'm trying to figure out the impact, if any, to existing users. Any thoughts on that? I'll do some local testing, too.

@elee-nst

Some thoughts from looking at the decimal docs :


If existing users were using the results for arithmetic operations they might see TypeError exceptions (since the returned item is now a Decimal instead of an int or float). It looks like Decimal() + int() is fine, but Decimal() + float() gives an exception. Casting the a Decimal to a float (or vice-versa) seems simple, although it is an extra step.

from decimal import *
a = Decimal(10)
b = float(1)
c = a + b
Traceback (most recent call last):
File "", line 1, in
TypeError: unsupported operand type(s) for +: 'Decimal' and 'float'
b2 = Decimal(b)
c = a + b2
c
Decimal('11')
c = float(a) + b
c
11.0

http://docs.python.org/library/decimal.html

Decimal objects cannot generally be combined with floats in arithmetic operations: an attempt to add a Decimal to a float, for example, will raise a TypeError. There’s one exception to this rule: it’s possible to use Python’s comparison operators to compare a float instance x with a Decimal instance y. ...


I guess that if an application switched to using Decimal instead of floats it would require more memory. For me that is a "don't care" and I guess for most people (i.e. using dynamoDB more for "high-availability" than for massive storage). Actually for me it is a plus since we are storing long numeric keys and want to keep all the digits!

If value is a float, the binary floating point value is losslessly converted to its exact decimal equivalent. This conversion can often require 53 or more digits of precision. For example, Decimal(float('1.1')) converts to Decimal('1.100000000000000088817841970012523233890533447265625').


This is more of a "future" issue, than a backward compatibility issue, but I noticed this in the Decimal docs. I was wondering if an application took in arbitrary Decimal values, and stored them in dynamoDB, what would happen for the special values below (i.e. what would dynamoDB do with them). There is already a conversion when storing things in dynamoDB anyway (38 digits max after the decimal point, etc.) so you can't expect an arbitrary Decimal to come back unchanged but it would be nice to figure out the limits.

9.4.5.2. Special values
The number system for the decimal module provides special values including NaN, sNaN, -Infinity, Infinity, and two zeros, +0 and -0.


My $0.02 is that Decimal is a good match to the large numbers allowed by DynamoDB and is the best default.

For backward compatibility maybe there could be a configuration option to have the "old behavior", so if someone upgrades and see exceptions (that they don't have time to fix right away) they can set a config option and quickly get the old behavior until they have time to deal with the exceptions.

Regards,
Eric

@disruptek

I think I'd prefer to extend Decimal to allow it to warn (or die) during truncation, handle float/int math, and implement handling for the special values. Allowing the user to override our "DynamoNumber" class would allow for both lightweight (float?) use or more sophisticated implementations.

@elee-nst

Allowing the user to override our "DynamoNumber" class would allow for both lightweight (float?) use or more sophisticated implementations.

If there was a config option to use Decimal that would work for me also.

I haven't looked at the boto config options, but if it could be passed in
boto.connect_dynamodb( aws_access_key_id=zzz, aws_secret_access_key=zzz, dynamoNumber='Decimal')
it would be very convenient (one line in the code). A separate config file would be less convenient, but would work too.

@elee-nst elee-nst closed this
@elee-nst elee-nst reopened this
@elee-nst

oops, I meant to comment, not "close and comment".

@garnaat
Owner

I've made some changes locally and I'm testing, but when I try to add a Decimal attribute to and Item and then store it, I'm getting an error from DynamoDB.

>>> i = table.new_item('mitch', 3)
>>> i['fie'] = Decimal(33.456)
>>> i.put()

Gives me:

DynamoDBValidationError: DynamoDBValidationError: 400 Bad Request
{u'message': u'Attempting to store more than 38 significant digits in a Number', u'__type': u'com.amazon.coral.validate#ValidationException'}

This is what is actually sent over the wire:

{"Item": {"foo": {"S": "mitch"}, "bar": {"N": "3"}, "fie": {"N": "33.4560000000000030695446184836328029632568359375"}}, "TableName": "foobar"}

@elee-nst

Here is my understanding:

From: http://docs.python.org/library/decimal.html#decimal.getcontext

By default Decimal will store the "exact" value (i.e. tries to not lose precision when doing conversions).

There is a
getcontext().prec = 28
setting to tell decimal how many decimal digits to store. This is used when computing the results of an operation with decimals (adding/multiplying, etc.).

There is also a .quantize() method that will trim to the desired number of digits.

Since amazon dynamodb supports 38 decimal digits of precision (after the decimal point), I tried setting the decimal.getcontext().prec to 38. But, that didn't give 38 digits after the decimal point, I needed to set it to 40 to give room for the integer portion also, "33"

import decimal
i = decimal.Decimal(33.456)
i
Decimal('33.4560000000000030695446184836328029632568359375')

decimal.getcontext()
Context(prec=28, rounding=ROUND_HALF_EVEN, Emin=-999999999, Emax=999999999, capitals=1, flags=[], traps=[InvalidOperation, DivisionByZero, Overflow])

decimal.getcontext().prec=40

** THE QUANTIZE METHOD NOW TRIMS TO 38 DIGITS AFTER THE DECIMAL POINT.

i2 = decimal.Decimal(33.456).quantize(decimal.Decimal('0.00000000000000000000000000000000000001'))

MULTIPLYING BY ONE WILL ALSO CAUSE IT TO TRIM THE RESULTS TO THE 'PREC' SPECIFIED.

i*1
Decimal('33.45600000000000306954461848363280296326')

One trick seems to how to specify 38 digits AFTER the decimal point (which is the dynamoDB limit). But, this looks like it is the right place to tweak things in the decimal library.

Rgds,
Eric

@elee-nst

Or, perhaps it could be trimmed to 38 digits (after the decimal point) after it is converted to a dynamo string?

def dynamize_value(val):
"""
Take a scalar Python value and return a dict consisting
of the Amazon DynamoDB type specification and the value that
needs to be sent to Amazon DynamoDB. If the type of the value
is not supported, raise a TypeError
"""
def _str(val):
"""
DynamoDB stores booleans as numbers. True is 1, False is 0.
This function converts Python booleans into DynamoDB friendly
representation.
"""
if isinstance(val, bool):
return str(int(val))
return str(val)

perhaps replacing str(val) with something like:

str(val)[ 0 : str(val).find('.')+39 ]
i.e. return everything from the start of the string to 38 digits after the .

I'm not certain that is 100% right, I think the str(val).find returns -1 if there isn't a decimal point, so a big positive integer might possibly be trimmed to 37 characters?

maybe something like the code below (warning, I didn't have time to test this runs!).

string = str(val)
hasDot = string.find('.')
if hasDot > -1:
string = string[ 0 : hasDot+39 ]
return string

@garnaat
Owner

I could trim the string prior to sending it to DynamoDB. I'm just wondering what would be better from a user perspective. Trimming behind the scenes may be confusing because the value actually stored in DynamoDB would be different than the value the user sees in Python. Not trimming would probably result in errors like this from DynamoDB but it would force the user to explicitly deal with it in Python. I'm kind of leaning towards the latter approach. Comments?

@disruptek

Leaving the handling to the user seems like the right approach to me; it's more pythonic and doesn't obscure semantics that we can't automate expertly. If we don't like my idea of a DynamoNumber class that can handle that expertise and be overridden by the user, how about designing the number handling code as abstractly as possible, so that non-Decimal instances can be consumed even if Decimal instances are produced?

@garnaat
Owner

@disruptek It's not that I don't like the idea of a DynamoNumber class, I just haven't had time to implement it.

@elee-nst

One thought (which I have not looked at much, so it might not be practical).

There might be cases (like 33.456) where a user is expecting dynamoDB to contain exactly 33.456, not the "nearest floating point number" of '33.45600000000000306954461848363280296326'.

Since dynamoDB sends everything as a string when talking to the cloud, perhaps there could be an option to input/read out a number as a string (where the end user has full control over the number of digits, etc.).

The only hiccup I could see is that dynamoDB will remove leading/trailing 0's (if you send 1.234000000 I believe it will store it as 1.234) so you could not always do an exact string match between what you sent and what you get back. But that is inherent in dynamoDB, any library interfacing to is would have this behavior.

@disruptek

I like that idea. Maybe what we really want is for our internal encoding of numeric values to be strings. This matches the DynamoDB model more explicitly and allows for the "33.456" string assignment without any lossy casting. I'd like to contribute some work on this, but I'm pretty well slammed at the moment.

@elee-nst
@disruptek

I agree that Layer1 should match the API as closely as possible, but as with any protocol design, I think we want to be strict in what we produce and loose in what we consume. I don't see the harm in performing Layer1 string conversions on numeric types that we find in lieu of strings, but any lossy precision conversion or under/over-flow should raise an exception.

We can perform the conversion to Decimal using a custom Context that asserts those conditions. This will break existing code that is using lossy float storage, but it will expertly handle strings, (exact) floats, and numeric types including bool and Decimal.

Decimal does seem to be the appropriate type for wrapping Layer2 numeric values retrieved from DynamoDB, as it will yield lossless reads. The user can drop to Layer1 if they want to perform their own type conversions with an eye towards resource conservation or controlled loss of precision.

Again, this almost certainly will yield new exceptions in existing code, but the reality is that the sooner we make a change, the better.

@jamesls
Owner

Closing in favor of #1183, which supercedes this.

@jamesls jamesls closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Jul 30, 2012
  1. @elee-nst

    Changes to convert dynamoDb number type to a python decimal type, to …

    elee-nst authored
    …preserve the 38 decimal digits that dynamoDb supports (after the decimal point).
    
    This avoids converting the number to a float or int, which causes problems since the value no longer matches what is in dynamoDb.
    
    See boot issue 873 for more details.
    boto#873 (comment)
This page is out of date. Refresh to see the latest.
Showing with 7 additions and 13 deletions.
  1. +4 −3 boto/dynamodb/layer2.py
  2. +3 −10 boto/dynamodb/types.py
View
7 boto/dynamodb/layer2.py
@@ -26,7 +26,8 @@
from boto.dynamodb.schema import Schema
from boto.dynamodb.item import Item
from boto.dynamodb.batch import BatchList, BatchWriteList
-from boto.dynamodb.types import get_dynamodb_type, dynamize_value, convert_num
+from boto.dynamodb.types import get_dynamodb_type, dynamize_value
+from decimal import Decimal
def item_object_hook(dct):
@@ -40,11 +41,11 @@ def item_object_hook(dct):
if 'S' in dct:
return dct['S']
if 'N' in dct:
- return convert_num(dct['N'])
+ return Decimal(dct['N'])
if 'SS' in dct:
return set(dct['SS'])
if 'NS' in dct:
- return set(map(convert_num, dct['NS']))
+ return set(map(Decimal, dct['NS']))
return dct
def table_generator(tgen):
View
13 boto/dynamodb/types.py
@@ -24,10 +24,11 @@
Some utility functions to deal with mapping Amazon DynamoDB types to
Python types and vice-versa.
"""
+from decimal import Decimal
-
+#120717 elee, add Decimal to the list of types recognized as a number.
def is_num(n):
- types = (int, long, float, bool)
+ types = (int, long, float, bool, Decimal)
return isinstance(n, types) or n in types
@@ -35,14 +36,6 @@ def is_str(n):
return isinstance(n, basestring) or (isinstance(n, type) and issubclass(n, basestring))
-def convert_num(s):
- if '.' in s:
- n = float(s)
- else:
- n = int(s)
- return n
-
-
def get_dynamodb_type(val):
"""
Take a scalar Python value and return a string representing
Something went wrong with that request. Please try again.