Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Changes to convert dynamoDb number type to a python decimal type, to pre... #890

wants to merge 1 commit into


None yet
4 participants

...serve the 38 decimal digits that dynamoDb supports (after the decimal point).

This avoids converting the number to a float or int, which causes problems since the value no longer matches what is in dynamoDb.

See boot issue 873 for more details.
#873 (comment)

Changes to convert dynamoDb number type to a python decimal type, to …
…preserve the 38 decimal digits that dynamoDb supports (after the decimal point).

This avoids converting the number to a float or int, which causes problems since the value no longer matches what is in dynamoDb.

See boot issue 873 for more details.
#873 (comment)

garnaat commented Aug 23, 2012

I should have used Decimal to begin with. That was my mistake. I'm trying to figure out the impact, if any, to existing users. Any thoughts on that? I'll do some local testing, too.

Some thoughts from looking at the decimal docs :

If existing users were using the results for arithmetic operations they might see TypeError exceptions (since the returned item is now a Decimal instead of an int or float). It looks like Decimal() + int() is fine, but Decimal() + float() gives an exception. Casting the a Decimal to a float (or vice-versa) seems simple, although it is an extra step.

from decimal import *
a = Decimal(10)
b = float(1)
c = a + b
Traceback (most recent call last):
File "", line 1, in
TypeError: unsupported operand type(s) for +: 'Decimal' and 'float'
b2 = Decimal(b)
c = a + b2
c = float(a) + b


Decimal objects cannot generally be combined with floats in arithmetic operations: an attempt to add a Decimal to a float, for example, will raise a TypeError. There’s one exception to this rule: it’s possible to use Python’s comparison operators to compare a float instance x with a Decimal instance y. ...

I guess that if an application switched to using Decimal instead of floats it would require more memory. For me that is a "don't care" and I guess for most people (i.e. using dynamoDB more for "high-availability" than for massive storage). Actually for me it is a plus since we are storing long numeric keys and want to keep all the digits!

If value is a float, the binary floating point value is losslessly converted to its exact decimal equivalent. This conversion can often require 53 or more digits of precision. For example, Decimal(float('1.1')) converts to Decimal('1.100000000000000088817841970012523233890533447265625').

This is more of a "future" issue, than a backward compatibility issue, but I noticed this in the Decimal docs. I was wondering if an application took in arbitrary Decimal values, and stored them in dynamoDB, what would happen for the special values below (i.e. what would dynamoDB do with them). There is already a conversion when storing things in dynamoDB anyway (38 digits max after the decimal point, etc.) so you can't expect an arbitrary Decimal to come back unchanged but it would be nice to figure out the limits. Special values
The number system for the decimal module provides special values including NaN, sNaN, -Infinity, Infinity, and two zeros, +0 and -0.

My $0.02 is that Decimal is a good match to the large numbers allowed by DynamoDB and is the best default.

For backward compatibility maybe there could be a configuration option to have the "old behavior", so if someone upgrades and see exceptions (that they don't have time to fix right away) they can set a config option and quickly get the old behavior until they have time to deal with the exceptions.



disruptek commented Aug 23, 2012

I think I'd prefer to extend Decimal to allow it to warn (or die) during truncation, handle float/int math, and implement handling for the special values. Allowing the user to override our "DynamoNumber" class would allow for both lightweight (float?) use or more sophisticated implementations.

Allowing the user to override our "DynamoNumber" class would allow for both lightweight (float?) use or more sophisticated implementations.

If there was a config option to use Decimal that would work for me also.
I haven't looked at the boto config options, but if it could be passed in
boto.connect_dynamodb( aws_access_key_id=zzz, aws_secret_access_key=zzz, dynamoNumber='Decimal')
it would be very convenient (one line in the code). A separate config file would be less convenient, but would work too.

@elee-nst elee-nst closed this Aug 23, 2012

@elee-nst elee-nst reopened this Aug 23, 2012

oops, I meant to comment, not "close and comment".


garnaat commented Sep 4, 2012

I've made some changes locally and I'm testing, but when I try to add a Decimal attribute to and Item and then store it, I'm getting an error from DynamoDB.

>>> i = table.new_item('mitch', 3)
>>> i['fie'] = Decimal(33.456)
>>> i.put()

Gives me:

DynamoDBValidationError: DynamoDBValidationError: 400 Bad Request
{u'message': u'Attempting to store more than 38 significant digits in a Number', u'__type': u'com.amazon.coral.validate#ValidationException'}

This is what is actually sent over the wire:

{"Item": {"foo": {"S": "mitch"}, "bar": {"N": "3"}, "fie": {"N": "33.4560000000000030695446184836328029632568359375"}}, "TableName": "foobar"}

elee-nst commented Sep 4, 2012

Here is my understanding:
From: http://docs.python.org/library/decimal.html#decimal.getcontext

By default Decimal will store the "exact" value (i.e. tries to not lose precision when doing conversions).

There is a
getcontext().prec = 28
setting to tell decimal how many decimal digits to store. This is used when computing the results of an operation with decimals (adding/multiplying, etc.).

There is also a .quantize() method that will trim to the desired number of digits.

Since amazon dynamodb supports 38 decimal digits of precision (after the decimal point), I tried setting the decimal.getcontext().prec to 38. But, that didn't give 38 digits after the decimal point, I needed to set it to 40 to give room for the integer portion also, "33"

import decimal
i = decimal.Decimal(33.456)

Context(prec=28, rounding=ROUND_HALF_EVEN, Emin=-999999999, Emax=999999999, capitals=1, flags=[], traps=[InvalidOperation, DivisionByZero, Overflow])



i2 = decimal.Decimal(33.456).quantize(decimal.Decimal('0.00000000000000000000000000000000000001'))



One trick seems to how to specify 38 digits AFTER the decimal point (which is the dynamoDB limit). But, this looks like it is the right place to tweak things in the decimal library.


elee-nst commented Sep 4, 2012

Or, perhaps it could be trimmed to 38 digits (after the decimal point) after it is converted to a dynamo string?

def dynamize_value(val):
Take a scalar Python value and return a dict consisting
of the Amazon DynamoDB type specification and the value that
needs to be sent to Amazon DynamoDB. If the type of the value
is not supported, raise a TypeError
def _str(val):
DynamoDB stores booleans as numbers. True is 1, False is 0.
This function converts Python booleans into DynamoDB friendly
if isinstance(val, bool):
return str(int(val))
return str(val)

perhaps replacing str(val) with something like:
str(val)[ 0 : str(val).find('.')+39 ]
i.e. return everything from the start of the string to 38 digits after the .

I'm not certain that is 100% right, I think the str(val).find returns -1 if there isn't a decimal point, so a big positive integer might possibly be trimmed to 37 characters?
maybe something like the code below (warning, I didn't have time to test this runs!).

string = str(val)
hasDot = string.find('.')
if hasDot > -1:
string = string[ 0 : hasDot+39 ]
return string


garnaat commented Sep 5, 2012

I could trim the string prior to sending it to DynamoDB. I'm just wondering what would be better from a user perspective. Trimming behind the scenes may be confusing because the value actually stored in DynamoDB would be different than the value the user sees in Python. Not trimming would probably result in errors like this from DynamoDB but it would force the user to explicitly deal with it in Python. I'm kind of leaning towards the latter approach. Comments?


disruptek commented Sep 5, 2012

Leaving the handling to the user seems like the right approach to me; it's more pythonic and doesn't obscure semantics that we can't automate expertly. If we don't like my idea of a DynamoNumber class that can handle that expertise and be overridden by the user, how about designing the number handling code as abstractly as possible, so that non-Decimal instances can be consumed even if Decimal instances are produced?


garnaat commented Sep 5, 2012

@disruptek It's not that I don't like the idea of a DynamoNumber class, I just haven't had time to implement it.

elee-nst commented Sep 6, 2012

One thought (which I have not looked at much, so it might not be practical).
There might be cases (like 33.456) where a user is expecting dynamoDB to contain exactly 33.456, not the "nearest floating point number" of '33.45600000000000306954461848363280296326'.

Since dynamoDB sends everything as a string when talking to the cloud, perhaps there could be an option to input/read out a number as a string (where the end user has full control over the number of digits, etc.).

The only hiccup I could see is that dynamoDB will remove leading/trailing 0's (if you send 1.234000000 I believe it will store it as 1.234) so you could not always do an exact string match between what you sent and what you get back. But that is inherent in dynamoDB, any library interfacing to is would have this behavior.


disruptek commented Sep 6, 2012

I like that idea. Maybe what we really want is for our internal encoding of numeric values to be strings. This matches the DynamoDB model more explicitly and allows for the "33.456" string assignment without any lossy casting. I'd like to contribute some work on this, but I'm pretty well slammed at the moment.

My apologies for a slow response! I am not that familiar with the
Layer1/Layer2 model of dynamoDB (I am a new user) and did not have a chance
to read up on it until now.
It does seem to me that Layer1 is a good place to allow json strings like
"SomeNum":{"N":"1307654345.1234567890123456789"} to be directly input.

This is the lowest-level interface to DynamoDB. Methods at this layer map
directly to API requests and parameters to the methods are either simple,
scalar values or they are the Python equivalent of the JSON input as
defined in the DynamoDB Developer’s Guide. All responses are direct
decoding of the JSON response bodies to Python data structures via the json
or simplejson modules.

What might the API look like? Do you have a small example showing how it
would be used?


On Tue, Sep 11, 2012 at 4:26 AM, jimdanz notifications@github.com wrote:

Weighing in on this, as it's relevant to a project that I'm working on.
My personal vote would be to provide a mechanism for the user of boto to
specify the DynamoDB type and value explicitly -- as in, directly specify
the strings that should go over the wire as JSON.

When I first looked at the boto API, I half expected layer1 to provide
this mechanism. Like, whereas layer2 put_item takes items that are
dictionaries (key,value) and does type inference, to me it'd be great if
ayer1 put_item would take items that are dictionaries of
(key,(type,value)), and pass both type and value along faithfully into the
JSON without extra any parsing/conversion/yada.

And then, of course, there would be the expectation that "that which is
put with layer1 cannot necessarily be retrieved successfully with layer2."

To disruptek's earlier point, I think providing this mechanism in layer1
would most closely satisfy the goal of "it's more pythonic and doesn't
obscure semantics that we can't automate expertly," since it allows users
to use boto without boto making any assumptions about their datatypes.

Thoughts? I'd be happy to whip up a pull request, but figured I'd test the
waters first...

Reply to this email directly or view it on GitHubhttps://github.com/boto/boto/pull/890#issuecomment-8456495.

Eric Lee

NOTICE: This email message is for the sole use of the intended recipient(s)
and may contain confidential and privileged information. Any unauthorized
review, use, disclosure or distribution is prohibited. If you are not the
intended recipient, please contact the sender by reply email and destroy
all copies of the original message.


disruptek commented Oct 13, 2012

I agree that Layer1 should match the API as closely as possible, but as with any protocol design, I think we want to be strict in what we produce and loose in what we consume. I don't see the harm in performing Layer1 string conversions on numeric types that we find in lieu of strings, but any lossy precision conversion or under/over-flow should raise an exception.

We can perform the conversion to Decimal using a custom Context that asserts those conditions. This will break existing code that is using lossy float storage, but it will expertly handle strings, (exact) floats, and numeric types including bool and Decimal.

Decimal does seem to be the appropriate type for wrapping Layer2 numeric values retrieved from DynamoDB, as it will yield lossless reads. The user can drop to Layer1 if they want to perform their own type conversions with an eye towards resource conservation or controlled loss of precision.

Again, this almost certainly will yield new exceptions in existing code, but the reality is that the sooner we make a change, the better.


jamesls commented Jan 7, 2013

Closing in favor of #1183, which supercedes this.

@jamesls jamesls closed this Jan 7, 2013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment