Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deep_type throws error on list of numpy ints #27

Closed
kyao opened this issue Jan 9, 2018 · 15 comments
Closed

deep_type throws error on list of numpy ints #27

kyao opened this issue Jan 9, 2018 · 15 comments
Labels

Comments

@kyao
Copy link

kyao commented Jan 9, 2018

The following code causes deep_type to throw a ValueError:

import pytypes
import numpy as np
pytypes.deep_type([np.int64(1), np.int64(1)])
@Stewori
Copy link
Owner

Stewori commented Jan 9, 2018

Are you using the latest release or the repository version?
I think this was fixed with 15ec80c, but there was not yet a release after that. A repeated element in a list caused a false-positive recursion detection.

@Stewori Stewori added the fixed? label Jan 9, 2018
@kyao
Copy link
Author

kyao commented Jan 9, 2018

I am using the latest from the repository. The error occurs even if there is no repeated element.

>>> pytypes.deep_type([np.int64(1), np.int64(2)])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ktyao/miniconda3/envs/d3m/lib/python3.6/site-packages/pytypes/type_util.py", line 282, in deep_type
    return _deep_type(obj, [], 0, depth, max_sample)
  File "/home/ktyao/miniconda3/envs/d3m/lib/python3.6/site-packages/pytypes/type_util.py", line 317, in _deep_type
    tpl = tuple(_deep_type(t, checked, checked_len2, depth-1) for t in obj)
  File "/home/ktyao/miniconda3/envs/d3m/lib/python3.6/site-packages/pytypes/type_util.py", line 317, in <genexpr>
    tpl = tuple(_deep_type(t, checked, checked_len2, depth-1) for t in obj)
  File "/home/ktyao/miniconda3/envs/d3m/lib/python3.6/site-packages/pytypes/type_util.py", line 301, in _deep_type
    if depth == 0 or obj in checked[:checked_len]:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

@Stewori
Copy link
Owner

Stewori commented Jan 9, 2018

Okay, confirmed. However this boils down to strange behavior with np.int64 and list:

one64 = np.int64(1)
a = [one64, one64]
one64 in a # True
one64 is a # False
one64 in [a] # ValueError

yields the same error, completely independent of pytypes. Is this intended/sane behaviour? I suppose it should maybe filed as a numpy bug. Do know if this is a known issue?

We can add a workaround like this:

def _obj_in_list(obj, lst):
	"""Simply using ``in`` is sometimes not reliable, e.g. for ``numpy.int64``,
	see https://github.com/Stewori/pytypes/issues/27.
	"""
	try:
		return obj in lst
	except ValueError:
		# As a fallback we use the index method
		try:
			lst.index(obj)
			return True
		except ValueError:
			return False

I'm a bit concerned to become bug-compatible with numpy. However in favor of workability and robustness I would add this workaround. Opinions?

@Stewori Stewori added bug and removed fixed? labels Jan 9, 2018
@kyao
Copy link
Author

kyao commented Jan 10, 2018

It is the intended behavior, however I am not sure if it is sane or not. With numpy you can compare a number against a list.

one64 == [one64,  one64]  # Returns an array of True, instead of False

Having a workaround for robustness is good.

With numpy the index method behaves the like in. So, it also fails.

[one64, [a]].index(one64)  # Returns 0
[[a], one64].index(one64)  # ValueError
_obj_in_list(one64, [one64, [a]]) # Returns True
_obj_in_list(one64, [[a], one64]) # Returns False

Perhaps, something like this:

def _obj_in_list2(obj, lst):
    """Simply using ``in`` is sometimes not reliable, e.g. for ``numpy.int64``,
    see https://github.com/Stewori/pytypes/issues/27.
    """
    try:
        return obj in lst
    except ValueError:
        # As a fallback we check each element individually
        for elt in lst:
            result = elt == obj
            if not hasattr(result, '__iter__') and result:
                return True
        return False

@Stewori
Copy link
Owner

Stewori commented Jan 10, 2018

one64 == [one64, one64] # Returns an array of True, instead of False

Then maybe an actual numpy-specific workaround would be best:

def _obj_in_list3(obj, lst):
    """Simply using ``in`` is sometimes not reliable, e.g. for ``numpy.int64``,
    see https://github.com/Stewori/pytypes/issues/27.
    """
    try:
        return obj in lst
    except ValueError:
        # NumPy-specific fallback
        return any(obj == lst) # for non-NumPy behavior raises TypeError: 'bool' object is not iterable

I'm not sure if we should also catch the TypeError that occurs if neither in nor NumPy behavior applies. We could add your suggestion as a third fallback or just deal with further issues as they occur.

@kyao
Copy link
Author

kyao commented Jan 10, 2018

This third version _obj_in_list3 looks good to me.

@mitar
Copy link
Collaborator

mitar commented Jan 13, 2018

There is another similar issue with numpy and type checking with this package.

import numpy
from pytypes import type_util

type_util._isinstance(numpy.array([1,2,3]), numpy.ndarray)
  File ".../pytypes/type_util.py", line 1489, in _isinstance
    if obj == {}:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Why is there obj == {} check at all? I mean, why a specific check for an empty dict?

@Stewori
Copy link
Owner

Stewori commented Jan 14, 2018

Empty lists, sets, dicts and other containers are specially problematic for deep_type, see python/typing#157. And since pytypes' main way of typechecking is combining deep_type and is_subtype, this can cause trouble. Given that there is not satisfying answer in python/typing#157 we model this with a pytypes-specific Empty[T] generic.
I think the if obj == {}: is a legacy hack that I forgot to remove. Maybe it was supposed to speed up _isinstance for empty dicts, but I don't see the usecase any more / cannot remember it (yes, code comments are great). It looks like we can safely remove this block, at least it causes no test failures.

That said, I did not test with numpy types so far. I didn't expect them to be specifically problematic.
Does it work if you remove the if obj == {}: block?

@Stewori
Copy link
Owner

Stewori commented Jan 14, 2018

Also: Sorry that I currently have no time to work much on these issues. I'm rather bound by another project. A soon as I can I will wrap things up for a new release, including this issue.

@mitar
Copy link
Collaborator

mitar commented Jan 14, 2018

I opened: numpy/numpy#10399 and numpy/numpy#10400

I think that the crazy conversion of int64s to numpy array is really a bug in numpy. But probably it is something magical they like and will probably not change. So we should think about how to support that.

@mitar
Copy link
Collaborator

mitar commented Jan 14, 2018

That said, I did not test with numpy types so far. I didn't expect them to be specifically problematic.
Does it work if you remove the if obj == {}: block?

Yes, this fixes the issue for me.

@mitar
Copy link
Collaborator

mitar commented Jan 14, 2018

The issue @kyao reported above, is in fact coming from type checking as well:

import numpy
import typing
from pytypes import type_util

type_util._isinstance({'mapping': [numpy.int64(1), numpy.int64(1)]}, typing.Dict)
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

It seems _isinstance first tries to construct a type and it fails in that. But it is also interesting that typing.Dict does not really require such complex type to be constructed. But of course code cannot know that in advance in a general case.

@mitar
Copy link
Collaborator

mitar commented Jan 14, 2018

One idea. In obj in checked[:checked_len], do we really want equality defined by == or do we want equality defined by is? I think we in fact want the latter, to detect loops. So fix is simple, we should just iterate and instead of == use is.

@mitar
Copy link
Collaborator

mitar commented Jan 14, 2018

I made #29 which fixes all of the issues mentioned here.

@Stewori
Copy link
Owner

Stewori commented Jan 14, 2018

Hey, thanks a lot for the commits. I am currently focused on fixing #22 and somehow want to keep my queue clean, that's why I do not merge it right away. But I will accept it after committing a fix for #22.

Stewori added a commit that referenced this issue Jan 16, 2018
Fixes #27 to support numpy values
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants