Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

int, Int, and box #1470

Closed
StefanKarpinski opened this issue Oct 29, 2012 · 155 comments · Fixed by #10452
Closed

int, Int, and box #1470

StefanKarpinski opened this issue Oct 29, 2012 · 155 comments · Fixed by #10452
Assignees
Labels
breaking This change will break code decision A decision on this change is needed speculative Whether the change will be implemented is speculative
Milestone

Comments

@StefanKarpinski
Copy link
Sponsor Member

I.e. write Int(1.5) instead of int(1.5). See this thread for discussion:

https://groups.google.com/d/topic/julia-dev/gy1HlWWcxBA/discussion

@JeffBezanson
Copy link
Sponsor Member

There are actually 3 separate points here:

  • changing box(Int, x) to Int(x)
  • changing int(x) to Int(x)
  • eliminating convert

The last one will not really work, since convert lets you associate conversions with types, but not all types can have methods added to them. If it were possible to associate methods with arbitrary types, the mechanism for doing so would be exactly the same as convert is now, just more obscure.

@JeffBezanson
Copy link
Sponsor Member

I also find it a bit odd to use Int(A) to convert an array to an integer array, since it feels like Int should return an Int, but this is not necessarily a technical problem.

@StefanKarpinski
Copy link
Sponsor Member Author

Any thoughts on the other two?

@JeffBezanson
Copy link
Sponsor Member

Changing int to Int would be no problem. But making bits types and abstract types callable seems to make the convert/construct redundancy even worse, since we'd have both T(x) and convert(T,x) for a bigger set of types.

There are two things we need, so there should be two syntaxes: one is "make this be of type T or else raise an error" (currently convert), and the other is various user-defined behaviors around a type, like converting arrays. So there are two options: merge int and Int, or merge Int and convert(Int, _).

Thinking more about how we could do the second one, each call f(x) could effectively become

if iscallable(f)
  f(x)
else
  convert(f, x)
end

where convert may no longer be first class. Instead it might only implement built-in conversion rules for tuple and union types.

@toivoh
Copy link
Contributor

toivoh commented Oct 29, 2012

Would convert for user defined types would be replaced by calls to the constructor? ( Otherwise, ignore the following )
I don't really feel comfortable with that; I want to be able to write types where the constructor is basically a private matter for the type, and objects are constructed using factory functions that know how to invoke the constructor.
And a single-argument constructor call might mean a bunch of other things besides "convert the value I gave you to your type".

Also, I know there are some convert methods that convert to abstract types, and I think they can be quite handy.
I guess that wouldn't be possible with a constructor call. Or to define a single convert method for all descendents of some abstract type.

Another thing to think about: I wouldn't expect convert(T,x) to necessarily create a new object, but I probably would expect T(x) to copy x even if x::T already.

@JeffBezanson
Copy link
Sponsor Member

Those are some good points. As much as I would like to remove excess concepts, convert will probably have to stay. If all our types were classes, it might tip the decision the other way.

@StefanKarpinski
Copy link
Sponsor Member Author

These are definitely good points, especially regarding convert(T,x) being the identity when x::T.

@StefanKarpinski
Copy link
Sponsor Member Author

Now that bits types and composites are all instances of DataType, I think we should revisit this. It's rather awkward trying to explain why you can add methods to BigInt but not to Int.

@JeffBezanson
Copy link
Sponsor Member

This is definitely in the category of issues that keep me up at night.

If/when we allow adding methods to Int, to me the immediate next problem is confusion between Int(x) and convert(Int,x). We might want some arrangement that T(x) calls convert by default (T(args...) = convert(T, args...)) or the other way around, so that there is consistency in which thing you define.

BigInt is indeed the best example of this right now, since it happens to mostly define constructors instead of convert methods. And I just discovered that while BigInt() accepts integers and strings, you can only convert from BigFloat via convert. This is the kind of confusion we should be able to avoid.

@JeffBezanson
Copy link
Sponsor Member

I've never really liked that some types have generic functions stuck inside them. It may be that all constructors should actually be methods of convert; i.e. make a(b) mean isa(a,Type) ? convert(a,b) : usual_apply(a,b).

@ivarne
Copy link
Sponsor Member

ivarne commented Nov 24, 2013

Do you suggest a special behaviour for 1 argument constructors? That would feel a bit strange for some types, like MyArray where convert(MyArray, 2) creates a MyArray with value 2, but MyArray(2) creates a 2 dimensional array.

@JeffBezanson
Copy link
Sponsor Member

Fair point; making 1-argument constructors a special case certainly doesn't seem right. And especially for mutable types conversion and construction start to seem more different.

@JeffBezanson
Copy link
Sponsor Member

Also, I didn't intend to suggest that 1 argument be a special case; I should have written a(b...).

@StefanKarpinski
Copy link
Sponsor Member Author

I know you don't like it, but it's really convenient and no one's going to be happy if that goes away. I really don't see why it's a big problem. It fits quite nicely with the general idea of a single object being the locus of many different roles – BigInt is both a type and the way to construct that type. A little messy from the internals perspective, but very easy for humans to understand. Consider also the apply idea. If we had a hypothetical construct function and apply, then we could just write apply(T::Type, args...) = construct(T,args...).

I also think we might want to introduce coerce in addition to convert as a generic version of the kind of operation that int does. The relationship between these three operations could be:

  • coerce(T,x) defaults to convert(T,x) but you can make it more forceful, e.g. by doing something like coerce(::Type{Int}, x::Float64) = iround(x)
  • construct(T,x) has a fallback that calls coerce(T,x) and writing T(x) where T is a type calls construct(T,x) – which defaults to trying to coerce x to type T, which in turn defaults to converting x to type T.

It's a bit layered and nuanced, but I think it would remove a lot of repetitive definitions.

@JeffBezanson
Copy link
Sponsor Member

I'm not against the notation BigInt(x), I'm just thinking about what it should mean. Your suggestion is the same as mine, except preserving the distinction between conversion and construction by introducing construct. And given that it looks like we need to keep the distinction, that's probably the way to go.

@ghost ghost assigned JeffBezanson Nov 25, 2013
@StefanKarpinski
Copy link
Sponsor Member Author

The apply business is really just a way of letting us define in Julia code what f(x) means, where f is an arbitrary object, rather than being restricted to function objects.

@quinnj
Copy link
Member

quinnj commented Nov 30, 2013

+1 for the construct/coerce machinery.

I think there also needs to be some stricter style guidelines that come out of this discussion with the key being consistency across all kinds of types for construction/conversion. For example, as a user, I want to naturally be able to guess that to convert/coerce one type to another, I just need to use the lowercase name of the type I want; i.e. int(1.5), float(1.5), bigint(1.5), string(1.5) etc. I think the guideline should be that lowercase names of types are really just calls to convert a single argument to the lowercase type.

I think the construct/coerce part helps clean up the concepts from the developer perspective and I just want to make sure that users win here too with consistent Proper/lowercase usage.

@StefanKarpinski
Copy link
Sponsor Member Author

My proposal was actually to make all of these the names of types: Int(1.5), Float(1.5), BigInt(1.5), String(1.5). It feels a bit weird since I'm not used to this scheme, but I think it would be much less confusing to newcomers.

@quinnj
Copy link
Member

quinnj commented Nov 30, 2013

Ah, that makes sense. I didn't quite piece it all together. That's great, I was more concerned about the consistency, so that solves the problem nicely (though will it be Float64(1.5) or will Float(1.5) be allowed?).

To make sure I understand, it defines conversion by construction (i.e. all conversions/coercions are a type of construction). So in building a new type, I would overload construct(T,x...) for constructors, coerce(T,x...) for forceful conversions to my type, and convert(T,x...) for other conversions?

@StefanKarpinski
Copy link
Sponsor Member Author

Yes, I think that's about it, although it does feel like a lot of different things to remember. In particular, I think this idea does imply that you would provide constructors for a type by adding methods to construct.

@JeffBezanson
Copy link
Sponsor Member

I'm hoping we'll be able to make this mostly backwards-compatible; method definitions on types will secretly add methods to construct.

The proposed design lets you "define what you can" and let everything else fall out. For example if a new type has a natural conversion to another one, you can just define convert and be all set. So despite the added function names, this approach seems easier to use to me.

Overall I'm in favor but I have a couple concerns.

coerce is a bit fuzzy --- it's so permissive, it's hard to decide what operations might be valid coercions. For example, can you coerce a Vector to a Task that generates its values? As usual, the function is mostly for numbers, but there it is not even sufficient --- there are itrunc, ifloor, iceil, and iround. Can you coerce a complex to a real? Do you just discard the imaginary part? It's hard to know what the rules are.

I've never liked the idea that Int(x) or convert(Int,x) might return something other than an Int (e.g. when x is an array). Then again I also hate having both Int and int, so it's a bit of a stalemate. I guess I'm willing to err on the side of removing lots of confusingly-redundant names.

@ggggggggg
Copy link
Contributor

As a relative newcomer I like the idea of Int(x) replacing int(x). It seems odd that the base types follow different rules from the types I create, although I guess they don't since a constructor is different from conversion.

Also I find it odd that Int and int exist, and float exists, but Float does not.

@ssfrr
Copy link
Contributor

ssfrr commented Jan 10, 2014

I just got bit by this trying to do an Int32(frames). The "type cannot be constructed" error wasn't particularly helpful as I was doing the conversion inside a new(), so I thought it was the object I was constructing that had the problem until a google search set me straight.

Just a ping.

@JeffBezanson
Copy link
Sponsor Member

It does look like the more generic conversion functions are more popular:

jeff@gurren:~/src/julia/base$ ack 'complex(32|64|128)\('|wc -l
38
jeff@gurren:~/src/julia/base$ ack 'complex\('|wc -l
78
jeff@gurren:~/src/julia/base$ ack 'float(16|32|64)\('|wc -l
128
jeff@gurren:~/src/julia/base$ ack 'float\('|wc -l
171

I'd like to get rid of all the lower-case conversions except float and complex. Good replacements include Int(x), map(Int, x), and Array{Int}(x).

We should also remove utf8, utf16, and utf32 (and wstring). I think the main barrier there is just the verbosity of UTF16String(x). Maybe the types should be renamed to UTF16 etc.

@ViralBShah
Copy link
Member

+1

@garborg
Copy link
Contributor

garborg commented Mar 8, 2015

Love it.

@ivarne
Copy link
Sponsor Member

ivarne commented Mar 9, 2015

Just for comparison, searching all METADATA packages gives:

ivarne~/dev/allpkg$ ack 'complex(32|64|128)\('|wc -l
      15
ivarne~/dev/allpkg$ ack 'complex\('|wc -l
      91
ivarne~/dev/allpkg$ ack 'float(16|32|64)\('|wc -l
     756
ivarne~/dev/allpkg$ ack 'float\('|wc -l
     630

@timholy
Copy link
Sponsor Member

timholy commented Mar 27, 2015

I'm really pleased about this change. But one practical matter: to avoid deprecation warnings, does anyone see an alternative to forking a 0.4 version of packages? The syntax Float32(2.8) can't work on julia 0.3.

@stevengj
Copy link
Member

@timholy, you can use @compat Float32(2.8) via the Compat package, or convert(Float32, 2.8).

@timholy
Copy link
Sponsor Member

timholy commented Mar 27, 2015

I'd thought about the rewriting part, but missed the @compat method. Thanks!

@timholy
Copy link
Sponsor Member

timholy commented Mar 27, 2015

Ah, although

`convert` has no method matching convert(::Type{Float32}, ::UnitRange{Int64})

suggests a certain amount of rewriting might be necessary anyway.

@stevengj
Copy link
Member

For a range, you can do convert(Vector{Float32}, r) or @compat Vector{Float32}(r) (Float32(r) is not the syntax in 0.4 for range conversion)

@timholy
Copy link
Sponsor Member

timholy commented Mar 27, 2015

I know. I was just hoping for something I could do via search-replace with a text editor. I have a lot of packages :-). And a lot of lab-specific code.

@ViralBShah
Copy link
Member

While I love this change, I don't like the idea of writing map(Int, x) instead of int(x), which is clear and concise. What about vectorized Int(x)?

@johnmyleswhite
Copy link
Member

Also relevant: does map(Int, x) make the output array of fixed length?

@tkelman
Copy link
Contributor

tkelman commented Apr 30, 2015

What about vectorized Int(x)?

Calling a constructor for a scalar type on a non-scalar input and implicitly mapping seems like a bad idea to me. map is pretty concise too, unless you want to get into Mathematica territory of adding infix syntax like /@ for map.

@ViralBShah
Copy link
Member

Why is it a bad idea? Intuitively, it does not feel right, but I dislike doing a map even more. Doing map is not as clear as the original int(x). I am pretty sure @alanedelman thinks so as well.

@quinnj
Copy link
Member

quinnj commented Apr 30, 2015

I agree with @tkelman. While writing Int(x) may be convenient, I think it leads to the crux of the "vectorized" discussions that have been going on: namely, do we continue the use of @vectorized_1arg to auto-vectorize scalar functions, or arbitrarily define vectorized versions of scalar functions here and there, or enforce a convention like map or some kind of operator that would vectorize scalar functions over a collection. I'm highly in favor of the latter; I don't think map(Int,x) is terrible, but I think some kind of operator not unlike the .+,.* family would be ideal.

It makes it explicit, both when writing and reading code that the operation is vectorized, and doesn't require any macro-ing or auto-vectorizing for the developer.

@timholy
Copy link
Sponsor Member

timholy commented Apr 30, 2015

@ViralBShah, it's the classic issue: does exp mean elementwise-exponential or matrix-exponential? Once functions-as-arguments have good performance, I suspect all elementwise operations will be performed using map.

@JeffBezanson
Copy link
Sponsor Member

I agree map(Int, x) is not nice syntax. However we simply cannot keep adding piecemeal vectorized versions of things. From one perspective int(x) is clear, but it just doesn't scale. As you start working with more types, how do you use them? If you know int(x), and then you learn about Rational{Int8}, it's not clear how to combine them. I think this is still an unsolved problem.

@ViralBShah
Copy link
Member

I am in favour of having a generic way to vectorize things as well, and that the current method does not scale. The only issue I have with map is that we have just made julia a little more difficult and annoying for those used to writing vectorized codes.

Perhaps we can retain some of those old definitions and not deprecate them yet - after all, you still have sin(x). Once we find nice syntax for map, we can remove all the vecotrized methods in one shot.

@timholy
Copy link
Sponsor Member

timholy commented May 1, 2015

In my julia travels, I have found at least one case where I wonder whether vectorization is more useful than we generally credit. Say I have two images, img1::Image{Ufixed8} and img2::Image{RGB{Ufixed8}}, and I want to convert these so that the raw values are encoded as Float32 rather than Ufixed8. For the purposes of this discussion, let's sometimes think of an RGB simply as a fixed-length 3-vector, so really what we have in one case is an array and in the other case an array-of-arrays (except that I also often think of an RGB as a single object). For writing generic code, float32(img) does have its appeal; otherwise I have to write this as:

map(Float32, img1)
map(x->map(Float32, x), img2)

The implication is that I have to write more versions of whatever function this conversion appears inside of.

Or maybe I'm missing a better way?

@ViralBShah
Copy link
Member

That was exactly one of the cases that happened when I was working on a demo using Images.

@pao
Copy link
Member

pao commented May 1, 2015

(This discussion is turning into an #8450 sequel. Map Syntax II: APL Strikes Back)

@timholy
Copy link
Sponsor Member

timholy commented May 1, 2015

@ViralBShah, FYI, float32 exists in Images and does what is described above.

@pao, good point, let's not add more to this here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking This change will break code decision A decision on this change is needed speculative Whether the change will be implemented is speculative
Projects
None yet
Development

Successfully merging a pull request may close this issue.