Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Record Types #120

Open
RossTate opened this issue Dec 24, 2011 · 9 comments
Open

Record Types #120

RossTate opened this issue Dec 24, 2011 · 9 comments

Comments

@RossTate
Copy link
Member

In the discussion for Tuples (#81), @luxspes brought up the issue of record types, a variant of tuples which are particularly important to databases and relational algebra.

One database given was:

RELATION {
        TUPLE {SourceUserName 'Joseph', TargetUserName 'Peter', RelationshipName  'Friend'},
        TUPLE {SourceUserName 'Jane', TargetUserName 'Alfred', RelationshipName  'Enemy'}
 } where SourceUserName = 'Joseph'

resulting in RELATION { TUPLE {SourceUserName 'Joseph', TargetUserName 'Peter', RelationshipName 'Friend'} }.

If we use the syntax {SourceUserName:String,TargetUserName:String,RelationshipName:String} to denote a record type in Ceylon, and the syntax {SourceUserName="Joseph", TargetUserName="Peter", RelationshipName="Friend"} to denote a record instance in Ceylon, then we can encode that example using comprehensions (#82):

value rel
    = {for rec in {{SourceUserName="Joseph", TargetUserName="Peter", RelationshipName="Friend"},
                   {SourceUserName="Jane", TargetUserName="Alfred", RelationshipName="Enemy"}}
       if rec.SourceUserName == "Joseph"
       rec};

and rel would have inferred type {SourceUserName:String,TargetUserName:String,RelationshipName:String}* and essentially have value {{SourceUserName="Joseph", TargetUserName="Peter", RelationshipName="Friend"}}.

Another example was, rather than

SELECT DISTINCT a as X,c,d,e,f,g,h,i, g+h as z from someTable

where you have to list all the fields you're not even messing with, we should be able to do

someTable RENAME (a AS X) REMOVE (b) ADD (g+h z)

So, if we extend the syntax for records with

  • rec.+field=value to take a record rec and make a new record with extra field field with value value
  • rec.-field to take a record rec and make a new record that's the same except without field field

then we can do this in Ceylon with the following:

{for row in someTable (row.+X=row.a).-a.-b.+z=row.g+row.h}

The syntax is a bit wordier than the database example, but that's because the database example is extremely context sensitive with a lot of implicit capturing. For example, if there were some previously defined variable g, then it's not clear whether the g+h expression should reference the previously defined g or the column g.

These thoughts open up a number of issues:

  1. Do we want record types?
  2. How should subtyping for record types work?
  3. How should polymorphism (generics) for record types work?
  4. How do we handle database joins?
  5. How do we optimize comprehensions into database queries?
@quintesse
Copy link
Member

Am I correct in understanding that {SourceUserName:String,TargetUserName:String,RelationshipName:String} is basically a shorthand for:

class TUPLE(String sourceUserName, String targetUserName, String relationshipName) {
    shared String sourceUserName = sourceUserName;
    shared String targetUserName = targetUserName;
    shared String relationshipName = relationshipName;
}

where TUPLE might actually be something explicit, anonymous or defined somehow by the typechecker, I haven't thought that through.

I'm not sure if I see much of an advantage, with the proposed changes in shared initializer parameters the above could already be shortened to:

class TUPLE(shared String sourceUserName, shared String targetUserName, shared String relationshipName) { }

and with the builder pattern intrinsic to Ceylon we could use this to make something like:

RELATION(
    TUPLE("foo", "bar", "baz"),
    TUPLE("wim", "zus", "jet")
)

and I'm sure with some clever thinking we could make a library that does what @luxspes wants, right?

@RossTate
Copy link
Member Author

So, if I understand you correctly, no it can't be shortened to that. I forgot to mention, these are anonymous record types. We don't want to have to specify a new class each time, because we want to be able to modify the types. For example, the rec.+field=value construct is implicitly creating a new class which has all the same fields as the type of rec plus the field field. When joining two records, one makes a new type out of the union of their fields (as well as check for equal values in the intersection of their fields).

As for a library, the challenge is making a library that works for all anonymous record types. The only ones I know of use that can address features like .+ and .- involve type classes (such as the many proposals for Haskell, which I've been meaning to read in more detail) or type-level computation (such as Adam Chlipala's Ur), both of which I think we're currently avoiding. I'll explain why this is such a challenge eventually (although it's looking like it'll probably be after the new year, unfortunately).

We might have some some differences in our goals that we can take advantage of, but it'll take some investigating. Or, we might just have to come to some sort of compromise. That is, of course, if we even choose to have record types.

@quintesse
Copy link
Member

Ok I understand, but that either means we statically determine those
anonymous classes generating them wherever necessary, which would be nice
but given that very often queries are actually dynamically built based upon
user input I don't know if it's actually the way to go. We could allow
dynamic building of new types or even just transform it to plain
collections of values. But hen we're basically just putting the Hibernate
query builder with some kind of in-memory database in the language ^^

-Tako

On Sat, Dec 24, 2011 at 12:07, Ross Tate <
reply@reply.github.com

wrote:

So, if I understand you correctly, no it can't be shortened to that. I
forgot to mention, these are anonymous record types. We don't want to have
to specify a new class each time, because we want to be able to modify the
types. For example, the rec.+field=value construct is implicitly creating
a new class which has all the same fields as the type of rec plus the
field field. When joining two records, one makes a new type out of the
union of their fields (as well as check for equal values in the
intersection of their fields).

As for a library, the challenge is making a library that works for all
anonymous record types. The only ones I know of use that can address
features like .+ and .- involve type classes (such as the many
proposals for
Haskell, which I've been meaning to read in more detail) or type-level
computation (such as Adam Chlipala's Ur), both of which I think we're
currently avoiding. I'll explain why this is such a challenge eventually
(although it's looking like it'll probably be after the new year,
unfortunately).

We might have some some differences in our goals that we can take
advantage of, but it'll take some investigating. Or, we might just have to
come to some sort of compromise. That is, of course, if we even choose to
have record types.


Reply to this email directly or view it on GitHub:
#120 (comment)

@gavinking
Copy link
Member

@quintesse I think at the actual Java level a record is just a tuple (you add support for tuples and later layer support for records on top of it). The names would be something that exists at the typechecker level and eliminated during lowering.

@quintesse
Copy link
Member

But can that handle the dynamic situation? Because normally with tuples you
know at compile time with what "size" tuple you're dealing. Isn't that a
possible problem?

-Tako

On Sat, Dec 24, 2011 at 18:51, Gavin King <
reply@reply.github.com

wrote:

@quintesse I think at the actual Java level a record is just a tuple (you
add support for tuples and later layer support for records on top of it).
The names would be something that exists at the typechecker level and
eliminated during lowering.


Reply to this email directly or view it on GitHub:
#120 (comment)

@gavinking
Copy link
Member

Because normally with tuples you know at compile time with what "size" tuple you're dealing.

Same is true of records. Really, they're just considered syntax sugar for a tuple in the languages I've looked at.

@quintesse
Copy link
Member

Which then removes a large part of its usefulness IMO

@RossTate
Copy link
Member Author

You raise good points. Interestingly, the implementation concerns suggest an approach for the type problem as well.

Implementation (for dynamic situations)

All record types erase to the same implementation, say an associative array (we could do a hashmap, but record types are typically small so an associative array would be faster). If an instance has type {foo:String,bar:Integer}, then that guarantees that associate array will have the key foo with a corresponding field of type String, and similarly for bar.

Type Approach (suggested by implementation)

Let {foo:String,bar:Integer} be a subtype of both {foo:String} and {bar:Integer}. So if you want to write something polymorphic over records with a foo field of type String, you can write <R extends {foo:String}> blah(R* records).

For rec.+field=val where rec has record type R and val has type T, the resulting type is R|{field:T}.

No removing fields (sorry); subtyping should take care of some of that though.

If r1 has record type R1 and r2 has record type R2, with R1|R2 satisfying Equatable<R1|R2>, then the result of intersect(r1, r2) has type (R1&R2)?. There is an issue if r1 and r2 have hidden fields of the same name, in which case join will work oddly, but I'm guessing this won't come up much for the applications we're considering.

@gavinking
Copy link
Member

Related: #745.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants