Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a Dictionary abstraction over the raw Cons list. #327

Merged
merged 2 commits into from
Mar 3, 2021

Conversation

jsiek
Copy link
Contributor

@jsiek jsiek commented Mar 2, 2021

A subsequent PR will replace the uses of AssocList with this Dictionary class.

@jsiek jsiek requested review from dabrahams and jonmeow March 2, 2021 14:34
@google-cla google-cla bot added the cla: yes PR meets CLA requirements according to bot. label Mar 2, 2021
@jonmeow
Copy link
Contributor

jonmeow commented Mar 2, 2021

Would it be possible to go with something more like std::hash_map? I'm wondering if we can simplify things and use standard classes, essentially.

@jsiek
Copy link
Contributor Author

jsiek commented Mar 2, 2021

In general I agree with going with std data structures when possible.

Regarding std::hash_map... the problem is that the type checker and interpreter very often copy the dictionary (at every AST node) and then make independent changes to the two copies. The copies would be O(n) with a std::hash_map but are O(1) with the cons-based Dictionary. The Dictionary has O(n) lookup, but that only happens at variable nodes in the AST, and not every AST node.

BTW, I think the Stack class could and probably should be replaced by std::stack.

@jsiek
Copy link
Contributor Author

jsiek commented Mar 2, 2021

This Fixes #316

@jonmeow
Copy link
Contributor

jonmeow commented Mar 2, 2021

In general I agree with going with std data structures when possible.

Regarding std::hash_map... the problem is that the type checker and interpreter very often copy the dictionary (at every AST node) and then make independent changes to the two copies. The copies would be O(n) with a std::hash_map but are O(1) with the cons-based Dictionary. The Dictionary has O(n) lookup, but that only happens at variable nodes in the AST, and not every AST node.

BTW, I think the Stack class could and probably should be replaced by std::stack.

Maybe you can modify code to demonstrate how this is intended to be used? e.g., if you backed out the iterator code, would this still work? Is Cons still required after Dictionary is created?

Part of my hesitant around this change is probably also based on SetValueAt versus SettingValueAt -- I would suggest rename SettingValueAt because it's difficult to understand the difference between the two from that name.

FWIW, I've had some concerns over Cons because it feels like an abbreviation, but I don't know what it stands for - that may be adding to my hesitance to approve here. I lack an intuitive understanding of this code, and it's not clear to me how this is fitting together.

@jsiek
Copy link
Contributor Author

jsiek commented Mar 2, 2021

Regarding how it is used, I've got a branch named dict on my fork jsiek/carbon-lang that is updated to use Dictionary.

Regarding the naming of SetValueAt and SettingValueAt, I'm open to suggestions. Those names were suggestions from Dave. I was also thinking about operator[]...

Cons is from good old Lisp and is how linked lists are built. A better name would be Node or ListNode. I'm happy to change to that name. But more importantly, Dictionary shields the rest of the code from having to know about low-level stuff like Cons or the similarly low-level old AssocList class.

@jsiek
Copy link
Contributor Author

jsiek commented Mar 2, 2021

Forgot to mention the purpose of the iterator... that's needed, for example, to print out a dictionary for tracing purposes. To replace PrintEnv and PrintTypeEnv with code that's not directly working with low-level stuff like Cons or AssocList.

Copy link
Contributor

@jonmeow jonmeow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ListNode would be appreciated (Node alone sounds tree-ish). Not needed here though. Doing a pass of comments...

Regarding SettingValueAt, how about "CopyAndSetValue"? As you can probably tell, I'm happy with explicit names.

Regarding operator[] overloading, style advice is to be judicious. In this case, I would be fine overloading it for Lookup and SetValueAt, although having the functions may be easier on the caller to read.

As an alternative, you might consider shortening names:

  • Lookup -> Get
  • SetValueAt -> Set
  • SettingValueAt -> CopyAndSetValueAt (above suggestion) -> CopyAndSet

executable_semantics/interpreter/dictionary.h Outdated Show resolved Hide resolved
executable_semantics/interpreter/dictionary.h Outdated Show resolved Hide resolved
executable_semantics/interpreter/dictionary.h Outdated Show resolved Hide resolved
executable_semantics/interpreter/dictionary.h Outdated Show resolved Hide resolved
executable_semantics/interpreter/cons_list.h Outdated Show resolved Hide resolved
executable_semantics/interpreter/dictionary.h Outdated Show resolved Hide resolved
@jsiek
Copy link
Contributor Author

jsiek commented Mar 3, 2021

I went with the method names Set and Get. I ditched SettingValueAt because the same thing can be accomplished with a copy and a Set.

@dabrahams
Copy link

On Tue, Mar 2, 2021 at 1:13 PM Jon Meow notifications@github.com wrote:

Part of my hesitant around this change is probably also based on
SetValueAt versus SettingValueAt -- I would suggest rename SettingValueAt
because it's difficult to understand the difference between the two from
that name.

This is part of a pattern that comes up repeatedly in programming oriented
around value semantics: you have two “versions” of the same operation, one
that mutates a thing in place and the other that, semantically, copies the
first one and mutates the copy, returning it. Example:

things.Append(x);                         // mutates things
let extendedThings = things.Appending(x); // does not mutate things

One often wants both versions of the operation, both for expressivity—it
makes code cleaner—and efficiency—appending to a vector is an example where
starting with a copy is inefficient.

(Ideally, there would be language features so the operations could be
related under one name, and so one operation can be synthesized from the
other one by the compiler when there's no efficiency advantage to writing
them separately, e.g:

 things.=Appending(x);                     // mutates things

we had this feature in Swift for a week or two before somebody panicked
and… well, it's a long story).

In any case, the way we came up to relate and distinguish the two
operations in Swift was that the mutating version would form an active verb
phrase (like "append x" at the use site) and the non-mutating operation
would form a related noun phrase ("things, appending x") at the use site.
This convention was extensively deployed in Swift libraries and has worked
out really well for users.

I'm certainly open to considering other conventions if you have a
suggestion, but without an alternative, it made sense to use one that was
proven in practice.

FWIW, I've had some concerns over Cons because it feels like an

abbreviation, but I don't know what it stands for - that may be adding to
my hesitance to approve here. I lack an intuitive understanding of this
code, and it's not clear to me how this is fitting together.

Jeremy already answered about Cons; I'll just add that it's an established
term of art in CS, but in the long run it's my aim to eliminate it and in
the short run to keep it as a low-level implementation detail of more
comprehensible data structures like Dictionary. There are other, better,
persistent dictionary data structures we can use eventually (e.g. a
red-black tree with shared nodes).

I very much suspect, though, even if we do use std components like stack,
that we will want to create wrappers around them that make the executable
semantics code more expressive and readable. For example, one thing we see
often with the stacks in this code is that the top n elements are replaced
with one or more elements, which amounts to several pops and subsequent
pushes. I made a dent in the repetitiveness of that pattern by having an
operation that pops N elements off the top of the stack, but something like

pile.ReplaceTop(3, { thingA, thingB });

will be much better. Method syntax is another important tool in clarifying
code (e.g. which thing is being mutated, reducing nesting), which is why I
don't want to build this as a free function on std::stack.

I hope this begins to explain why we're headed this way

/cc @geoffromer

--
-Dave

@dabrahams
Copy link

@jonmeow wrote:

ListNode would be appreciated (Node alone sounds tree-ish). Not needed here though. Doing a pass of comments...

FWIW, in lisp, cons cells are in fact used to build everything, including trees. I'm not defending the name or the usage. At the same time, there's limited benefit in spending the time to rename something we know we want to throw out soon.

Copy link
Contributor

@jonmeow jonmeow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the changes @jsiek.

@dabrahams I think this code is more understandable as it's currently implemented. I think the SettingValueAt method has simply been dropped, which I think eliminates that dissent.

@jsiek jsiek merged commit f374cae into carbon-language:trunk Mar 3, 2021
jsiek added a commit that referenced this pull request Mar 3, 2021
* add command-line flag to enable/disable tracing output

* adding missing exit for pattern variable in wrong context and a test case for it (#324)

* Update executable_semantics/interpreter/interpreter.cpp

comment on separate line as code

Co-authored-by: Jon Meow <46229924+jonmeow@users.noreply.github.com>

* fix interpreter's handling of optional else of if statement (#323)

* Update pattern_variable_fail.golden due to error (#334)

* Use llvm's CommandLine for parsing (#332)

* GitHub testing action (#331)

Co-authored-by: Chandler Carruth <chandlerc@gmail.com>

* add copyright

* Create a Dictionary abstraction over the raw Cons list. (#327)

* Create a Dictionary abstraction over the raw Cons list.

* renamed Cons and some methods of Dictionary, various other cleanup

* Update executable_semantics/tracing_flag.cpp

added namespace comment

Co-authored-by: Jon Meow <46229924+jonmeow@users.noreply.github.com>

* added a comment to cpp file

Co-authored-by: Jon Meow <46229924+jonmeow@users.noreply.github.com>
Co-authored-by: Dave Abrahams <dabrahams@google.com>
Co-authored-by: Chandler Carruth <chandlerc@gmail.com>
@jsiek jsiek deleted the dictionary branch March 3, 2021 21:11
chandlerc pushed a commit that referenced this pull request Jun 28, 2022
* Create a Dictionary abstraction over the raw Cons list.

* renamed Cons and some methods of Dictionary, various other cleanup
chandlerc added a commit that referenced this pull request Jun 28, 2022
* add command-line flag to enable/disable tracing output

* adding missing exit for pattern variable in wrong context and a test case for it (#324)

* Update executable_semantics/interpreter/interpreter.cpp

comment on separate line as code

Co-authored-by: Jon Meow <46229924+jonmeow@users.noreply.github.com>

* fix interpreter's handling of optional else of if statement (#323)

* Update pattern_variable_fail.golden due to error (#334)

* Use llvm's CommandLine for parsing (#332)

* GitHub testing action (#331)

Co-authored-by: Chandler Carruth <chandlerc@gmail.com>

* add copyright

* Create a Dictionary abstraction over the raw Cons list. (#327)

* Create a Dictionary abstraction over the raw Cons list.

* renamed Cons and some methods of Dictionary, various other cleanup

* Update executable_semantics/tracing_flag.cpp

added namespace comment

Co-authored-by: Jon Meow <46229924+jonmeow@users.noreply.github.com>

* added a comment to cpp file

Co-authored-by: Jon Meow <46229924+jonmeow@users.noreply.github.com>
Co-authored-by: Dave Abrahams <dabrahams@google.com>
Co-authored-by: Chandler Carruth <chandlerc@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes PR meets CLA requirements according to bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants