[#1395] Marshalling Set and HashSet #1405

jiegillet · 2019-10-07T14:55:58Z

As discussed in #1395.

So far I have only implemented a version where duplicates are ignored, Data.Set doesn't provide tools to check for duplicates, and the notion of set doesn't really exist in dhall so I'm still a bit unsure about that...

sjakobi

This looks pretty good to me. We should definitely offer these Types.

If however we decide to add variants that don't accept duplicates, I think we need to discuss the naming of the Types, and the question of which variant the Interpret instances should use.

dhall/src/Dhall.hs

sjakobi · 2019-10-07T21:59:31Z

dhall/src/Dhall.hs

@@ -1659,6 +1690,9 @@ instance Inject a => Inject (Vector a) where
 instance Inject a => Inject (Data.Set.Set a) where
    injectWith = fmap (contramap Data.Set.toList) injectWith

+instance Inject a => Inject (Data.HashSet.HashSet a) where


I wonder whether we should sort the elements… 🤔

In math, sets have no notion of ordering so I would tend towards not sorting it...

I'm having second thoughts, maybe we should sort it... Extra opinion @Gabriel439?

On second thought, this is something that we could change later on. For now, maybe just point out that there's no sorting, and add a doctest that demonstrates it.

OK, I've done one for HashSet (not pushed yet), but what about Set? Data.Set. toList and Data.Set.toAscList are both O(n), I might as well use toAscList right?

I think Data.Set.toList is defined via toAscList. toAscList is more explicit though. 👍

You are correct. Good.

Co-Authored-By: Simon Jakobi <simon.jakobi@gmail.com>

jiegillet · 2019-10-08T02:16:23Z

I added distinctList, the implementation works and I'm happy to discuss the name.

In my opinion the Interpret instance should use set because it's the standard way of dealing with duplicates.

sjakobi

👍 on the idea of just comparing the numbers of elements!

sjakobi · 2019-10-08T10:56:01Z

dhall/src/Dhall.hs

+> Non-distinct elements in fromList [NaturalLit 1,NaturalLit 1,NaturalLit 3]
+>
+-}
+distinctList :: (Ord a) => Type a -> Type (Data.Set.Set a)


I would prefer to call this setFromDistinctList.

sjakobi · 2019-10-08T11:02:38Z

dhall/src/Dhall.hs

+distinctList (Type extractIn expectedIn) = Type extractOut expectedOut
+  where
+    extractOut (ListLit _ es)
+      | length es == length (seqToSet es) = seqToSet <$> traverse extractIn es


I think this would fail to catch cases where distinct Dhall Exprs map to the same Haskell value. So I believe we should check the size of the resulting Set instead of the size of the Set of Exprs.

sjakobi · 2019-10-08T11:05:31Z

dhall/src/Dhall.hs

+distinctList (Type extractIn expectedIn) = Type extractOut expectedOut
+  where
+    extractOut (ListLit _ es)
+      | length es == length (seqToSet es) = seqToSet <$> traverse extractIn es


We've previously had little bugs where, after a refactoring, null was applied to an Either instead of a [] or so.

Would you mind using the respective size functions instead of length?

sjakobi · 2019-10-08T11:11:48Z

dhall/src/Dhall.hs

+        seqToSet :: (Ord a, Foldable t) => t a -> Data.Set.Set a
+        seqToSet = Data.Set.fromList . Data.Foldable.toList
+
+        err = "Non-distinct elements in " <> Data.Text.pack (show es)


This error message could get pretty huge. Maybe we could try to find the first duplicate (Haskell) element and just report that?

jiegillet · 2019-10-08T13:43:05Z

Next iteration, unfortunately it became more complex...

If I want to show the Haskell duplicate elements, I need to apply extractIn on the values of es and since Validation is not a monad, I have to resort to pattern matching, and I'm not exactly sure what to do if I get a Failure, just pass it along I guess.
Also, I need to ask for Show a for the Haskell element. In practice it's probably fine but it's kind of ugly to me.

sjakobi · 2019-10-08T15:21:32Z

dhall/src/Dhall.hs

+            esList = Data.Foldable.toList esSeq
+            esSet = Data.Set.fromList esList
+            sameSize = Data.Set.size esSet == Data.Sequence.length esSeq
+            duplicates = esList Data.List.\\ Data.Foldable.toList esSet


If we're serious about the possibility of duplicates due to distinct Dhall terms having the same Haskell representation, this isn't sufficient. deleteBy (\x y -> extractIn x == extractIn y) should work.

At this stage, esSet and esList are Haskell values already because we are inside of the case traverse extractIn es of.
Maybe the es notation isn't helping, to be honest I don't even know what es stands for, I copied it from other functions.

Ah, yes, my misunderstanding!

es is for "expressions", I guess. You could use values or vs for the Haskell values.

Gabriella439 · 2019-10-08T15:26:49Z

dhall/src/Dhall.hs

+Duplicate elements are ignored.
+-}
+set :: (Ord a) => Type a -> Type (Data.Set.Set a)
+set = fmap Data.Set.fromList . list


I think this should be the same as setFromDistinctList. My reasoning is that by default we should fail loudly if we detect anything wrong

Let's maybe keep this definition as ~setIgnoringDuplicates or setAllowingDuplicates?

Yeah, that seems reasonable to me

sjakobi

LGTM, apart from a few more comments.

I'd be interested in feedback from @Gabriel439 and @neongreen though.

sjakobi · 2019-10-09T15:36:56Z

dhall/src/Dhall.hs

@@ -935,6 +1064,12 @@ instance Interpret a => Interpret [a] where
 instance Interpret a => Interpret (Vector a) where
    autoWith opts = vector (autoWith opts)

+instance (Interpret a, Ord a, Show a) => Interpret (Data.Set.Set a) where


Can you document how this instance handles duplicates? Maybe just reference setFromDistinctList.

There can't be any duplicates here because the source is a Set which turns into a Dhall List.

I take that as an indication that we should find better names for Interpret and Inject. ;)

(Take a look at the definition on the line below)

Oh my God, don't I feel silly!
Thank you for taking some of the blame away from me :)

sjakobi · 2019-10-09T15:37:03Z

dhall/src/Dhall.hs

+instance (Interpret a, Ord a, Show a) => Interpret (Data.Set.Set a) where
+    autoWith opts = setFromDistinctList (autoWith opts)
+
+instance (Interpret a, Hashable a, Ord a, Show a) => Interpret (Data.HashSet.HashSet a) where


sjakobi · 2019-10-09T15:40:14Z

dhall/src/Dhall.hs

+    extractOut (ListLit _ es) = case traverse extractIn es of
+        Success vSeq
+            | sameSize               -> Success vSet
+            | length duplicates == 1 -> extractError err1


I think something like this would make it cheaper to distinguish Success and Failure:

Suggested change

| length duplicates == 1 -> extractError err1

| otherwise -> extractError err

where err | length duplicates == 1 …

Why would it be cheaper?
I don't mind, but err1 and errN take up a few lines, I think the structure code may be harder to read...

In the current situation it seems that we need to compute length duplicates to get the Failure constructor. I believe that if we move the length duplicates check into the computation of the error message, the caller can determine the constructor without computing length duplicates.

Maybe GHC is smart enough, but it's better not to rely on it too much.

But Failure is only accessed if it's not a Success. If it's not a success, it doesn't go in the conditions.

f x = case x of Just x | error "whoops" -> 1 Nothing -> 0 ghci> f Nothing 0

I'm not quite sure what you mean. extractError results in Failure too.

I'm assuming usage like

case toMonadic (extract (setFromDistinctList elementType) expr) of Right s -> … Left _ -> … -- handle failure but ignore the error message

I'm arguing that even if expr contains duplicates, it shouldn't be necessary to force length duplicates, but the current code looks like it does.

OK, I understand what you mean now, users should be able to pattern match on Failure without extra computation. That's a good point and sorry for being so dense ^^

No worries. I wasn't very clear!

Gabriella439

Looks great to me! Feel free to merge whenever you are ready (or add the merge-me label)

sjakobi

LGTM, apart from one more wibble. :)

dhall/src/Dhall.hs

Co-Authored-By: Simon Jakobi <simon.jakobi@gmail.com>

[dhall-lang#1395] Marshalling Set and HashSet

b654063

sjakobi reviewed Oct 7, 2019

View reviewed changes

dhall/src/Dhall.hs Outdated Show resolved Hide resolved

dhall/src/Dhall.hs Outdated Show resolved Hide resolved

sjakobi reviewed Oct 7, 2019

View reviewed changes

jiegillet and others added 3 commits October 8, 2019 08:46

Update dhall/src/Dhall.hs

cc091a6

Co-Authored-By: Simon Jakobi <simon.jakobi@gmail.com>

Update dhall/src/Dhall.hs

f385dcc

Co-Authored-By: Simon Jakobi <simon.jakobi@gmail.com>

Added distinctList

7c9b0e7

sjakobi reviewed Oct 8, 2019

View reviewed changes

setFromDistinctList updates

74088fa

sjakobi reviewed Oct 8, 2019

View reviewed changes

Gabriella439 reviewed Oct 8, 2019

View reviewed changes

jiegillet added 2 commits October 9, 2019 18:27

Doctests abour ordering

6c4b41a

both set and hashset can ignore or fail with duplicates

b25699e

sjakobi approved these changes Oct 9, 2019

View reviewed changes

Gabriella439 approved these changes Oct 10, 2019

View reviewed changes

jiegillet added 2 commits October 10, 2019 21:04

Comments on instances

ea23379

Moved length computation deeper

4867d54

sjakobi approved these changes Oct 10, 2019

View reviewed changes

dhall/src/Dhall.hs Outdated Show resolved Hide resolved

Little wibble

e88aafd

Co-Authored-By: Simon Jakobi <simon.jakobi@gmail.com>

jiegillet added the merge me label Oct 10, 2019

Merge branch 'master' into jie-set

d7a6fc7

mergify bot merged commit 34f706e into dhall-lang:master Oct 10, 2019

	\| length duplicates == 1 -> extractError err1
	\| otherwise -> extractError err
	where err \| length duplicates == 1 …

[#1395] Marshalling Set and HashSet #1405

[#1395] Marshalling Set and HashSet #1405

Conversation

jiegillet commented Oct 7, 2019

sjakobi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jiegillet Oct 8, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jiegillet commented Oct 8, 2019

sjakobi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jiegillet commented Oct 8, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sjakobi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Gabriella439 left a comment

Choose a reason for hiding this comment

sjakobi left a comment

Choose a reason for hiding this comment

jiegillet Oct 8, 2019 •

edited

Loading

jiegillet commented Oct 8, 2019 •

edited

Loading