Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#1395] Marshalling Set and HashSet #1405

Merged
merged 11 commits into from
Oct 10, 2019
Merged

Conversation

jiegillet
Copy link
Collaborator

As discussed in #1395.

So far I have only implemented a version where duplicates are ignored, Data.Set doesn't provide tools to check for duplicates, and the notion of set doesn't really exist in dhall so I'm still a bit unsure about that...

Copy link
Collaborator

@sjakobi sjakobi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty good to me. We should definitely offer these Types.

If however we decide to add variants that don't accept duplicates, I think we need to discuss the naming of the Types, and the question of which variant the Interpret instances should use.

dhall/src/Dhall.hs Outdated Show resolved Hide resolved
dhall/src/Dhall.hs Outdated Show resolved Hide resolved
@@ -1659,6 +1690,9 @@ instance Inject a => Inject (Vector a) where
instance Inject a => Inject (Data.Set.Set a) where
injectWith = fmap (contramap Data.Set.toList) injectWith

instance Inject a => Inject (Data.HashSet.HashSet a) where
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether we should sort the elements… 🤔

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In math, sets have no notion of ordering so I would tend towards not sorting it...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having second thoughts, maybe we should sort it... Extra opinion @Gabriel439?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, this is something that we could change later on. For now, maybe just point out that there's no sorting, and add a doctest that demonstrates it.

Copy link
Collaborator Author

@jiegillet jiegillet Oct 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I've done one for HashSet (not pushed yet), but what about Set? Data.Set. toList and Data.Set.toAscList are both O(n), I might as well use toAscList right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Data.Set.toList is defined via toAscList. toAscList is more explicit though. 👍

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct. Good.

jiegillet and others added 3 commits October 8, 2019 08:46
Co-Authored-By: Simon Jakobi <simon.jakobi@gmail.com>
Co-Authored-By: Simon Jakobi <simon.jakobi@gmail.com>
@jiegillet
Copy link
Collaborator Author

I added distinctList, the implementation works and I'm happy to discuss the name.

In my opinion the Interpret instance should use set because it's the standard way of dealing with duplicates.

Copy link
Collaborator

@sjakobi sjakobi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 on the idea of just comparing the numbers of elements!

> Non-distinct elements in fromList [NaturalLit 1,NaturalLit 1,NaturalLit 3]
>
-}
distinctList :: (Ord a) => Type a -> Type (Data.Set.Set a)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to call this setFromDistinctList.

distinctList (Type extractIn expectedIn) = Type extractOut expectedOut
where
extractOut (ListLit _ es)
| length es == length (seqToSet es) = seqToSet <$> traverse extractIn es
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would fail to catch cases where distinct Dhall Exprs map to the same Haskell value. So I believe we should check the size of the resulting Set instead of the size of the Set of Exprs.

distinctList (Type extractIn expectedIn) = Type extractOut expectedOut
where
extractOut (ListLit _ es)
| length es == length (seqToSet es) = seqToSet <$> traverse extractIn es
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've previously had little bugs where, after a refactoring, null was applied to an Either instead of a [] or so.

Would you mind using the respective size functions instead of length?

seqToSet :: (Ord a, Foldable t) => t a -> Data.Set.Set a
seqToSet = Data.Set.fromList . Data.Foldable.toList

err = "Non-distinct elements in " <> Data.Text.pack (show es)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error message could get pretty huge. Maybe we could try to find the first duplicate (Haskell) element and just report that?

@jiegillet
Copy link
Collaborator Author

jiegillet commented Oct 8, 2019

Next iteration, unfortunately it became more complex...

If I want to show the Haskell duplicate elements, I need to apply extractIn on the values of es and since Validation is not a monad, I have to resort to pattern matching, and I'm not exactly sure what to do if I get a Failure, just pass it along I guess.
Also, I need to ask for Show a for the Haskell element. In practice it's probably fine but it's kind of ugly to me.

esList = Data.Foldable.toList esSeq
esSet = Data.Set.fromList esList
sameSize = Data.Set.size esSet == Data.Sequence.length esSeq
duplicates = esList Data.List.\\ Data.Foldable.toList esSet
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're serious about the possibility of duplicates due to distinct Dhall terms having the same Haskell representation, this isn't sufficient. deleteBy (\x y -> extractIn x == extractIn y) should work.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this stage, esSet and esList are Haskell values already because we are inside of the case traverse extractIn es of.
Maybe the es notation isn't helping, to be honest I don't even know what es stands for, I copied it from other functions.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, my misunderstanding!

es is for "expressions", I guess. You could use values or vs for the Haskell values.

Duplicate elements are ignored.
-}
set :: (Ord a) => Type a -> Type (Data.Set.Set a)
set = fmap Data.Set.fromList . list
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be the same as setFromDistinctList. My reasoning is that by default we should fail loudly if we detect anything wrong

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's maybe keep this definition as ~setIgnoringDuplicates or setAllowingDuplicates?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that seems reasonable to me

Copy link
Collaborator

@sjakobi sjakobi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, apart from a few more comments.

I'd be interested in feedback from @Gabriel439 and @neongreen though.

@@ -935,6 +1064,12 @@ instance Interpret a => Interpret [a] where
instance Interpret a => Interpret (Vector a) where
autoWith opts = vector (autoWith opts)

instance (Interpret a, Ord a, Show a) => Interpret (Data.Set.Set a) where
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you document how this instance handles duplicates? Maybe just reference setFromDistinctList.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There can't be any duplicates here because the source is a Set which turns into a Dhall List.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I take that as an indication that we should find better names for Interpret and Inject. ;)

(Take a look at the definition on the line below)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh my God, don't I feel silly!
Thank you for taking some of the blame away from me :)

instance (Interpret a, Ord a, Show a) => Interpret (Data.Set.Set a) where
autoWith opts = setFromDistinctList (autoWith opts)

instance (Interpret a, Hashable a, Ord a, Show a) => Interpret (Data.HashSet.HashSet a) where
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

extractOut (ListLit _ es) = case traverse extractIn es of
Success vSeq
| sameSize -> Success vSet
| length duplicates == 1 -> extractError err1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think something like this would make it cheaper to distinguish Success and Failure:

Suggested change
| length duplicates == 1 -> extractError err1
| otherwise -> extractError err
where err | length duplicates == 1

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would it be cheaper?
I don't mind, but err1 and errN take up a few lines, I think the structure code may be harder to read...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current situation it seems that we need to compute length duplicates to get the Failure constructor. I believe that if we move the length duplicates check into the computation of the error message, the caller can determine the constructor without computing length duplicates.

Maybe GHC is smart enough, but it's better not to rely on it too much.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But Failure is only accessed if it's not a Success. If it's not a success, it doesn't go in the conditions.

f x = case x of
    Just x | error "whoops" -> 1
    Nothing -> 0

ghci> f Nothing
0

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure what you mean. extractError results in Failure too.

I'm assuming usage like

case toMonadic (extract (setFromDistinctList elementType) expr) of
  Right s -> …
  Left _ -> … -- handle failure but ignore the error message

I'm arguing that even if expr contains duplicates, it shouldn't be necessary to force length duplicates, but the current code looks like it does.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I understand what you mean now, users should be able to pattern match on Failure without extra computation. That's a good point and sorry for being so dense ^^

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries. I wasn't very clear!

Copy link
Collaborator

@Gabriella439 Gabriella439 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me! Feel free to merge whenever you are ready (or add the merge-me label)

Copy link
Collaborator

@sjakobi sjakobi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, apart from one more wibble. :)

dhall/src/Dhall.hs Outdated Show resolved Hide resolved
Co-Authored-By: Simon Jakobi <simon.jakobi@gmail.com>
@mergify mergify bot merged commit 34f706e into dhall-lang:master Oct 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants