Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataStructure organization #169

Closed
jackfoxy opened this issue Nov 30, 2012 · 58 comments
Closed

DataStructure organization #169

jackfoxy opened this issue Nov 30, 2012 · 58 comments

Comments

@jackfoxy
Copy link
Contributor

A timely recommendation for moving forward with FSharpx.DataStructures from Don Syme on this thread #165 (comment) . Jon Harrop recently wrote a lengthy response on this thread https://groups.google.com/forum/?hl=en&fromgroups=#!topic/pragmatic-functional-programming-research/K_yKAMDGJS0 to my article discussing DataStructure project organization http://jackfoxy.com/fsharpx-data-structures-discussion/

DataStructures has had an "R&D" quality, which is beneficial, so I suggest breaking DataStructures off into two projects or subprojects, FunctionalRandD and Functional. The first project would continue where the current DataStructures leaves off, but Functional would have more stringent requirements for entry and update. We could learn from failed experiments in R&D and only promote a single best-of-breed queue, heap, etc. to Functional.

I also suggest porting my project DS_Benchmark https://github.com/jackfoxy/DS_Benchmark to FunctionalRandD. It offers a consistent benchmarking infrastructure and scripting, so benchmarks can be run independently and verified. (For some reason the readme.md stopped displaying, but if you click the readme.md file it does display.) It has its architectural warts, but will benefit from other eyes making improvements.

@jackfoxy
Copy link
Contributor Author

@forki I found the Vector breaking change because it is in a project of mine I update every week, so it was no big deal for me to change code in my project. Any change I might make to Vector in its current form would be a breaking change for someone else.

I really want to move forward on DataStructure work within a new framework, as has been discussed on several threads. I don't think you have published all of your ideas on this subject. I am willing to put together the source pull requests to set-up a new structure, but you know much more about the build and NuGet. On some other thread you mentioned FSharpx 1.7. Does it make sense to institute DS organizational changes in this release?

@forki
Copy link
Member

forki commented Dec 10, 2012

Do you want to create a new nuget package "Fsharpx.Datastructures" or a complete new product?

@forki
Copy link
Member

forki commented Dec 10, 2012

I would propose you create a new project in a Datastructure branch. I then try to create an additional build for a nuget pre-release. After a while we release it together with the JSON changes as 1.7. - I always wanted to create a pre-release build for fsharpx.

@jackfoxy
Copy link
Contributor Author

I suggest the following:

  1. We leave the collections and datastructures folders "as is" for backwards compatibility, but "deprecate" them, meaning current users should migrate off of them, and the project no longer accepts changes.

  2. We create a new project / namespace / NuGet package (FSharpx.Collections.SandBox?) and copy all the existing collections and datastructures to this project. This project serves as the "Research and Development" data structures space for people interested in experimenting with data structures. We continue to accept any data structure someone in the community wants to create to this project.

  3. We create another new project / NuGet package (not sure what to call this, maybe FSharpx.Collections). This project only accepts "finished" data structures. It should only contain the "best" queue, the "best" heap, etc. and unique structures like BKTree, etc. This project should have 2 namespaces: FSharpx.Collections.Mutable and FSharpx.Collections.Persistent (or functional).

I can update the Data Structure "rules" I posted earlier with feedback I have received and any feedback you want to give and post it to the Wiki. Today I don't think there are any collections or data structures fully ready for the new "finished" project. Ideally there would be some sort of voting for structures to graduate to this project. I can create a "roadmap" of finishing touches I think are required. Community members can "finish" collections in the SandBox, and eventually a copy of a finished collection gets pulled into the finished project.

@forki
Copy link
Member

forki commented Dec 11, 2012

Ok. That sounds perfect.

@jackfoxy
Copy link
Contributor Author

@dsyme @ovatsus @mausch @DanielFabian @7sharp9 @forki and anyone else interested: pending final input from you all, FSharpx 1.7 is moving forward with the following re-organization of collections/datastructures. Naming standards are important to get right from the beginning, and impossible to correct after the fact, so please make your thoughts known on namespace names or anything else about the proposal.

  1. The Collections folder, DataStructures folder and namespace will be deprecated in 1.7. All files (except LazyList) will be copied to a new stand-alone project tentatively named "FSharpx.Collections.SandBox".

  2. FSharpx.Collections.SandBox is a stand-alone project, NuGet package, and namespace.

  3. FSharpx.Collections is another stand-alone project and NuGet package. It contains two namespaces "FSharpx.Collections.Mutable" and "FSharpx.Collections.Persistent".

  4. LazyList will have XML comments added and be moved (deleted from collections) to FSharpx.Collections.Persistent as the first collection meeting standards for this project.

  5. Community standards for FSharpx.Collections.Mutable and FSharpx.Collections.Persistent will be published to the Wiki and as a readme.md in FSharpx.Collections.

  6. FSharpx.Collections.SandBox will be for first submissions and updates of any collection/datastructure from the community. Only when a collection meets community standards will it be copied to FSharpx.Collections.

  7. Types in current Collections and DataStructures folders will be marked with a "DEPRECATED: use version in FSharpx.Collections.SandBox, FSharpx.Collections.Mutable, or FSharpx.Collections.Persistent" XML comment. Eventually (v2.0?) these folders will be deleted.

@forki
Copy link
Member

forki commented Dec 12, 2012

+1 - thanks for your work.

@7sharp9
Copy link
Member

7sharp9 commented Dec 12, 2012

Does that include adding the obsolete attribute?

On 12 Dec 2012, at 17:34, jackfoxy notifications@github.com wrote:

@dsyme @ovatsus @mausch @DanielFabian @7sharp9 @forki and anyone else interested: pending final input from you all, FSharpx 1.7 is moving forward with the following re-organization of collections/datastructures. Naming standards are important to get right from the beginning, and impossible to correct after the fact, so please make your thoughts known on namespace names or anything else about the proposal.

  1. The Collections folder, DataStructures folder and namespace will be deprecated in 1.7. All files (except LazyList) will be copied to a new stand-alone project tentatively named "FSharpx.Collections.SandBox".

  2. FSharpx.Collections.SandBox is a stand-alone project, NuGet package, and namespace.

  3. FSharpx.Collections is another stand-alone project and NuGet package. It contains two namespaces "FSharpx.Collections.Mutable" and "FSharpx.Collections.Persistent".

  4. LazyList will have XML comments added and be moved (deleted from collections) to FSharpx.Collections.Persistent as the first collection meeting standards for this project.

  5. Community standards for FSharpx.Collections.Mutable and FSharpx.Collections.Persistent will be published to the Wiki and as a readme.md in FSharpx.Collections.

  6. FSharpx.Collections.SandBox will be for first submissions and updates of any collection/datastructure from the community. Only when a collection meets community standards will it be copied to FSharpx.Collections.

  7. Types in current Collections and DataStructures folders will be marked with a "DEPRECATED: use version in FSharpx.Collections.SandBox, FSharpx.Collections.Mutable, or FSharpx.Collections.Persistent" XML comment. Eventually (v2.0?) these folders will be deleted.


Reply to this email directly or view it on GitHub.

@jackfoxy
Copy link
Contributor Author

@7sharp9 consider the obsolete attribute included (pending further comment).

@ovatsus
Copy link

ovatsus commented Dec 12, 2012

+1

@ghost
Copy link

ghost commented Dec 13, 2012

+1

I don't think "persistent" works as a name (e.g. due to confusion with "persistence" in the sense of automatic storage of the collection to disk). Perhaps FSharpx.Collections or FSharpx.Collections.Immutable

@jackfoxy
Copy link
Contributor Author

"Immutable" it is. "Functional" has also been suggested ...

@ghost
Copy link

ghost commented Dec 13, 2012

I don't favour "Functional" because, strictly speaking, these data structures uually have little to do with functional programming in the "lambda" and "function value" sense of the term - for example, they would make sense in a first-order programming environment where functions are not first-class values.

@jackfoxy
Copy link
Contributor Author

"Immutable" it is, that makes sense.

@DanielFabian
Copy link
Contributor

+1

Thanks for the work

@forki
Copy link
Member

forki commented Dec 13, 2012

"Persistent" is used in clojure and scala world. We could link to: http://en.wikipedia.org/wiki/Persistent_data_structure

I am personally not sure if the distinction makes that much sense, but the vector is not fully immutable. It is often constructed by a TransientVector which is mutable. After the first use it is flagged as read-only and allows only to create new vectors.

@ovatsus
Copy link

ovatsus commented Dec 13, 2012

Why don't we just leave them directly under FSharpx.Collections, and just use the sub-namespace for the mutables?

@forki
Copy link
Member

forki commented Dec 13, 2012

yep that sounds like a good idea.

@letrec
Copy link

letrec commented Dec 13, 2012

Does it really worth emphasising that much mutable vs immutable? Wouldn't it be sufficient to put a comment for each collection?

@forki
Copy link
Member

forki commented Dec 13, 2012

Actually it's a rather theoretical discussion. AFAIK atm we don't have
mutable collections in fsharpx. We have the TransientVector but it's
internal.
Am 13.12.2012 13:10 schrieb "letrec" notifications@github.com:

Does it really worth emphasising that much mutable vs immutable?
Wouldn't it be sufficient to put a comment for each collection?


Reply to this email directly or view it on GitHubhttps://github.com//issues/169#issuecomment-11332215.

@jackfoxy
Copy link
Contributor Author

By visual inspection Collections/CircularBuffer and DataStructures/RingBuffer appear to be mutable to me. There has been discussion on other threads of adding HashSet from PowerPack, which is mutable.

Has the number of files in the Collections folder shrunk? (I had not looked closely at it in the last 3 weeks or so.) It seems like it almost only contains extension methods now.

Since LazyList is now an FSharpx collection/datastructure, does it not make sense to incorporate its module method "mapAccum" directly into its module instead of having it in Collections.fs? Also some "helper" methods datastructures/Infrastructure.fs.

@ovatsus
Copy link

ovatsus commented Dec 13, 2012

I think the HashSet in PowerPack is the same as the HashSet in System.Collections.Generic and is just there from the time it wasn't still included in the BCL

@jackfoxy
Copy link
Contributor Author

@ovatsus You're right. I meant to refer to HashMultiMap in PP, which is mutable.

@ovatsus
Copy link

ovatsus commented Dec 13, 2012

That would be more interesting, but I would first rename it to MultiMap and do a little bit of interface cleanup to be more in line with the current Map in F#, at least removing the deprecated members

@jackfoxy
Copy link
Contributor Author

@dsyme If you had to do it over again, would you distinguish mutable/immutable in the Microsoft.FSharp.Collections namespace?

@jackfoxy
Copy link
Contributor Author

@ovatsus says "Why don't we just leave them directly under FSharpx.Collections, and just use the sub-namespace for the mutables?"

I'm liking this idea more. The FSharpx.Core.Collections folder looks smaller and more organized than I remember it. Still all of the collections in it (LazyList, ByteString, CircularBuffer, and NonEmptyList) are in the FSharpx namespace, so changing them to FSharpx.Collections namespace (and in the case of CircularBuffer FSharpx.Collections.Mutable namespace) is a breaking change.

@ovatsus
Copy link

ovatsus commented Dec 13, 2012

I tend to like simpler solutions, I really don't like too many nested namespaces. About breaking changes, I honestly don't think we should worry too much about it and leave old deprecated versions, etc... I don't think there are that many people using them (I might be wrong), and it's an easy fix, you can always get the previous version on NuGet. FSharpx is in a relatively early stage, it's not an old mature library (I think the fact we're reorganizing stuff and discussing namespaces proves that), it's better to get things right at this phase than to maintain stability.

@jackfoxy
Copy link
Contributor Author

I'm a minimalist too. I have to admit I had not looked at the collections folder lately, and my memory of it was of a more mixed-up state. Seems very organized now. If @forki agrees I will make namespace changes before the end of the weekend.

Still planning to spin-off datastructures into a stand-alone project and Collections.SandBox namespace.

@forki
Copy link
Member

forki commented Dec 14, 2012

yes please.

@jackfoxy
Copy link
Contributor Author

#190

Almost a non-breaking change, by adding AutoOpen of new namespaces to FSharpx.Core AssemblyInfo.

But strangely some files in CSharpTests had to have using FSharpx.Collections added when previously they worked without using FSharpx; Something about C# interop I do not understand.

@forki
Copy link
Member

forki commented Dec 20, 2012

@jackfoxy
Copy link
Contributor Author

Yes, I saw that. I'm planning on running some benchmarks.

@jackfoxy jackfoxy reopened this Jan 2, 2013
@jackfoxy
Copy link
Contributor Author

jackfoxy commented Jan 2, 2013

I must have accidently closed this issue. Appologies.

I opened a discussion thread https://groups.google.com/forum/?hl=en&fromgroups#!topic/pragmatic-functional-programming-research/CtcBeLepCDkon on naming data structure functions. Please contribute to the discussion.

@jackfoxy
Copy link
Contributor Author

jackfoxy commented Jan 2, 2013

@forki I've been benchmarking pre-release System.Collections.Immutable.Queue against the Queue I have ready for FSharpx.Collections. The one I have is just as fast (or slightly faster in some cases) and has more functions.

@forki
Copy link
Member

forki commented Jan 3, 2013

Feel free to replace it.

@7sharp9
Copy link
Member

7sharp9 commented Jan 3, 2013

@jackfoxy Which queue were you benchmarking against? The mutable array
based one in System.Collections?

On 3 January 2013 09:43, Steffen Forkmann notifications@github.com wrote:

Feel free to replace it.


Reply to this email directly or view it on GitHubhttps://github.com//issues/169#issuecomment-11838635.

@jackfoxy
Copy link
Contributor Author

jackfoxy commented Jan 3, 2013

@7sharp9 In total I benchmarked 10 queues, including the mutable System.Collections.Queue and System.Collections.Concurrent.Queue, which of course won almost every benchmark.

But the interesting one was pre-release System.Collections.Immutable.Queue. I think a modified version of the immutable FSharpx.Datastructures.BatchedQueue is actually better. I will post it as FSharpx.Collections.Queue as soon as I resolve the function naming issue.

I recently started a contract job which is taking a lot of my time, but I will publish the full benchmarks in a couple days.

@jackfoxy
Copy link
Contributor Author

jackfoxy commented Jan 5, 2013

I've been meaning to write a more coherent exposition on this topic, here it finally is http://jackfoxy.com/semantics-and-list-like-data-structures

@jackfoxy
Copy link
Contributor Author

@forki My proposal for the next step evolving FSharpx.Collections:

  1. Create an FSharpx.Collections.Test project and move all the collections tests to it from FSharpx.Tests (purely an organizational change)

  2. Create a FSharpx.Collections.Experimental project and copy all the Datastructures into this project. Mark all current Datastructures as obsolete (and maybe when FSharpx 2.0 is released the Datastructures folder gets deleted). This "Experimental" project would be a separate NuGet package. (I earlier suggested naming this "sandbox", but I think "experimental" has established itself as a better name.)

I can make both of these changes, but I still have not come up to speed with Fake or NuGet, so I realize this makes more work for you. What do you think?

@forki
Copy link
Member

forki commented Jan 23, 2013

You can do it. I'll fix the build if needed.

@jackfoxy
Copy link
Contributor Author

Time for interested people to let me know what they think:

It occurred to me we can keep the Git history of Datastructure test and collection test files if I refer to them in their current folders from their new projects. However, if no one objects, I will create new folders for the new test projects (FSharpx.Collections.Tests and FSharpx.Collections.Experimental.Tests) and we will lose history for the test files involved. Let me know if you object to this, or have another idea of how to preserve the history.

@mausch
Copy link
Member

mausch commented Jan 24, 2013

If by "losing history" you mean losing the ability to run a git blame on those files and get the history prior to this hypothetical move, then I don't really mind in this case. IMHO go ahead and do it...

@jackfoxy
Copy link
Contributor Author

@mausch Yeah, that's what I mean by "losing history".

@forki
Copy link
Member

forki commented Jan 24, 2013

Actually even this can be analysed (with some weird tricks) - but I think it's really no problem.

@jackfoxy
Copy link
Contributor Author

@forki NuGet FSharpx.Collections.Experimental ships without FSharpx.Core. That may be by design. I can think of good reasons not to bundle Core.dll in the Experimental NuGet package. I really do not not which way is better...anyway, thanks for your work in completing the Datastructure re-organization!

@forki
Copy link
Member

forki commented Jan 29, 2013

we can set up a dependency on the FSharpx.Core package if that helps?!

@jackfoxy
Copy link
Contributor Author

@forki That's probably a good idea.

@forki
Copy link
Member

forki commented Jan 29, 2013

Seems this is already the case see http://nuget.org/packages/FSharpx.Collections.Experimental/

@jackfoxy
Copy link
Contributor Author

...hmmm. Good old NuGet! There is no way to force NuGet to include Core in the Experimental package? I wouldn't worry about it too much. If you do install Experimental without Core the error message is pretty explicit about what needs fixing, and I assume if there were a version mis-match between Experimental and Core that would also be apparent. It is an "experimental" project, after all.

@forki
Copy link
Member

forki commented Jan 29, 2013

It would be easy to include the Core.dll in the package but this might result in conflicts when some installs the FSharpx.Core.dll package. Actually I think the current solution is the best (modulo nuget bugs).

@jackfoxy
Copy link
Contributor Author

Yeah, I agree.

@7sharp9
Copy link
Member

7sharp9 commented Jan 29, 2013

Just a quick query, is fsharpx portable profile friendly?

On 29 Jan 2013, at 16:56, jackfoxy notifications@github.com wrote:

Yeah, I agree.


Reply to this email directly or view it on GitHub.

@jackfoxy
Copy link
Contributor Author

DataStructures and Collections have been re-organized into namespaces FSharpx.Collections, FSharpx.Collections.Mutable, and FSharpx.Collections.Experimental. This issue is complete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants