-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Splitting varinfo.jl #118
Comments
I don't mind if it's one large file or multiple small files as long as the implementation is clearly and reasonably structured. Apart from that, I'd say 400 loc for these three methods seems a bit much and might indicate that we should try to simplify the current implementation, regardless of how we organize it 😄 |
Well, we have |
In general I also prefer "multiple small files" than "single large file". I belive splittng them could also pave our way to simplify the code if possible as David suggested. |
I am ambivalent. Either file structure is fine, I suspect mostly because I don't tend to develop a lot in DynamicPPL. I can say there is a lot of code here and I would probably want to split them up if they were in one of the packages I work more in. |
I don't feel it splitting the files will necessarily simplify the code. Personally, I like to keep related code in one file, organised by sections. This makes it easy to read the code, and easy to track changes IMO. This also makes the More importantly, we want to have a readable, modular, well-tested DSL implementation. We need to keep in mind there are only a few people who can understand (let alone modify) all code in Turing. It's important to bear in mind what we want to support directly, and indirectly, e.g by providing more documentation and advanced examples. Moving forward, I would gradually prioritise readability and performance over adding new features upon requests. |
It doesn't simplify the code but it simplifies the development and debugging process. It is easier to draw a mental model of which files use which other files than it is to draw the same model with sections in a file. Switching windows or tabs is much easier and faster than scrolling 100s of locs to see another function you suspect has something to do with an error. Tests can be also organized into smaller files encouraging more granularity and exhaustiveness. We can add longer and more detailed docstrings without risking the file growing to be many 1000s of locs. Now let's take some examples from stdlib to see how they differ. This huge file https://github.com/JuliaLang/julia/blob/master/stdlib/SparseArrays/src/linalg.jl defines linalg functions for
I find smaller self-contained files more readable than a large file which defines many things that call each other. |
I don’t disagree with that small, self-contained files are more readable. I would be happy to read those files in fact. What worries me is that, splitting VarInfo doesn’t lead to self-contained files at the moment. Instead, it leads to complex dependencies between small files, which is hard to understand and hard to track for people not super familiar about the code base. I think we can and should gradually simplify the code in VarInfo, before deciding whether it should be split into many files. If we can archive the self-contained principle, i.e., minimal clear dependency between files, then I am not against splitting VarInfo. |
I completely agree.
I don't think it will cause real dependency issues. I believe what Mohamed wants is simply
where those codes can also be seperated by sections in Hong's suggestion, i.e.
What comes to end is what Mohamed said "Switching windows or tabs is much easier and faster than scrolling 100s of locs to see another function you suspect has something to do with an error." and I completely agree with it from my own experience. Modern text editors has no support to switch between sections we define but do have support to switch files we seperate. |
Code complexity also has a lot to do with the impression. I think function call patterns are complex in varinfo.jl if you consider all of them at the same time. But looking at the 400 locs of indexing and pushing, you realize that this is only 40 locs per implementation on average including docstrings and multi-line function headers. Can it be improved? Maybe. If we can improve it, can we do it without changing any other file? Most likely yes if we don't change the underlying data structure. Is it worth the time/effort? Well, spending a few weeks to go from 40 locs per implementation to 30 may or may not be the best investment of time. I will leave that for you to judge.
I think it has been established that the single file varinfo.jl is hard to understand by people who are not super familiar with the code base. In fact, it is hard to understand by people who wrote the code and left it for a few months without working on it. I think it is worth giving the smaller files a try and see if it's actually less intimidating or not.
Well, there is a chicken and egg problem here. Splitting the files encourages us to make them more self-contained. But without making them self-contained, splitting into multiple files doesn't really help. I feel like at least for indexing and linking, we can have 2 self-contained files that define 3-4 functions at most. The remaining getters and setters are used all over the place so in the reorganization PR, I just threw them all in the kitchen sink file utils.jl. Breaking down utils.jl into self-contained files can be the next step if needed. But most of the time, I find myself tinkering with indexing or linking functions and not the other getters and setters. So maybe utils.jl can be left alone but each function can be well documented. So you see, dividing the file into files makes us think about modularity more seriously. Different sections in a file...not so much. |
I am really really in favor of splitting the gigantic varinfo.jl file into multiple files. It is currently doing a whole lot of things revolving around
VarInfo
but I think it can be managed better by splitting it into multiple files, most of which are relatively self-contained and do specific tasks.For instance, the
getindex
,setindex!
andpush!
code is almost 400 lines of code that all attempt to define these 3 functions only. Everything else is helper code. If we put these 400 lines inindexing.jl
, we know that if we suspect a bug has something to do with indexing, that it's probably somewhere in this 400 line file which is a relatively self-contained rather than in a 1100 loc file. That makes development and debuggingVarInfo
-related code easier at least for me. Even if a bug has something to do with an interaction of 2 files. It is easier for me to inspect the 2 files simultaneously and understand what's happening by switching windows or tabs, than it would be if I had to scroll up then scroll down again repeatedly.What do you all think?
The text was updated successfully, but these errors were encountered: