-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add gather
, count
, fix #17, fix isNumber
, fix #19
#18
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Previously our definition of what constitutes a number was off. But why don't we use parseutils.parseBiggestFloat? I assume because originally this thing was supposed to be simple? Or did we have cases which were not covered by parseBiggestFloat?
The previous definition of the lift templates was problematic, if one wanted to lift some proc only locally (which may be necessary in some circumstances, e.g. when lifting a proc for a unit test). Set the `toExport` static bool argument to false. Also exports the string templates for the user to use.
Vindaar
force-pushed
the
addGatherAndFix17
branch
from
November 7, 2019 10:57
06b41f8
to
fdea91e
Compare
Rebased onto current master. |
Phew, this proved more work / thinking than I thought it would. |
NOTE: I decided against using `parseBiggestFloat`, because while it would have been easier (and possibly faster) it parses floats from a string. That does not mean the whole string is a valid number. We could probably have gotten away with something like: ```nim var tmp: float64 let numParsed = parseBiggestFloat(s, tmp, 0) if numParsed > 0 and numParsed == s.len - 1: result = true ``` but this still wouldn't have accounted for allowed spaces at the end of the string... Well, I guess part of me wanted so see how hard it would be to write a float checker (which probably still has bugs). And well, we don't even want the number from this proc... :)
Fortunately the CSV parser =rowEntry= proc returns a =var string=, which allows us to remove the white space in place before we assign it. This finally gets rid of our darn string copy bottleneck. Yay!
Count is in principle a convenience proc for `group_by` and `summarize`
Otherwise the `ids` set will be printed completely. Since we often have the set full this "breaks" printing of scales.
Since `marker` is now a field of a ginger.Style, should hash it.
Adds a proc to assign a specific value to the position `idx` in column `k` to value `val`.
Now `f{}` works both for cases where a called function is a field of an object and also if simply a function is available under an identifier and not a raw proc. This allows such a formula: ```nim f{ s.col ~ s.trans(s.col) } ``` to compile (and work!), mapping the `s.col` to its transformed.
These are requried to decide which kind of plot to draw. I.e. is it a discrete or continuous plot along each axis. Note that discrete in y direction is not implemented yet!
Previously this forbid the application of more complicated functions than simply e.g. "tan("cty")"
The `callHistogram` proc handles the binning related fields of the Geom object and takes the correct information as input. Also takes care to assign the `dcKind*` fields of the FilledGeom as well as adding the `binWidths` field to the resulting DF for the binning stat.
Previously wasn't allowed to draw ticks with relative coordinates, but for discrete ticks it makes sense, since the labels aren't numerical anyways and we know perfectly well where the ticks are going to be.
The assertion that the resulting `numX` should be the same as the input geom.numBins is bad due to the additional ways to set the number of bins beyond just the `numBins` geom field (binWidth, breaks etc). Then we also don't want to assign the `numX` back to the geom, since that is supposed to store the value the user assigned originally. Later we're going to use `numX` only anyways.
This allows complete control both about the binning for the stat_bin like case as well as the interpretation in the stat identity case for the data (in case user hands prebinned DF and wants geom_point overlaid onto center of bins)
Also raises an exception if an unsupported geom is to be post processed instead of just ignoring it. `geom_point` can now be binned due to changes to the binning related fields of the Geom object. Information is available and accessible via geom_point proc.
Instead of simply having one proc for each geom which has to deal with all sorts of different cases we now divide the logic rather by: - discreteness - stat kinds - and only then geoms This allows to handle all sorts of different options in a more streamlined way, especially regarding adding more options in the future (at least I hope so). Thought the code could probably still be compactified some more, but well. Fine of for now.
We still have to add facet plots back in. This will simply be done by having a facet field in the FilledScales and if facet is set in the GgPlot object we'll group_by the desired field before the `postProcessScales` step.
Have to extract the ginger view from the PlotView object now, instead of receiving it from ggcreate directly. Pretty surprised that the tests still passed though!
If a scale does not exist, so the user wants to set some value we shouldn't use `select`. Thus use `getIdentityData` here as well for the DF we hand to `fillScaleImpl`
Test compares the resulting PDF files line by line except the line containing the creation date.
Vindaar
force-pushed
the
addGatherAndFix17
branch
3 times, most recently
from
November 16, 2019 06:18
f758336
to
374f1b7
Compare
Apparently the order is machine specific after all, hehe.
Vindaar
force-pushed
the
addGatherAndFix17
branch
from
November 16, 2019 07:26
3cf821c
to
d88cdd3
Compare
Vindaar
force-pushed
the
addGatherAndFix17
branch
from
November 16, 2019 08:13
2970bf8
to
becf674
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
See update below
This PR first of all changes our definition of what constitutes a number, since our definition was broken (question remains why we still use that proc, instead of
parseutils.parseBiggestFloat
).Then adds
gather
, so that the plot mentioned in issue #17 can be written as:instead of the handling with
select
andbind_rows
.Finally it will fix #17 properly.
gather
count
for DFput facet wrap back inwill be done in another PR soonwrite new tests for new featuresfor now done via plot comparisongeom_point
w/stat="identity"
workUpdate:
This turned out to be a major rewrite of a big chunk of the
ggplotnim.nim
file. Essentially the calculation part of the code and the drawing portion was better split now.It also contains a (hastily written) discussion of a performance bottleneck I encountered during the rewrite. While the specific case of that has been fixed, it seems to be a bigger problem, which is GC related. If weird performance regressions happen, attempt to compile the code with
--gc:boehm
, which does not suffer from the same slowdowns.Taken from the commit with the major changes:
Update 2
Well, after the last change, which rewrote the data processing I then realized that the drawing logic wasn't really made to handle the new data that was available. So I rewrote that code too.
Now the geom that's being drawn is almost decoupled from the calculations that happen for the data, so that e.g. points can be used to represent (or add to) a bar plot, add points to a binned plot etc.
More recipes for all these cases will be added.