Add random sampling and improve performance of Parallel Components viz #420

thekylesaurus · 2018-05-14T21:50:03Z

Fixes #59 (partially)

Allows for random sampling via "shuffle" parameter, consistent with sklearn train/test split, for instance.

Improves performance by at least an order of magnitude for several test data sets.

@DistrictDataLabs/team-oz-maintainers

…e done and complicates matching to class values

…te traces into class and plot classes individually.

thekylesaurus · 2018-05-14T21:54:17Z

Performance analysis shown below is based on iris data set, where replicates were used to increase the data set size (as indicated by the factor on the x-axis)

thekylesaurus · 2018-05-14T21:58:14Z

Unfortunately, the approach used results in a subtly modified style. There is no longer alpha addition intra-class. The argument could be made that the new style reduces intra-class information; or, inversely, the it reduces unnecessary business. Is this style change going to be acceptable? If not, I will abandon this request and simply submit version with only the random subsampling added.

bbengfort · 2018-05-15T11:35:45Z

@thekylesaurus awesome PR, thank you so much for taking a look at this! I'm very impressed by the speed up graphs and the thorough presentation, you definitely addressed questions I may have had right off the bat.

I'm going to start my review now, and I'll dive into the alpha issue as I review. In the meantime, the tests aren't passing due to image closeness errors. It appears that your PR does not include updated images for the sampling method -- my guess was that you were omitting them until the style question was resolved. However, it is fairly simple to synchronize actual and images, if you'd like to follow these steps:

Run the tests: pytest tests/test_features/test_pcoords.py
In the root directory run the sync: python -m tests.images tests/test_features/test_pcoords.py

This will move the test case images generated by the tests in tests/actual_images to the baseline comparison directory, tests/baseline_images. You can then add the baseline images (the actual images are git ignored), commit and push them.

bbengfort · 2018-05-15T11:36:30Z

Oh, make sure you have the test dependencies installed; pip install -r tests/requirements.txt!

bbengfort

This is an excellent PR and a welcome contribution, thank you so much! I had a few comments in the code that hopefully will be easy enough to address during the sprints today. Other than that, there are some tests that should be written ... or at least stubbed out - both for the pcoords tests and the new is_series util.

I'm sensitive to the fact that you're traveling - do you have time to get these things sorted, e.g. are you at the sprints again today? If not, we can work something out to ensure your contribution gets included in a timely manner; just let me know!

bbengfort · 2018-05-15T11:38:22Z

yellowbrick/features/base.py

@@ -218,7 +218,7 @@ def fit(self, X, y=None, **kwargs):
        # Store the classes for the legend if they're None.
        if self.classes_ is None:
            # TODO: Is this the most efficient method?
-            self.classes_ = [str(label) for label in set(y)]
+            self.classes_ = [label for label in set(y)]


y may contain integers for already label encoded vectors; is there a reason you removed the forced string conversion?

I didn't understand why you would want to cast integers (or even floats) to string. It upsets the mapping between self.classes_ and y. Removing this cast makes certain operations easier downstream.

Nonetheless, I admittedly don't know all of the repercussions for the various children of DataVisualizer. I'll revert and handle this inside of ParallelCoordinates.draw()

I appreciate that - these are class labels, not classes and are generally used for display on the image. We do have a lot to figure out with this and is a fairly large topic that includes sequential vs. discrete target values #73 using label encoders, etc. Scikit-learn also has some check_target utilities that we'd also like to employ more meaningfully.

I think if you handle this inside of ParallelCoordinates.draw() you'll be setting a pretty good example.

Good deal. Is completed and will be sent in next push.

bbengfort · 2018-05-15T11:39:50Z

yellowbrick/features/pcoords.py

+    Attributes
+    --------
+
+    n_samples : int


Because this attribute is "learned" on fit, I recommend that we call it n_samples_ -- the underscore suffix is used in scikit-learn to distinguish between learned parameters and hyperparameters.

Totally agree. But self.n_samples existing prior to my contribs. I didn't know if this could break anything that inherits ParallelCoordinates.

Ah, gotcha - yeh I see it now, I checked and actually nothing subclasses ParallelCoordinates so it should be fine (you can run make make uml if you have pyreverse installed to generate these images) - though I suspect there is a test that checks that property that might have to be updated.

There actually isn't a test that references that field. But I'll add something.

Awesome thanks, in that case, it must just be left-over technical debt; but I like having it.

bbengfort · 2018-05-15T11:42:49Z

yellowbrick/features/pcoords.py

        # Visual Parameters
        self.show_vlines = vlines
        self.vlines_kwds = vlines_kwds or {
            'linewidth': 1, 'color': 'black'
        }

+        # Defaults for derived values
+        self.n_samples = None


If this parameter is learned on fit, it does not need a default specified in __init__, and even if we're not sampling, it can just contain the number of instances which might be handy information.

pycharm gets mad when you add a member variable outside of init. I tend to agree with this pep8 convention - add member variables out of init can make it difficult to track different attributes. Having everything allocated in init helps interpretability IMO.

Nonetheless, I'll remove.

That's certainly true, and in any other project, I would agree. The issue is that scikit-learn uses the existence of these attributes for check_fitted and other fit-related validation.

Removed from init in next push

bbengfort · 2018-05-15T11:47:32Z

yellowbrick/features/pcoords.py


-        # Choose a subset of samples
-        # TODO: allow selection of a random subset of samples instead of head
+        # Get the total number of observations


In the original version of the code, the "sampling" was done in draw since it was only a single line and prevented us from having to override DataVisualizer.fit -- however I think with the inclusion of more sampling code and the "learned" parameter, n_samples_, it now makes sense to add a fit() method to this class that calls super and performs the sampling. That will keep draw a bit more discrete and allow us to test the sampling mechanism without creating an image (e.g. we can patch draw and inspect the input passed to draw.

Good call. Will do.

@bbengfort - do you want this new ParallelCoordinates.fit() to return X and y? Maybe only in some non-default verbose mode used for testing only?

We do want to ensure the signatures for sklearn interface methods remain the same, for fit it is:

class Estimator(object): def fit(self, X, y=None): return self

This will ensure that helper methods such as fit_transform in pipelines continue to function as expected.

To test, I recommend we mock ParallelCoordinates.draw, which X and y will be passed to, we can then inspect the output of fit there.

bbengfort · 2018-05-15T11:51:20Z

yellowbrick/features/pcoords.py

-        # Choose a subset of samples
-        # TODO: allow selection of a random subset of samples instead of head
+        # Get the total number of observations
+        nobs = len(X)


Because X is an np.ndarray at this point, there is actually no cost to calling len(X) -- it's just returning an attribute, it's not actually counting or anything. I would prefer to use len(X) over a variable name because it is a bit more readable if that's ok.

Interesting. Sounds good.

bbengfort · 2018-05-15T11:55:08Z

yellowbrick/features/pcoords.py

+        if (self.n_samples < nobs) and self.shuffle:
+            indices = choice(nobs, self.n_samples, replace=False)
+        else:
+            indices = slice(self.n_samples)


Very neat, I haven't used slice before, glad to learn something new!

Just learned about it at PyCon2018!

bbengfort · 2018-05-15T11:56:49Z

yellowbrick/features/pcoords.py

+            self.n_samples = int(nobs * self.sample)
+
+        if (self.n_samples < nobs) and self.shuffle:
+            indices = choice(nobs, self.n_samples, replace=False)


This is not a problem for this PR, but perhaps you could add a comment to the issue or a new issue (or maybe note this in the docstring) -- this method of sampling does not stratify by class, it's a uniform random sample of all instances. This means that the ratio of classes in the sampled dataset probably will not match the ratio of classes in the original data set.

Along those lines, scikit-learn has a number of shuffle-split utilities for cross-validation, we could recommend to the user that if they did need more advanced sampling they could use one of those utilities prior to passing data to the visualizer.

I think it makes sense to add the sklearn idiomatic "stratify" option to this class (and probably others). As this PR is already getting a little heavy, I'll create a new issue.

Awesome, thanks - I think that would be an excellent follow-on PR!

bbengfort · 2018-05-15T11:59:34Z

yellowbrick/features/pcoords.py

+        y = y[indices]
+
+        # Get shape of the sampled data
+        nrows, ncols = X.shape

        # Normalize
        if self.normalize is not None:
            X = self.normalizers[self.normalize].fit_transform(X)


Would you mind also moving normalization into the fit() method, please?

bbengfort · 2018-05-15T12:10:55Z

yellowbrick/features/pcoords.py

+            increments_in_class = increments_separated * len(x_in_class)
+            if len(x_in_class) > 0:
+                self.ax.plot(increments_in_class, x_in_class.flatten(), label=label, color=colors[label],
+                             alpha=0.5, linewidth=1, **kwargs)


Would you mind walking me through this new drawing method? I can see that the overall goal here was to call the plot method once per class instead of once per row - a laudable goal and one that I can see dramatically improving the performance of the visualizer. It might help to have a few comments to guide other contributors to this method, and if I could get a more detailed sense of what you're doing I can work to apply it to other visualizers.

The alpha is set here to 0.5 -- so I'm not sure I understand exactly the change in the opacity style. Given the two figures you presented, I believe the top figure is preferable because it shows darker regions of greater instance density on a per-class basis. Was the top figure generated with this code or the old code?

Frankly, even if the bottom figure was generated with your revisions, I think these revisions are worth it to keep the opacity changes and we can work on detailed styling in another PR.

Sounds good. I will add comments to the code and some documentation to this PR to clarify.

The top image was the "current" method. The bottom image was the "new" method.

I'll revert back to 0.25 alpha. The 0.5 seemed to give a better visual agreement with the former style. Are you OK with the linewidth specification? Again, this gives better stylistic agreement for the samples I investigated.

I'm ok with the linewidth argument -- in the past, we've had issues with fixed settings like this (e.g. in a "poster" mode of matplotlib where rcParams are set, this fixed value might cause some issues) but I think it will be ok for now.

bbengfort · 2018-05-15T12:13:02Z

yellowbrick/utils/types.py

@@ -202,6 +202,27 @@ def is_dataframe(obj):
 isdataframe = is_dataframe


+def is_series(obj):


Thanks for adding this! It would also be helpful if you wouldn't mind creating some tests for the type checking; the tests are pretty well defined in tests/test_utils/test_types.py -- should be a matter of copy and paste.

No problem!

thekylesaurus · 2018-05-15T15:39:11Z

Hey @bbengfort - thanks for your energy and optimism! I'm happy to try to contribute.

I am on the case, but I am having some issues with the pytest-flakes and pytest-spec test dependencies. They are not conda installable and my work machine is not amenable to conda-forge or pypi installs. I'll try to find a way around, but completing this may require access to my personal machine. I won't have access to that until next weekend at the earliest.

thekylesaurus · 2018-05-15T15:40:54Z

@bbengfort - I can also ssh to my work's linux box, but without certain windows conveniences, my abilities become a little bit handcuffed. I'll try to make it happen.

Thanks for your support!

bbengfort · 2018-05-15T15:45:23Z

@thekylesaurus I completely understand, and I must say - we're thrilled to get a Windows developer on the team, we mainly use macOS or Linux. I've been trying to add AppVeyor support for Windows CI for the past few weeks, but it's been a tricky road since the only Windows box I have to test with is the CI itself!

We do have a couple of options:

Merge this commit without the tests/add issues and stubs to create the tests -- with this option though, hopefully, you can work with someone there during the sprints to fix the tests quickly.
Wait until you have access to your machine to finish up the PR.

Either option is fine with me, let me know what you prefer!

thekylesaurus · 2018-05-15T18:19:23Z

@bbengfort - the test_parallel_coords() test was failing due to my decision to have shuffle=True by default, consistent with sklearn test/train split subsampling. I have now set shuffle=False by default, and it rectifies the problem.

However, I would think the default behavior should be to randomly subsample. Thoughts?

bbengfort · 2018-05-15T18:47:47Z

I agree that the default behavior should be to randomly subsample, perhaps we can tackle that in another PR so that the tests pass now?

thekylesaurus · 2018-05-15T20:32:47Z

@bbengfort - I had about 9 commits in my working repo today. Unfortunately, I corrupted some git-related parameter in that repo and was unable to interact with remote. I tried a number of work-arounds, to no avail. So, I re-cloned yellowbrick and manually migrated all changes relative to commit 32b292e... over to the new repo.

In the latest commit (9994e09), all of the comments in the above discussion are addressed, but you don't have the luxury of granular commits. I apologize for the inconvenience.

thekylesaurus · 2018-05-15T20:48:50Z

Illustrations of current and new method shown below:

Current:

New:

bbengfort

This looks great and we're very close to completion! Thank you so much for your hard work, and no worries about the granular commits, sorry that you had to redo all that work.

I had a couple of comments that should be addressed, but they're relatively minor. Other than that, it's about getting the tests to pass -- they're image comparison failures, possibly because you're generating the test images on a Windows machine, right?

If so, I can merge this PR with the tests failing and fix the tests to work on travis (we're still trying to figure out the windows situation). If you're on linux and want to try to generate the test images there and sync them, that's also fine with me.

bbengfort · 2018-05-16T13:38:00Z

tests/test_features/test_pcoords.py

+        visualizer.fit_transform(self.X, self.y)
+
+        visualizer = ParallelCoordinates(sample=3, shuffle=True, random_state=np.random.RandomState())
+        visualizer.fit_transform(self.X, self.y)


Should we also add a shuffle=False?

Added test_pcoords_sample_shuffle_false()

bbengfort · 2018-05-16T13:40:31Z

tests/test_features/test_pcoords.py

+        visualizer.fit_transform(self.X, self.y)
+
+        visualizer = ParallelCoordinates(sample=0.5, shuffle=True, random_state=np.random.RandomState())
+        visualizer.fit_transform(self.X, self.y)


Same comments as above in the int sample.

Also, do we want to ensure no shuffling occurs when there is no sampling? Or is this satisfied in test_static_subsample?

Added test_pcoords_sample_float_shuffle_false()

Tested in test_static_subsample. I would argue this is the best place because X and y are exposed on input and output.

bbengfort · 2018-05-16T13:40:57Z

tests/test_features/test_pcoords.py

@@ -131,3 +157,65 @@ def test_integrated_pcoords(self):
        visualizer.fit_transform(X, y)
        visualizer.poof()
        self.assert_images_similar(visualizer)
+
+    @staticmethod


What does making this a static method do for the test case?

PEP8-compliance. That is about it.

bbengfort · 2018-05-16T13:48:22Z

tests/test_features/test_pcoords.py

+        ncols = 50
+
+        y = np.arange(ntotal).reshape(ntotal, 1)
+        X = np.ones((ntotal, ncols)) * y


I very much like this idea for a test case, it definitely allows you to determine the ordering with ease! Note, however, that y should have shape (100,) and instead, it has shape (100,1). Because you need this shape in order to compute X, I recommend the following:

y = np.arange(ntotal) X = np.ones((ntotal, ncols)) * y.reshape(ntotal,1)

Good call. Now done.

bbengfort · 2018-05-16T13:49:39Z

tests/test_utils/test_types.py

+        """
+        Test that is_series does not match non-dataframes
+        """
+        assert not is_series(obj)


Thank you for adding these tests!

bbengfort · 2018-05-16T13:50:02Z

yellowbrick/features/pcoords.py

@@ -149,6 +152,15 @@ class ParallelCoordinates(DataVisualizer):
        If int, specifies the maximum number of samples to display.
        If float, specifies a fraction between 0 and 1 to display.

+    random_state : int, RandomState instance or None


Awesome +100 for this - thank you so much for thinking to add this in!

bbengfort · 2018-05-16T13:52:50Z

yellowbrick/features/pcoords.py

+    n_samples_ : int
+        number of samples included in the visualization object
+
+    rng_ : np.random.RandomState


This is a complete nit, and I apologize for it, but since I made a big deal of n_samples_ I feel like I should point out that this is not a learned parameter in fit, but a private attribute mirroring random_state and should probably be _rng. If I didn't have any other comments I would have just ignored this and perhaps made a quick change after the PR, but perhaps you could quickly update it?

Done. Removed reference to _rng from docstring.

bbengfort · 2018-05-16T13:55:15Z

yellowbrick/features/pcoords.py

-            self.n_samples = int(len(X) * self.sample)
-        X = X[:self.n_samples, :]
+        # Subsample
+        X, y = self._subsample(X, y)


Making the sampling a separate function was a very good idea - nice!

bbengfort · 2018-05-16T13:56:30Z

yellowbrick/features/pcoords.py


        # Normalize
        if self.normalize is not None:
            X = self.normalizers[self.normalize].fit_transform(X)

-        # Get the shape of the data
+        super(ParallelCoordinates, self).fit(X, y, **kwargs)
+


Perhaps I'm missing this in the code review pane, but fit() should call draw() -- am I missing where this is happening?

Hehe, I think I got it, but I'm leaving my original comment above in - the super method is calling draw() right? Perhaps it would be more clear if we did something like:

# the super method calls draw and returns self return super(ParallelCoordinates, self).fit(X, y, **kwargs)

Comment added. I should have had this in there.

bbengfort · 2018-05-16T14:11:56Z

Your illustration of the current and new incrementation method is very well done, thank you. I understand in principle what you're doing - I guess I'm not sure why you had to add the null column at the end to actually create the darkening effect, I would have thought that if you put two semi-transparent lines on top of each other, they would have darkened rather than blending as the did. Anyway - great work on figuring out a solution, the performance benefits are definitely worth it!

thekylesaurus · 2018-05-17T05:16:16Z

@bbengfort - comments addressed in most recent commits.

Pertaining to failed test, can you direct me to the test that is failing? I am able to pass all tests in test_pcoords, including the image comparison test.

Pertaining to the nans, they are in there to prevent the line plot from drawing from right to left. If you didn't have the nans, there are be lines crossing from the right hand side of the image to the left-hand side of the window.

The nans don't address the alpha effect. The lack of alpha darkening is the unfortunate result of matplotlib not accounting for alpha darkening for intersections of a single trace with itself. I spent a solid hour trying to find a work-around, with no luck.

bbengfort · 2018-05-17T10:32:18Z

Ah, sorry I forgot about the crossing lines thing - that was part of the reason we did the one line per instance implementation; the nans are a good solution! As for the alpha darkening, perhaps we can open an issue for that? I can't help feeling that there is something in matplotlib that might make it work, but if you've already spent hours on it, we should punt, the visualizer is working very well right now.

The tests are passing for you? Well, that's good news sort of - what system are you on?

I think potentially the easiest thing is for me to merge your PR and then try to fix the Travis tests myself, it's notoriously tricky and annoying. Are you ok with that?

For reference, here are the tests that are failing:

_______________ ParallelCoordinatesTests.test_integrated_pcoords _______________
self = <tests.test_features.test_pcoords.ParallelCoordinatesTests testMethod=test_integrated_pcoords>
    @pytest.mark.xfail(
        sys.platform == 'win32', reason="images not close on windows"
    )
    def test_integrated_pcoords(self):
        """
            Test parallel coordinates on a real data set (downsampled for speed)
            """
        occupancy = self.load_data('occupancy')
    
        X = occupancy[[
            "temperature", "relative_humidity", "light", "C02", "humidity"
        ]]
    
        y = occupancy['occupancy'].astype(int)
    
        # Convert X to an ndarray
        X = X.copy().view((float, len(X.dtype.names)))
    
        # Test the visualizer
        visualizer = ParallelCoordinates(sample=200)
        visualizer.fit_transform(X, y)
        visualizer.poof()
>       self.assert_images_similar(visualizer)
E       yellowbrick.exceptions.ImageComparisonFailure: images not close (RMS 26.357)
E       tests/actual_images/test_features/test_pcoords/test_integrated_pcoords.png
E       	vs
E       tests/baseline_images/test_features/test_pcoords/test_integrated_pcoords.png
tests/test_features/test_pcoords.py:185: ImageComparisonFailure
_______________ ParallelCoordinatesTests.test_normalized_pcoords _______________
self = <tests.test_features.test_pcoords.ParallelCoordinatesTests testMethod=test_normalized_pcoords>
    @pytest.mark.xfail(
        sys.platform == 'win32', reason="images not close on windows"
    )
    def test_normalized_pcoords(self):
        """
            Assert no errors occur using 'normalize' argument
            """
        visualizer = ParallelCoordinates(normalize='l2')
        visualizer.fit_transform(self.X, self.y)
        visualizer.poof()
>       self.assert_images_similar(visualizer)
E       yellowbrick.exceptions.ImageComparisonFailure: images not close (RMS 0.074)
E       tests/actual_images/test_features/test_pcoords/test_normalized_pcoords.png
E       	vs
E       tests/baseline_images/test_features/test_pcoords/test_normalized_pcoords.png
tests/test_features/test_pcoords.py:65: ImageComparisonFailure

thekylesaurus · 2018-05-18T05:47:30Z

My machine is a Windows 7 Enterprise on Dell Latitude E5570 with Intel HD Graphics 530 and AMD Radeon R7 M370.

I did more digging. In the current code base, both of the failing tests indicated above had a wrapper applied to them:

@pytest.mark.xfail(
    sys.platform == 'win32', reason="images not close on windows"
)

This wrapper evidently skips these tests when run on a windows machine.

When I removed these wrappers, I actually passed the test_normalized_pcoords() test.

I failed the test_integrated_pcoords() test. However, not because the image match failed, but because my computers firewall is blocking the load_data('occupancy') step: "SSLError: ("bad handshake: SysCallError(10054, 'WSAECONNRESET')",)"

I am in favor of your merging, but I also would express caution. Some would argument that this PR modifies the look and feel of the ParallelCoordinates plot (due to the removal of intraclass alpha darkening). If you think the speed up increase it worth it, please take the lead on merging and fixing the tests.

Thanks @bbengfort

bbengfort · 2018-05-18T15:41:21Z

@DistrictDataLabs/team-oz-maintainers @thekylesaurus suggests that we consider carefully merging this parallel coordinates implementation because it changes the style of the plots:

The current, poorly performing style plots each instance as it's own line one at a time, allowing us to achieve a darkening effect where more instances are:

The new implementation draws each class as a single line, breaking them up with nan values so there is no cross back and forth from the end of the parallel coordinates to the beginning. However, because these lines are the "same line", there is no darkening effect:

The performance boost from the proposed implementation is significant, especially as more instances are added to the plot (almost a 30x improvement for 100x the iris data set).

Any thoughts on how to proceed?

rebeccabilbro · 2018-05-18T15:53:43Z

@bbengfort - I recommend we make this an alternate visualizer, FastParallelCoordinates, and keep the currently (slow) visualizer as ParallelCoordinates; we can use the docstrings and documentation for each to explain the tradeoffs to users and leave it up to them to decide which is best for their use case.

lwgray · 2018-05-18T16:20:20Z

Is Key information being lost by not showing the density?

Disregard - my above comment - I know now that it is reduces intraclass information

thekylesaurus · 2018-05-18T16:43:18Z

What about a flag in the existing visualizer?

…

On Fri, May 18, 2018 at 11:20 AM Larry Gray ***@***.***> wrote: Is Key information being lost by not showing the density? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#420 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AEvWIrX8_Cx3HNNfbYlQL25Tghf5QSCjks5tzvTJgaJpZM4T-koL> .

bbengfort · 2018-05-20T13:30:42Z

@rebeccabilbro @thekylesaurus I think leaving it up to the user is probably the best thing to do; I'd say it's a tough call between overriding the draw method in a subclass and using a flag since the implications of the fast method are fairly subtle. For simplicity, I'm leaning towards using a flag in the constructor fast=False with a short explanation in the docstring about what's happening.

My feeling is that the usage will be that the user tries ParallelCoordinates without the fast flag, gets aggravated it's taking so long, sets the fast flag and tries again.

We should also create a section in the documentation with the graphs @thekylesaurus created and the speed up to show the implications in detail.

bbengfort · 2018-05-20T13:36:05Z

I will take the following steps when I get the chance:

create an issue to fix tests/add fast and docs
merge this PR
synchronize baseline images and repair tests
update Parallel Coordinates is getting slower #230 and Improve Parallel Coordinates #59
update tests and add fast submit PR

Hopefully tomorrow or the day after. I would just merge this in now but am reluctant to because the tests are failing, and I'd like to get them sorted ASAP so other contributors don't have failing tests related to parallel coordinates.

bbengfort · 2018-05-23T20:23:22Z

See #447

bbengfort · 2018-05-23T20:46:37Z

@thekylesaurus thank you so much for this PR and your hard work; we're glad to have you as a contributor, and hope you'll consider working with us more in the future!

Kyle Schmitt added 5 commits May 14, 2018 12:40

Initialize n_samples attribute in __init__ and include in docstring

8b23242

Allow for random subsampling in Parallel Coordinates visualizer

3320230

Refactor to economize nrow and ncol calculated params

35f0d34

Remove cast to str for class names. It is unclear why this needs to b…

b1f28ec

…e done and complicates matching to class values

Improve performance. Instead of plotting individual traces, consolida…

32b292e

…te traces into class and plot classes individually.

bbengfort reviewed May 15, 2018

View reviewed changes

Address comments received from bbengfort

9994e09

bbengfort approved these changes May 16, 2018

View reviewed changes

Address comments from bbenegort on 5_16_2018

204787d

bbengfort approved these changes May 17, 2018

View reviewed changes

bbengfort mentioned this pull request May 23, 2018

Fix ParallelCoordinates tests and add fast parameter #447

Closed

5 tasks

Merge branch 'develop' into develop

7447db6

bbengfort merged commit 6e6c58f into DistrictDataLabs:develop May 23, 2018

bbengfort mentioned this pull request May 23, 2018

Parallel Coordinates is getting slower #230

Closed

bbengfort mentioned this pull request Feb 6, 2019

Alter ParallelCoordinates docs to include plot:: directives #729

Merged

		@@ -202,6 +202,27 @@ def is_dataframe(obj):
		isdataframe = is_dataframe


		def is_series(obj):

Add random sampling and improve performance of Parallel Components viz #420

Add random sampling and improve performance of Parallel Components viz #420

Conversation

thekylesaurus commented May 14, 2018

thekylesaurus commented May 14, 2018 • edited

thekylesaurus commented May 14, 2018

bbengfort commented May 15, 2018

bbengfort commented May 15, 2018

bbengfort left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thekylesaurus May 15, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thekylesaurus May 15, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thekylesaurus May 15, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thekylesaurus commented May 15, 2018

thekylesaurus commented May 15, 2018

bbengfort commented May 15, 2018

thekylesaurus commented May 15, 2018

bbengfort commented May 15, 2018

thekylesaurus commented May 15, 2018

thekylesaurus commented May 15, 2018

bbengfort left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bbengfort commented May 16, 2018

thekylesaurus commented May 17, 2018

bbengfort commented May 17, 2018

thekylesaurus commented May 18, 2018

bbengfort commented May 18, 2018

rebeccabilbro commented May 18, 2018

lwgray commented May 18, 2018 • edited

thekylesaurus commented May 14, 2018 •

edited

thekylesaurus May 15, 2018 •

edited

thekylesaurus May 15, 2018 •

edited

thekylesaurus May 15, 2018 •

edited

lwgray commented May 18, 2018 •

edited

bbengfort commented May 20, 2018 •

edited