simplifying API #105

jeremymanning · 2017-04-24T14:40:42Z

The API could be simplified in a few places. For example:

In hyp.plot we could include an align flag that runs hyp.tools.align on the data if set to True (default: False).
In hyp.tools.align and hyp.tools.procrustes we could include a ndimsflag that runs hyp.tools.reduce on the dataset prior to alignment if not None (default: None)
In hyp.tools.align and hyp.tools.procrustes, if the data matrices don't have the same numbers of features, we should zero-pad all of the matrices to ensure they have the same number of features as the matrix with the most features
in hyp.tools.load we could include align and ndims flags that pass the data through the appropriate other functions (hyp.tools.reduce, followed by hyp.tools.align) so that the reduced/aligned data are returned from the start, without needed to save extra copies of the dataset

The text was updated successfully, but these errors were encountered:

rarredon · 2017-06-01T23:10:46Z

Hi there, I'm Ryan. I found your project through the Mozilla Sprint. This issue sounds like something I can work on. I have a few questions to clarify.

In regards to the first bullet point, where does the alignment algorithm fit in with normalization and reduction? From details of the other bullet points, it sounds like alignment should come after reduction. So should order of operations be: normalization, reduction, then alignment?
The plot function (hyperplot.plot) has many keyword args. Should I add the align flag at the end, or maybe somewhere in the middle? I think standard practice is to put the more heavily used arguments closer to the front.

andrewheusser · 2017-06-02T00:11:40Z

Hi Ryan! We'd love to have you help out! Regarding your questions -

In regards to the first bullet point, where does the alignment algorithm fit in with normalization and reduction? From details of the other bullet points, it sounds like alignment should come after reduction. So should order of operations be: normalization, reduction, then alignment?

I believe the order should be normalize -> align -> reduce. @jeremymanning can you verify this order of transforms?

The plot function (hyperplot.plot) has many keyword args. Should I add the align flag at the end, or maybe somewhere in the middle? I think standard practice is to put the more heavily used arguments closer to the front.

I would put that right after the ndims flag (i clustered the 'data transforms' in that section). It should be false by default, and throw an error if the len(x) < 1 (its doesn't make sense to 'align' something to itself). If you could add a short description to the doc string, that would also be helpful. you can find info on the operation in the docstring of hyp.tools.align

Let me know if any questions come up! you can leave comments here, or in our gitter: https://gitter.im/hypertools/Lobby

jeremymanning · 2017-06-02T02:37:47Z

For plotting, I think we want normalize --> reduce --> align. This is because:

align runs slowly on high-dimensional data, so it'll be faster to align reduced data
aligning reduced data also means that the visual appearance (i.e. what actually gets plotted) will be of aligned data, whereas if we align prior to reducing, the alignment will be only partially preserved in the plot
we should keep the behavior of plot consistent with (the proposed modifications of) align and reduce

Re: the positions of the new keyword arguments, @andrewheusser's suggestion looks good to me...

rarredon · 2017-06-02T04:36:53Z

Thanks guys, your comments are helpful.

Also, in regards to bullet point 3, how would you like the matrices to be zero padded? Should they be padded on the left, on the right, on both edges equally (if padding width is even)? Maybe there could be an extra kw arg pad_edge if you think it would be beneficial for the user to choose.

jeremymanning · 2017-06-02T10:57:09Z

Great question!

None of the functions should pad with zeros, except hyp.tools.align. (And I believe this is already implemented.) So we shouldn't need to add any zero padding to implement this issue. To explain our reasoning for not zero padding (except for align):

The basic assumption we make in reduce is that each dimension (feature) of the input is the same across the different input matrices. So that assumption is violated if the dimension labels don't match across the matrices. As a basic easy-to-implement sanity check, if the sizes of the inputs don't match (along the columns), we know for sure that the column labels can't possibly match, so the reduce function should error out. Additional thoughts on this:

If the numbers of columns do match, the onus is on the user to ensure that the input matrices are compatible
Question for @andrewheusser: if the user inputs pandas dataframes, do we do any sort of matching or re-ordering of the columns? And/or do we check to make sure the column labels match? If not, we should add an issue to implement this for reduce! Specifically:
- If there are multiple dataframes, first make sure that the set of column labels is common across all of those dataframes. If not, throw an error.
- Second, make sure that the column label orders are the same across all of those dataframes. If not, re-order the columns to match. (It doesn't matter exactly how we do this, but a simple implementation would be to re-order the columns to match the first dataframe in the list.)

The exception to all of this is align. Align's basic assumption is that the dimension labels don't match. So for align, we should zero pad (and it doesn't actually matter how we do it-- align should give the same answer regardless).

andrewheusser · 2017-06-02T12:16:54Z

@jeremymanning - re dataframes, we do not do any reordering. If the input data is a dataframe (or list of dfs) we:

transform all text columns into binary dummy variables (see here leaving the numerical columns as they are
turn the whole df into a np array
optionally output the column labels if return_labels=True

I'll add an issue to check to see if the column labels are the same across dfs when reducing. if they are the same, just out of order, they will be resorted.

@rarredon I will be working on hypertools all day today, so ping me if you need clarification as things come up!

rarredon · 2017-06-02T22:09:06Z

Okay, so I'll finish up the other parts, and forget the part about zero padding. Should be done soon!

jeremymanning · 2017-06-02T22:12:34Z

That's great @rarredon-- thanks!

jeremymanning · 2017-06-05T11:34:57Z

Thanks @rarredon! Closing this issue...

jeremymanning · 2017-06-05T11:42:07Z

Actually...re-opening, sorry. I realized: the tutorials on readthedocs still need updating.

jeremymanning · 2017-06-05T11:42:53Z

(Should be done only after we push the next version of hypertools to pip)

andrewheusser · 2017-06-05T12:21:53Z

do you mean adding some examples using these new features to the gallery? @lucywowen is working on some tutorials like http://cdl-quail.readthedocs.io/en/latest/tutorial.html, but those are still underway

jeremymanning · 2017-06-05T12:53:34Z

No-- I just mean updating the current examples (e.g. align, reduce, etc.) to use the simpler syntax. But the tutorials will be helpful too!

andrewheusser · 2017-06-05T13:23:48Z

Got it! I updated all the examples and created a log of the changes, ready for pip?

jeremymanning · 2017-06-05T13:39:03Z

ready for pip-- pip, pip, pooray! 🎉

jeremymanning · 2017-06-05T13:43:49Z

eek-- wait!!! saving animations is now crashing...

In [8]: hyp.plot([group1, group2], animate=True, zoom=2.5, save_path='/Users/jmanning/Desktop/animation.mp4')
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-8-1eebb1ac5546> in <module>()
----> 1 hyp.plot([group1, group2], animate=True, zoom=2.5, save_path='/Users/jmanning/Desktop/animation.mp4')

/Users/jmanning/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/hypertools/plot/plot.pyc in plot(x, fmt, marker, markers, linestyle, linestyles, color, colors, palette, group, labels, legend, title, elev, azim, ndims, normalize, n_clusters, save_path, animate, duration, tail_duration, rotations, zoom, chemtrails, precog, bullettime, frame_rate, explore, show)
    271     if save_path is not None:
    272         if animate:
--> 273             Writer = animation.writers['ffmpeg']
    274             writer = Writer(fps=frame_rate, bitrate=1800)
    275             line_ani.save(save_path, writer=writer)

NameError: global name 'animation' is not defined

jeremymanning · 2017-06-05T13:44:51Z

^ this also seems to affect 0.2.0 😿

andrewheusser · 2017-06-05T13:45:11Z

do you have the most version of master? i patched that

jeremymanning · 2017-06-05T13:48:25Z

ah-- nvm; just installed directly from git and it works again. sorry for the scare.

andrewheusser · 2017-06-05T13:49:21Z

all good! pushing to pip!

jeremymanning added the enhancement label Apr 24, 2017

jeremymanning assigned andrewheusser Apr 24, 2017

andrewheusser added this to the v2 release milestone Apr 25, 2017

andrewheusser added the high priority label Apr 25, 2017

jeremymanning added the easy(ish) label May 24, 2017

jeremymanning modified the milestones: mozsprint, v2 release May 24, 2017

andrewheusser added the mozilla sprint label May 24, 2017

jeremymanning unassigned andrewheusser Jun 2, 2017

rarredon mentioned this issue Jun 2, 2017

Simplifying API by adding kw args ndims and align to some key functions #133

Merged

jeremymanning closed this as completed Jun 5, 2017

jeremymanning reopened this Jun 5, 2017

andrewheusser closed this as completed Jun 14, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

simplifying API #105

simplifying API #105

jeremymanning commented Apr 24, 2017

rarredon commented Jun 1, 2017

andrewheusser commented Jun 2, 2017

jeremymanning commented Jun 2, 2017

rarredon commented Jun 2, 2017

jeremymanning commented Jun 2, 2017

andrewheusser commented Jun 2, 2017

rarredon commented Jun 2, 2017

jeremymanning commented Jun 2, 2017

jeremymanning commented Jun 5, 2017

jeremymanning commented Jun 5, 2017

jeremymanning commented Jun 5, 2017

andrewheusser commented Jun 5, 2017

jeremymanning commented Jun 5, 2017

andrewheusser commented Jun 5, 2017

jeremymanning commented Jun 5, 2017

jeremymanning commented Jun 5, 2017

jeremymanning commented Jun 5, 2017

andrewheusser commented Jun 5, 2017

jeremymanning commented Jun 5, 2017

andrewheusser commented Jun 5, 2017

simplifying API #105

simplifying API #105

Comments

jeremymanning commented Apr 24, 2017

rarredon commented Jun 1, 2017

andrewheusser commented Jun 2, 2017

jeremymanning commented Jun 2, 2017

rarredon commented Jun 2, 2017

jeremymanning commented Jun 2, 2017

andrewheusser commented Jun 2, 2017

rarredon commented Jun 2, 2017

jeremymanning commented Jun 2, 2017

jeremymanning commented Jun 5, 2017

jeremymanning commented Jun 5, 2017

jeremymanning commented Jun 5, 2017

andrewheusser commented Jun 5, 2017

jeremymanning commented Jun 5, 2017

andrewheusser commented Jun 5, 2017

jeremymanning commented Jun 5, 2017

jeremymanning commented Jun 5, 2017

jeremymanning commented Jun 5, 2017

andrewheusser commented Jun 5, 2017

jeremymanning commented Jun 5, 2017

andrewheusser commented Jun 5, 2017