Pandas forward compat #1828

kernc · 2016-12-16T16:48:24Z

Description of changes

Minimal (non-exhaustive) pandas forward-compatible API with our current Table technology. Current API still (mostly) works but is deprecated and updated in tests.

Includes

Code changes
Tests
Documentation

codecov-io · 2016-12-20T23:10:17Z

Current coverage is 89.08% (diff: 85.00%)

Merging #1828 into master will decrease coverage by 0.15%

@@             master      #1828   diff @@
==========================================
  Files            86         85     -1   
  Lines          9100       9169    +69   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits           8121       8168    +47   
- Misses          979       1001    +22   
  Partials          0          0

Powered by Codecov. Last update 7acb5fc...a572b5d

astaric · 2016-12-21T14:15:10Z

Orange/misc/distmatrix.py

@@ -261,10 +262,10 @@ def save(self, filename):
        with open(filename, "wt") as fle:
            fle.write(data + "\n")
            if col_labels is not None:
-                fle.write("\t".join(str(e.metas[0]) for e in col_labels) + "\n")
+                fle.write("\t".join(str(m[0]) for m in col_labels.metas) + "\n")


astaric · 2016-12-21T14:15:19Z

Orange/misc/distmatrix.py

            for i, row in enumerate(self):
                if row_labels is not None:
-                    fle.write(str(row_labels[i].metas[0]) + "\t")
+                    fle.write(str(row_labels.metas.T[0][i]) + "\t")


astaric · 2016-12-21T14:16:08Z

Orange/preprocess/impute.py

@@ -81,8 +81,7 @@ class DropInstances(BaseImputeMethod):
    description = ""

    def __call__(self, data, variable):
-        index = data.domain.index(variable)
-        return numpy.isnan(data[:, index]).reshape(-1)
+        return numpy.isnan([list(row) for _, row in data[variable].iterrows()]).reshape(-1)


Haven't measured it, but I suppose not much slower than it was before. numpy.isnan(data[:, index]) also did a full iteration over data (via Sequence.__iter__() via table.__getitem__(i)).

In pandas, this will be: return data[variable].isnull().

I'll fix it up to use Table.get_column_view() interim.

astaric · 2016-12-21T14:17:20Z

Orange/statistics/contingency.py



-def get_contingencies(dat, skipDiscrete=False, skipContinuous=False):
+def get_contingencies(dat, skip_discrete=False, skip_continuous=False):


was this rename really necessary?

astaric · 2016-12-21T14:18:57Z

Orange/tests/test_contingency.py

@@ -39,8 +39,8 @@ def test_discrete(self):

    def test_discrete_missing(self):
        d = data.Table("zoo")
-        d.Y[25] = float("nan")
-        d[0][0] = float("nan")
+        d.loc[d.index[25], d.domain.class_var] = float("nan")


I just love how pandas makes the code cleaner :)

This is not code. These are tests, made compatible with as little change as possible. The useful function code will always be much cleaner.

Ideally, this would be:

d.type.iloc[25] = np.nan # 'type' is classvar name # or d.target.iloc[25] = np.nan # target as an introduced alias # or d[d.domain.class_var].iloc[25] = np.nan # familiar, verbose

But our current code makes it hard to adapt it so. I suppose it will have to get a bit more ugly before it gets neat.

astaric · 2016-12-21T14:27:06Z

Orange/widgets/data/owfeatureconstructor.py

@@ -544,8 +544,7 @@ def duplicateFeature(self):
    def check_attrs_values(self, attr, data):
        for i in range(len(data)):
            for var in attr:
-                if not math.isnan(data[i, var]) \
-                        and int(data[i, var]) >= len(var.values):
+                if data[var].iloc[data.index[i]] not in range(var.values):


I did not know that you can use "in" with generators :)

In Python 3, range(x) is a special object. You can't use in with generators.

astaric · 2016-12-21T14:30:24Z

Orange/widgets/evaluate/owpredictions.py

@@ -278,11 +278,6 @@ def _update_predictions_model(self):
            results = []
            for p in slots:
                values, prob = p.results
-                if p.predictor.domain.class_var.is_discrete:


Sparse support pending.

Of note is the forced value type to float: without it, integer arrays are interpreted as double pointers and their values are 0, which breaks weights.

Over 90 % of base widgets now work. I haven't checked every single button, but they produce the intended result. Merge data, feature constructor and Venn diagram need a bit more work, so those aren't functional yet.

Replaced were: if [data|table]: (for __bool__) (data|table)\[ (changed to pandas indexing) \[.{10}(\.\.\.|Ellipsis).{10}\] (unsupported Ellipsis replaced with slice(None)) for row in data (replaced with .iterrows() call) ...

astaric · 2017-12-20T11:21:15Z

As this PR touches too many files in the repository and has not been updated for almost a year, it is highly unlikely that it will be merged. We are still interested in porting the Table to pandas, but it will have to be done in a more gradual way.

If anyone is interested in working on this, I would suggest the following steps:

Add new methods to Table object that match pandas interface
(empty(), iterrows(), ... - see commit 9a1a35a for more ideas)
Modify the code to call the new methods while deprecating the old ones
(code calls empty(), __bool__ becomes deprecated, ...)
Figure out if Table can be ported to pandas without modifying more than 10 files, if not, return to 1. :)

Each "pandas compatible" method should be added in a separate pull request, as that eases reviewing and raises the chances of PR actually being merged. I would also split 1. and 2. into two PRs as 1 should only touch a single file and rebasing it should be much easier than the modifications of all places the code is used in 2.

kernc force-pushed the pandas branch 9 times, most recently from 4ea0ab7 to a572b5d Compare December 20, 2016 23:09

kernc changed the title ~~[WIP] Pandas forward compat~~ Pandas forward compat Dec 20, 2016

kernc assigned astaric Dec 20, 2016

astaric reviewed Dec 21, 2016

View reviewed changes

kernc force-pushed the pandas branch from a572b5d to 755aba8 Compare January 5, 2017 19:58

sstanovnik added 16 commits January 5, 2017 21:00

Some basic fixes for subscripting pandas.

1618db6

Ported contingency to pandas.

4399dc1

Adapt distances and tests to work with the new Table.

f197839

Sparse support pending.

Miscellaneous test adaptations.

a996cca

Miscellaneous test compatibility fixes.

9065ea8

Migrate remover and its tests.

72cb35e

Simple tree and softmax adaptation.

28c6548

Of note is the forced value type to float: without it, integer arrays are interpreted as double pointers and their values are 0, which breaks weights.

Use 0 instead of NA when values don't exist in distributions.

9acdd5e

A bucketload of fixes for widgets.

6f519ba

Over 90 % of base widgets now work. I haven't checked every single button, but they produce the intended result. Merge data, feature constructor and Venn diagram need a bit more work, so those aren't functional yet.

A small fix for the new single-class test.

e9aeacd

Use proper top-level imports. D'oh!

8f14ecd

Fix some broken Table imports.

5eb9004

Widget test adaptation and widget fixes.

4ab5d18

REVIEWME: 'fixed' displaying SQL tables.

0dae51d

Fix OWHeatmap and its recent tests.

fdc7dc5

Remove Table.append.

3bd4c51

sstanovnik and others added 3 commits January 5, 2017 21:00

Change usages of checksum to hash.

886b5cc

Fix a failing owkmeans test.

6e6c547

Table, Instance: minimal forward-pandas-compatible API

9a1a35a

kernc force-pushed the pandas branch from 755aba8 to efa392d Compare January 5, 2017 20:01

kernc added 2 commits January 5, 2017 21:05

Avoid Table API deprecations

7fad5c7

Replaced were: if [data|table]: (for __bool__) (data|table)\[ (changed to pandas indexing) \[.{10}(\.\.\.|Ellipsis).{10}\] (unsupported Ellipsis replaced with slice(None)) for row in data (replaced with .iterrows() call) ...

Avoid Table API deprecations (SQL tests)

36a2dea

kernc force-pushed the pandas branch from efa392d to 36a2dea Compare January 5, 2017 20:05

astaric removed their assignment Apr 7, 2017

astaric closed this Dec 20, 2017

irgolic mentioned this pull request Nov 7, 2020

OWCSVImport: Get ready for OWFile merge #5077

Draft

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pandas forward compat #1828

Pandas forward compat #1828

kernc commented Dec 16, 2016 •

edited

Loading

codecov-io commented Dec 20, 2016 •

edited

Loading

astaric Dec 21, 2016

astaric Dec 21, 2016

astaric Dec 21, 2016

kernc Dec 21, 2016

astaric Dec 21, 2016 •

edited

Loading

astaric Dec 21, 2016

kernc Dec 21, 2016 •

edited

Loading

astaric Dec 21, 2016 •

edited

Loading

kernc Jan 4, 2017 •

edited

Loading

astaric Dec 21, 2016

astaric commented Dec 20, 2017



		def get_contingencies(dat, skipDiscrete=False, skipContinuous=False):
		def get_contingencies(dat, skip_discrete=False, skip_continuous=False):

Pandas forward compat #1828

Pandas forward compat #1828

Conversation

kernc commented Dec 16, 2016 • edited Loading

Description of changes

Includes

codecov-io commented Dec 20, 2016 • edited Loading

Current coverage is 89.08% (diff: 85.00%)

astaric Dec 21, 2016

Choose a reason for hiding this comment

astaric Dec 21, 2016

Choose a reason for hiding this comment

astaric Dec 21, 2016

Choose a reason for hiding this comment

kernc Dec 21, 2016

Choose a reason for hiding this comment

astaric Dec 21, 2016 • edited Loading

Choose a reason for hiding this comment

astaric Dec 21, 2016

Choose a reason for hiding this comment

kernc Dec 21, 2016 • edited Loading

Choose a reason for hiding this comment

astaric Dec 21, 2016 • edited Loading

Choose a reason for hiding this comment

kernc Jan 4, 2017 • edited Loading

Choose a reason for hiding this comment

astaric Dec 21, 2016

Choose a reason for hiding this comment

astaric commented Dec 20, 2017

kernc commented Dec 16, 2016 •

edited

Loading

codecov-io commented Dec 20, 2016 •

edited

Loading

astaric Dec 21, 2016 •

edited

Loading

kernc Dec 21, 2016 •

edited

Loading

astaric Dec 21, 2016 •

edited

Loading

kernc Jan 4, 2017 •

edited

Loading