Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.
Sign upPandas forward compat #1828
Conversation
kernc
force-pushed the
kernc:pandas
branch
9 times, most recently
from
4ea0ab7
to
a572b5d
Dec 20, 2016
kernc
changed the title
[WIP] Pandas forward compat
Pandas forward compat
Dec 20, 2016
kernc
assigned
astaric
Dec 20, 2016
This comment has been minimized.
This comment has been minimized.
codecov-io
commented
Dec 20, 2016
•
Current coverage is 89.08% (diff: 85.00%)@@ master #1828 diff @@
==========================================
Files 86 85 -1
Lines 9100 9169 +69
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
+ Hits 8121 8168 +47
- Misses 979 1001 +22
Partials 0 0
|
astaric
reviewed
Dec 21, 2016
| @@ -261,10 +262,10 @@ def save(self, filename): | |||
| with open(filename, "wt") as fle: | |||
| fle.write(data + "\n") | |||
| if col_labels is not None: | |||
| fle.write("\t".join(str(e.metas[0]) for e in col_labels) + "\n") | |||
| fle.write("\t".join(str(m[0]) for m in col_labels.metas) + "\n") | |||
This comment has been minimized.
This comment has been minimized.
| for i, row in enumerate(self): | ||
| if row_labels is not None: | ||
| fle.write(str(row_labels[i].metas[0]) + "\t") | ||
| fle.write(str(row_labels.metas.T[0][i]) + "\t") |
This comment has been minimized.
This comment has been minimized.
| @@ -81,8 +81,7 @@ class DropInstances(BaseImputeMethod): | |||
| description = "" | |||
|
|
|||
| def __call__(self, data, variable): | |||
| index = data.domain.index(variable) | |||
| return numpy.isnan(data[:, index]).reshape(-1) | |||
| return numpy.isnan([list(row) for _, row in data[variable].iterrows()]).reshape(-1) | |||
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
kernc
Dec 21, 2016
Author
Member
Haven't measured it, but I suppose not much slower than it was before. numpy.isnan(data[:, index]) also did a full iteration over data (via Sequence.__iter__() via table.__getitem__(i)).
In pandas, this will be: return data[variable].isnull().
I'll fix it up to use Table.get_column_view() interim.
|
|
||
|
|
||
| def get_contingencies(dat, skipDiscrete=False, skipContinuous=False): | ||
| def get_contingencies(dat, skip_discrete=False, skip_continuous=False): |
This comment has been minimized.
This comment has been minimized.
| @@ -39,8 +39,8 @@ def test_discrete(self): | |||
|
|
|||
| def test_discrete_missing(self): | |||
| d = data.Table("zoo") | |||
| d.Y[25] = float("nan") | |||
| d[0][0] = float("nan") | |||
| d.loc[d.index[25], d.domain.class_var] = float("nan") | |||
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
kernc
Dec 21, 2016
•
Author
Member
This is not code. These are tests, made compatible with as little change as possible. The useful function code will always be much cleaner.
Ideally, this would be:
d.type.iloc[25] = np.nan # 'type' is classvar name
# or
d.target.iloc[25] = np.nan # target as an introduced alias
# or
d[d.domain.class_var].iloc[25] = np.nan # familiar, verboseBut our current code makes it hard to adapt it so. I suppose it will have to get a bit more ugly before it gets neat.
| @@ -544,8 +544,7 @@ def duplicateFeature(self): | |||
| def check_attrs_values(self, attr, data): | |||
| for i in range(len(data)): | |||
| for var in attr: | |||
| if not math.isnan(data[i, var]) \ | |||
| and int(data[i, var]) >= len(var.values): | |||
| if data[var].iloc[data.index[i]] not in range(var.values): | |||
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
kernc
Jan 4, 2017
•
Author
Member
In Python 3, range(x) is a special object. You can't use in with generators.
| @@ -278,11 +278,6 @@ def _update_predictions_model(self): | |||
| results = [] | |||
| for p in slots: | |||
| values, prob = p.results | |||
| if p.predictor.domain.class_var.is_discrete: | |||
This comment has been minimized.
This comment has been minimized.
kernc
force-pushed the
kernc:pandas
branch
from
a572b5d
to
755aba8
Jan 5, 2017
sstanovnik
added some commits
Jul 11, 2016
sstanovnik
and others
added some commits
Aug 16, 2016
kernc
force-pushed the
kernc:pandas
branch
from
755aba8
to
efa392d
Jan 5, 2017
kernc
added some commits
Dec 15, 2016
kernc
force-pushed the
kernc:pandas
branch
from
efa392d
to
36a2dea
Jan 5, 2017
astaric
removed their assignment
Apr 7, 2017
This comment has been minimized.
This comment has been minimized.
|
As this PR touches too many files in the repository and has not been updated for almost a year, it is highly unlikely that it will be merged. We are still interested in porting the Table to pandas, but it will have to be done in a more gradual way. If anyone is interested in working on this, I would suggest the following steps:
Each "pandas compatible" method should be added in a separate pull request, as that eases reviewing and raises the chances of PR actually being merged. I would also split 1. and 2. into two PRs as 1 should only touch a single file and rebasing it should be much easier than the modifications of all places the code is used in 2. |
kernc commentedDec 16, 2016
•
edited
Description of changes
Minimal (non-exhaustive) pandas forward-compatible API with our current Table technology. Current API still (mostly) works but is deprecated and updated in tests.
Includes