Backport DataTables using merge commit #1220

nalimilan · 2017-09-02T16:14:11Z

This is equivalent to #1214 but using git merge rather than git rebase. The merge commit is quite small (it can be seen locally using git show from the head of the branch). DataTables history has been preserved, except for the DataFrames ->DataTables rename which has been erased using this script.

Add compatibility with pre-contrasts ModelFrame constructor

…ise for speed improvement (#1070)

* Only sort duplicated columns once * Added comments * moved check for identical arrays inside of for loop * Don't mention PR under /src, only mention them under /test

Resolve "WARNING: [a] concatenation is deprecated; use collect(a) instead"

Use vcat() instead of collect() in colwise(), and identity() instead of abs(), since the latter do not work with Nullable.

Also switch from mkdocs output to Documenter's HTML output.

…1078)

make GroupApplied immutable by adding subframe type parameter

* avoid [:], use reshape() * avoid unnecessary Symbol<->String conversions

Misc minor enhancements

* handle -1 and add tests * replace `import Base.==` with `Base.:(==)` * typo and error test

Also return a NullableCategoricalArray from sharepools() since the code currently doesn't check that no null values are present. anyway this function is internal and the change imposes no overhead.

* Better display of Nullables * Don't write trailing space in Latex output Also fix missing newline in show test

* limit attribute of IOContext is used for html generation * fixup

Closes #1103

I apparently missed these occurrences when removing these functions.

The subdatatable/views code did not have a clear function heirarchy. Sometimes view called subdatatable, sometimes subdatatable called back up to view, and other times view would call view again before later calling subdatatable. The code was also relying on a custom Index that was used nowhere else, and seemed unneccessary. Fixes issue that users could only specify single columns to subset on (rather than arrays of columns), and adds tests for datatables and subdatatables to assert view works as expected.

io.md does not exist since the readtable() has been removed. Pooling should be called categorical, so rename the file and sections.

CategoricalValue entries should always be printed via showcompact() in order to get a short representation. This uses ourshowcompact() to do that when printing DataTables to REPL via show, as well as when printing them to HTML, LaTeX and CSV via printtable(). Also fix a failure due to duplicated keyword arguments on Julia 0.7.

Fixes a failure on nightlies.

nalimilan · 2017-09-02T16:15:41Z

Of course this cannot be merged via the GitHub interface since that's not fast-forward. It will have to be done manually.

nalimilan · 2017-09-02T16:34:41Z

(Even if the colors don't indicate it, tests are green on Travis with Julia 0.6.)

rofinn · 2017-09-02T19:07:05Z

Diffing rf/datatables and nl/datatables-backport wasn't very helpful for reviewing, but a few things I did notice were:

A few remaining incorrect names (e.g., dt -> df, JuliaData -> JuliaStats).
Documentation that needs to be removed or fixed (e.g., the formulas documentation should be removed)

I suppose those can probably be fixed after this is merged.

nalimilan · 2017-09-02T20:11:07Z

Good catch! New version should fix these issues.

ararslan

Looks alright to me. A lot of the things I noted can be addressed by another FemtoCleaner run once this is merged.

ararslan · 2017-09-02T21:01:19Z

docs/src/lib/utilities.md

 ```@docs
 eltypes
 head
 completecases
-completecases!


Shouldn't we be removing completecases here in addition to completecases!?

nvm looks like they do different things; the former identifies complete cases as a boolean vector and the latter is now dropnull!.

ararslan · 2017-09-02T21:05:01Z

docs/src/man/reshaping_and_pivoting.md

-iris = dataset("datasets", "iris")
+using DataFrames
+using CSV
+iris = CSV.read(joinpath(Pkg.dir("DataFrames"), "test/data/iris.csv"), DataFrame)


The slashes in the path here kind of defeat the purpose of using joinpath 😉

ararslan · 2017-09-02T21:07:12Z

src/DataFrames.jl

       colwise,
       combine,
       completecases,
-       completecases!,


Same comment regarding completecases

ararslan · 2017-09-02T21:10:47Z

src/abstractdataframe/abstractdataframe.jl


 ##############################################################################
 ##
 ## Equality
 ##
 ##############################################################################

+# Imported in DataFrames.jl for compatibility across Julia 0.4 and 0.5
+Base.:(==)(df1::AbstractDataFrame, df2::AbstractDataFrame) = isequal(df1, df2)


Didn't we change this recently in DataTables? Having == call isequal defeats the purpose of having them be separate functions. If this is only for 0.4 and 0.5 compatibility, we don't support those anymore, so this could be removed.

Yeah, I thought this looked fishy too.

Yeah, that's also related to issues around how to define == for Nullable. We should fix this in later PRs.

ararslan · 2017-09-02T21:11:40Z

src/abstractdataframe/abstractdataframe.jl

@@ -405,14 +378,43 @@ function StatsBase.describe(io, df::AbstractDataFrame)
    end
 end

+function StatsBase.describe{T}(io::IO, X::AbstractVector{Union{T, Null}})


This syntax is deprecated, should use where

ararslan · 2017-09-02T21:24:35Z

src/dataframe/dataframe.jl

@@ -689,7 +663,7 @@ function deleterows!(df::DataFrame, ind::AbstractVector{Int})
    idf = 1
    iind = 1
    ikeep = 1
-    keep = Vector{Int}(n - length(ind2))
+    keep = Array{Int}(n-length(ind2))


This was Array before the fork, and has been changed to Vector in DataFrames since then. I've changed the merge commit to use Vector since that's slightly better.

ararslan · 2017-09-02T21:28:20Z

src/other/utils.jl

@@ -96,95 +95,54 @@ end
 #'
 #' DataFrames.gennames(10)
 function gennames(n::Integer)
-    res = Vector{Symbol}(n)
+    res = Array{Symbol}(n)


Another seemingly unnecessary change from Vector to Array

ararslan · 2017-09-02T21:29:09Z

src/other/utils.jl

-#' DataFrames.countna(@data([1, 2, 3]))
-countna(da::DataArray) = sum(da.na)
+#' DataFrames.countnull([1, 2, 3])
+function countnull(a::AbstractArray)


Seems like this should either be defined in Nulls or deprecated in favor of count(isnull, a)

I think it's countnull to allow, e.g. DataArray/NullableArray/CategoricalArray to define optimized versions.

You still can quite easily:

Base.count(::typeof(isnull), a::SweetArray) = # awesome implementation

Yeah, let's remove it in a subsequent PR.

ararslan · 2017-09-02T21:30:22Z

src/subdataframe/subdataframe.jl

@@ -5,6 +5,42 @@
 ##
 ##############################################################################

+if VERSION >= v"0.6.0-dev.2643"


Could simplify this to remove the conditional

ararslan · 2017-09-02T21:31:54Z

test/data.jl

@@ -42,18 +34,24 @@ module TestData
    # lots more to do

    #test_group("assign")
-    df6[3] = @data(["un", "deux", "troix", "quatre"])
+    df6[3] = ["un", "deux", "trois", "quatre"]


That was really a shame for all JuliaStats! ;-)

C'est dommage!

- Changed julia requirement to 0.6 minimum - Stopped testing on 0.5 - Removed Compat dependency - Fixed a few 0.6 warnings.

This requires Nulls, as well as new versions of CategoricalArrays, DataStreams and WeakRefStrings.

…fork) Resolve all conflicts in favor of DataTables, except for two cases where DataFrames contained improvements not in DataTables in areas touched since the fork: - Add <thead> and <tbody> tags in HTML output. - Use Vector instead of Array in one case where appropriate

nalimilan · 2017-09-03T14:35:41Z

src/dataframerow/dataframerow.jl

+    (ncol(r1.df) == ncol(r2.df)) ||
+        throw(ArgumentError("Rows of the data tables that have different number of columns cannot be compared ($(ncol(df1)) and $(ncol(df2)))"))
+    @inbounds for i in 1:ncol(r1.df)
+        isless(r1.df[i][r1.row], r2.df[i][r2.row]) && return true


Since we're doing a general review, we should have a look at the semantics of this function at some point. It was added in JuliaData/DataTables.jl#17 and I'm not sure what's its purpose. Its behavior looks quite counter-intuitive to me.

For now I've kept it, but simplified it a bit since null uses the right isless definition.

See #1222. Actually, the way the code was written looked weird to me, but the behavior was OK.

nalimilan · 2017-09-03T14:48:33Z

I've fixed the issues which are related to the merge. Let's handle the others later (using Femtocleaner where possible). I've also noticed a few remaining uses of unsafe_get, I've removed them from @quinnj's commit to keep history clean.

I've just pushed it to master since I'm afraid of what GitHub is going to do in this complex situation. We should have a look at the failures on Julia 0.7, they don't seem too hard to fix.

quinnj · 2017-09-03T16:53:31Z

Note that one of the 0.7 failures is a bug in current isbits Union array copying. A fix will be incoming to Base in the next few days.

Gord Stephen and others added 30 commits September 14, 2016 10:13

RFC: Add compatibility with pre-contrasts ModelFrame constructor (#1042)

968e980

Add compatibility with pre-contrasts ModelFrame constructor

Reindex transposed sparse contrast matrix into modelmat_cols column-w…

d4ad15b

…ise for speed improvement (#1070)

Fill existing arrays with scalars (#1057)

2931693

Port to NullableArrays and CategoricalArrays

9963dcd

Only sort duplicated columns once (#1072)

948ea09

* Only sort duplicated columns once * Added comments * moved check for identical arrays inside of for loop * Don't mention PR under /src, only mention them under /test

collecting with brackets is deprecated (#939)

b182e37

Resolve "WARNING: [a] concatenation is deprecated; use collect(a) instead"

test empty frames joins

b294846

Fix test failures on master (#1075)

c216961

Use vcat() instead of collect() in colwise(), and identity() instead of abs(), since the latter do not work with Nullable.

test empty frames groupby()

34b129a

Update Documenter syntax (#966)

6514cb6

Also switch from mkdocs output to Documenter's HTML output.

more DataFrame assignment tests

8c1991c

Add querying section with links to other packages to documentation (#…

6179620

…1078)

readonly AbstractVector interface for Cols

f72eb10

simplify eltypes()

b3900d3

small cleanups to stack/unstack

20474ab

immutable GroupApplied, enhance combine()

250ee81

make GroupApplied immutable by adding subframe type parameter

aggregate() optimizations

99fa895

* avoid [:], use reshape() * avoid unnecessary Symbol<->String conversions

fix groupby() doc

3a35409

Merge pull request #1076 from alyst/misc_fixes

8bca104

Misc minor enhancements

Add output to LaTeX (useful for IJulia notebook export to PDF) (#845)

1310b23

handle A ~ B - 1 and add tests (#1086)

98b9e48

* handle -1 and add tests * replace `import Base.==` with `Base.:(==)` * typo and error test

Fix join when mixing NullableArray and Array{Nullable} (#1089)

1f75868

Also return a NullableCategoricalArray from sharepools() since the code currently doesn't check that no null values are present. anyway this function is internal and the change imposes no overhead.

Better display of Nullables (#1084)

a080174

* Better display of Nullables * Don't write trailing space in Latex output Also fix missing newline in show test

Update StatsBase.df to dof (#1097)

b5edb5e

limit attribute of IOContext is used for html generation (#1099)

a2c900f

* limit attribute of IOContext is used for html generation * fixup

Fix docstring example (#1107)

09737f6

Closes #1103

Loosen constructor for a DataFrame (#1108)

3fa777b

Use the tagged version of Documenter (#1109)

9777ed7

fix typo in Nullable holding 1 example (#1112)

fa3030b

Small docs fixes (#1077)

39a2934

I apparently missed these occurrences when removing these functions.

cjprybol and others added 9 commits July 8, 2017 19:04

Remove describe code moved to StatsBase and NullableArrays

1e1fd7f

Remove _setdiff, which has been implemented as setdiff in Base

5af53e5

Remove references to io.md and pooling.md in manual

578e442

io.md does not exist since the readtable() has been removed. Pooling should be called categorical, so rename the file and sections.

Fix hyperlink to paper

71a80fe

Enable builds on nightlies, add Julia 0.6 badge

0239e71

Import Base.unique! on Julia 0.7

6d94bae

Fixes a failure on nightlies.

Allow renaming column without change (#77)

326b23c

nalimilan force-pushed the nl/datatables-backport branch from 3da60ca to 51c6f65 Compare September 2, 2017 16:20

nalimilan mentioned this pull request Sep 2, 2017

WIP: DataTables.jl Backport #1214

Closed

4 tasks

nalimilan force-pushed the nl/datatables-backport branch from 51c6f65 to 31005d1 Compare September 2, 2017 16:25

kleinschmidt mentioned this pull request Sep 2, 2017

Cox Regression JuliaStats/Survival.jl#2

Merged

nalimilan force-pushed the nl/datatables-backport branch from 31005d1 to d006adb Compare September 2, 2017 20:10

ararslan reviewed Sep 2, 2017

View reviewed changes

rofinn and others added 3 commits September 3, 2017 15:54

Drop Julia 0.5 support

cc3c880

- Changed julia requirement to 0.6 minimum - Stopped testing on 0.5 - Removed Compat dependency - Fixed a few 0.6 warnings.

Port from Nullable to Union{Null, T}

6035da8

This requires Nulls, as well as new versions of CategoricalArrays, DataStreams and WeakRefStrings.

nalimilan force-pushed the nl/datatables-backport branch from d006adb to b293dd7 Compare September 3, 2017 14:26

nalimilan commented Sep 3, 2017

View reviewed changes

nalimilan merged commit b293dd7 into master Sep 3, 2017

nalimilan deleted the nl/datatables-backport branch September 3, 2017 14:49

rofinn mentioned this pull request Sep 7, 2017

Drop DataArrays requirement? JuliaData/RData.jl#19

Closed

cjprybol mentioned this pull request Sep 11, 2017

Backporting to DataFrames JuliaData/DataTables.jl#81

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backport DataTables using merge commit #1220

Backport DataTables using merge commit #1220

nalimilan commented Sep 2, 2017 •

edited

Loading

nalimilan commented Sep 2, 2017

nalimilan commented Sep 2, 2017

rofinn commented Sep 2, 2017 •

edited

Loading

nalimilan commented Sep 2, 2017

ararslan left a comment

ararslan Sep 2, 2017

ararslan Sep 2, 2017

ararslan Sep 2, 2017

ararslan Sep 2, 2017

ararslan Sep 2, 2017

quinnj Sep 2, 2017

nalimilan Sep 3, 2017

ararslan Sep 2, 2017

ararslan Sep 2, 2017

nalimilan Sep 3, 2017

ararslan Sep 2, 2017

ararslan Sep 2, 2017

quinnj Sep 2, 2017

ararslan Sep 2, 2017

nalimilan Sep 3, 2017

ararslan Sep 2, 2017

ararslan Sep 2, 2017

nalimilan Sep 3, 2017

ararslan Sep 3, 2017

nalimilan Sep 3, 2017

nalimilan Sep 3, 2017

nalimilan commented Sep 3, 2017

quinnj commented Sep 3, 2017

Backport DataTables using merge commit #1220

Backport DataTables using merge commit #1220

Conversation

nalimilan commented Sep 2, 2017 • edited Loading

nalimilan commented Sep 2, 2017

nalimilan commented Sep 2, 2017

rofinn commented Sep 2, 2017 • edited Loading

nalimilan commented Sep 2, 2017

ararslan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nalimilan commented Sep 3, 2017

quinnj commented Sep 3, 2017

nalimilan commented Sep 2, 2017 •

edited

Loading

rofinn commented Sep 2, 2017 •

edited

Loading