Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance the DataFrame keyword constructor to recycle #882

Closed
wants to merge 1 commit into from

Conversation

tshort
Copy link
Contributor

@tshort tshort commented Nov 9, 2015

This fixes an issue with the DataFrame constructor with keyword arguments. This comes up when using by, particularly with DataFramesMeta use. Here is the issue:

julia> DataFrame(a = 1:5, b = 1)
5x2 DataFrames.DataFrame
| Row | a | b |
|-----|---|---|
| 1   | 1 | 1 |
| 2   | 2 | 1 |
| 3   | 3 | 1 |
| 4   | 4 | 1 |
| 5   | 5 | 1 |

julia> DataFrame(a = 1, b = 1:5)
ERROR: New columns must have the same length as old columns
 in insert_single_column! at /home/tshort/.julia/v0.4/DataFrames/src/dataframe/dataframe.jl:309
 in setindex! at /home/tshort/.julia/v0.4/DataFrames/src/dataframe/dataframe.jl:368
 in DataFrame at /home/tshort/.julia/v0.4/DataFrames/src/dataframe/dataframe.jl:104

Here are results with this PR:

julia> DataFrame(a = 1, b = 1:5)
5x2 DataFrames.DataFrame
| Row | a | b |
|-----|---|---|
| 1   | 1 | 1 |
| 2   | 1 | 2 |
| 3   | 1 | 3 |
| 4   | 1 | 4 |
| 5   | 1 | 5 |

julia> DataFrame(a = 1:5, b = 1)
5x2 DataFrames.DataFrame
| Row | a | b |
|-----|---|---|
| 1   | 1 | 1 |
| 2   | 2 | 1 |
| 3   | 3 | 1 |
| 4   | 4 | 1 |
| 5   | 5 | 1 |

julia> DataFrame(a = 1:5, b = 1:3)
5x2 DataFrames.DataFrame
| Row | a | b |
|-----|---|---|
| 1   | 1 | 1 |
| 2   | 2 | 2 |
| 3   | 3 | 3 |
| 4   | 4 | 1 |
| 5   | 5 | 2 |

Note that this doesn't change the lack of recycling for other methods.

@nalimilan
Copy link
Member

Makes sense, but I'd prefer making the recycling behaviour stricter: IMHO it should only work when the requested length is a multiple of the current length of the vector. It would thus be more similar to how broadcasting works. I've been bitten badly in R by silent recycling in cases where it was clearly a mistake, which would have been caught by a less tolerant policy.

@tshort
Copy link
Contributor Author

tshort commented Nov 9, 2015

I agree with that argument, @nalimilan. I'll update the PR tonight.

@tshort
Copy link
Contributor Author

tshort commented Nov 19, 2015

Any last comments? If not, I'll merge over the weekend.

# Return and AbstractVector of length `len` filled with `x`.
# `x` is recycled if `len(x)` is an even multiple of `len`.
rep_len(x, len) = rep(x, len)
function rep_len(x::AbstractVector, len)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather raise an error than return a vector of a different length than the requested one:

julia> rep_len([1, 2], 7)
6-element Array{Int64,1}:
 1
 2
 1
 2
 1
 2

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. Thx.

@nalimilan
Copy link
Member

Do you still want to merge this?

@tshort
Copy link
Contributor Author

tshort commented Mar 12, 2016

No. After rethinking, I only want to handle the case of extending a single element. Ill redo it.

@tshort
Copy link
Contributor Author

tshort commented Mar 30, 2016

I've updated this to only recycle for a single element. If this is okay, I'll squash.

@johnmyleswhite
Copy link
Contributor

That works for me.

@@ -98,10 +98,32 @@ type DataFrame <: AbstractDataFrame
end
end

# Return and AbstractVector of length `len` filled with `x`.
# `x` is recycled if `len(x)` is an even multiple of `len`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is length(x). Also, the description doesn't sound completely accurate: x is always recycled an integer number of times, so that the length is less than or equal to len. Though the behaviour you describe would be sensible (i.e. raise an error if not a multiple).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching. This is also out of date. I forgot to update the docs.

@nalimilan
Copy link
Member

+1 too, modulo the small comment I added.

@tshort
Copy link
Contributor Author

tshort commented Mar 30, 2016

Squashed and ready to go, assuming that Travis doesn't complain.

for (k, v) in kwargs
result[k] = v
if length(v) != 1 && length(v) != len
error("Incompatible lengths of arguments")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe better throw an ArgumentError? Also I would be really nice to give the name of the problematic column as well as its size vs. the expected one. :-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also check that an error is thrown.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps DimensionMismatch would be an appropriate exception here?

@nalimilan
Copy link
Member

Bump. With the small fix I noted should be good to go.

@nalimilan
Copy link
Member

Re-bump (see issue #973).

@tshort
Copy link
Contributor Author

tshort commented May 22, 2016

Sorry. Slammed until next weekend...
On May 22, 2016 11:46 AM, "Milan Bouchet-Valat" notifications@github.com
wrote:

Re-bump (see issue #973
#973).


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#882 (comment)

@nalimilan
Copy link
Member

No worries, let's just no forget about this completely. :-)

@nalimilan
Copy link
Member

Re-bump. :-)

nalimilan pushed a commit that referenced this pull request Jul 8, 2017
Consolidating the constructors minimized the number of places where
auto promotion could take place. The new constructor recycles scalars
such that if DataTable is created with a mix of scalars and vectors
the scalars will be recycled to the same length as the vectors. Fixes an
outstanding bug where scalar recycling only worked if the scalar
assignments came after the vector assignments of the desired length, see
#882. Tests that
used to assume NullableArray promotion now explicitly use NullableArrays
and new constructor tests have been added to test changes.
nalimilan pushed a commit that referenced this pull request Jul 8, 2017
Consolidating the constructors minimized the number of places where
auto promotion could take place. The new constructor recycles scalars
such that if DataTable is created with a mix of scalars and vectors
the scalars will be recycled to the same length as the vectors. Fixes an
outstanding bug where scalar recycling only worked if the scalar
assignments came after the vector assignments of the desired length, see
#882. Tests that
used to assume NullableArray promotion now explicitly use NullableArrays
and new constructor tests have been added to test changes.
nalimilan pushed a commit that referenced this pull request Jul 8, 2017
Consolidating the constructors minimized the number of places where
auto promotion could take place. The new constructor recycles scalars
such that if DataTable is created with a mix of scalars and vectors
the scalars will be recycled to the same length as the vectors. Fixes an
outstanding bug where scalar recycling only worked if the scalar
assignments came after the vector assignments of the desired length, see
#882. Tests that
used to assume NullableArray promotion now explicitly use NullableArrays
and new constructor tests have been added to test changes.
nalimilan pushed a commit that referenced this pull request Jul 8, 2017
Consolidating the constructors minimized the number of places where
auto promotion could take place. The new constructor recycles scalars
such that if DataTable is created with a mix of scalars and vectors
the scalars will be recycled to the same length as the vectors. Fixes an
outstanding bug where scalar recycling only worked if the scalar
assignments came after the vector assignments of the desired length, see
#882. Tests that
used to assume NullableArray promotion now explicitly use NullableArrays
and new constructor tests have been added to test changes.
nalimilan pushed a commit that referenced this pull request Jul 8, 2017
Consolidating the constructors minimized the number of places where
auto promotion could take place. The new constructor recycles scalars
such that if DataTable is created with a mix of scalars and vectors
the scalars will be recycled to the same length as the vectors. Fixes an
outstanding bug where scalar recycling only worked if the scalar
assignments came after the vector assignments of the desired length, see
#882. Tests that
used to assume NullableArray promotion now explicitly use NullableArrays
and new constructor tests have been added to test changes.
nalimilan pushed a commit that referenced this pull request Jul 8, 2017
Consolidating the constructors minimized the number of places where
auto promotion could take place. The new constructor recycles scalars
such that if DataTable is created with a mix of scalars and vectors
the scalars will be recycled to the same length as the vectors. Fixes an
outstanding bug where scalar recycling only worked if the scalar
assignments came after the vector assignments of the desired length, see
#882. Tests that
used to assume NullableArray promotion now explicitly use NullableArrays
and new constructor tests have been added to test changes.
rofinn pushed a commit that referenced this pull request Aug 17, 2017
Consolidating the constructors minimized the number of places where
auto promotion could take place. The new constructor recycles scalars
such that if DataTable is created with a mix of scalars and vectors
the scalars will be recycled to the same length as the vectors. Fixes an
outstanding bug where scalar recycling only worked if the scalar
assignments came after the vector assignments of the desired length, see
#882. Tests that
used to assume NullableArray promotion now explicitly use NullableArrays
and new constructor tests have been added to test changes.
nalimilan pushed a commit that referenced this pull request Aug 25, 2017
Consolidating the constructors minimized the number of places where
auto promotion could take place. The new constructor recycles scalars
such that if DataTable is created with a mix of scalars and vectors
the scalars will be recycled to the same length as the vectors. Fixes an
outstanding bug where scalar recycling only worked if the scalar
assignments came after the vector assignments of the desired length, see
#882. Tests that
used to assume NullableArray promotion now explicitly use NullableArrays
and new constructor tests have been added to test changes.
quinnj pushed a commit that referenced this pull request Sep 2, 2017
Consolidating the constructors minimized the number of places where
auto promotion could take place. The new constructor recycles scalars
such that if DataTable is created with a mix of scalars and vectors
the scalars will be recycled to the same length as the vectors. Fixes an
outstanding bug where scalar recycling only worked if the scalar
assignments came after the vector assignments of the desired length, see
#882. Tests that
used to assume NullableArray promotion now explicitly use NullableArrays
and new constructor tests have been added to test changes.
@quinnj
Copy link
Member

quinnj commented Sep 7, 2017

I think this has been resolved on master.

@quinnj quinnj closed this Sep 7, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants