-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Erroneous behaviours of MultiIndex #342
Comments
@zverok , can you please show some examples and expected output ? |
@Shekharrajak, I believe that:
|
E.g.: df = Daru::DataFrame.new({b: [11,12], a: [101,102], c: [11,22]},
order: [:a, :b, :c],
index: [[:k], [:l]])
# v1:
# ArgumentError: MultiIndex can't consist of single-element tuples!
# or v2:
df.index
# => #<Daru::Index(2): {k, l}> -- not MultiIndex! And df = Daru::DataFrame.new({b: [11,12,13,14,15], a: [101,102,103,104,105],
c: [11,22,33,44,55]},
order: [:a, :b, :c],
index: [[:k], [:k], [:k], [:l], [:l]])
# ArgumentError: repeating values in index! |
Thanks! I think for 1st example v2 will be good. For 2nd example: I think, it should allow repeating index values. Means in 2nd example
So when user want values in index
That means
not this :
So that we can access the rows using
Is it good idea ? @zverok |
I believe, index by definition should be unique (it becames complicated with "category indexes" and I do not feel clearly understanding matters, but generic rule is simple: "index is unique names for rows"). But it is just my opinion. |
Pandas allows repeating values in index. However, since we haven't come across a concrete use case where this functionality is useful, I think there is no need to spend effort on making it happen. We will most likely need to change the underlying data structure for storing the index (its currently a Hash) and making it as fast as a Hash (in pure Ruby) will be a challenge. |
@zverok
which doesn't presume that index should be unique. |
Lokesh has a point. However lets put off the uniqueness issue until someone comes up with a concrete use case. |
I don't believe "database index" is a good metaphor here: in this case it should be auxiliary structure, added to dataframe for easier access (and we could have 10 different indexes for different types of access). Dataframe index is rather an unique names for the rows as far as I can understand, and therefore https://en.wikipedia.org/wiki/Index_(publishing) is better comparison. |
I'm still struggling with understanding why indexes more complex than sequential integers are really necessary for dataframes. Ideally, |
@gnilrets We cannot increase the lookup performance of @gnilrets Do you agree? |
At least, because of "special" indexes (MultiIndex, which is easy to slice by part of tuple, and DateTimeIndex, where you can query the entire year). I believe that notion of Index in the meaning we use it in Daru cames from spreadsheets/accounting, and typical tables looking like
This is typical way how scientists think of data, I believe. |
I think if if indexes are not unique then Means
Isn't good? |
I am using Categorical Index when there is only one level and labels left (and duplicate index present), see : https://github.com/SciRuby/daru/pull/340/files#diff-df0c816a5a6b82ab4d961bf9d1a0acbfR248 |
Shown at #340
Problems:
The text was updated successfully, but these errors were encountered: