Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

concat with axis=1, join='outer' not working correctly #1719

Closed
wesm opened this issue Aug 2, 2012 · 2 comments
Closed

concat with axis=1, join='outer' not working correctly #1719

wesm opened this issue Aug 2, 2012 · 2 comments
Labels
Milestone

Comments

@wesm
Copy link
Member

wesm commented Aug 2, 2012

http://stackoverflow.com/questions/11761884/pandas-concatouter-not-doing-union


It looks pandas.concat is doing 'left outer' join instead of just union the indexes. Seems a bug to me but maybe I'm missing something obvious.

    import pandas
    import pandas.util.testing as put
    ts1 = put.makeTimeSeries()
    ts2 = put.makeTimeSeries()[::2]
    ts3 = put.makeTimeSeries()[::3]
    ts4 = put.makeTimeSeries()[::4]

    ## to join with union
    ## these two are of different length!
    pandas.concat([ts1,ts2], join='outer', axis = 1) 
    pandas.concat([ts2,ts1], join='outer', axis = 1)
Any idea how can I get the full union (as they do claim by using join='outer' on the pandas document)

Thanks.
@lesteve
Copy link
Contributor

lesteve commented Aug 2, 2012

Just to add my 2 cents on this, it looks like this has not much to to do with DataFrame.concat and happens when you use the fast union for DatetimeIndex objects. Taking inspiration from the original example:

import pandas
import pandas.util.testing as put
ts = put.makeTimeSeries(4)
tsWithGaps = ts[::2]
index = ts.index
indexWithGaps = tsWithGaps.index
index
indexWithGaps
index.union(indexWithGaps)
indexWithGaps.union(index)
pandas.Index.union(indexWithGaps, index)

and the output:

In [7]: index
Out[7]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2000-01-03 00:00:00, ..., 2000-01-06 00:00:00]
Length: 4, Freq: B, Timezone: None

In [8]: indexWithGaps
Out[8]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2000-01-03 00:00:00, 2000-01-05 00:00:00]
Length: 2, Freq: 2B, Timezone: None

In [9]: index.union(indexWithGaps)
Out[9]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2000-01-03 00:00:00, ..., 2000-01-06 00:00:00]
Length: 4, Freq: B, Timezone: None

In [10]: indexWithGaps.union(index)
Out[10]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2000-01-03 00:00:00, ..., 2000-01-06 00:00:00]
Length: 3, Freq: 2B, Timezone: None

In [11]: pandas.Index.union(indexWithGaps, index)
Out[11]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2000-01-03 00:00:00, ..., 2000-01-06 00:00:00]
Length: 4, Freq: None, Timezone: None

Just for sake of clarity the problem is in [10] where the union of the indices has length 3 instead of length 4 (the Jan-4 timestamp is missing if you want to know). [11] shows what happened in the non fast union case and is fine. On top of that it's slightly weird that indexWithGaps.union(index) has a 2B frequency.

Looking at the existing issues this probably has some overlap with #1708. Interestingly enough this indexing problem doesn't seem to affect aligning timeseries, i.e. both ts.align(tsWithGaps) and tsWithGaps.align(ts) work fine. On the other hand, constructing dataframe is affected as the original bug report showed:

In [84]: pandas.DataFrame(collections.OrderedDict([('ts', ts), ('tsWithGaps', tsWithGaps)]))
Out[84]:
                  ts  tsWithGaps
2000-01-03 -0.045699   -0.045699
2000-01-04 -1.611032         NaN
2000-01-05 -1.055301   -1.055301
2000-01-06  1.024215         NaN

In [85]: pandas.DataFrame(collections.OrderedDict([('tsWithGaps', tsWithGaps), ('ts', ts)]))
Out[85]:
                  ts  tsWithGaps
2000-01-03 -0.045699   -0.045699
2000-01-05 -1.055301   -1.055301
2000-01-06  1.024215         NaN

i.e. the dataframe constructed depends on which order you provide the timeseries.

@wesm
Copy link
Member Author

wesm commented Aug 9, 2012

This was fixed in 5382985

@wesm wesm closed this as completed in 4fb1bd6 Aug 9, 2012
wesm added a commit that referenced this issue Aug 9, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants