Skip to content
Permalink
Branch: master
Commits on Dec 14, 2019
  1. Support 'compute.ops_on_diff_frames' for NumPy ufunc compay in Series (

    HyukjinKwon authored and ueshin committed Dec 14, 2019
    …#1128)
    
    This PR proposes to support 'ops_on_diff_frames' option for NumPy ufunc compat in Series
    
    ```python
    >>> import databricks.koalas as ks
    >>> import numpy as np
    >>> a = ks.range(10).id
    >>> b = ks.range(10, 20).id
    >>> ks.set_option('compute.ops_on_diff_frames', True)
    >>> np.arctan2(a, b)
    0    0.000000
    1    0.090660
    2    0.165149
    3    0.226799
    4    0.278300
    5    0.321751
    6    0.358771
    7    0.390607
    8    0.418224
    9    0.442374
    Name: id, dtype: float64
    ```
Commits on Dec 13, 2019
  1. 'isna' type functions should return proper message for MultiIndex (#1130

    itholic authored and HyukjinKwon committed Dec 13, 2019
    )
    
    In pandas, `pd.MultiIndex.isna()`, `pd.MultiIndex.isnull()`, `pd.MultiIndex.notna()`, `pd.MultiIndex.notnull()` return same error message that `isna is not defined for MultiIndex` like below.
    
    ```python
    >>> pidx = pd.MultiIndex.from_tuples([('a', 'x', 1), ('b', 'y', 2)])
    
    >>> pidx.isna()
    Traceback (most recent call last):
    ...
    NotImplementedError: isna is not defined for MultiIndex
    
    >>> pidx.isnull()
    Traceback (most recent call last):
    ...
    NotImplementedError: isna is not defined for MultiIndex
    
    >>> pidx.notna()
    Traceback (most recent call last):
    ...
    NotImplementedError: isna is not defined for MultiIndex
    
    >>> pidx.notnull()
    Traceback (most recent call last):
    ...
    NotImplementedError: isna is not defined for MultiIndex
    ```
    
    i think we'd better mimic them for our functions.
  2. Introduce _IndexerLike. (#1126)

    ueshin authored and HyukjinKwon committed Dec 13, 2019
    Simple refines.
  3. Implement DataFrame.info (#1124)

    HyukjinKwon committed Dec 13, 2019
    This PR proposes `DataFrame.info`.
    
    ```python
    >>> import databricks.koalas as ks
    >>> ks.range(100).info()
    <class 'databricks.koalas.frame.DataFrame'>
    Index: 100 entries, 0 to 99
    Data columns (total 1 columns):
    id    100 non-null int64
    ```
    
    Resolves #872
Commits on Dec 12, 2019
  1. Refine loc. (#1119)

    ueshin committed Dec 12, 2019
    This PR is one of refactorings for `indexing`.
  2. Fill up the missing docs for Index/MultiIndex (#1123)

    itholic authored and HyukjinKwon committed Dec 12, 2019
    as commented in #1114 (comment),
    
    i filled up the missing docs for `Index` and `MultiIndex`
  3. Implements pct_change() for DataFrame (#1051)

    itholic authored and HyukjinKwon committed Dec 12, 2019
    Resolves #878 
    (https://pandas.pydata.org/pandas-docs/version/0.23.4/generated/pandas.DataFrame.pct_change.html)
    
    ```
    >>> df = ks.DataFrame({
    ...     'FR': [4.0405, 4.0963, 4.3149],
    ...     'GR': [1.7246, 1.7482, 1.8519],
    ...     'IT': [804.74, 810.01, 860.13]},
    ...     index=['1980-01-01', '1980-02-01', '1980-03-01'])
    >>> df
                    FR      GR      IT
    1980-01-01  4.0405  1.7246  804.74
    1980-02-01  4.0963  1.7482  810.01
    1980-03-01  4.3149  1.8519  860.13
    
    >>> df.pct_change()
                      FR        GR        IT
    1980-01-01       NaN       NaN       NaN
    1980-02-01  0.013810  0.013684  0.006549
    1980-03-01  0.053365  0.059318  0.061876
    ```
    
    You can set periods to shift for forming percent change
    
    ```
    >>> df.pct_change(2)
                      FR        GR       IT
    1980-01-01       NaN       NaN      NaN
    1980-02-01       NaN       NaN      NaN
    1980-03-01  0.067912  0.073814  0.06883
    ```
  4. Implement min/max for Index/MultiIndex (#1114)

    itholic authored and HyukjinKwon committed Dec 12, 2019
    Implement min/max for Index/MultiIndex
    
    For Index
    
    ```python
    >>> idx = ks.Index([3, 2, 1])
    >>> idx.min()
    1
    >>> idx.max()
    3
    ```
    
    For MultiIndex
    
    ```python
    >>> idx = ks.MultiIndex.from_tuples([('a', 'x', 1), ('b', 'y', 2)])
    >>> idx.min()
    ('a', 'x', 1)
    >>> idx.max()
    ('b', 'y', 2)
    ```
Commits on Dec 10, 2019
  1. Implements idxmax()/idxmin() for DataFrame (#1054)

    itholic authored and HyukjinKwon committed Dec 10, 2019
    Resolves #1043 
    
    ```python
    >>> kdf = ks.DataFrame({'a': [1, 2, 3, 2],
    ...                     'b': [4.0, 2.0, 3.0, 1.0],
    ...                     'c': [300, 200, 400, 200]})
    >>> kdf
       a    b    c
    0  1  4.0  300
    1  2  2.0  200
    2  3  3.0  400
    3  2  1.0  200
    
    >>> kdf.idxmax()
    a    2
    b    0
    c    2
    Name: 0, dtype: int64
    ```
    
    For Multi-column Index
    
    ```python
    >>> kdf = ks.DataFrame({'a': [1, 2, 3, 2],
    ...                     'b': [4.0, 2.0, 3.0, 1.0],
    ...                     'c': [300, 200, 400, 200]})
    >>> kdf.columns = pd.MultiIndex.from_tuples([('a', 'x'), ('b', 'y'), ('c', 'z')])
    >>> kdf
       a    b    c
       x    y    z
    0  1  4.0  300
    1  2  2.0  200
    2  3  3.0  400
    3  2  1.0  200
    
    >>> kdf.idxmax()
    (a, x)    2
    (b, y)    0
    (c, z)    2
    Name: 0, dtype: int64
    ```
  2. Refine iloc. (#1111)

    ueshin authored and HyukjinKwon committed Dec 10, 2019
    This PR is one of refactorings for `indexing`
  3. Implements Top-level function to_numeric() (#1060)

    itholic authored and HyukjinKwon committed Dec 10, 2019
    Resolves #1044 
    (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_numeric.html#pandas.to_numeric)
    
    i think we can directly use pandas one, except the case of `Series`
    
    ```python
    >>> kser = ks.Series(['1.0', '2', '-3'])
    >>> kser
    0    1.0
    1      2
    2     -3
    Name: 0, dtype: object
    
    >>> ks.to_numeric(kser)
    0    1.0
    1    2.0
    2   -3.0
    Name: 0, dtype: float32
    
    If given Series contains invalid value to cast float, just cast it to `np.nan`
    
    >>> kser = ks.Series(['apple', '1.0', '2', '-3'])
    >>> kser
    0    apple
    1      1.0
    2        2
    3       -3
    Name: 0, dtype: object
    
    >>> ks.to_numeric(kser)
    0    NaN
    1    1.0
    2    2.0
    3   -3.0
    Name: 0, dtype: float32
    
    Also support for list, tuple, np.array, or a single numeric type
    
    >>> ks.to_numeric(['1.0', '2', '-3'])
    array([ 1.,  2., -3.])
    
    >>> ks.to_numeric(('1.0', '2', '-3'))
    array([ 1.,  2., -3.])
    
    >>> ks.to_numeric(np.array(['1.0', '2', '-3']))
    array([ 1.,  2., -3.])
    
    >>> ks.to_numeric(1.0)
    1.0
    ```
  4. disable 'str' for 'SeriesGroupBy', disable 'DataFrame' for 'GroupBy' (#…

    itholic authored and HyukjinKwon committed Dec 10, 2019
    …1097)
    
    Resolve #1095 
    
    ```python
    >>> kser = ks.Series([1, 2, 3, 4, 5], name='x')
    >>> kser.groupby('x').head(2)
    Traceback (most recent call last):
    ...
    KeyError: ('x',)
    ```
    ```python
    >>> pdf = pd.DataFrame({'a': [1, 2, 6, 4, 4, 6, 4, 3, 7],
    ...                             'b': [4, 2, 7, 3, 3, 1, 1, 1, 2],
    ...                             'c': [4, 2, 7, 3, None, 1, 1, 1, 2],
    ...                             'd': list('abcdefght')},
    ...                            index=[0, 1, 3, 5, 6, 8, 9, 9, 9])
    >>> pdf.groupby(pdf)
    Traceback (most recent call last):
    ...
    ValueError: Grouper for '<class 'pandas.core.frame.DataFrame'>' not 1-dimensional
    
    >>> pdf.a.groupby(pdf)
    Traceback (most recent call last):
    ...
    ValueError: Grouper for '<class 'pandas.core.frame.DataFrame'>' not 1-dimensional
    ```
Commits on Dec 6, 2019
  1. Implement MultiIndex.levshape (#1086)

    RainFung authored and ueshin committed Dec 6, 2019
    Get MultiIndex unique value count in every level.
  2. Setting index name / names for Series (#1079)

    itholic authored and HyukjinKwon committed Dec 6, 2019
    Unlike pandas, `koalas.Series` can't set the index name like below.
    
    ```python
    >>> pser = pd.Series([1, 2, 3, 4, 5])
    >>> pser.index.name = 'koalas'
    >>> pser.index.name
    'koalas'
    
    >>> kser = ks.Series([1, 2, 3, 4, 5])
    >>> kser.index.name = 'koalas'
    >>> kser.index.name
    ```
    
    For MultiIndex also
    
    ```python
    >>> midx = pd.MultiIndex([['lama', 'cow', 'falcon'],
    ...                       ['speed', 'weight', 'length']],
    ...                      [[0, 0, 0, 1, 1, 1, 2, 2, 2],
    ...                       [0, 1, 2, 0, 1, 2, 0, 1, 2]])
    >>> pser = pd.Series([45, 200, 1.2, 30, 250, 1.5, 320, 1, 0.3], index=midx)
    >>> pser.index.names
    FrozenList([None, None])
    >>> pser.index.names = ['hello', 'koalas']
    >>> pser.index.names
    FrozenList(['hello', 'koalas'])
    
    >>> midx = pd.MultiIndex([['lama', 'cow', 'falcon'],
    ...                       ['speed', 'weight', 'length']],
    ...                      [[0, 0, 0, 1, 1, 1, 2, 2, 2],
    ...                       [0, 1, 2, 0, 1, 2, 0, 1, 2]])
    >>> kser = ks.Series([45, 200, 1.2, 30, 250, 1.5, 320, 1, 0.3], index=midx)
    >>> kser.index.names
    [None, None]
    >>> kser.index.names = ['hello', 'koalas']
    >>> kser.index.names
    [None, None]
    ```
    
    So, this PR suggests that make ours possible also.
    
    ```python
    >>> kser = ks.Series([1, 2, 3, 4, 5])
    >>> kser.index.name = 'koalas'
    >>> kser.index.name
    'koalas'
    
    >>> midx = pd.MultiIndex([['lama', 'cow', 'falcon'],
    ...                       ['speed', 'weight', 'length']],
    ...                      [[0, 0, 0, 1, 1, 1, 2, 2, 2],
    ...                       [0, 1, 2, 0, 1, 2, 0, 1, 2]])
    >>> kser = ks.Series([45, 200, 1.2, 30, 250, 1.5, 320, 1, 0.3], index=midx)
    >>> kser.index.names
    [None, None]
    >>> kser.index.names = ['hello', 'koalas']
    >>> kser.index.names
    ['hello', 'koalas']
    ```
Commits on Dec 5, 2019
  1. Implement Index.fillna (#1102)

    RainFung authored and ueshin committed Dec 5, 2019
    Implement Index.fillna via spark.DataFrame.fillna
  2. Bump version to 0.23.0

    ueshin committed Dec 5, 2019
  3. Set the pyarrow version upper bound again. (#1108)

    ueshin committed Dec 5, 2019
    Seems still unstable with PyArrow 0.15. Let me set the upper bound again for now.
Commits on Dec 4, 2019
  1. Add __doc__ for replaced properties when logging is enabled. (#1107)

    ueshin committed Dec 4, 2019
    When logging is enabled, docs for replaced properties didn't exist, and as a side-effect, doctests for the properties were not run.
  2. Complete NumPy ufunc compatibility (#1106)

    HyukjinKwon authored and ueshin committed Dec 4, 2019
    This PR completes NumPy's ufunc support (followup of #1096).
    
    See also https://docs.scipy.org/doc/numpy/reference/arrays.classes.html#standard-array-subclasses
    
    E.g.:
    
    ```python
    >>> import databricks.koalas as ks
    >>> import numpy as np
    >>> kdf = ks.range(10)
    >>> kser = np.sqrt(kdf.id)
    >>> type(kser)
    <class 'databricks.koalas.series.Series'>
    >>> kser
    0    0.000000
    1    1.000000
    2    1.414214
    3    1.732051
    4    2.000000
    5    2.236068
    6    2.449490
    7    2.645751
    8    2.828427
    9    3.000000
    ```
  3. Fix idxmax() / idxmin() for Series work properly (#1078)

    itholic authored and ueshin committed Dec 4, 2019
    Reopen of #1065
    
    fix more properly with considering of more examples commented at #1065 (comment)
    
    ```python
    >>> pser = pd.Series([1, 100, None, 100, 1, 100], index=[10, 3, 5, 2, 1, 8])
    >>> kser = ks.from_pandas(pser)
    >>>
    >>> pser.idxmax()
    3
    >>> kser.idxmax()
    3
    >>> pser.idxmin()
    10
    >>> kser.idxmin()
    10
    ```
  4. Avoid to return a different DataFrame from Series where (and multiple…

    HyukjinKwon committed Dec 4, 2019
    … small bug fixes) (#1100)
    
    **1..** We can avoid operations on different DataFrames. See below:
    
    ```python
    >>> import databricks.koalas as ks
    >>> kdf = ks.DataFrame({'a': [1,2,3]})
    >>> kdf.a.where(kdf.a > 1)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/.../koalas/databricks/koalas/series.py", line 3916, in where
        kdf['__tmp_cond_col__'] = cond
      File "/.../koalas/databricks/koalas/frame.py", line 8089, in __setitem__
        kdf = align_diff_frames(assign_columns, self, value, fillna=False, how="left")
      File "/.../koalas/databricks/koalas/utils.py", line 187, in align_diff_frames
        combined = combine_frames(this, that, how=how)
      File "/.../koalas/databricks/koalas/utils.py", line 122, in combine_frames
        "Cannot combine the series or dataframe because it comes from a different dataframe. "
    ValueError: Cannot combine the series or dataframe because it comes from a different dataframe. In order to allow this operation, enable 'compute.ops_on_diff_frames' option.
    ```
    
    `kdf.a` is from the same DataFrame but currently it requires `compute.ops_on_diff_frames` to be enabled.
    
    This is because `self.to_frame()` creates another DataFrame and it loses it's anchor DataFrame; therefore, it became a different DataFrame.
    
    
    **2..** We should set `column_index` and `column_index_names`. Otherwise, output DataFrames will lose column indexes (e.g., multi-index columns).
    
    **3..** When we should use `self._kdf` in `Series` API implementation, we should keep check if it matters at output data column because, if we apply some APIs that do not need to create another DataFrame (which is preferred), it will ignore the applied column. See the example below:
    
    ```python
    ser = df.a.round()  # `kdf` is as is but it only replaces `scol`.
    ser._kdf.filter(...).show()  # `scol` is missing in the output
    ```
    
    **4..** Currently, `compute.ops_on_diff_frames` should only be enabled in `OpsOnDiffFramesEnabledTest`. This option is disabled by default and discouraged due to performance issue.
    
    It was mistakenly enabled at `SeriesTest`. I removed it back.
Commits on Dec 3, 2019
  1. Implement Dataframe.iterrows using spark.dataframe.toLocalIterator (#…

    guyao authored and ueshin committed Dec 3, 2019
    …1070)
    
    Resolves #885.
    Inspired by #1003 (comment)
    
    the data type yield from iterator is `pd.Series`. I assume each row is small enough.
  2. Add a basic NumPy ufunc compatibility (#1096)

    HyukjinKwon authored and ueshin committed Dec 3, 2019
    This PR implements `__array_ufunc__`  (see https://docs.scipy.org/doc/numpy/reference/ufuncs.html#output-type-determination) to allow some of basic ufunc can run against Koalas Series and Index (some dunder APIs).
    
    ```python
    >>> import databricks.koalas as ks
    >>> import numpy as np
    >>> kdf = ks.range(10)
    >>> kdf = np.add(kdf.id, kdf.id)
    >>> type(kdf)
    <class 'databricks.koalas.series.Series'>
    >>> kdf
    0     0
    1     2
    2     4
    3     6
    4     8
    5    10
    6    12
    7    14
    8    16
    9    18
    Name: id, dtype: int64
    ```
  3. Produce correct output against multiIndex when 'compute.ops_on_diff_f…

    itholic authored and HyukjinKwon committed Dec 3, 2019
    …rames' is enabled (#1089)
    
    Resolves #1088 
    
    ```python
    >>> kser1
    lama    speed      45.0
    koalas  length    200.0
    cow     speed       1.2
            power      30.0
            length    250.0
    falcon  speed       1.5
            weight    320.0
            power       1.0
    Name: 0, dtype: float64
    
    >>> kser2
    lama    speed     -45.0
            weight    200.0
            length     -1.2
    cow     speed      30.0
            weight   -250.0
            length      1.5
    falcon  speed     320.0
            weight      1.0
            length     -0.3
    Name: 0, dtype: float64
    ```
    
    existing `kser1 + kser2`  works like below which was not correct.
    
    ```python
    >>> kser1 + kser2
    cow     length    None
    falcon  weight    None
    cow     speed     None
    falcon  speed     None
    koalas  length    None
    lama    speed     None
    cow     power     None
    falcon  power     None
            falcon    None
            falcon    None
            falcon    None
    lama    lama      None
            lama      None
            lama      None
    cow     cow       None
            cow       None
            cow       None
    Name: 0, dtype: object
    ```
    
    now it works same with pandas
    
    ```python
    >>> kser1 + kser2
    cow     weight      NaN
            length    251.5
    lama    weight      NaN
    falcon  weight    321.0
    cow     speed      31.2
    lama    length      NaN
    falcon  speed     321.5
    koalas  length      NaN
    lama    speed       0.0
    cow     power       NaN
    falcon  power       NaN
            length      NaN
    Name: 0, dtype: float64
    ```
  4. Implement pct_change() for Series (#1071)

    itholic authored and HyukjinKwon committed Dec 3, 2019
    https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.pct_change.html#pandas.Series.pct_change
    
    ```python
    >>> kser = ks.Series([90, 91, 85])
    >>> kser
    0    90
    1    91
    2    85
    Name: 0, dtype: int64
    
    >>> kser.pct_change()
    0         NaN
    1    0.011111
    2   -0.065934
    Name: 0, dtype: float64
    
    >>> kser.pct_change(periods=2)
    0         NaN
    1         NaN
    2   -0.055556
    Name: 0, dtype: float64
    """
    ```
  5. Allow Series.__getitem__ to take boolean Series (#1075)

    harupy authored and HyukjinKwon committed Dec 3, 2019
    Resolves #1073
Commits on Dec 2, 2019
  1. Implement value_counts() for Index & MultiIndex (#949)

    itholic authored and HyukjinKwon committed Dec 2, 2019
    https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Index.value_counts.html#pandas.Index.value_counts
    
    ```python
    >>> s = ks.Series([0, 1, 2, 3, 4, 5], index=[3, 1, 2, 3, 4, np.nan])
    >>> s.index
    Float64Index([3.0, 1.0, 2.0, 3.0, 4.0, nan], dtype='float64')
    
    >>> s.index.value_counts().sort_index()
    1.0    1
    2.0    1
    3.0    2
    4.0    1
    Name: count, dtype: int64
    
    >>> s.index.value_counts(sort=True)  # doctest: +SKIP
    3.0    2
    4.0    1
    1.0    1
    2.0    1
    Name: count, dtype: int64
    
    >>> s.index.value_counts(normalize=True).sort_index()
    1.0    0.2
    2.0    0.2
    3.0    0.4
    4.0    0.2
    Name: count, dtype: float64
    
    >>> s.index.value_counts(dropna=False).sort_index()
    1.0    1
    2.0    1
    3.0    2
    4.0    1
    NaN    1
    Name: count, dtype: int64
    ```
    
    For MultiIndex
    
    ```python
    >>> midx = pd.MultiIndex([['lama', 'cow', 'falcon'],
    ...                       ['speed', 'weight', 'length']],
    ...                      [[0, 0, 0, 1, 1, 1, 2, 2, 2],
    ...                       [1, 1, 1, 1, 1, 2, 1, 2, 2]])
    >>> s = ks.Series([45, 200, 1.2, 30, 250, 1.5, 320, 1, 0.3], index=midx)
    >>> s.index
    MultiIndex([(  'lama', 'weight'),
                (  'lama', 'weight'),
                (  'lama', 'weight'),
                (   'cow', 'weight'),
                (   'cow', 'weight'),
                (   'cow', 'length'),
                ('falcon', 'weight'),
                ('falcon', 'length'),
                ('falcon', 'length')],
               )
    
    >>> s.index.value_counts().sort_index()
    (cow, length)       1
    (cow, weight)       2
    (falcon, length)    2
    (falcon, weight)    1
    (lama, weight)      3
    Name: count, dtype: int64
    
    >>> s.index.value_counts(normalize=True).sort_index()
    (cow, length)       0.111111
    (cow, weight)       0.222222
    (falcon, length)    0.222222
    (falcon, weight)    0.111111
    (lama, weight)      0.333333
    Name: count, dtype: float64
    ```
  2. Add Null handling for different frames (#1083)

    harupy authored and HyukjinKwon committed Dec 2, 2019
    As @ueshin pointed out in #1029 (comment), the fix #1029 didn't cover the case below. This PR fixes it.
    
    ```python
    >>> ks.options.compute.ops_on_diff_frames = True
    >>> s1 = pd.Series([True, False, True], index=list("ABC"), name="x")
    >>> s2 = pd.Series([True, True, False], index=list("ABD"), name="x")
    >>> s1
    A     True
    B    False
    C     True
    Name: x, dtype: bool
    >>> s2
    A     True
    B     True
    D    False
    Name: x, dtype: bool
    >>> s1 | s2
    A     True
    B     True
    C     True
    D    False
    Name: x, dtype: bool
    >>> (ks.from_pandas(s1) | ks.from_pandas(s2)).sort_index()
    A    True
    B    True
    C    True
    D    None
    Name: x, dtype: object
    ```
  3. Explicitly don't use Series.ravel() (#1074)

    itholic authored and HyukjinKwon committed Dec 2, 2019
    I tried to implement Series.ravel(),
    
    But i think it seems that it requires to compute whole dataset that has potentially raises out of memory.
    
    So i think that we better don't support this function explicitly and let users know alternative. (`to_numpy().ravel()`)
    
    ```python
    >>> ks.Series([1, 2, 3]).ravel()
    Traceback (most recent call last):
    databricks.koalas.exceptions.PandasNotImplementedError: The method `ks.Series.ravel()` is not implemented. If you want to collect your flattened underlying data as an NumPy array, use 'to_numpy().ravel()' instead.
    ```
  4. Implement Index.shape & MultiIndex.shape (#1085)

    RainFung authored and HyukjinKwon committed Dec 2, 2019
  5. Document MultiIndex.nlevels (#1087)

    RainFung authored and HyukjinKwon committed Dec 2, 2019
  6. Implement first_valid_index for DataFrame & Integrate with Series (#1092

    itholic authored and HyukjinKwon committed Dec 2, 2019
    )
    
    Implement DataFrame.first_valid_index
    
    
    ```python
    >>> kdf = ks.DataFrame({'a': [None, 2, 3, 2],
    ...                     'b': [None, 2.0, 3.0, 1.0],
    ...                     'c': [None, 200, 400, 200]},
    ...                     index=['Q', 'W', 'E', 'R'])
    >>> kdf
         a    b      c
    Q  NaN  NaN    NaN
    W  2.0  2.0  200.0
    E  3.0  3.0  400.0
    R  2.0  1.0  200.0
    
    >>> kdf.first_valid_index()
    'W'
    ```
    
    Support for MultiIndex columns
    
    ```
    >>> kdf.columns = pd.MultiIndex.from_tuples([('a', 'x'), ('b', 'y'), ('c', 'z')])
    >>> kdf
         a    b      c
         x    y      z
    Q  NaN  NaN    NaN
    W  2.0  2.0  200.0
    E  3.0  3.0  400.0
    R  2.0  1.0  200.0
    
    >>> kdf.first_valid_index()
    'W'
    ```
    
    and integrate with `Series.first_valid_index()` into `generic.py` since they can share same implementation.
Commits on Nov 27, 2019
  1. Include link to Help Thirsty Koalas Fund (#1082)

    dennyglee authored and ueshin committed Nov 27, 2019
    Include link to Help Thirsty Koalas Fund
  2. Add more assertions for constructors. (#1081)

    ueshin authored and HyukjinKwon committed Nov 27, 2019
  3. Add option_context. (#1077)

    ueshin authored and HyukjinKwon committed Nov 27, 2019
Older
You can’t perform that action at this time.