Flexible binary operations
====

**灵活的二进制操作**

With binary operations between pandas data structures, there are two key points of interest:

在pandas数据结构之间的二进制操作，有两个关键点：

> - Broadcasting behavior between higher- (e.g. DataFrame) and lower-dimensional (e.g. Series) objects.
> - 较高（例如DataFrame）和较低维（例如系列）对象之间的广播行为。
> - Missing data in computations
> - 计算中缺少数据

We will demonstrate how to manage these issues independently, though they can be handled simultaneously.

我们将演示如何独立管理这些问题，尽管它们可以同时处理。

**Matching / broadcasting behavior**

DataFrame has the methods [`add()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.add.html#pandas.DataFrame.add), [`sub()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.sub.html#pandas.DataFrame.sub), [`mul()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.mul.html#pandas.DataFrame.mul), [`div()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.div.html#pandas.DataFrame.div) and related functions [`radd()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.radd.html#pandas.DataFrame.radd), [`rsub()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.rsub.html#pandas.DataFrame.rsub), ... for carrying out binary operations. For broadcasting behavior, Series input is of primary interest. Using these functions, you can use to either match on the *index* or *columns* via the **axis** keyword:

# 匹配广播行为

DataFrame 有这些方法：[`add()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.add.html#pandas.DataFrame.add), [`sub()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.sub.html#pandas.DataFrame.sub), [`mul()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.mul.html#pandas.DataFrame.mul), [`div()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.div.html#pandas.DataFrame.div) 和相关函数 [`radd()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.radd.html#pandas.DataFrame.radd), [`rsub()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.rsub.html#pandas.DataFrame.rsub), ... 用于执行二进制操作。对于广播行为，Series 输入是主要关注点。使用这些函数，您可以使用**axis**关键字匹配*index*或者*columns*：

In [129]:
import numpy as np
import pandas as pd

df = pd.DataFrame({'one' : pd.Series(np.random.randn(3), index=['a', 'b', 'c']),
                   'two' : pd.Series(np.random.randn(4), index=['a', 'b', 'c', 'd']),
                   'three': pd.Series(np.random.randn(3), index=['b', 'c', 'd'])})

In [130]:
df

Unnamed: 0,one,two,three
a,0.810245,-1.062772,
b,-1.24789,0.546119,0.585633
c,-0.503521,0.459264,-0.803528
d,,0.552072,-0.219398


In [131]:
row = df.iloc[1]

column = df['two']

row

one     -1.247890
two      0.546119
three    0.585633
Name: b, dtype: float64

In [132]:
column

a   -1.062772
b    0.546119
c    0.459264
d    0.552072
Name: two, dtype: float64

In [133]:
df.sub(row, axis = 'columns')

Unnamed: 0,one,two,three
a,2.058136,-1.608892,
b,0.0,0.0,0.0
c,0.744369,-0.086855,-1.38916
d,,0.005953,-0.80503


In [134]:
df.sub(row, axis=1)

Unnamed: 0,one,two,three
a,2.058136,-1.608892,
b,0.0,0.0,0.0
c,0.744369,-0.086855,-1.38916
d,,0.005953,-0.80503


In [135]:
df.sub(column, axis='index')

Unnamed: 0,one,two,three
a,1.873018,0.0,
b,-1.79401,0.0,0.039513
c,-0.962785,0.0,-1.262792
d,,0.0,-0.77147


In [136]:
df.sub(column, axis=0)

Unnamed: 0,one,two,three
a,1.873018,0.0,
b,-1.79401,0.0,0.039513
c,-0.962785,0.0,-1.262792
d,,0.0,-0.77147


Furthermore you can align a level of a multi-indexed DataFrame with a Series.

此外，你还可以将一个多索引的DataFrame与Series对齐。

In [137]:
dfmi = df.copy()

dfmi

Unnamed: 0,one,two,three
a,0.810245,-1.062772,
b,-1.24789,0.546119,0.585633
c,-0.503521,0.459264,-0.803528
d,,0.552072,-0.219398


In [138]:
dfmi.index = pd.MultiIndex.from_tuples([(1,'a'),(1,'b'),(1,'c'),(2,'a')],
   ....:                                        names=['first','second'])

dfmi.index

MultiIndex(levels=[[1, 2], ['a', 'b', 'c']],
           labels=[[0, 0, 0, 1], [0, 1, 2, 0]],
           names=['first', 'second'])

In [139]:
dfmi.sub(column, axis=0, level='second')

Unnamed: 0_level_0,Unnamed: 1_level_0,one,two,three
first,second,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,a,1.873018,0.0,
1,b,-1.79401,0.0,0.039513
1,c,-0.962785,0.0,-1.262792
2,a,,1.614844,0.843375


Series and Index also support the divmod() builtin. This function takes the floor division and modulo operation at the same time returning a two-tuple of the same type as the left hand side. For example:

Series和Index也支持内建的函数`divmod()`。 此函数同时进行向下取整除法和模运算，返回与左侧相同类型的二元组。 例如：

In [140]:
s = pd.Series(np.arange(10))
s

0    0
1    1
2    2
3    3
4    4
5    5
6    6
7    7
8    8
9    9
dtype: int32

In [141]:
div, rem = divmod(s, 3)

In [142]:
div

0    0
1    0
2    0
3    1
4    1
5    1
6    2
7    2
8    2
9    3
dtype: int32

In [143]:
rem

0    0
1    1
2    2
3    0
4    1
5    2
6    0
7    1
8    2
9    0
dtype: int32

In [144]:
idx = pd.Index(np.arange(10))

idx

Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='int64')

In [145]:
div,rem = divmod(idx,3)

In [146]:
div

Int64Index([0, 0, 0, 1, 1, 1, 2, 2, 2, 3], dtype='int64')

In [147]:
rem

Int64Index([0, 1, 2, 0, 1, 2, 0, 1, 2, 0], dtype='int64')

We can also do elementwise [`divmod()`](https://docs.python.org/3/library/functions.html#divmod):

我们也可以进行元素方式的[`divmod()`](https://docs.python.org/3/library/functions.html#divmod):

In [148]:
div, rem = divmod(s, [2, 2, 3, 3, 4, 4, 5, 5, 6, 6])

In [149]:
div

0    0
1    0
2    0
3    1
4    1
5    1
6    1
7    1
8    1
9    1
dtype: int32

In [150]:
rem

0    0
1    1
2    2
3    0
4    0
5    1
6    1
7    2
8    2
9    3
dtype: int32

**Missing data / operations with fill values**

In Series and DataFrame (though not yet in Panel), the arithmetic functions have the option of inputting a *fill_value*, namely a value to substitute when at most one of the values at a location are missing. For example, when adding two DataFrame objects, you may wish to treat NaN as 0 unless both DataFrames are missing that value, in which case the result will be NaN (you can later replace NaN with some other value using `fillna` if you wish).

# 缺少数据 / 填充值操作

在Series和DataFrame中（虽然尚未在Panel中），算术函数可以选择输入**fill_value**，即当一个位置的大多数值缺失时用一个值替换。例如，当添加两个DataFrame对象时，您可能希望将NaN视为0，除非两个DataFrame都缺少该值，两个DataFrame都缺失该值时结果将是NaN（如果您愿意，您可以稍后使用`fillna`替换NaN为其他值）。

        one     three       two
a -1.101558       NaN  1.124472
b -0.177289 -0.634293  2.487104
c  0.462215  1.931194 -0.486066
d       NaN -1.222918 -0.456288



In [42]: df2
Out[42]: 
        one     three       two
a -1.101558  1.000000  1.124472
b -0.177289 -0.634293  2.487104
c  0.462215  1.931194 -0.486066
d       NaN -1.222918 -0.456288

In [151]:
df = pd.DataFrame({'one' : pd.Series(np.random.randn(3), index=['a', 'b', 'c']),
                   'two' : pd.Series(np.random.randn(4), index=['a', 'b', 'c', 'd']),
                   'three': pd.Series(np.random.randn(3), index=['b', 'c', 'd'])})

In [152]:
df

Unnamed: 0,one,two,three
a,0.947482,-1.371815,
b,1.619801,-0.810365,-1.542614
c,0.477267,-0.027348,0.486127
d,,-1.236228,-0.518147


In [153]:
df2 = pd.DataFrame({'one' : pd.Series(np.random.randn(3), index=['a', 'b', 'c']),
                   'two' : pd.Series(np.random.randn(4), index=['a', 'b', 'c', 'd']),
                   'three': pd.Series(np.random.randn(4), index=['a','b', 'c', 'd'])})

In [154]:
df2

Unnamed: 0,one,two,three
a,-0.207255,0.112672,0.555723
b,1.369395,2.230512,-0.313849
c,-0.879237,0.359246,-1.422131
d,,0.978959,-1.371178


In [155]:
df + df2

Unnamed: 0,one,two,three
a,0.740227,-1.259143,
b,2.989197,1.420148,-1.856464
c,-0.40197,0.331898,-0.936004
d,,-0.257268,-1.889325


In [156]:
df.add(df2, fill_value=0)

Unnamed: 0,one,two,three
a,0.740227,-1.259143,0.555723
b,2.989197,1.420148,-1.856464
c,-0.40197,0.331898,-0.936004
d,,-0.257268,-1.889325


**Flexible Comparisons**

Starting in v0.8, pandas introduced binary comparison methods eq, ne, lt, gt, le, and ge to Series and DataFrame whose behavior is analogous to the binary arithmetic operations described above:

# 灵活的比较

从v0.8开始，pandas将二进制比较方法eq，ne，lt，gt，le和ge引入Series和DataFrame，其行为类似于上述二进制算术运算：

In [157]:
df.gt(df2)

Unnamed: 0,one,two,three
a,True,False,False
b,True,False,False
c,True,False,True
d,False,False,True


In [158]:
df2.ne(df)

Unnamed: 0,one,two,three
a,True,True,True
b,True,True,True
c,True,True,True
d,True,True,True


These operations produce a pandas object the same type as the left-hand-side input that if of dtype `bool`. These `boolean` objects can be used in indexing operations, see [here](http://pandas.pydata.org/pandas-docs/version/0.20.3/indexing.html#indexing-boolean)

这些操作产生的pandas对象与左侧输入的类型相同，如果是dtype`bool`。 这些`boolean`对象可用于索引操作，参见[here](http://pandas.pydata.org/pandas-docs/version/0.20.3/indexing.html#indexing-boolean)

**Boolean Reductions**

You can apply the reductions: [`empty`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.empty.html#pandas.DataFrame.empty), [`any()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.any.html#pandas.DataFrame.any), [`all()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.all.html#pandas.DataFrame.all), and [`bool()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.bool.html#pandas.DataFrame.bool) to provide a way to summarize a boolean result.


# Boolean Reductions Bolean缩小

可以应用这些缩小: [`empty`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.empty.html#pandas.DataFrame.empty), [`any()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.any.html#pandas.DataFrame.any), [`all()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.all.html#pandas.DataFrame.all), and [`bool()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.bool.html#pandas.DataFrame.bool) 以提供boolean结果的摘要。

In [159]:
(df > 0).all()

one      False
two      False
three    False
dtype: bool

In [160]:
(df > 0).any()

one       True
two      False
three     True
dtype: bool

You can reduce to a final boolean value.

可以减少到一个最终布尔值。

In [161]:
(df > 0).any().any()

True

You can test if a pandas object is empty, via the empty property.

可以通过empty属性测试pandas对象是否为空。

In [162]:
df.empty

False

In [163]:
pd.DataFrame(columns=list('ABC')).empty

True

To evaluate single-element pandas objects in a boolean context, use the method bool():

要在布尔上下文中评估单元素pandas对象，请使用方法bool（）：

In [164]:
pd.Series([True]).bool()

True

In [165]:
pd.Series([False]).bool()

False

In [166]:
pd.DataFrame([[True]]).bool()

True

In [167]:
pd.DataFrame([[False]]).bool()

False

**Warning** You might be tempted to do the following: 

你可能会尝试这样做：
```
>>> if df:
     ...
```
Or

```
>>> df and df2
```


These both will raise as you are trying to compare multiple values.

两者都会触发错误，由于你尝试比较多个值。

`ValueError: The truth value of an array is ambiguous. Use a.empty, a.any() or a.all().`

See [gotchas](http://pandas.pydata.org/pandas-docs/version/0.20.3/gotchas.html#gotchas-truth) for a more detailed discussion.

**Comparing if objects are equivalent**

Often you may find there is more than one way to compute the same result. As a simple example, consider `df+df`and `df*2`. To test that these two computations produce the same result, given the tools shown above, you might imagine using `(df+df == df*2).all()`. But in fact, this expression is False:

# 比较对象是否等效

通常，您可能会发现计算相同结果的方法不止一种。 举个简单的例子，考虑`df + df`和`df * 2`。 为了测试这两个计算产生相同的结果，给定上面显示的工具，你可以想象使用`（df + df == df * 2）.all（）`。 但事实上，这个表达式是错误的：

In [168]:
df+df == df*2

Unnamed: 0,one,two,three
a,True,True,False
b,True,True,True
c,True,True,True
d,False,True,True


In [169]:
(df+df == df*2).all()

one      False
two       True
three    False
dtype: bool

Notice that the boolean DataFrame `df+df == df*2` contains some False values! That is because NaNs do not compare as equals:

请注意，布尔DataFrame`df + df == df * 2`包含一些False值！ 那是因为 NaN 不等于 NaN：

In [170]:
np.nan == np.nan

False

So, as of v0.13.1, NDFrames (such as Series, DataFrames, and Panels) have an [`equals()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.equals.html#pandas.DataFrame.equals) method for testing equality, with NaNs in corresponding locations treated as equal.

因此，从v0.13.1开始，NDFrames（如Series，DataFrames和Panels）有一个 [`equals()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.equals.html#pandas.DataFrame.equals)方法测试相等性，相应位置的NaN被视为相等。

In [171]:
(df+df).equals(df*2)

True

Note that the Series or DataFrame index needs to be in the same order for equality to be True:

请注意，Series或DataFrame索引的顺序必须相同才能为True：

In [172]:
df1 = pd.DataFrame({'col':['foo', 0, np.nan]})

df2 = pd.DataFrame({'col':[np.nan, 0, 'foo']}, index=[2,1,0])

In [173]:
df1

Unnamed: 0,col
0,foo
1,0
2,


In [174]:
df2

Unnamed: 0,col
2,
1,0
0,foo


In [175]:
df1.equals(df2)

False

In [176]:
df1.equals(df2.sort_index())

True