Flexible binary operations
====

**灵活的二进制操作**

With binary operations between pandas data structures, there are two key points of interest:

在pandas数据结构之间的二进制操作，有两个关键点：

> - Broadcasting behavior between higher- (e.g. DataFrame) and lower-dimensional (e.g. Series) objects.
> - 较高（例如DataFrame）和较低维（例如系列）对象之间的广播行为。
> - Missing data in computations
> - 计算中缺少数据

We will demonstrate how to manage these issues independently, though they can be handled simultaneously.

我们将演示如何独立管理这些问题，尽管它们可以同时处理。

**Matching / broadcasting behavior**

DataFrame has the methods [`add()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.add.html#pandas.DataFrame.add), [`sub()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.sub.html#pandas.DataFrame.sub), [`mul()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.mul.html#pandas.DataFrame.mul), [`div()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.div.html#pandas.DataFrame.div) and related functions [`radd()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.radd.html#pandas.DataFrame.radd), [`rsub()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.rsub.html#pandas.DataFrame.rsub), ... for carrying out binary operations. For broadcasting behavior, Series input is of primary interest. Using these functions, you can use to either match on the *index* or *columns* via the **axis** keyword:

# 匹配广播行为

DataFrame 有这些方法：[`add()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.add.html#pandas.DataFrame.add), [`sub()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.sub.html#pandas.DataFrame.sub), [`mul()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.mul.html#pandas.DataFrame.mul), [`div()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.div.html#pandas.DataFrame.div) 和相关函数 [`radd()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.radd.html#pandas.DataFrame.radd), [`rsub()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.rsub.html#pandas.DataFrame.rsub), ... 用于执行二进制操作。对于广播行为，Series 输入是主要关注点。使用这些函数，您可以使用**axis**关键字匹配*index*或者*columns*：

In [2]:
import numpy as np
import pandas as pd

df = pd.DataFrame({'one' : pd.Series(np.random.randn(3), index=['a', 'b', 'c']),
                   'two' : pd.Series(np.random.randn(4), index=['a', 'b', 'c', 'd']),
                   'three': pd.Series(np.random.randn(3), index=['b', 'c', 'd'])})

In [3]:
df

Unnamed: 0,one,two,three
a,-0.951262,0.748569,
b,-2.300643,0.066718,-0.90884
c,1.387979,-1.886431,2.140897
d,,1.000606,-1.012071


In [4]:
row = df.iloc[1]

column = df['two']

row

one     -2.300643
two      0.066718
three   -0.908840
Name: b, dtype: float64

In [5]:
column

a    0.748569
b    0.066718
c   -1.886431
d    1.000606
Name: two, dtype: float64

In [6]:
df.sub(row, axis = 'columns')

Unnamed: 0,one,two,three
a,1.349381,0.681852,
b,0.0,0.0,0.0
c,3.688622,-1.953149,3.049737
d,,0.933888,-0.103231


In [7]:
df.sub(row, axis=1)

Unnamed: 0,one,two,three
a,1.349381,0.681852,
b,0.0,0.0,0.0
c,3.688622,-1.953149,3.049737
d,,0.933888,-0.103231


In [8]:
df.sub(column, axis='index')

Unnamed: 0,one,two,three
a,-1.699831,0.0,
b,-2.36736,0.0,-0.975558
c,3.274411,0.0,4.027328
d,,0.0,-2.012676


In [9]:
df.sub(column, axis=0)

Unnamed: 0,one,two,three
a,-1.699831,0.0,
b,-2.36736,0.0,-0.975558
c,3.274411,0.0,4.027328
d,,0.0,-2.012676


Furthermore you can align a level of a multi-indexed DataFrame with a Series.

此外，你还可以将一个多索引的DataFrame与Series对齐。

In [10]:
dfmi = df.copy()

dfmi

Unnamed: 0,one,two,three
a,-0.951262,0.748569,
b,-2.300643,0.066718,-0.90884
c,1.387979,-1.886431,2.140897
d,,1.000606,-1.012071


In [11]:
dfmi.index = pd.MultiIndex.from_tuples([(1,'a'),(1,'b'),(1,'c'),(2,'a')],
   ....:                                        names=['first','second'])

dfmi.index

MultiIndex(levels=[[1, 2], ['a', 'b', 'c']],
           labels=[[0, 0, 0, 1], [0, 1, 2, 0]],
           names=['first', 'second'])

In [12]:
dfmi.sub(column, axis=0, level='second')

Unnamed: 0_level_0,Unnamed: 1_level_0,one,two,three
first,second,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,a,-1.699831,0.0,
1,b,-2.36736,0.0,-0.975558
1,c,3.274411,0.0,4.027328
2,a,,0.252036,-1.76064


Series and Index also support the divmod() builtin. This function takes the floor division and modulo operation at the same time returning a two-tuple of the same type as the left hand side. For example:

Series和Index也支持内建的函数`divmod()`。 此函数同时进行向下取整除法和模运算，返回与左侧相同类型的二元组。 例如：

In [13]:
s = pd.Series(np.arange(10))
s

0    0
1    1
2    2
3    3
4    4
5    5
6    6
7    7
8    8
9    9
dtype: int32

In [14]:
div, rem = divmod(s, 3)

In [15]:
div

0    0
1    0
2    0
3    1
4    1
5    1
6    2
7    2
8    2
9    3
dtype: int32

In [16]:
rem

0    0
1    1
2    2
3    0
4    1
5    2
6    0
7    1
8    2
9    0
dtype: int32

In [17]:
idx = pd.Index(np.arange(10))

idx

Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='int64')

In [18]:
div,rem = divmod(idx,3)

In [19]:
div

Int64Index([0, 0, 0, 1, 1, 1, 2, 2, 2, 3], dtype='int64')

In [20]:
rem

Int64Index([0, 1, 2, 0, 1, 2, 0, 1, 2, 0], dtype='int64')

We can also do elementwise [`divmod()`](https://docs.python.org/3/library/functions.html#divmod):

我们也可以进行元素方式的[`divmod()`](https://docs.python.org/3/library/functions.html#divmod):

In [21]:
div, rem = divmod(s, [2, 2, 3, 3, 4, 4, 5, 5, 6, 6])

In [22]:
div

0    0
1    0
2    0
3    1
4    1
5    1
6    1
7    1
8    1
9    1
dtype: int32

In [23]:
rem

0    0
1    1
2    2
3    0
4    0
5    1
6    1
7    2
8    2
9    3
dtype: int32

**Missing data / operations with fill values**

In Series and DataFrame (though not yet in Panel), the arithmetic functions have the option of inputting a *fill_value*, namely a value to substitute when at most one of the values at a location are missing. For example, when adding two DataFrame objects, you may wish to treat NaN as 0 unless both DataFrames are missing that value, in which case the result will be NaN (you can later replace NaN with some other value using `fillna` if you wish).

# 缺少数据 / 填充值操作

在Series和DataFrame中（虽然尚未在Panel中），算术函数可以选择输入**fill_value**，即当一个位置的大多数值缺失时用一个值替换。例如，当添加两个DataFrame对象时，您可能希望将NaN视为0，除非两个DataFrame都缺少该值，两个DataFrame都缺失该值时结果将是NaN（如果您愿意，您可以稍后使用`fillna`替换NaN为其他值）。

        one     three       two
a -1.101558       NaN  1.124472
b -0.177289 -0.634293  2.487104
c  0.462215  1.931194 -0.486066
d       NaN -1.222918 -0.456288



In [42]: df2
Out[42]: 
        one     three       two
a -1.101558  1.000000  1.124472
b -0.177289 -0.634293  2.487104
c  0.462215  1.931194 -0.486066
d       NaN -1.222918 -0.456288

In [24]:
df = pd.DataFrame({'one' : pd.Series(np.random.randn(3), index=['a', 'b', 'c']),
                   'two' : pd.Series(np.random.randn(4), index=['a', 'b', 'c', 'd']),
                   'three': pd.Series(np.random.randn(3), index=['b', 'c', 'd'])})

In [25]:
df

Unnamed: 0,one,two,three
a,0.5226,-1.342961,
b,-0.610163,0.231069,-1.118179
c,0.234201,-0.514295,0.167796
d,,0.247945,1.064971


In [26]:
df2 = pd.DataFrame({'one' : pd.Series(np.random.randn(3), index=['a', 'b', 'c']),
                   'two' : pd.Series(np.random.randn(4), index=['a', 'b', 'c', 'd']),
                   'three': pd.Series(np.random.randn(4), index=['a','b', 'c', 'd'])})

In [27]:
df2

Unnamed: 0,one,two,three
a,1.05107,-0.770872,2.090719
b,-1.264317,0.083863,0.026365
c,-0.904693,-1.133956,1.939191
d,,-0.948008,0.452249


In [28]:
df + df2

Unnamed: 0,one,two,three
a,1.57367,-2.113833,
b,-1.87448,0.314932,-1.091814
c,-0.670492,-1.648251,2.106987
d,,-0.700064,1.517221


In [29]:
df.add(df2, fill_value=0)

Unnamed: 0,one,two,three
a,1.57367,-2.113833,2.090719
b,-1.87448,0.314932,-1.091814
c,-0.670492,-1.648251,2.106987
d,,-0.700064,1.517221


**Flexible Comparisons**

Starting in v0.8, pandas introduced binary comparison methods eq, ne, lt, gt, le, and ge to Series and DataFrame whose behavior is analogous to the binary arithmetic operations described above:

# 灵活的比较

从v0.8开始，pandas将二进制比较方法eq，ne，lt，gt，le和ge引入Series和DataFrame，其行为类似于上述二进制算术运算：

In [30]:
df.gt(df2)

Unnamed: 0,one,two,three
a,False,False,False
b,True,True,False
c,True,True,False
d,False,True,True


In [31]:
df2.ne(df)

Unnamed: 0,one,two,three
a,True,True,True
b,True,True,True
c,True,True,True
d,True,True,True


These operations produce a pandas object the same type as the left-hand-side input that if of dtype `bool`. These `boolean` objects can be used in indexing operations, see [here](http://pandas.pydata.org/pandas-docs/version/0.20.3/indexing.html#indexing-boolean)

这些操作产生的pandas对象与左侧输入的类型相同，如果是dtype`bool`。 这些`boolean`对象可用于索引操作，参见[here](http://pandas.pydata.org/pandas-docs/version/0.20.3/indexing.html#indexing-boolean)

**Boolean Reductions**

You can apply the reductions: [`empty`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.empty.html#pandas.DataFrame.empty), [`any()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.any.html#pandas.DataFrame.any), [`all()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.all.html#pandas.DataFrame.all), and [`bool()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.bool.html#pandas.DataFrame.bool) to provide a way to summarize a boolean result.


# Boolean 缩小 

可以应用这些缩小: [`empty`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.empty.html#pandas.DataFrame.empty), [`any()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.any.html#pandas.DataFrame.any), [`all()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.all.html#pandas.DataFrame.all), and [`bool()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.bool.html#pandas.DataFrame.bool) 以提供boolean结果的摘要。

In [32]:
(df > 0).all()

one      False
two      False
three    False
dtype: bool

In [33]:
(df > 0).any()

one      True
two      True
three    True
dtype: bool

You can reduce to a final boolean value.

可以减少到一个最终布尔值。

In [34]:
(df > 0).any().any()

True

You can test if a pandas object is empty, via the empty property.

可以通过empty属性测试pandas对象是否为空。

In [35]:
df.empty

False

In [36]:
pd.DataFrame(columns=list('ABC')).empty

True

To evaluate single-element pandas objects in a boolean context, use the method bool():

要在布尔上下文中评估单元素pandas对象，请使用方法bool（）：

In [37]:
pd.Series([True]).bool()

True

In [38]:
pd.Series([False]).bool()

False

In [39]:
pd.DataFrame([[True]]).bool()

True

In [40]:
pd.DataFrame([[False]]).bool()

False

**Warning** You might be tempted to do the following: 

你可能会尝试这样做：
```
>>> if df:
     ...
```
Or

```
>>> df and df2
```


These both will raise as you are trying to compare multiple values.

两者都会触发错误，由于你尝试比较多个值。

`ValueError: The truth value of an array is ambiguous. Use a.empty, a.any() or a.all().`

See [gotchas](http://pandas.pydata.org/pandas-docs/version/0.20.3/gotchas.html#gotchas-truth) for a more detailed discussion.

**Comparing if objects are equivalent**

Often you may find there is more than one way to compute the same result. As a simple example, consider `df+df`and `df*2`. To test that these two computations produce the same result, given the tools shown above, you might imagine using `(df+df == df*2).all()`. But in fact, this expression is False:

# 比较对象是否等效

通常，您可能会发现计算相同结果的方法不止一种。 举个简单的例子，考虑`df + df`和`df * 2`。 为了测试这两个计算产生相同的结果，给定上面显示的工具，你可以想象使用`（df + df == df * 2）.all（）`。 但事实上，这个表达式是错误的：

In [41]:
df+df == df*2

Unnamed: 0,one,two,three
a,True,True,False
b,True,True,True
c,True,True,True
d,False,True,True


In [42]:
(df+df == df*2).all()

one      False
two       True
three    False
dtype: bool

Notice that the boolean DataFrame `df+df == df*2` contains some False values! That is because NaNs do not compare as equals:

请注意，布尔DataFrame`df + df == df * 2`包含一些False值！ 那是因为 NaN 不等于 NaN：

In [43]:
np.nan == np.nan

False

So, as of v0.13.1, NDFrames (such as Series, DataFrames, and Panels) have an [`equals()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.equals.html#pandas.DataFrame.equals) method for testing equality, with NaNs in corresponding locations treated as equal.

因此，从v0.13.1开始，NDFrames（如Series，DataFrames和Panels）有一个 [`equals()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.equals.html#pandas.DataFrame.equals)方法测试相等性，相应位置的NaN被视为相等。

In [44]:
(df+df).equals(df*2)

True

Note that the Series or DataFrame index needs to be in the same order for equality to be True:

请注意，Series或DataFrame索引的顺序必须相同才能为True：

In [45]:
df1 = pd.DataFrame({'col':['foo', 0, np.nan]})

df2 = pd.DataFrame({'col':[np.nan, 0, 'foo']}, index=[2,1,0])

In [46]:
df1

Unnamed: 0,col
0,foo
1,0
2,


In [47]:
df2

Unnamed: 0,col
2,
1,0
0,foo


In [48]:
df1.equals(df2)

False

In [49]:
df1.equals(df2.sort_index())

True

**Comparing array-like objects**

You can conveniently do element-wise comparisons when comparing a pandas data structure with a scalar value:

# 比较类似数组的对象

在将pandas数据结构与标量值进行比较时，您可以方便地进行元素比较：

In [50]:
pd.Series(['foo', 'bar', 'baz']) == 'foo'

0     True
1    False
2    False
dtype: bool

In [51]:
pd.Index(['foo', 'bar', 'baz']) == 'foo'

array([ True, False, False])

Pandas also handles element-wise comparisons between different array-like objects of the same length:

Pandas还处理相同长度的不同的类似数组对象之间的元素比较：

In [52]:
pd.Series(['foo', 'bar', 'baz']) == pd.Index(['foo', 'bar', 'qux'])

0     True
1     True
2    False
dtype: bool

In [53]:
pd.Series(['foo', 'bar', 'baz']) == np.array(['foo', 'bar', 'qux'])

0     True
1     True
2    False
dtype: bool

Trying to compare `Index` or `Series` objects of different lengths will raise a ValueError:

试图比较不同长度的`Index`或`Series`对象会引发ValueError：

In [54]:
pd.Series(['foo', 'bar', 'baz']) == pd.Series(['foo', 'bar'])

ValueError: Can only compare identically-labeled Series objects

In [55]:
pd.Series(['foo', 'bar', 'baz']) == pd.Series(['foo'])

ValueError: Can only compare identically-labeled Series objects

Note that this is different from the numpy behavior where a comparison can be broadcast:

请注意，这与可以广播比较的numpy行为不同：

In [56]:
np.array([1, 2, 3]) == np.array([2])

array([False,  True, False])

or it can return False if broadcasting can not be done:

如果无法进行广播，它可以返回False：

In [57]:
np.array([1, 2, 3]) == np.array([1, 2])

  """Entry point for launching an IPython kernel.


False

**Combining overlapping data sets**

A problem occasionally arising is the combination of two similar data sets where values in one are preferred over the other. An example would be two data series representing a particular economic indicator where one is considered to be of “higher quality”. However, the lower quality series might extend further back in history or have more complete data coverage. As such, we would like to combine two DataFrame objects where missing values in one DataFrame are conditionally filled with like-labeled values from the other DataFrame. The function implementing this operation is [`combine_first()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.combine_first.html#pandas.DataFrame.combine_first), which we illustrate:

# 组合重叠数据集

偶尔出现的问题是两个相似数据集的组合，其中一个中的值优先于另一个。 一个例子是代表特定经济指标的两个数据系列，其中一个被认为具有“更高质量”。 但是，较低质量的系列可能会延续到历史，或者拥有更完整的数据覆盖范围。 因此，我们希望组合两个DataFrame对象，其中一个DataFrame中的缺失值有条件地填充来自其他DataFrame的类似标记的值。 实现此操作的函数是[`combine_first()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.combine_first.html#pandas.DataFrame.combine_first) ，说明如下：

In [58]:
df1 = pd.DataFrame({'A' : [1., np.nan, 3., 5., np.nan],
                    'B' : [np.nan, 2., 3., np.nan, 6.]})

df2 = pd.DataFrame({'A' : [5., 2., 4., np.nan, 3., 7.],
                    'B' : [np.nan, np.nan, 3., 4., 6., 8.]})

In [59]:
df1

Unnamed: 0,A,B
0,1.0,
1,,2.0
2,3.0,3.0
3,5.0,
4,,6.0


In [60]:
df2

Unnamed: 0,A,B
0,5.0,
1,2.0,
2,4.0,3.0
3,,4.0
4,3.0,6.0
5,7.0,8.0


In [61]:
df1.combine_first(df2)

Unnamed: 0,A,B
0,1.0,
1,2.0,2.0
2,3.0,3.0
3,5.0,4.0
4,3.0,6.0
5,7.0,8.0


In [62]:
df2.combine_first(df1)

Unnamed: 0,A,B
0,5.0,
1,2.0,2.0
2,4.0,3.0
3,5.0,4.0
4,3.0,6.0
5,7.0,8.0


**General DataFrame Combine**

The [`combine_first()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.combine_first.html#pandas.DataFrame.combine_first) method above calls the more general DataFrame method [`combine()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.combine.html#pandas.DataFrame.combine). This method takes another DataFrame and a combiner function, aligns the input DataFrame and then passes the combiner function pairs of Series (i.e., columns whose names are the same).

So, for instance, to reproduce [`combine_first()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.combine_first.html#pandas.DataFrame.combine_first) as above:

# 一般DataFrame 组合

上面的[`combine_first()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.combine_first.html#pandas.DataFrame.combine_first)方法调用更一般的DataFrame方法[`combine()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.combine.html#pandas.DataFrame.combine). 这个方法接受另一个DataFrame，并且传递一对 Series 组合器函数，(例如，名称相同的列).

因此，例如，要重现上面的[`combine_first()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.combine_first.html#pandas.DataFrame.combine_first):

In [64]:
combiner = lambda x, y: np.where(pd.isnull(x), y, x)

combiner

<function __main__.<lambda>(x, y)>

In [65]:
df1.combine(df2, combiner)

Unnamed: 0,A,B
0,1.0,
1,2.0,2.0
2,3.0,3.0
3,5.0,4.0
4,3.0,6.0
5,7.0,8.0
