Iteration
====

The behavior of basic iteration over pandas objects depends on the type. When iterating over a Series, it is regarded as array-like, and basic iteration produces the values. Other data structures, like DataFrame and Panel, follow the dict-like convention of iterating over the “keys” of the objects.

基本迭代对pandas对象的行为取决于类型。 迭代一系列时，它被视为类似数组，基本迭代产生值。 其他数据结构，如DataFrame和Panel，遵循迭代对象“键”的类似dict的约定。

In short, basic iteration (`for i in object`) produces:

简而言之，基本迭代 (`for i in object`) 生成:

- **Series**: values
- **DataFrame**: column labels
- **Panel**: item labels

Thus, for example, iterating over a DataFrame gives you the column names:

因此，例如，迭代一个DataFrame会给你列名称：

In [2]:
import numpy as np
import pandas as pd

df = pd.DataFrame({'col1' : np.random.randn(3), 'col2' : np.random.randn(3)},
                  index=['a', 'b', 'c'])

df

Unnamed: 0,col1,col2
a,0.510256,-0.621711
b,1.598638,-0.5748
c,1.244569,0.931037


In [3]:
for col in df:
    print(col)

col1
col2


Pandas objects also have the dict-like [`iteritems()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.iteritems.html#pandas.DataFrame.iteritems) method to iterate over the (key, value) pairs.

Pandas 对象也有类似字典的[`iteritems()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.iteritems.html#pandas.DataFrame.iteritems) 方法以迭代`(key, value)`对。

To iterate over the rows of a DataFrame, you can use the following methods:

- [`iterrows()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.iterrows.html#pandas.DataFrame.iterrows): Iterate over the rows of a DataFrame as (index, Series) pairs. This converts the rows to Series objects, which can change the dtypes and has some performance implications.
- [`itertuples()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.itertuples.html#pandas.DataFrame.itertuples): Iterate over the rows of a DataFrame as namedtuples of the values. This is a lot faster than [`iterrows()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.iterrows.html#pandas.DataFrame.iterrows), and is in most cases preferable to use to iterate over the values of a DataFrame.

要迭代DataFrame的行，您可以使用以下方法：

- [`iterrows()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.iterrows.html#pandas.DataFrame.iterrows): 遍历DataFrame的行作为`(index, Series)`对. 这会将各行转换为Series对象，这会改变dtype，并具有一定的性能影响。
- [`itertuples()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.itertuples.html#pandas.DataFrame.itertuples): 遍历DataFrame的行作为值的名称元组。这个比[`iterrows()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.iterrows.html#pandas.DataFrame.iterrows)快得多, 并且在大多数情况下，最好用于迭代DataFrame的值。

**Warning**

Iterating through pandas objects is generally **slow**. In many cases, iterating manually over the rows is not needed and can be avoided with one of the following approaches:

- Look for a *vectorized* solution: many operations can be performed using built-in methods or numpy functions, (boolean) indexing, ...
- When you have a function that cannot work on the full DataFrame/Series at once, it is better to use [`apply()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.apply.html#pandas.DataFrame.apply) instead of iterating over the values. See the docs on [function application](http://pandas.pydata.org/pandas-docs/version/0.20.3/basics.html#basics-apply).
- If you need to do iterative manipulations on the values but performance is important, consider writing the inner loop using e.g. cython or numba. See the [enhancing performance](http://pandas.pydata.org/pandas-docs/version/0.20.3/enhancingperf.html#enhancingperf) section for some examples of this approach.

**Warning**

通过pandas对象迭代通常是**slow**。在许多情况下，不需要在行上手动迭代，可以使用以下方法之一避免：

- 寻找*矢量化*解决方案：许多操作可以使用内置方法或numpy函数，（布尔）索引，...
- 当你的函数不能同时在完整的DataFrame / Series上工作时，最好使用[`apply()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.apply.html#pandas.DataFrame.apply) 而不是迭代值。参见文档 [function application](http://pandas.pydata.org/pandas-docs/version/0.20.3/basics.html#basics-apply).
- 如果您需要对值进行迭代操作但性能很重要，请考虑使用cython 或 numba 来编写内部循环。有关此方法的一些例子请参见 [enhancing performance](http://pandas.pydata.org/pandas-docs/version/0.20.3/enhancingperf.html#enhancingperf) 部分。

**Warning**

You should **never modify** something you are iterating over. This is not guaranteed to work in all cases. Depending on the data types, the iterator returns a copy and not a view, and writing to it will have no effect!

For example, in the following case setting the value has no effect:

**警告**

应该**永远不要修改**你正在迭代的东西。这并不能保证在所有情况下都有效。根据数据类型，迭代器返回副本而不是视图，写入它将无效！

例如，在以下情况设置中，该值无效：

In [4]:
df = pd.DataFrame({'a': [1, 2, 3], 'b': ['a', 'b', 'c']})
df

Unnamed: 0,a,b
0,1,a
1,2,b
2,3,c


In [5]:
for index, row in df.iterrows():
    row['a'] = 10

In [6]:
df

Unnamed: 0,a,b
0,1,a
1,2,b
2,3,c


**iteritems**

Consistent with the dict-like interface, [`iteritems()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.iteritems.html#pandas.DataFrame.iteritems) iterates through key-value pairs:

- **Series**: (index, scalar value) pairs
- **DataFrame**: (column, Series) pairs
- **Panel**: (item, DataFrame) pairs

For example:

# iteritems

与 dict-like 接口一致, [`iteritems()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.iteritems.html#pandas.DataFrame.iteritems) 通过 key-value 对迭代：

- **Series**: (index, scalar value) pairs
- **DataFrame**: (column, Series) pairs
- **Panel**: (item, DataFrame) pairs

For example:

In [7]:
wp = pd.Panel(np.random.randn(2, 5, 4), items=['Item1', 'Item2'],
              major_axis=pd.date_range('1/1/2000', periods=5),
              minor_axis=['A', 'B', 'C', 'D'])

wp

Panel is deprecated and will be removed in a future version.
The recommended way to represent these types of 3-dimensional data are with a MultiIndex on a DataFrame, via the Panel.to_frame() method
Alternatively, you can use the xarray package http://xarray.pydata.org/en/stable/.
Pandas provides a `.to_xarray()` method to help automate this conversion.

  exec(code_obj, self.user_global_ns, self.user_ns)


<class 'pandas.core.panel.Panel'>
Dimensions: 2 (items) x 5 (major_axis) x 4 (minor_axis)
Items axis: Item1 to Item2
Major_axis axis: 2000-01-01 00:00:00 to 2000-01-05 00:00:00
Minor_axis axis: A to D

In [8]:
for item, frame in wp.iteritems():
    print(item)
    print(frame)

Item1
                   A         B         C         D
2000-01-01  0.972171 -1.700848  0.105587 -0.955621
2000-01-02 -1.417761  0.687126  1.023519  0.789865
2000-01-03 -0.316405  0.174491  0.573109  0.552623
2000-01-04 -1.708996  0.257747 -0.255739  0.885072
2000-01-05  0.393431 -0.235166  0.967107 -1.576310
Item2
                   A         B         C         D
2000-01-01  3.297379  0.261529  1.295011  0.150658
2000-01-02 -1.350281  0.357317 -0.481826  1.300997
2000-01-03 -0.445518 -0.299142  1.079205  0.898727
2000-01-04  0.396400 -0.593469  1.041679  0.675742
2000-01-05  0.376862 -0.005547  0.642474  2.700365


# iterrows

[`iterrows()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.iterrows.html#pandas.DataFrame.iterrows) allows you to iterate through the rows of a DataFrame as Series objects. It returns an iterator yielding each index value along with a Series containing the data in each row:

[`iterrows()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.iterrows.html#pandas.DataFrame.iterrows) 允许迭代DataFrame的全部行作为Series对象。它返回一个迭代器，产生每个索引值以及包含每行数据的Series：

In [9]:
for row_index, row in df.iterrows():
    print('%s\n%s' % (row_index, row))

0
a    1
b    a
Name: 0, dtype: object
1
a    2
b    b
Name: 1, dtype: object
2
a    3
b    c
Name: 2, dtype: object


**Note**

Because [`iterrows()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.iterrows.html#pandas.DataFrame.iterrows) returns a Series for each row, it does **not** preserve dtypes across the rows (dtypes are preserved across columns for DataFrames). For example,

**注意** 

因为[`iterrows()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.iterrows.html#pandas.DataFrame.iterrows) 为每行返回一个Series, 它**not**保留行中的dtypes(dtypes 被保留在DataFrames的列中。）。例如：

In [11]:
df_orig = pd.DataFrame([[1, 1.5]], columns=['int', 'float'])

df_orig.dtypes

int        int64
float    float64
dtype: object

In [13]:
row = next(df_orig.iterrows())[1]

row

int      1.0
float    1.5
Name: 0, dtype: float64

All values in `row`, returned as a Series, are now upcasted to floats, also the original integer value in column x:

作为Series返回的`row`中的所有值现在都被上传为浮点数，也是列x中的原始整数值：

In [14]:
row['int'].dtype

dtype('float64')

In [15]:
df_orig['int'].dtype

dtype('int64')

To preserve dtypes while iterating over the rows, it is better to use [`itertuples()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.itertuples.html#pandas.DataFrame.itertuples) which returns namedtuples of the values and which is generally much faster as `iterrows`.

为了在迭代行时保留dtypes，最好使用[`itertuples()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.itertuples.html#pandas.DataFrame.itertuples) ，它返回值的 namedtuples并且通常比`iterrows`快很多。

For instance, a contrived way to transpose the DataFrame would be:

例如，转换DataFrame的一种人为的方法是：

In [17]:
df2 = pd.DataFrame({'x': [1, 2, 3], 'y': [4, 5, 6]})

df2

Unnamed: 0,x,y
0,1,4
1,2,5
2,3,6


In [18]:
df2.T

Unnamed: 0,0,1,2
x,1,2,3
y,4,5,6


In [19]:
df2_t = pd.DataFrame(dict((idx,values) for idx, values in df2.iterrows()))

In [20]:
df2_t

Unnamed: 0,0,1,2
x,1,2,3
y,4,5,6


**itertuples**

The [`itertuples()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.itertuples.html#pandas.DataFrame.itertuples) method will return an iterator yielding a namedtuple for each row in the DataFrame. The first element of the tuple will be the row’s corresponding index value, while the remaining values are the row values.

For instance,

# itertuples

[`itertuples()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.itertuples.html#pandas.DataFrame.itertuples) 方法为DataFrames的每行返回一个迭代器产生一个namedtuple。元组的第一个元素是行的相应索引值，而其余值是行值。

例如,

In [21]:
for row in df.itertuples():
    print(row)

Pandas(Index=0, a=1, b='a')
Pandas(Index=1, a=2, b='b')
Pandas(Index=2, a=3, b='c')


This method does not convert the row to a Series object but just returns the values inside a namedtuple. Therefore, [`itertuples()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.itertuples.html#pandas.DataFrame.itertuples) preserves the data type of the values and is generally faster as [`iterrows()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.iterrows.html#pandas.DataFrame.iterrows).

此方法不会将行转换为Series对象，而只返回namedtuple中的值。因此, [`itertuples()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.itertuples.html#pandas.DataFrame.itertuples) 保留值的数据类型并且通常比[`iterrows()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.iterrows.html#pandas.DataFrame.iterrows)更快.

Note The column names will be renamed to positional names if they are invalid Python identifiers, repeated, or start with an underscore. With a large number of columns (>255), regular tuples are returned.

**注意** 如果列名称是无效的Python标识符，重复或以下划线开头，则列名称将重命名为位置名称。 使用大量列（> 255）时，将返回常规元组。