Link to Medium blog post: https://towardsdatascience.com/how-to-iterate-over-rows-in-a-pandas-dataframe-6aa173fc6c84

### Do you really need to iterate over rows?

As highlighted in the official pandas documentation, the iteration through DataFrames is very inefficient and it can usually be avoided. Usually, pandas newcomers are not familiar with the concept of vectorisation and are unaware that most operations in pandas should (and can) be performed in a non-iterative context.

Before attempting to iterate through pandas objects, you must first ensure that none of the options below suit the needs of your use-case:

- Vectorisation over iteration: pandas comes with rich set of built-in methods whose performance is optimised. Most of the operations could potentially be performed using one of these methods. Additionally, you can even take a look at numpy and check whether any of its functions can be used in your context.
- Applying a function to rows: A common requirement is definitely when it comes to apply a function to every row, which designed to work — say — over only one row at a time, and not on the full DataFrame or Series. In such cases, it’s always best to use apply() method instead of iterating through the pandas object. For more details, you can refer to this section of the pandas documentation that explains how to apply your own or another library’s functions to pandas objects.
- Iterative manipulations: In case you need to perform iterative manipulations and at the same time performance is a concern, then you may have to take a look into cython or numba. For more details around these concepts you can read this section of the pandas documentation.
- Printing a DataFrame: If you want to print out a DataFrame then simply use DataFrame.to_string() method in order to render the DataFrame to a console-friendly tabular output.

### Iterating over the rows of a DataFrame

In case none of the above options will work for you, then you may still want to iterate through pandas objects. You can do so using either iterrows() or itertuples() built-in methods.

Before seeing both methods in action, let’s create an example DataFrame that we’ll use to iterate over.

In [1]:
import pandas as pd

df  = pd.DataFrame({
    'colA': [1, 2, 3, 4, 5],
    'colB': ['a', 'b', 'c', 'd', 'e'],
    'colC': [True, True, False, True, False],
})
print(df)

   colA colB   colC
0     1    a   True
1     2    b   True
2     3    c  False
3     4    d   True
4     5    e  False


pandas.DataFrame.iterrows() method is used to iterate over DataFrame rows as (index, Series) pairs. Note that this method does not preserve the dtypes across rows due to the fact that this method will convert each row into a Series. If you need to preserve the dtypes of the pandas object, then you should use itertuples() method instead.

In [2]:
for index, row in df.iterrows():
    print(row['colA'], row['colB'], row['colC'])

1 a True
2 b True
3 c False
4 d True
5 e False


pandas.DataFrame.itertuples() method is used to iterate over DataFrame rows as namedtuples. In general, itertuples() is expected to be faster compared to iterrows().


In [3]:
for row in df.itertuples():
    print(row.colA, row.colB, row.colC)

1 a True
2 b True
3 c False
4 d True
5 e False


### Modifying while iterating over rows

At this point, it’s important to highlight that you should never modify a pandas DataFrame or Series you are iterating over. Depending on the data types of your pandas object, the iterator may return a copy of the object rather than a view. In this case, writing anything to a copy won’t have the desired effect.

For instance, let’s suppose we want to double the values of each row in colA. An iterative approach won’t do the trick:

In [4]:
for index, row in df.iterrows():
  row['colA'] = row['colA'] * 2
print(df)

   colA colB   colC
0     1    a   True
1     2    b   True
2     3    c  False
3     4    d   True
4     5    e  False


In similar use-cases, you should use apply() method instead.

In [5]:
df['colA'] = df['colA'].apply(lambda x: x * 2)
print(df)


   colA colB   colC
0     2    a   True
1     4    b   True
2     6    c  False
3     8    d   True
4    10    e  False
