In [1]:
import numpy as np
import pandas as pd

- items()      --> iterates on **columns** / like a dictionary 
- iterrows()   --> iterates on **rows** / doesn't preserve the datatypes
- itertuples() -->  iterates on **rows** / faster than iterrows / returns a **namedtuple** / preserves the datatypes 

<br>

### iteration on Series

In [3]:
s = pd.Series(np.random.randn(5))
s

0   -0.686844
1   -0.869235
2   -0.492136
3   -0.499241
4    0.594399
dtype: float64

In [4]:
for i in s :
    print(i)

-0.6868443204149748
-0.8692350769605786
-0.4921364952704538
-0.4992410637393864
0.5943986093123964


it will iterate on Serie's elements

### iteration on DataFrame

In [5]:
df = pd.DataFrame(
    {
        'a': [1, 2, 3],
        'b': ['a', 'b', 'c'],
        'c': [1.2, 2.55, 0.005]
    }
)
df

Unnamed: 0,a,b,c
0,1,a,1.2
1,2,b,2.55
2,3,c,0.005


In [6]:
for i in df :
    print(i)

a
b
c


iteration on DataFrames will return the column names<br> <br>

**Warning**

Iterating through pandas objects is generally slow. In many cases,<br></br>
iterating manually over the rows is not needed and can be avoided with one of the following approaches:

- Look for a **vectorized** solution: many operations can be performed using **built-in methods or NumPy functions**, (boolean) indexing, …
- When you have a function that cannot work on the full DataFrame/Series at once, it is better to use **apply()** instead of iterating over the values.
- If you need to do iterative manipulations on the values but performance is important, consider writing the inner loop with cython or numba. See the **enhancing performance section of pandas docs** for some examples of this approach.

**Warning**<br>
You should never modify something you are iterating over.<br>
This is not guaranteed to work in all cases. Depending on the data types, the iterator returns a copy and not a view, and writing to it will have no effect!

For example, in the following case setting the value has no effect:

In [19]:
for index, value in df.iterrows() :
    value[0] += 1
print(df)

   a  b      c
0  1  a  1.200
1  2  b  2.550
2  3  c  0.005


as you can see, the dataframe hasn't changed, because in the iteration, the pandas gave us a **copy**<br>
of the dataframe, and not the actuall dataframe

Note: To update a dataframe in pandas while iterating row by row, you can use **df.at**:

In [27]:
for index, value in df.iterrows() :
    df.at[index, 'a'] += 1
print(df)

   a  b      c
0  2  a  1.200
1  3  b  2.550
2  4  c  0.005


and as we can see, it has changed!!!

### items()

Consistent with the **dict-like** interface, items() iterates through key-value pairs: <br>
it iterates through **columns**

In [44]:
for column, value in df.items() :
    print(f"---- {column:^2}----")
    print(value, end='\n\n')

---- a ----
0    2
1    3
2    4
Name: a, dtype: int64

---- b ----
0    a
1    b
2    c
Name: b, dtype: object

---- c ----
0    1.200
1    2.550
2    0.005
Name: c, dtype: float64



### iterrows

In [46]:
for index, value in df.iterrows() :
    print(f"---- {column:^2}----")
    print(value, end='\n\n')

---- c ----
a      2
b      a
c    1.2
Name: 0, dtype: object

---- c ----
a       3
b       b
c    2.55
Name: 1, dtype: object

---- c ----
a        4
b        c
c    0.005
Name: 2, dtype: object



**Note**: ‌ ‌ Because iterrows() returns a Series for each row, **it does not preserve dtypes across the rows** (dtypes are preserved across columns for DataFrames). 

‌To preserve dtypes while iterating over the rows, it is better to use **itertuples()** which returns **namedtuples** of the values and which is generally much **faster** than iterrows(). ‌ For instance, a contrived way to transpose the DataFrame would be:

In [47]:
df

Unnamed: 0,a,b,c
0,2,a,1.2
1,3,b,2.55
2,4,c,0.005


In [48]:
df_T = df.T
df_T

Unnamed: 0,0,1,2
a,2,3,4
b,a,b,c
c,1.2,2.55,0.005


In [56]:
pd.DataFrame({key: value for key, value in df.iterrows()})

Unnamed: 0,0,1,2
a,2,3,4
b,a,b,c
c,1.2,2.55,0.005


### itertuples()

The itertuples() method will return an iterator yielding a **namedtuple** for each row in the DataFrame.<br> The first element of the tuple will be the row’s corresponding index value, while the remaining values are the row values.

In [60]:
for value in df.itertuples() :
    print(value)

Pandas(Index=0, a=2, b='a', c=1.2)
Pandas(Index=1, a=3, b='b', c=2.55)
Pandas(Index=2, a=4, b='c', c=0.005)


**WARNING**<BR>The column names will be renamed to positional names if they are invalid Python identifiers, repeated, or start with an underscore

In [67]:
tmp = pd.DataFrame(
    {'name': ['ali', 'kazem'],
     'family name': ['a', 'b'],
     'age': [1, 2]
    }
)
tmp.set_index('name', inplace=True)
tmp

Unnamed: 0_level_0,family name,age
name,Unnamed: 1_level_1,Unnamed: 2_level_1
ali,a,1
kazem,b,2


In [68]:
for i in tmp.itertuples() :
    print(i)

Pandas(Index='ali', _1='a', age=1)
Pandas(Index='kazem', _1='b', age=2)


as you can see, the col_name of the dataframe has changed to _1