# Chapter 19: Looping and Aggregation

- We may want to apply operations over items in a dataframe.
- We can use the ``.apply`` method or an aggregation method

Three methods to loop over a dataframe
- ``.iteritems`` method gives you a tuple with a column name and the column (a series)
- ``.iterrows`` method gives you a tuple with a index value and row (converted into a series)
- ``.itertuples`` method gives you a row represented as a named tuple (with the index in position 0)

In [1]:
import pandas as pd
import numpy as np

url = 'https://github.com/mattharrison/datasets/raw/master/data/siena2018-pres.csv'
df = pd.read_csv(url, index_col=0)

In [2]:
def tweak_siena_pres(df):
    def int64_to_uint8(df_):
        cols = df_.select_dtypes('int64')
        return (df_
                .astype({col:'uint8' for col in cols}))


    return (df
     .rename(columns={'Seq.':'Seq'})    # 1 removes period from column name Eq.
     .rename(columns={k:v.replace(' ', '_') for k,v in
        {'Bg': 'Background',
         'PL': 'Party leadership', 'CAb': 'Communication ability',
         'RC': 'Relations with Congress', 'CAp': 'Court appointments',
         'HE': 'Handling of economy', 'L': 'Luck',
         'AC': 'Ability to compromise', 'WR': 'Willing to take risks',
         'EAp': 'Executive appointments', 'OA': 'Overall ability',
         'Im': 'Imagination', 'DA': 'Domestic accomplishments',
         'Int': 'Integrity', 'EAb': 'Executive ability',
         'FPA': 'Foreign policy accomplishments',
         'LA': 'Leadership ability',
         'IQ': 'Intelligence', 'AM': 'Avoid crucial mistakes',
         'EV': "Experts' view", 'O': 'Overall'}.items()})
     .astype({'Party':'category'})  # 2 sets the type of Party column to category
     .pipe(int64_to_uint8)  # 3 converts all the int64 columns to unsigned 8-bit columns
     .assign(Average_rank=lambda df_:(df_.select_dtypes('uint8') # 4 creates am average_rank column
                 .sum(axis=1).rank(method='dense').astype('uint8')),
             Quartile=lambda df_:pd.qcut(df_.Average_rank, 4,
                 labels='1st 2nd 3rd 4th'.split())
            )
    )

In [3]:
pres = tweak_siena_pres(df)

## 19.1 For Loops

In [7]:
# iteration over columns
for col_name, col in pres.iteritems():
    print(col_name, type(col))


Seq <class 'pandas.core.series.Series'>
President <class 'pandas.core.series.Series'>
Party <class 'pandas.core.series.Series'>
Background <class 'pandas.core.series.Series'>
Imagination <class 'pandas.core.series.Series'>
Integrity <class 'pandas.core.series.Series'>
Intelligence <class 'pandas.core.series.Series'>
Luck <class 'pandas.core.series.Series'>
Willing_to_take_risks <class 'pandas.core.series.Series'>
Ability_to_compromise <class 'pandas.core.series.Series'>
Executive_ability <class 'pandas.core.series.Series'>
Leadership_ability <class 'pandas.core.series.Series'>
Communication_ability <class 'pandas.core.series.Series'>
Overall_ability <class 'pandas.core.series.Series'>
Party_leadership <class 'pandas.core.series.Series'>
Relations_with_Congress <class 'pandas.core.series.Series'>
Court_appointments <class 'pandas.core.series.Series'>
Handling_of_economy <class 'pandas.core.series.Series'>
Executive_appointments <class 'pandas.core.series.Series'>
Domestic_accomplishment

  for col_name, col in pres.iteritems():


In [9]:
# iteration over rows
for idx, row in pres.iterrows():
    print(idx, type(row))

1 <class 'pandas.core.series.Series'>
2 <class 'pandas.core.series.Series'>
3 <class 'pandas.core.series.Series'>
4 <class 'pandas.core.series.Series'>
5 <class 'pandas.core.series.Series'>
6 <class 'pandas.core.series.Series'>
7 <class 'pandas.core.series.Series'>
8 <class 'pandas.core.series.Series'>
9 <class 'pandas.core.series.Series'>
10 <class 'pandas.core.series.Series'>
11 <class 'pandas.core.series.Series'>
12 <class 'pandas.core.series.Series'>
13 <class 'pandas.core.series.Series'>
14 <class 'pandas.core.series.Series'>
15 <class 'pandas.core.series.Series'>
16 <class 'pandas.core.series.Series'>
17 <class 'pandas.core.series.Series'>
18 <class 'pandas.core.series.Series'>
19 <class 'pandas.core.series.Series'>
20 <class 'pandas.core.series.Series'>
21 <class 'pandas.core.series.Series'>
22 <class 'pandas.core.series.Series'>
23 <class 'pandas.core.series.Series'>
24 <class 'pandas.core.series.Series'>
25 <class 'pandas.core.series.Series'>
26 <class 'pandas.core.series.Seri

In [11]:
# iteration over rows as namedtuple
for tup in pres.itertuples():
    print(tup[0], tup.Party)

1 Independent
2 Federalist
3 Democratic-Republican
4 Democratic-Republican
5 Democratic-Republican
6 Democratic-Republican
7 Democratic
8 Democratic
9 Whig
10 Independent
11 Democratic
12 Whig
13 Whig
14 Democratic
15 Democratic
16 Republican
17 Democratic
18 Republican
19 Republican
20 Republican
21 Republican
22 Democratic
23 Republican
24 Republican
25 Republican
26 Republican
27 Democratic
28 Republican
29 Republican
30 Republican
31 Democratic
32 Democratic
33 Republican
34 Democratic
35 Democratic
36 Republican
37 Republican
38 Democratic
39 Republican
40 Republican
41 Democratic
42 Republican
43 Democratic
44 Republican


## 19.3 The .apply Method

- Like the series method, we should be wary of using the dataframe method
- If dealing with numbers, we might want to see if we can operate in a vectorized way
- When we call ``.apply`` on a dataframe, we work on the whole row or whole column.

In [12]:
# using apply method
# calls the function on each column
(pres
.select_dtypes('number')
.apply(lambda row: row.max()-row.min(), axis='columns'))

1     17
2     28
3     19
4     16
5     13
6     28
7     34
8     18
9     22
10    19
11    16
12    15
13     8
14     3
15     8
16    27
17    10
18    21
19    13
20    21
21    24
22    12
23     8
24    21
25    13
26    19
27    28
28    10
29    26
30    31
31    15
32    27
33    18
34    28
35    38
36    31
37    23
38    35
39    28
40    19
41    36
42    24
43    22
44    34
dtype: uint8

In [13]:
# using a better method
(pres
.select_dtypes('number')
.pipe(lambda df_:df_.max(axis='columns') - df.min(axis='columns')))

  .pipe(lambda df_:df_.max(axis='columns') - df.min(axis='columns')))


1     17
2     28
3     19
4     16
5     13
6     28
7     34
8     18
9     22
10    19
11    16
12    15
13     8
14     3
15     8
16    27
17    10
18    21
19    13
20    21
21    24
22    12
23     8
24    21
25    13
26    19
27    28
28    10
29    26
30    31
31    15
32    27
33    18
34    28
35    38
36    31
37    23
38    35
39    28
40    19
41    36
42    24
43    22
44    34
dtype: int64