[back](./13-dataframe-operations.ipynb)

---
## `DataFrame Comparisons and Iterations`

- [Comparing entire DataFrames](#comparing-entire-dataframes)
- [Comparing DataFrame rows and columns](#comparing-dataframe-rows-and-columns)
- [Iterating through DataFrames](#iterating-through-dataframes)


### `Initial Setup`

In [1]:
# Importing Pandas

import pandas as pd
import numpy as np


In [2]:
# Data set-up

df1 = pd.DataFrame({
    'col1': {'row1': 1, 'row2': 1, 'row3': 3},
    'col2': {'row1': 4, 'row3': 9, 'row4': 6}
})

df2 = pd.DataFrame({
    'col1': {'row1': 2, 'row2': 5, 'row3': 1},
    'col2': {'row1': 4, 'row3': 8, 'row4': 7}
})

def reset_df1():
  global df1
  df1 = pd.DataFrame({
      'col1': {'row1': 1, 'row2': 1, 'row3': 3},
      'col2': {'row1': 4, 'row3': 9, 'row4': 6}
  })
  print_df1()

def reset_df2():
  global df2
  df2 = pd.DataFrame({
      'col1': {'row1': 2, 'row2': 5, 'row3': 1},
      'col2': {'row1': 4, 'row3': 8, 'row4': 7}
  })
  print_df2()


def print_df1():
  print('Original DataFrame 1:')
  print(df1)
  divider()

def print_df2():
  print('Original DataFrame 2:')
  print(df2)
  divider()

def divider():
  print('-'*80)


print_df1()
print_df2()


Original DataFrame 1:
      col1  col2
row1   1.0   4.0
row2   1.0   NaN
row3   3.0   9.0
row4   NaN   6.0
--------------------------------------------------------------------------------
Original DataFrame 2:
      col1  col2
row1   2.0   4.0
row2   5.0   NaN
row3   1.0   8.0
row4   NaN   7.0
--------------------------------------------------------------------------------


### `Comparing entire DataFrames`

The simplest for of comparisons are using the basic comparison operators like `<`, `>`, `==`, `!=`, `<=`, `>=`

Much like with Series, these will return a new DataFrame with `True` or `False` values

In [3]:
print_df1()
print_df2()

print('Comparing if df1 > df2:')
print(df1 > df2)

Original DataFrame 1:
      col1  col2
row1   1.0   4.0
row2   1.0   NaN
row3   3.0   9.0
row4   NaN   6.0
--------------------------------------------------------------------------------
Original DataFrame 2:
      col1  col2
row1   2.0   4.0
row2   5.0   NaN
row3   1.0   8.0
row4   NaN   7.0
--------------------------------------------------------------------------------
Comparing if df1 > df2:
       col1   col2
row1  False  False
row2  False  False
row3   True   True
row4  False  False



---
We can also use functions instead of the basic operators like `lt`, `gt`, `eq`, `ne`, `le`, `ge`

In [4]:
print_df1()
print_df2()

print('Comparing if df1.lt(df2):')
print(df1.lt(df2))


Original DataFrame 1:
      col1  col2
row1   1.0   4.0
row2   1.0   NaN
row3   3.0   9.0
row4   NaN   6.0
--------------------------------------------------------------------------------
Original DataFrame 2:
      col1  col2
row1   2.0   4.0
row2   5.0   NaN
row3   1.0   8.0
row4   NaN   7.0
--------------------------------------------------------------------------------
Comparing if df1.lt(df2):
       col1   col2
row1   True  False
row2   True  False
row3  False  False
row4  False   True



---
**NOTE:** This is different from the `.equals()`, where this functions just return a single value, `True` if all the elements are equal, `False` otherwise

In [5]:
print_df1()
print_df2()

print('df1.equals(df2):')
print(df1.equals(df2))

Original DataFrame 1:
      col1  col2
row1   1.0   4.0
row2   1.0   NaN
row3   3.0   9.0
row4   NaN   6.0
--------------------------------------------------------------------------------
Original DataFrame 2:
      col1  col2
row1   2.0   4.0
row2   5.0   NaN
row3   1.0   8.0
row4   NaN   7.0
--------------------------------------------------------------------------------
df1.equals(df2):
False


### `Comparing DataFrame rows and columns`

Comparing rows and columns are similar to Series

In [6]:
print_df1()
print_df2()

# Using basic operator

print('If df1[\'col1\'] > df2[\'col1\']:')
print(df1['col1'] > df2['col1'])

Original DataFrame 1:
      col1  col2
row1   1.0   4.0
row2   1.0   NaN
row3   3.0   9.0
row4   NaN   6.0
--------------------------------------------------------------------------------
Original DataFrame 2:
      col1  col2
row1   2.0   4.0
row2   5.0   NaN
row3   1.0   8.0
row4   NaN   7.0
--------------------------------------------------------------------------------
If df1['col1'] > df2['col1']:
row1    False
row2    False
row3     True
row4    False
Name: col1, dtype: bool


In [7]:
print_df1()
print_df2()

# Using inbuilt function

print('If df1[\'col1\'].gt(df2[\'col1\']):')
print(df1['col1'].gt(df2['col1']))

Original DataFrame 1:
      col1  col2
row1   1.0   4.0
row2   1.0   NaN
row3   3.0   9.0
row4   NaN   6.0
--------------------------------------------------------------------------------
Original DataFrame 2:
      col1  col2
row1   2.0   4.0
row2   5.0   NaN
row3   1.0   8.0
row4   NaN   7.0
--------------------------------------------------------------------------------
If df1['col1'].gt(df2['col1']):
row1    False
row2    False
row3     True
row4    False
Name: col1, dtype: bool


In [8]:
print_df1()
print_df2()

# Comparing across rows, using default operators

print('If df1.loc[\'row1\'] < df2.loc[\'row1\']:')
print(df1.loc['row1'] < df2.loc['row1'])


Original DataFrame 1:
      col1  col2
row1   1.0   4.0
row2   1.0   NaN
row3   3.0   9.0
row4   NaN   6.0
--------------------------------------------------------------------------------
Original DataFrame 2:
      col1  col2
row1   2.0   4.0
row2   5.0   NaN
row3   1.0   8.0
row4   NaN   7.0
--------------------------------------------------------------------------------
If df1.loc['row1'] < df2.loc['row1']:
col1     True
col2    False
Name: row1, dtype: bool


In [9]:
print_df1()
print_df2()

# Comparing across rows, using inbuilt function

print('If df1.loc[\'row1\'].gt(df2.loc[\'row1\']):')
print(df1.loc['row1'].gt(df2.loc['row1']))


Original DataFrame 1:
      col1  col2
row1   1.0   4.0
row2   1.0   NaN
row3   3.0   9.0
row4   NaN   6.0
--------------------------------------------------------------------------------
Original DataFrame 2:
      col1  col2
row1   2.0   4.0
row2   5.0   NaN
row3   1.0   8.0
row4   NaN   7.0
--------------------------------------------------------------------------------
If df1.loc['row1'].gt(df2.loc['row1']):
col1    False
col2    False
Name: row1, dtype: bool


### `Iterating through DataFrames`

There are a lot of ways in which we can iterate through a DataFrame

- Columns
- Indexes
- Items (will help us get the column labels and index values)
- Rows (will help us get the row labels and column values)

In [10]:
print_df1()

# Columns (labels)
for col in df1:
  print(col)
divider()

Original DataFrame 1:
      col1  col2
row1   1.0   4.0
row2   1.0   NaN
row3   3.0   9.0
row4   NaN   6.0
--------------------------------------------------------------------------------
col1
col2
--------------------------------------------------------------------------------


In [11]:
print_df1()

# Row (labels)
for row in df1.index:
  print(row)
divider()

Original DataFrame 1:
      col1  col2
row1   1.0   4.0
row2   1.0   NaN
row3   3.0   9.0
row4   NaN   6.0
--------------------------------------------------------------------------------
row1
row2
row3
row4
--------------------------------------------------------------------------------


In [12]:
print_df1()

# Iterate and print individual elements in a DataFrame
print('Iterating based on column:')
divider()
for col_name, row_val in df1.items():
  print(col_name)
  print(row_val)
  print(type(row_val))
  divider()

Original DataFrame 1:
      col1  col2
row1   1.0   4.0
row2   1.0   NaN
row3   3.0   9.0
row4   NaN   6.0
--------------------------------------------------------------------------------
Iterating based on column:
--------------------------------------------------------------------------------
col1
row1    1.0
row2    1.0
row3    3.0
row4    NaN
Name: col1, dtype: float64
<class 'pandas.core.series.Series'>
--------------------------------------------------------------------------------
col2
row1    4.0
row2    NaN
row3    9.0
row4    6.0
Name: col2, dtype: float64
<class 'pandas.core.series.Series'>
--------------------------------------------------------------------------------


In [13]:
print_df1()

# Iterate and print individual elements in a DataFrame
print('Iterating based on rows:')
divider()
for row_name, col_val in df1.iterrows():
  print(row_name)
  print(col_val)
  divider()


Original DataFrame 1:
      col1  col2
row1   1.0   4.0
row2   1.0   NaN
row3   3.0   9.0
row4   NaN   6.0
--------------------------------------------------------------------------------
Iterating based on rows:
--------------------------------------------------------------------------------
row1
col1    1.0
col2    4.0
Name: row1, dtype: float64
--------------------------------------------------------------------------------
row2
col1    1.0
col2    NaN
Name: row2, dtype: float64
--------------------------------------------------------------------------------
row3
col1    3.0
col2    9.0
Name: row3, dtype: float64
--------------------------------------------------------------------------------
row4
col1    NaN
col2    6.0
Name: row4, dtype: float64
--------------------------------------------------------------------------------



---
[next](./15-reading-csv-into-dataframe.ipynb)