# Is it possible to compute set difference between dataframes?
[reference](https://stackoverflow.com/questions/18180763/set-difference-for-pandas)

In [1]:
import pandas as pd
import numpy as np

Test data:

In [2]:
# An example for which there are differences between dataframes
df1 = pd.DataFrame({'col1':[10,2,3], 
                    'col2':[40,5,6]})

df2 = pd.DataFrame({'col1':[1,2,30], 
                    'col2':[4,5,60]})

# An example w/o any differences between rows, but, the rows' positions are different.
# df1 = pd.DataFrame({'col1':[1,2,3], 
#                     'col2':[4,5,6]})
# df2 = pd.DataFrame({'col1':[3,2,1], 
#                     'col2':[6,5,4]})

In [3]:
print(df1,"\n")
print(df2)

   col1  col2
0    10    40
1     2     5
2     3     6 

   col1  col2
0     1     4
1     2     5
2    30    60


## Using numpy

df1 - df2:

In [4]:
setdf = pd.DataFrame({
    col: np.setdiff1d(getattr(df1, col).values, getattr(df2, col).values)
    for col in df1.columns
})

In [5]:
print(setdf)

   col1  col2
0     3     6
1    10    40


## Using Pandas MultiIndex objects

Pandas MultiIndex objects have fast set operations implemented as methods, so you can convert the DataFrames to MultiIndexes, use the difference() method, then convert the result back to a DataFrame.

The columns should be in the same order in both dataframes

In [6]:
df1mi = pd.MultiIndex.from_frame(df1)
df2mi = pd.MultiIndex.from_frame(df2)

dfdiff = df1mi.difference(df2mi).to_frame().reset_index(drop=True)

print(dfdiff)

   col1  col2
0     3     6
1    10    40
