# Set Operations in NumPy

In NumPy, set operations are commonly used when working with arrays of data. Just like mathematical set operations, the purpose is to find elements that share some sort of relationship. Here, I'll guide you through some of the most commonly used set operations in NumPy.


In [1]:
import numpy as np


## np.unique()

The `np.unique()` function returns the sorted unique elements of an array. This is a great way to identify all unique elements within your data.


In [2]:
# Defining the array
arr = np.array([1, 1, 2, 2, 3, 3, 4, 4, 5, 5])

# Using np.unique()
unique = np.unique(arr)
print(unique)

[1 2 3 4 5]



## np.intersect1d()

The `np.intersect1d()` function returns the sorted, common elements in two or more arrays.


In [3]:
# Defining the arrays
arr1 = np.array([1, 2, 3, 4, 5])
arr2 = np.array([3, 4, 5, 6, 7])

# Using np.intersect1d()
intersect = np.intersect1d(arr1, arr2)
print(intersect)

[3 4 5]



## np.union1d()

The `np.union1d()` function returns the sorted union of elements (i.e., all unique elements).


In [4]:
# Using np.union1d()
union = np.union1d(arr1, arr2)
print(union)

[1 2 3 4 5 6 7]



## np.setdiff1d()

The `np.setdiff1d()` function returns the sorted elements in array1 that are not in array2.


In [5]:
# Using np.setdiff1d()
diff = np.setdiff1d(arr1, arr2)
print(diff)

[1 2]



## np.setxor1d()

The `np.setxor1d()` function returns the sorted elements that are only in one (not both) of the input arrays.


In [6]:
# Using np.setxor1d()
xor = np.setxor1d(arr1, arr2)
print(xor)

[1 2 6 7]


In [10]:
customers_january = np.array([100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110])
customers_february = np.array([106, 107, 108, 111, 112, 113, 114, 115])

In [16]:
# find unique elements in the array

# Finding unique customer IDs across both moths
all_customers = np.union1d(customers_january, customers_february)
unique_customers = np.unique(all_customers)
print('Unique customers IDs:', unique_customers)
print('Number of unique customers:', len(unique_customers))

Unique customers IDs: [100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115]
Number of unique customers: 16


In [17]:
# Finding common customer IDs across both moths
common_customers = np.intersect1d(customers_january, customers_february)
print('Common customers IDs:', common_customers)
print('Number of common customers:', len(common_customers))

Common customers IDs: [106 107 108]
Number of common customers: 3


In [18]:
# Finding customer IDs present in January but not in February
lost_customers = np.setdiff1d(customers_january, customers_february)
print('Lost customers IDs:', lost_customers)
print('Number of lost customers:', len(lost_customers))

Lost customers IDs: [100 101 102 103 104 105 109 110]
Number of lost customers: 8



These operations are a powerful way to compare datasets and identify similarities and differences. These are crucial operations in data preprocessing and data analysis.

> Content created by [**Carlos Cruz-Maldonado**](https://www.linkedin.com/in/carloscruzmaldonado/).  
> I am available to answer any questions or provide further assistance.   
> Feel free to reach out to me at any time.  