Learn how to find the distinct values in a column of a Pandas DataFrame.

# Using the <font color='brown'>unique</font> and <font color='brown'>nunique</font> functions


The columns in a <font color='red'>DataFrame</font> might be categorical or continuous. The categorical columns may have many distinct but finite numbers of values, whereas continuous ones can take infinitely many values. In this sense, the columns can be considered as either discrete or categorical random variables.

Checking the number of distinct values in a categorical column is an essential part of exploratory data analysis. The <font color='red'>nunique</font> function returns <u>the number of distinct values in a column</u> and the <font color='red'>unique</font> function actually <u>shows the unique values</u>. We can apply them to the product group column in the <font color='brown'>sales</font>.


In [2]:
import pandas as pd

sales = pd.read_csv("sales.csv")

print(sales.head(4))

   product_code product_group  stock_qty    cost    price  last_week_sales  \
0          4187           PG2        498  420.76   569.91               13   
1          4195           PG2        473  545.64   712.41               16   
2          4204           PG2        968  640.42   854.91               22   
3          4219           PG2        241  869.69  1034.55               14   

   last_month_sales  
0                58  
1                58  
2                88  
3                45  


In [3]:
print(sales["product_group"].nunique())

print(sales["product_group"].unique())

6
['PG2' 'PG4' 'PG6' 'PG5' 'PG3' 'PG1']


# The <font color='brown'>value_counts</font> function
There are six distinct product groups. We might also need to check <u>how many rows each product group has</u>. The <font color='red'>value_counts</font> function returns all the distinct values in a column along with the number of their occurrences.

In [4]:
import pandas as pd

sales = pd.read_csv("sales.csv")

print(sales["product_group"].value_counts())

product_group
PG4    349
PG5    255
PG6    243
PG2     75
PG3     39
PG1     39
Name: count, dtype: int64


The <font color = 'red'>value_counts</font> function is one of the most frequently used Pandas functions because it provides a quick way of exploring a categorical column.


