![image.png](attachment:image.png)

![image.png](attachment:image.png)

# Pandas.apply()

Pandas.apply allow the users to pass a function and apply it on every single value of the Pandas series. It comes as a huge improvement for the pandas library as this function helps to segregate data according to the conditions required due to which it is efficiently used in data science and machine learning.

Installation:
Import the Pandas module into the python file using the following commands on the terminal:

pip install pandas
To read the csv file and squeezing it into a pandas series following commands are used:

import pandas as pd
s = pd.read_csv("stock.csv", squeeze=True)
Syntax:


s.apply(func, convert_dtype=True, args=())
Parameters:

func: .apply takes a function and applies it to all values of pandas series.
convert_dtype: Convert dtype as per the function’s operation.
args=(): Additional arguments to pass to function instead of series.
Return Type: Pandas Series after applied function/operation.

In [None]:
import pandas as pd
  
# reading csv
s = pd.read_csv("stock.csv", squeeze = True)
  
# defining function to check price
def fun(num):
  
    if num<200:
        return "Low"
  
    elif num>= 200 and num<400:
        return "Normal"
  
    else:
        return "High"
  
# passing function to apply and storing returned series in new
new = s.apply(fun)
  
# printing first 3 element
print(new.head(3))
  
# printing elements somewhere near the middle of series
print(new[1400], new[1500], new[1600])
  
# printing last 3 elements
print(new.tail(3))

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

# Apply function to every row in a Pandas DataFrame


Python is a great language for performing data analysis tasks. It provides with a huge amount of Classes and function which help in analyzing and manipulating data in an easier way. 
One can use apply() function in order to apply function to every row in given dataframe. Let’s see the ways we can do this task.
Example #1: 

In [1]:

# Import pandas package
import pandas as pd
 
# Function to add
def add(a, b, c):
    return a + b + c
 
def main():
     
    # create a dictionary with
    # three fields each
    data = {
            'A':[1, 2, 3],
            'B':[4, 5, 6],
            'C':[7, 8, 9] }
     
    # Convert the dictionary into DataFrame
    df = pd.DataFrame(data)
    print("Original DataFrame:\n", df)
     
    df['add'] = df.apply(lambda row : add(row['A'],
                     row['B'], row['C']), axis = 1)
  
    print('\nAfter Applying Function: ')
    # printing the new dataframe
    print(df)
  
if __name__ == '__main__':
    main()

Original DataFrame:
    A  B  C
0  1  4  7
1  2  5  8
2  3  6  9

After Applying Function: 
   A  B  C  add
0  1  4  7   12
1  2  5  8   15
2  3  6  9   18


You can use the numpy function as the parameters to the dataframe as well.

In [2]:
import pandas as pd
import numpy as np
  
def main():
     
    # create a dictionary with
    # five fields each
    data = {
            'A':[1, 2, 3],
            'B':[4, 5, 6],
            'C':[7, 8, 9] }
     
    # Convert the dictionary into DataFrame
    df = pd.DataFrame(data)
    print("Original DataFrame:\n", df)
     
    # applying function to each row in the dataframe
    # and storing result in a new column
    df['add'] = df.apply(np.sum, axis = 1)
  
    print('\nAfter Applying Function: ')
    # printing the new dataframe
    print(df)
  
if __name__ == '__main__':
    main()

Original DataFrame:
    A  B  C
0  1  4  7
1  2  5  8
2  3  6  9

After Applying Function: 
   A  B  C  add
0  1  4  7   12
1  2  5  8   15
2  3  6  9   18


# Normalization



In [3]:
# Import pandas package
import pandas as pd
 
def normalize(x, y):
    x_new = ((x - np.mean([x, y])) /
             (max(x, y) - min(x, y)))
     
    # print(x_new)
    return x_new
 
def main():
     
    # create a dictionary with three fields each
    data = {
        'X':[1, 2, 3],
        'Y':[45, 65, 89] }
     
    # Convert the dictionary into DataFrame
    df = pd.DataFrame(data)
    print("Original DataFrame:\n", df)
     
    df['X'] = df.apply(lambda row : normalize(row['X'],
                                  row['Y']), axis = 1)
  
    print('\nNormalized:')
    print(df)
  
if __name__ == '__main__':
    main()

Original DataFrame:
    X   Y
0  1  45
1  2  65
2  3  89

Normalized:
     X   Y
0 -0.5  45
1 -0.5  65
2 -0.5  89


# Range

In [4]:
import pandas as pd
import numpy as np
  
pd.options.mode.chained_assignment = None
 
# Function to generate range
def generate_range(n):
     
    # printing the range for eg:
    # input is 67 output is 60-70
    n = int(n)
     
    lower_limit = n//10 * 10
    upper_limit = lower_limit + 10
     
    return str(str(lower_limit) + '-' + str(upper_limit))
      
def replace(row):
    for i, item in enumerate(row):
         
        # updating the value of the row
        row[i] = generate_range(item)
    return row
          
  
def main():
    # create a dictionary with
    # three fields each
    data = {
            'A':[0, 2, 3],
            'B':[4, 15, 6],
            'C':[47, 8, 19] }
     
    # Convert the dictionary into DataFrame
    df = pd.DataFrame(data)
  
    print('Before applying function: ')
    print(df)
      
    # applying function to each row in
    # dataframe and storing result in a new column
    df = df.apply(lambda row : replace(row))
      
  
    print('After Applying Function: ')
    # printing the new dataframe
    print(df)
  
if __name__ == '__main__':
    main()

Before applying function: 
   A   B   C
0  0   4  47
1  2  15   8
2  3   6  19
After Applying Function: 
      A      B      C
0  0-10   0-10  40-50
1  0-10  10-20   0-10
2  0-10   0-10  10-20


# Pandas Series.apply()

Pandas series is a One-dimensional ndarray with axis labels. The labels need not be unique but must be a hashable type. The object supports both integer- and label-based indexing and provides a host of methods for performing operations involving the index.

Pandas Series.apply() function invoke the passed function on each element of the given series object.

Syntax: Series.apply(func, convert_dtype=True, args=(), **kwds)

Parameter :
func : Python function or NumPy ufunc to apply.
convert_dtype : Try to find better dtype for elementwise function results.
args : Positional arguments passed to func after the series value.
**kwds : Additional keyword arguments passed to func.


Returns : Series

1: Use Series.apply() function to change the city name to ‘Montreal’ if the city is ‘Rio’.

In [5]:
# importing pandas as pd
import pandas as pd
  
# Creating the Series
sr = pd.Series(['New York', 'Chicago', 'Toronto', 'Lisbon', 'Rio'])
  
# Create the Index
index_ = ['City 1', 'City 2', 'City 3', 'City 4', 'City 5'] 
  
# set the index
sr.index = index_
  
# Print the series
print(sr)

City 1    New York
City 2     Chicago
City 3     Toronto
City 4      Lisbon
City 5         Rio
dtype: object


In [6]:
# Now we will use Series.apply() function to change the city name to ‘Montreal’ if the city is ‘Rio’.

# change 'Rio' to 'Montreal'
# we have used a lambda function
result = sr.apply(lambda x : 'Montreal' if x =='Rio' else x )
  
# Print the result
print(result)

City 1    New York
City 2     Chicago
City 3     Toronto
City 4      Lisbon
City 5    Montreal
dtype: object


In [7]:
#Use Series.apply() function to return True if the value in the given series object is greater than 30 else return False.


# importing pandas as pd
import pandas as pd
  
# Creating the Series
sr = pd.Series([11, 21, 8, 18, 65, 18, 32, 10, 5, 32, None])
  
# Create the Index
# apply yearly frequency
index_ = pd.date_range('2010-10-09 08:45', periods = 11, freq ='Y')
  
# set the index
sr.index = index_
  
# Print the series
print(sr)

2010-12-31 08:45:00    11.0
2011-12-31 08:45:00    21.0
2012-12-31 08:45:00     8.0
2013-12-31 08:45:00    18.0
2014-12-31 08:45:00    65.0
2015-12-31 08:45:00    18.0
2016-12-31 08:45:00    32.0
2017-12-31 08:45:00    10.0
2018-12-31 08:45:00     5.0
2019-12-31 08:45:00    32.0
2020-12-31 08:45:00     NaN
Freq: A-DEC, dtype: float64


In [8]:
#Now we will use Series.apply() function to return True if a value in the given series object is greater than 30 else return False.

# return True if greater than 30
# else return False
result = sr.apply(lambda x : True if x>30 else False)
  
# Print the result
print(result)

2010-12-31 08:45:00    False
2011-12-31 08:45:00    False
2012-12-31 08:45:00    False
2013-12-31 08:45:00    False
2014-12-31 08:45:00     True
2015-12-31 08:45:00    False
2016-12-31 08:45:00     True
2017-12-31 08:45:00    False
2018-12-31 08:45:00    False
2019-12-31 08:45:00     True
2020-12-31 08:45:00    False
Freq: A-DEC, dtype: bool


# Pandas dataframe.aggregate()
author
Shubham__Ranjan
Read
Discuss
Courses
Practice
Video

Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages and makes importing and analyzing data much easier.

Dataframe.aggregate() function is used to apply some aggregation across one or more column. Aggregate using callable, string, dict, or list of string/callables. Most frequently used aggregations are:

sum: Return the sum of the values for the requested axis
min: Return the minimum of the values for the requested axis
max: Return the maximum of the values for the requested axis

Syntax: DataFrame.aggregate(func, axis=0, *args, **kwargs)


Parameters:
func : callable, string, dictionary, or list of string/callables. Function to use for aggregating the data. If a function, must either work when passed a DataFrame or when passed to DataFrame.apply. For a DataFrame, can pass a dict, if the keys are DataFrame column names.
axis : (default 0) {0 or ‘index’, 1 or ‘columns’} 0 or ‘index’: apply function to each column. 1 or ‘columns’: apply function to each row.

Returns: Aggregated DataFrame

In [9]:
# importing pandas package
import pandas as pd
  
# making data frame from csv file
df = pd.read_csv("nba.csv")
  
df.head()

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
0,Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0
1,Jae Crowder,Boston Celtics,99.0,SF,25.0,6-6,235.0,Marquette,6796117.0
2,John Holland,Boston Celtics,30.0,SG,27.0,6-5,205.0,Boston University,
3,R.J. Hunter,Boston Celtics,28.0,SG,22.0,6-5,185.0,Georgia State,1148640.0
4,Jonas Jerebko,Boston Celtics,8.0,PF,29.0,6-10,231.0,,5000000.0


In [10]:
# printing the first 10 rows of the dataframe
df[:10]

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
0,Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0
1,Jae Crowder,Boston Celtics,99.0,SF,25.0,6-6,235.0,Marquette,6796117.0
2,John Holland,Boston Celtics,30.0,SG,27.0,6-5,205.0,Boston University,
3,R.J. Hunter,Boston Celtics,28.0,SG,22.0,6-5,185.0,Georgia State,1148640.0
4,Jonas Jerebko,Boston Celtics,8.0,PF,29.0,6-10,231.0,,5000000.0
5,Amir Johnson,Boston Celtics,90.0,PF,29.0,6-9,240.0,,12000000.0
6,Jordan Mickey,Boston Celtics,55.0,PF,21.0,6-8,235.0,LSU,1170960.0
7,Kelly Olynyk,Boston Celtics,41.0,C,25.0,7-0,238.0,Gonzaga,2165160.0
8,Terry Rozier,Boston Celtics,12.0,PG,22.0,6-2,190.0,Louisville,1824360.0
9,Marcus Smart,Boston Celtics,36.0,PG,22.0,6-4,220.0,Oklahoma State,3431040.0


Aggregation works with only numeric type columns.



In [12]:
df.aggregate(['sum', 'min','max'])

  df.aggregate(['sum', 'min','max'])


Unnamed: 0,Number,Age,Weight,Salary
sum,8079.0,12311.0,101236.0,2159837000.0
min,0.0,19.0,161.0,30888.0
max,99.0,40.0,307.0,25000000.0


In [13]:
# importing pandas package
import pandas as pd
  
# making data frame from csv file
df = pd.read_csv("nba.csv")
  
# We are going to find aggregation for these columns
df.aggregate({"Number":['sum', 'min'],
              "Age":['max', 'min'],
              "Weight":['min', 'sum'], 
              "Salary":['sum']})

Unnamed: 0,Number,Age,Weight,Salary
sum,8079.0,,101236.0,2159837000.0
min,0.0,19.0,161.0,
max,,40.0,,


# andas DataFrame mean() Method


Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages and makes importing and analyzing data much easier.

Pandas DataFrame mean() 
Pandas dataframe.mean() function returns the mean of the values for the requested axis. If the method is applied on a pandas series object, then the method returns a scalar value which is the mean value of all the observations in the Pandas Dataframe. If the method is applied on a Pandas Dataframe object, then the method returns a Pandas series object which contains the mean of the values over the specified axis.

Syntax: DataFrame.mean(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Parameters :

axis : {index (0), columns (1)}
skipna : Exclude NA/null values when computing the result
level : If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.
Returns : mean : Series or DataFrame (if level specified)

Pandas DataFrame.mean() Examples
Example 1:

Use mean() function to find the mean of all the observations over the index axis. 

In [14]:
# importing pandas as pd
import pandas as pd
  
# Creating the dataframe
df = pd.DataFrame({"A":[12, 4, 5, 44, 1],
                "B":[5, 2, 54, 3, 2],
                "C":[20, 16, 7, 3, 8],
                "D":[14, 3, 17, 2, 6]})
  
# Print the dataframe
df

Unnamed: 0,A,B,C,D
0,12,5,20,14
1,4,2,16,3
2,5,54,7,17
3,44,3,3,2
4,1,2,8,6


et’s use the Dataframe.mean() function to find the mean over the index axis

In [17]:
# Even if we do not specify axis = 0,
# the method will return the mean over
# the index axis by default
df.mean()

# or df.mean(axis=0)

A    13.2
B    13.2
C    10.8
D     8.4
dtype: float64

Use mean() function on a Dataframe that has None values. Also, find the mean over the column axis.

In [20]:
# importing pandas as pd
import pandas as pd
  
# Creating the dataframe
df = pd.DataFrame({"A":[12, 4, 5, None, 1],
                "B":[7, 2, 54, 3, None],
                "C":[20, 16, 11, 3, 8],
                "D":[14, 3, None, 2, 6]})
  
# skip the Na values while finding the mean
df

Unnamed: 0,A,B,C,D
0,12.0,7.0,20,14.0
1,4.0,2.0,16,3.0
2,5.0,54.0,11,
3,,3.0,3,2.0
4,1.0,,8,6.0


In [27]:
df.mean(axis = 1,skipna=True)

0    13.250000
1     6.250000
2    23.333333
3     2.666667
4     5.000000
dtype: float64

# Pandas Series.mean()


Pandas series is a One-dimensional ndarray with axis labels. The labels need not be unique but must be a hashable type. The object supports both integer- and label-based indexing and provides a host of methods for performing operations involving the index.

Pandas Series.mean() function return the mean of the underlying data in the given Series object.

Syntax: Series.mean(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Parameter :
axis : Axis for the function to be applied on.
skipna : Exclude NA/null values when computing the result.
level : If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a scalar.
numeric_only : Include only float, int, boolean columns.
**kwargs : Additional keyword arguments to be passed to the function.


Returns : mean : scalar or Series (if level specified)

Example #1: Use Series.mean() function to find the mean of the underlying data in the given series object.

In [28]:
# importing pandas as pd
import pandas as pd
  
# Creating the Series
sr = pd.Series([10, 25, 3, 25, 24, 6])
  
# Create the Index
index_ = ['Coca Cola', 'Sprite', 'Coke', 'Fanta', 'Dew', 'ThumbsUp']
  
# set the index
sr.index = index_
  
# Print the series
print(sr)

Coca Cola    10
Sprite       25
Coke          3
Fanta        25
Dew          24
ThumbsUp      6
dtype: int64


In [30]:
result = sr.mean()
result

15.5

As we can see in the output, the Series.mean() function has successfully returned the mean of the given series object.
 
Example #2: Use Series.mean() function to find the mean of the underlying data in the given series object. The given series object also contains some missing values.

In [31]:
# importing pandas as pd
import pandas as pd
  
# Creating the Series
sr = pd.Series([19.5, 16.8, None, 22.78, 16.8, 20.124, None, 18.1002, 19.5])
  
# Print the series
print(sr)

0    19.5000
1    16.8000
2        NaN
3    22.7800
4    16.8000
5    20.1240
6        NaN
7    18.1002
8    19.5000
dtype: float64


In [32]:
# return the mean
# skip all the missing values
result = sr.mean(skipna = True)
  
# Print the result
print(result)

19.086314285714284


# Pandas dataframe.mad()


Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric python packages. Pandas is one of those packages and makes importing and analyzing data much easier.

Pandas dataframe.mad() function return the mean absolute deviation of the values for the requested axis. The mean absolute deviation of a dataset is the average distance between each data point and the mean. It gives us an idea about the variability in a dataset.

Syntax: DataFrame.mad(axis=None, skipna=None, level=None)

Parameters :
axis : {index (0), columns (1)}
skipna : Exclude NA/null values when computing the result
level : If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.


Returns : mad : Series or DataFrame (if level specified)

#  mad() function to find the mean absolute deviation

In [33]:
# importing pandas as pd
import pandas as pd
  
# Creating the dataframe 
df = pd.DataFrame({"A":[12, 4, 5, 44, 1],
                   "B":[5, 2, 54, 3, 2], 
                   "C":[20, 16, 7, 3, 8],
                   "D":[14, 3, 17, 2, 6]})
  
# Print the dataframe
df

Unnamed: 0,A,B,C,D
0,12,5,20,14
1,4,2,16,3
2,5,54,7,17
3,44,3,3,2
4,1,2,8,6


In [35]:
#use the dataframe.mad() function to find the mean absolute deviation.

# find the mean absolute deviation 
# over the index axis
df.mad(axis = 0)

  df.mad(axis = 0)


A    12.32
B    16.32
C     5.76
D     5.68
dtype: float64

: Use mad() function to find the mean absolute deviation of values over the column axis which is having some Na values in it.

In [36]:
# importing pandas as pd
import pandas as pd
  
# Creating the dataframe 
df = pd.DataFrame({"A":[12, 4, 5, None, 1],
                   "B":[7, 2, 54, 3, None],
                   "C":[20, 16, 11, 3, 8], 
                   "D":[14, 3, None, 2, 6]})
  
# To find the mean absolute deviation
# skip the Na values when finding the mad value
df.mad(axis = 1, skipna = True)

  df.mad(axis = 1, skipna = True)


0     3.750000
1     4.875000
2    20.444444
3     0.444444
4     2.666667
dtype: float64

# Pandas Series.mad() to calculate Mean Absolute Deviation of a Series


Pandas provide a method to make Calculation of MAD (Mean Absolute Deviation) very easy. MAD is defined as average distance between each value and mean.

The formula used to calculate MAD is:

![image.png](attachment:image.png)

![image.png](attachment:image.png)

In [37]:

# importing pandas module 
import pandas as pd 
    
# importing numpy module 
import numpy as np 
    
# creating list
list =[5, 12, 1, 0, 4, 22, 15, 3, 9]
  
# creating series
series = pd.Series(list)
  
# calling .mad() method
result = series.mad()
  
# display
result

  result = series.mad()


5.876543209876543

![image.png](attachment:image.png)

#  Pandas dataframe.sem()


Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric python packages. Pandas is one of those packages and makes importing and analyzing data much easier.

Pandas dataframe.sem() function return unbiased standard error of the mean over requested axis. The standard error (SE) of a statistic (usually an estimate of a parameter) is the standard deviation of its sampling distribution[1] or an estimate of that standard deviation. If the parameter or the statistic is the mean, it is called the standard error of the mean (SEM).

Syntax : DataFrame.sem(axis=None, skipna=None, level=None, ddof=1, numeric_only=None, **kwargs)

Parameters :
axis : {index (0), columns (1)}
skipna : Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
ddof : Delta Degrees of Freedom. The divisor used in calculations is N – ddof, where N represents the number of elements.
numeric_only : Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series


Return : sem : Series or DataFrame (if level specified)

In [38]:
# importing pandas as pd
import pandas as pd
  
# Creating the dataframe 
df = pd.read_csv("nba.csv")
  
# Print the dataframe
df

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
0,Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0
1,Jae Crowder,Boston Celtics,99.0,SF,25.0,6-6,235.0,Marquette,6796117.0
2,John Holland,Boston Celtics,30.0,SG,27.0,6-5,205.0,Boston University,
3,R.J. Hunter,Boston Celtics,28.0,SG,22.0,6-5,185.0,Georgia State,1148640.0
4,Jonas Jerebko,Boston Celtics,8.0,PF,29.0,6-10,231.0,,5000000.0
...,...,...,...,...,...,...,...,...,...
453,Shelvin Mack,Utah Jazz,8.0,PG,26.0,6-3,203.0,Butler,2433333.0
454,Raul Neto,Utah Jazz,25.0,PG,24.0,6-1,179.0,,900000.0
455,Tibor Pleiss,Utah Jazz,21.0,C,26.0,7-3,256.0,,2900000.0
456,Jeff Withey,Utah Jazz,24.0,C,26.0,7-0,231.0,Kansas,947276.0


In [39]:
#use the dataframe.sem() function to find the standard error of the mean over the index axis.

# find standard error of the mean of all the columns
df.sem(axis = 0)

  df.sem(axis = 0)


Number         0.746862
Age            0.206011
Weight         1.233459
Salary    247611.576815
dtype: float64

# Notice, 
all the non-numeric columns and values are automatically not included in the calculation of the dataframe. We did not have to specifically input the numeric columns for the calculation of the standard error of the mean.
 
Example #2: Use sem() function to find the standard error of the mean over the column axis. Also do not skip the NaN values in the calculation of the dataframe.

In [40]:
# importing pandas as pd
import pandas as pd
  
# Creating the dataframe 
df = pd.read_csv("nba.csv")
  
# Calculate the standard error of 
# the mean of all the rows in dataframe
df.sem(axis = 1, skipna = False)

  df.sem(axis = 1, skipna = False)


0      1.932567e+06
1      1.698999e+06
2               NaN
3      2.871404e+05
4      1.249978e+06
           ...     
453    6.083135e+05
454    2.249810e+05
455    7.249748e+05
456    2.367956e+05
457             NaN
Length: 458, dtype: float64

When we include the NaN values then it will cause that particular row or column to be NaN

#  Pandas Series.value_counts()


Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric python packages. Pandas is one of those packages and makes importing and analyzing data much easier.

Pandas series is a One-dimensional ndarray with axis labels. The labels need not be unique but must be a hashable type. The object supports both integer- and label-based indexing and provides a host of methods for performing operations involving the index.

Pandas Series.value_counts() function return a Series containing counts of unique values. The resulting object will be in descending order so that the first element is the most frequently-occurring element. Excludes NA values by default.

Syntax: Series.value_counts(normalize=False, sort=True, ascending=False, bins=None, dropna=True)


Parameter :
normalize : If True then the object returned will contain the relative frequencies of the unique values.
sort : Sort by values.
ascending : Sort in ascending order.
bins : Rather than count values, group them into half-open bins, a convenience for pd.cut, only works with numeric data.
dropna : Don’t include counts of NaN.

Returns : counts : Series

Example #1: Use Series.value_counts() function to find the unique value counts of each element in the given Series object.

In [41]:
# importing pandas as pd
import pandas as pd
  
# Creating the Series
sr = pd.Series(['New York', 'Chicago', 'Toronto', 'Lisbon', 'Rio', 'Chicago', 'Lisbon'])
  
# Print the series
print(sr)

0    New York
1     Chicago
2     Toronto
3      Lisbon
4         Rio
5     Chicago
6      Lisbon
dtype: object


In [42]:
#Now we will use Series.value_counts() function to find the values counts of each unique value in the given Series object.

# find the value counts
sr.value_counts()

Chicago     2
Lisbon      2
New York    1
Toronto     1
Rio         1
dtype: int64

In [43]:
sr.describe()

count           7
unique          5
top       Chicago
freq            2
dtype: object

In [44]:
#Use Series.value_counts() function to find the unique value counts of each element in the given Series object.


# importing pandas as pd
import pandas as pd
  
# Creating the Series
sr = pd.Series([100, 214, 325, 88, None, 325, None, 325, 100])
  
# Print the series
print(sr)



0    100.0
1    214.0
2    325.0
3     88.0
4      NaN
5    325.0
6      NaN
7    325.0
8    100.0
dtype: float64


In [45]:
sr.value_counts()

325.0    3
100.0    2
214.0    1
88.0     1
dtype: int64

As we can see in the output, the Series.value_counts() function has returned the value counts of each unique value in the given Series object.

# Pandas Index.value_counts()

Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric python packages. Pandas is one of those packages and makes importing and analyzing data much easier.
Pandas Index.value_counts() function returns object containing counts of unique values. The resulting object will be in descending order so that the first element is the most frequently-occurring element. Excludes NA values by default.
 

Syntax: Index.value_counts(normalize=False, sort=True, ascending=False, bins=None, dropna=True)
Parameters : 
normalize : If True then the object returned will contain the relative frequencies of the unique values. 
sort : Sort by values 
ascending : Sort in ascending order 
bins : Rather than count values, group them into half-open bins, a convenience for pd.cut, only works with numeric data 
dropna : Don’t include counts of NaN.
Returns : counts : Series 
 

Example #1: Use Index.value_counts() function to count the number of unique values in the given Index.

In [46]:
# importing pandas as pd
import pandas as pd
 
# Creating the index
idx = pd.Index(['Harry', 'Mike', 'Arther', 'Nick',
                'Harry', 'Arther'], name ='Student')
 
# Print the Index
print(idx)

Index(['Harry', 'Mike', 'Arther', 'Nick', 'Harry', 'Arther'], dtype='object', name='Student')


In [47]:

# find the count of unique values in the index
idx.value_counts()

Harry     2
Arther    2
Mike      1
Nick      1
Name: Student, dtype: int64

The function has returned the count of all unique values in the given index. Notice the object returned by the function contains the occurrence of the values in descending order. 

In [48]:
# importing pandas as pd
import pandas as pd
 
# Creating the index
idx = pd.Index([21, 10, 30, 40, 50, 10, 50])
 
# Print the Index
print(idx)

Int64Index([21, 10, 30, 40, 50, 10, 50], dtype='int64')


In [49]:
#Let’s count the occurrence of all the unique values in the Index.

# for finding the count of all
# unique values in the index.
idx.value_counts()

10    2
50    2
21    1
30    1
40    1
dtype: int64

The function has returned the count of all unique values in the index.