<a href="https://colab.research.google.com/github/anujsaxena/Python/blob/main/Pandas_functions.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Functions & Description**
Let us now understand the functions under Descriptive Statistics in Python Pandas. The following table list down the important functions −

1.	count()	:Number of non-null observations
2.	sum()	:Sum of values
3.	mean()	:Mean of Values
4.	median()	:Median of Values
5.	mode()	:Mode of values
6.	std()	:Standard Deviation of the Values
7.	min()	:Minimum Value
8.	max()	:Maximum Value
9.	abs()	:Absolute Value
10.	prod()	:Product of Values
11.	cumsum()	:Cumulative Sum
12.	cumprod()	:Cumulative Product

Note − Since DataFrame is a Heterogeneous data structure. Generic operations don’t work with all functions.

•	Functions like sum(), cumsum() work with both numeric and character (or) string data elements without any error. Though npractice, character aggregations are never used generally, these functions do not throw any exception.

•	Functions like abs(), cumprod() throw exception when the DataFrame contains character or string data because such operations cannot be performed.


In [1]:
#max()
import pandas as pd
df = pd.DataFrame({"A":[12, 4, 5, 44, 1], 
                   "B":[5, 2, 54, 3, 2], 
                   "C":[20, 16, 7, 3, 8],  
                   "D":[14, 3, 17, 2, 6]}) 
print(df)
print(df.max())


    A   B   C   D
0  12   5  20  14
1   4   2  16   3
2   5  54   7  17
3  44   3   3   2
4   1   2   8   6
A    44
B    54
C    20
D    17
dtype: int64


In [2]:
print(df.max(axis=0)) #gets max of each column

A    44
B    54
C    20
D    17
dtype: int64


In [3]:
print(df.max(axis=1)) #gets maximum of each row

0    20
1    16
2    54
3    44
4     8
dtype: int64


In [4]:
df = pd.DataFrame({"A":[12, 4, 5, None, 1],  
                   "B":[7, 2, 54, 3, None], 
                   "C":[20, 16, 11, 3, 8], 
                   "D":[14, 3, None, 2, 6]}) 

df.max()

A    12.0
B    54.0
C    20.0
D    14.0
dtype: float64

In [5]:
df = pd.DataFrame({"A":[12, 4, 5, None, 1],  
                   "B":[7, 2, 54, None, 3], 
                   "C":[20, 16, 11, None, 8], 
                   "D":[14, 3, 2, None, 6]}) 

df.max()

A    12.0
B    54.0
C    20.0
D    14.0
dtype: float64

In [6]:
df.max(axis=1)

0    20.0
1    16.0
2    54.0
3     NaN
4     8.0
dtype: float64

In [7]:
# skip the NaN values while finding the maximum 
df.max(axis = 0, skipna = True)

A    12.0
B    54.0
C    20.0
D    14.0
dtype: float64

In [9]:
#std
import pandas as pd
import numpy as np

#Create a Dictionary of series
d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack',
   'Lee','David','Gasper','Betina','Andres']),
   'Age':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]),
   'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])
}
print(df)
#Create a DataFrame
df = pd.DataFrame(d)
print(df.std())


      Name  Age  Rating
0      Tom   25    4.23
1    James   26    3.24
2    Ricky   25    3.98
3      Vin   23    2.56
4    Steve   30    3.20
5    Smith   29    4.60
6     Jack   23    3.80
7      Lee   34    3.78
8    David   40    2.98
9   Gasper   30    4.80
10  Betina   51    4.10
11  Andres   46    3.65
Age       9.232682
Rating    0.661628
dtype: float64


# **Summarizing Data**

describe() computes the statistics of a data frame.


In [10]:
#Create a Dictionary of series
d = {'Name':pd.Series(['Tina','Ria','Amit','Sumit','Manish','Mayank','Vishal','Vardan','Rahul,','Vikesh','Priyank','Bhavesh']),
   'Age':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]),
   'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])
}

#Create a DataFrame
df = pd.DataFrame(d)
print(df.describe())


             Age     Rating
count  12.000000  12.000000
mean   31.833333   3.743333
std     9.232682   0.661628
min    23.000000   2.560000
25%    25.000000   3.230000
50%    29.500000   3.790000
75%    35.500000   4.132500
max    51.000000   4.800000


In [11]:

print(df.describe(include=['object']))

           Name
count        12
unique       12
top     Bhavesh
freq          1


In [12]:
#Create a Dictionary of series
d = {'Name':pd.Series(['Tina','Ria','Amit','Sumit','Bhavesh','Mayank','Vishal','Vardan','Rahul,','Vikesh','Priyank','Bhavesh']),
   'Age':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]),
   'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])
}

#Create a DataFrame
df = pd.DataFrame(d)
print(df.describe(include=['object']))

           Name
count        12
unique       11
top     Bhavesh
freq          2


In [13]:
print(df.describe(include='all'))

           Name        Age     Rating
count        12  12.000000  12.000000
unique       11        NaN        NaN
top     Bhavesh        NaN        NaN
freq          2        NaN        NaN
mean        NaN  31.833333   3.743333
std         NaN   9.232682   0.661628
min         NaN  23.000000   2.560000
25%         NaN  25.000000   3.230000
50%         NaN  29.500000   3.790000
75%         NaN  35.500000   4.132500
max         NaN  51.000000   4.800000


# **Quantile**

The word “quantile” comes from the word quantity. In simple terms, a quantile is where a sample is divided into equal-sized, adjacent, subgroups (that’s why it’s sometimes called a “fractile“). It can also refer to dividing a probability distribution into areas of equal probability.

In [14]:
#Create a Dictionary of series
d = {'Name':pd.Series(['Tina','Ria','Amit','Sumit','Manish','Mayank','Vishal','Vardan','Rahul,','Vikesh','Priyank','Bhavesh']),
   'Age':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]),
   'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])
}

#Create a DataFrame
df = pd.DataFrame(d)
#quantile of 0.2 for each column

df.quantile(.2, axis = 0)


Age       25.000
Rating     3.208
Name: 0.2, dtype: float64

In [16]:
# find (0.1 0.25 0.5 0.75) quantiles along the index axis.
print(df)
df.quantile([.1, .25, .5, .75], axis = 0)


       Name  Age  Rating
0      Tina   25    4.23
1       Ria   26    3.24
2      Amit   25    3.98
3     Sumit   23    2.56
4    Manish   30    3.20
5    Mayank   29    4.60
6    Vishal   23    3.80
7    Vardan   34    3.78
8    Rahul,   40    2.98
9    Vikesh   30    4.80
10  Priyank   51    4.10
11  Bhavesh   46    3.65


Unnamed: 0,Age,Rating
0.1,23.2,3.002
0.25,25.0,3.23
0.5,29.5,3.79
0.75,35.5,4.1325


# **Pivot**

Return reshaped DataFrame organized by given index / column values.
Reshape data (produce a “pivot” table) based on column values. Uses unique values from specified index / columns to form axes of the resulting DataFrame. This function does not support data aggregation, multiple values will result in a MultiIndex in the columns.

#**Parameters:**	
index : string or object, optional column to use to make new frame’s index. If None, uses existing index.

columns : string or object
Column to use to make new frame’s columns.

values : string, object or a list of the previous, optional
Column(s) to use for populating new frame’s values. If not specified, all remaining columns will be used and the result will have hierarchically indexed columns.

Changed in version 0.23.0: Also accept list of column names.

#**Returns:**	
DataFrame
Returns reshaped DataFrame.

#**Raises:**	
ValueError: When there are any index, columns combinations with multiple values. DataFrame.pivot_table when you need to aggregate.



In [19]:
#Create a Dictionary of series
d = {'Name':pd.Series(['Tina','Ria','Amit','Sumit','Manish','Mayank','Vishal','Vardan','Rahul,','Vikesh','Priyank','Bhavesh']),
   'Age':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]),
   'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])
}

#Create a DataFrame
df = pd.DataFrame(d)
p = df.pivot('Name','Age','Rating')
print(p)
print(type(p))

Age        23    25    26   29   30    34    40    46   51
Name                                                      
Amit      NaN  3.98   NaN  NaN  NaN   NaN   NaN   NaN  NaN
Bhavesh   NaN   NaN   NaN  NaN  NaN   NaN   NaN  3.65  NaN
Manish    NaN   NaN   NaN  NaN  3.2   NaN   NaN   NaN  NaN
Mayank    NaN   NaN   NaN  4.6  NaN   NaN   NaN   NaN  NaN
Priyank   NaN   NaN   NaN  NaN  NaN   NaN   NaN   NaN  4.1
Rahul,    NaN   NaN   NaN  NaN  NaN   NaN  2.98   NaN  NaN
Ria       NaN   NaN  3.24  NaN  NaN   NaN   NaN   NaN  NaN
Sumit    2.56   NaN   NaN  NaN  NaN   NaN   NaN   NaN  NaN
Tina      NaN  4.23   NaN  NaN  NaN   NaN   NaN   NaN  NaN
Vardan    NaN   NaN   NaN  NaN  NaN  3.78   NaN   NaN  NaN
Vikesh    NaN   NaN   NaN  NaN  4.8   NaN   NaN   NaN  NaN
Vishal   3.80   NaN   NaN  NaN  NaN   NaN   NaN   NaN  NaN
<class 'pandas.core.frame.DataFrame'>


In [20]:
df = pd.DataFrame({'A': ['John', 'Boby', 'Mina'], 
      'B': ['Masters', 'Graduate', 'Graduate'], 
      'C': [27, 23, 21]}) 
print(df)

      A         B   C
0  John   Masters  27
1  Boby  Graduate  23
2  Mina  Graduate  21


In [21]:
p=df.pivot('A', 'B', 'C')
print(p)

B     Graduate  Masters
A                      
Boby      23.0      NaN
John       NaN     27.0
Mina      21.0      NaN


In [22]:
print(type(p))

<class 'pandas.core.frame.DataFrame'>


In [23]:
p=df.pivot(index ='A', columns ='B', values =['C', 'A'])
print(p)

            C                A        
B    Graduate Masters Graduate Masters
A                                     
Boby       23     NaN     Boby     NaN
John      NaN      27      NaN    John
Mina       21     NaN     Mina     NaN


In [25]:
df = pd.DataFrame({'foo': ['one', 'one', 'one', 'two', 'two','two'],
                    'bar': ['A', 'B', 'C', 'A', 'B', 'C'],
                    'baz': [1, 2, 3, 4, 5, 6],
                    'zoo': ['x', 'y', 'z', 'q', 'w', 't']})

print(df)


   foo bar  baz zoo
0  one   A    1   x
1  one   B    2   y
2  one   C    3   z
3  two   A    4   q
4  two   B    5   w
5  two   C    6   t


In [26]:
df.pivot(index='foo', columns='bar', values='baz')

bar,A,B,C
foo,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
one,1,2,3
two,4,5,6


In [27]:
df.pivot(index='foo', columns='bar')['baz']
print(df)

   foo bar  baz zoo
0  one   A    1   x
1  one   B    2   y
2  one   C    3   z
3  two   A    4   q
4  two   B    5   w
5  two   C    6   t


In [28]:
df.pivot(index='foo', columns='bar', values=['baz', 'zoo'])
print(df)

   foo bar  baz zoo
0  one   A    1   x
1  one   B    2   y
2  one   C    3   z
3  two   A    4   q
4  two   B    5   w
5  two   C    6   t


In [30]:
df = pd.DataFrame({"foo": ['one', 'one', 'two', 'two'], "bar": ['A', 'A', 'B', 'C'], "baz": [1, 2, 3, 4]})
print(df)

   foo bar  baz
0  one   A    1
1  one   A    2
2  two   B    3
3  two   C    4


In [39]:
d = {'Age':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]),
   'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])
}
df = pd.DataFrame(d)
print(df)
df.pivot(index='Age', columns='Rating')

    Age  Rating
0    25    4.23
1    26    3.24
2    25    3.98
3    23    2.56
4    30    3.20
5    29    4.60
6    23    3.80
7    34    3.78
8    40    2.98
9    30    4.80
10   51    4.10
11   46    3.65


Age
23
25
26
29
30
34
40
46
51


In [32]:
df = pd.DataFrame({'foo': ['one', 'one', 'one', 'two', 'two','two'],
                    'bar': ['A', 'B', 'C', 'A', 'B', 'C'],
                    'baz': [1, 2, 3, 4, 5, 6],
                    'zoo': ['x', 'y', 'z', 'q', 'w', 't']})

print(df)

   foo bar  baz zoo
0  one   A    1   x
1  one   B    2   y
2  one   C    3   z
3  two   A    4   q
4  two   B    5   w
5  two   C    6   t


In [33]:
m = df.min()
print(m)

foo    one
bar      A
baz      1
zoo      q
dtype: object


In [40]:
d = {'Age':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]),
   'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])
}
df = pd.DataFrame(d)
p = df.abs()
print(p)

     Age  Rating
0   25.0    4.23
1   26.0    3.24
2   25.0    3.98
3   23.0    2.56
4   30.0    3.20
5   29.0    4.60
6   23.0    3.80
7   34.0    3.78
8   40.0    2.98
9   30.0    4.80
10  51.0    4.10
11  46.0    3.65


In [41]:
pr = df.prod()
print(pr)

Age       7.158408e+17
Rating    6.320128e+06
dtype: float64


In [42]:
cpr = df.cumprod()
print(cpr)

                   Age        Rating
0                   25  4.230000e+00
1                  650  1.370520e+01
2                16250  5.454670e+01
3               373750  1.396395e+02
4             11212500  4.468465e+02
5            325162500  2.055494e+03
6           7478737500  7.810877e+03
7         254277075000  2.952512e+04
8       10171083000000  8.798485e+04
9      305132490000000  4.223273e+05
10   15561756990000000  1.731542e+06
11  715840821540000000  6.320128e+06


In [43]:
cs = df.cumsum()
print(cs)

    Age  Rating
0    25    4.23
1    51    7.47
2    76   11.45
3    99   14.01
4   129   17.21
5   158   21.81
6   181   25.61
7   215   29.39
8   255   32.37
9   285   37.17
10  336   41.27
11  382   44.92


In [44]:
m = df.mean()
print(m)

Age       31.833333
Rating     3.743333
dtype: float64


In [45]:
md = df.median()
print(md)

Age       29.50
Rating     3.79
dtype: float64


In [46]:
m = df.mode()
print(m)

     Age  Rating
0   23.0    2.56
1   25.0    2.98
2   30.0    3.20
3    NaN    3.24
4    NaN    3.65
5    NaN    3.78
6    NaN    3.80
7    NaN    3.98
8    NaN    4.10
9    NaN    4.23
10   NaN    4.60
11   NaN    4.80
