In this notebook, we will have a look at the different descriptive statistical functions available in Python. 

## Notebook Contents

##### <span style="color:green">1. Indexing using .loc()</span>
##### <span style="color:green">2. Indexing using .iloc()</span>
##### <span style="color:green">4. Missing Values</span>
##### <span style="color:green">5. DataFrame.isnull()</span>
##### <span style="color:green">6. DataFrame.notnull()</span>
##### <span style="color:green">7. DataFrame.fillna()</span>
##### <span style="color:green">8. DataFrame.dropna()</span>
##### <span style="color:green">9. Replacing values</span>
##### <span style="color:green">10. Reindexing</span>

# Loading and viewing data

Before we start lets us load data oof bank nifty Daily data .

In [1]:
import numpy as np 
import pandas as pd 

df = pd.read_csv(r"C:\Users\ramsu\ltphd\banknifty_feb Future daily.csv")
print(df.head())
df.count()

              Ticker          Date/Time      Open      High       Low  \
0  BANKNIFTY20FEBFUT  20-01-02 00:00:00  32444.85  32711.15  32394.00   
1  BANKNIFTY20FEBFUT  20-01-03 00:00:00  32510.00  32539.70  32184.40   
2  BANKNIFTY20FEBFUT  20-01-06 00:00:00  32080.20  32109.70  31403.95   
3  BANKNIFTY20FEBFUT  20-01-07 00:00:00  31740.30  32018.45  31410.00   
4  BANKNIFTY20FEBFUT  20-01-08 00:00:00  31191.95  31645.60  31067.15   

      Close  Volume  
0  32685.85   25920  
1  32266.85   37860  
2  31474.10   63980  
3  31598.55   60220  
4  31559.05   72340  


Ticker       39
Date/Time    39
Open         39
High         39
Low          39
Close        39
Volume       39
dtype: int64

In [2]:
df.shape

(39, 7)

## Indexing

Indexing provides us with the axis labelling information in pandas. Further, it helps us to identify the exact position of data which is important while analysing data. <br>

While studying indexing, we will also focus on how to slice and dice the data according to our needs in a dataframe.

## Indexing using .loc()

It is a 'label-location' based indexer for selection of data points.

In [3]:
# Using .loc()

# Import the pandas library and aliasing as pd

import pandas as pd
import numpy as np

# Select all rows for a specific column 

print (df.loc[:,'Close'].head())

"""Similar to operations in Numpy but loc function is used """

0    32685.85
1    32266.85
2    31474.10
3    31598.55
4    31559.05
Name: Close, dtype: float64


'Similar to operations in Numpy but loc function is used '

In [4]:
# Select the first five rows of the specific columns

# Remember that the '.loc()' method INCLUDES the rows and columns in its stop argument.

# Observe that '0:4' will include 5 rows from index 0 to 4

# The loc indexer takes the row arguments first and the column arguments second.

print(df.loc[:4,"Close":])

      Close  Volume
0  32685.85   25920
1  32266.85   37860
2  31474.10   63980
3  31598.55   60220
4  31559.05   72340


In [5]:
# Select the rows 2 to 7 of all the columns from the dataframe 

print(df.loc[2:7,])

              Ticker          Date/Time      Open      High       Low  \
2  BANKNIFTY20FEBFUT  20-01-06 00:00:00  32080.20  32109.70  31403.95   
3  BANKNIFTY20FEBFUT  20-01-07 00:00:00  31740.30  32018.45  31410.00   
4  BANKNIFTY20FEBFUT  20-01-08 00:00:00  31191.95  31645.60  31067.15   
5  BANKNIFTY20FEBFUT  20-01-09 00:00:00  31874.10  32307.00  31874.10   
6  BANKNIFTY20FEBFUT  20-01-10 00:00:00  32313.30  32489.55  32100.00   
7  BANKNIFTY20FEBFUT  20-01-13 00:00:00  32313.25  32490.00  32245.00   

      Close  Volume  
2  31474.10   63980  
3  31598.55   60220  
4  31559.05   72340  
5  32265.90   64720  
6  32231.90   48840  
7  32354.95   31440  


In [6]:
# Select the rows and columns specified

print(df.loc[[1,2,3,4,7,8],["Date/Time","High","Low"]])

           Date/Time      High       Low
1  20-01-03 00:00:00  32539.70  32184.40
2  20-01-06 00:00:00  32109.70  31403.95
3  20-01-07 00:00:00  32018.45  31410.00
4  20-01-08 00:00:00  31645.60  31067.15
7  20-01-13 00:00:00  32490.00  32245.00
8  20-01-14 00:00:00  32363.25  32154.80


In [7]:
# To check if the all rows row's values are greater than 32000.

print(df.loc[:,"Open":"Close"]>32000)

     Open   High    Low  Close
0    True   True   True   True
1    True   True   True   True
2    True   True  False  False
3   False   True  False  False
4   False  False  False  False
5   False   True  False   True
6    True   True   True   True
7    True   True   True   True
8    True   True   True   True
9    True   True  False   True
10   True   True  False   True
11  False  False  False  False
12   True   True  False  False
13  False  False  False  False
14  False  False  False  False
15  False  False  False  False
16  False  False  False  False
17  False  False  False  False
18  False  False  False  False
19  False  False  False  False
20  False  False  False  False
21  False  False  False  False
22  False  False  False  False
23  False  False  False  False
24  False  False  False  False
25  False  False  False  False
26  False  False  False  False
27  False  False  False  False
28  False  False  False  False
29  False  False  False  False
30  False  False  False  False
31  Fals

## Indexing using .iloc()

Another way to perform indexing is by using the 'iloc()' method.

This is perfectly similar to array usage in numpy 

In [8]:
print(df.iloc[:5,4:])

        Low     Close  Volume
0  32394.00  32685.85   25920
1  32184.40  32266.85   37860
2  31403.95  31474.10   63980
3  31410.00  31598.55   60220
4  31067.15  31559.05   72340


In [9]:
# Selecting the exact requested columns.

print(df.iloc[[1,2,3,7],[1,2,3,4]])

           Date/Time      Open      High       Low
1  20-01-03 00:00:00  32510.00  32539.70  32184.40
2  20-01-06 00:00:00  32080.20  32109.70  31403.95
3  20-01-07 00:00:00  31740.30  32018.45  31410.00
7  20-01-13 00:00:00  32313.25  32490.00  32245.00


## Missing values

Missing values are values that are absent from the dataframe. Usually, all the dataframes that you would work on would be large and there will be a case of 'missing values' in most of them. 
1
Hence, it becomes important for you to learn how to handle these missing values.

In [10]:
df.isnull()

Unnamed: 0,Ticker,Date/Time,Open,High,Low,Close,Volume
0,False,False,False,False,False,False,False
1,False,False,False,False,False,False,False
2,False,False,False,False,False,False,False
3,False,False,False,False,False,False,False
4,False,False,False,False,False,False,False
5,False,False,False,False,False,False,False
6,False,False,False,False,False,False,False
7,False,False,False,False,False,False,False
8,False,False,False,False,False,False,False
9,False,False,False,False,False,False,False


## DataFrame.notnull()

This method returns a Boolean result.<br>
<br>
It will return 'False' if the data point is not a 'NaN' (Not a Number) value. Missing data is represented by a NaN value. 

In [11]:
df.iloc[:,4:].notnull()

Unnamed: 0,Low,Close,Volume
0,True,True,True
1,True,True,True
2,True,True,True
3,True,True,True
4,True,True,True
5,True,True,True
6,True,True,True
7,True,True,True
8,True,True,True
9,True,True,True


we can use many fillna and dropna methods to play with this data 

df.dropna(axis=1) drop all columns wth nan 

## Replacing values

Replacing helps us to select any data point in the entire dataframe and replace it with the value of our choice.

In [12]:
my ={'one':[10,20,30,40,50,2000],'two':[1000,0,30,40,50,60]}
df1 = pd.DataFrame({'one':[10,20,30,40,50,2000],'two':[1000,0,30,40,50,60]})

print (df1)
type(my)

    one   two
0    10  1000
1    20     0
2    30    30
3    40    40
4    50    50
5  2000    60


dict

In [13]:
print(df1.replace({2000:20,1000:10}))
type({2000:20,1000:10})

   one  two
0   10   10
1   20    0
2   30   30
3   40   40
4   50   50
5   20   60


dict

## Reindexing 

Reindexing changes the row labels and column labels of a dataframe.<br> 
<br> 
To reindex means to confirm the data to match a given set of labels along a particular axis.

In [14]:
df.head()

Unnamed: 0,Ticker,Date/Time,Open,High,Low,Close,Volume
0,BANKNIFTY20FEBFUT,20-01-02 00:00:00,32444.85,32711.15,32394.0,32685.85,25920
1,BANKNIFTY20FEBFUT,20-01-03 00:00:00,32510.0,32539.7,32184.4,32266.85,37860
2,BANKNIFTY20FEBFUT,20-01-06 00:00:00,32080.2,32109.7,31403.95,31474.1,63980
3,BANKNIFTY20FEBFUT,20-01-07 00:00:00,31740.3,32018.45,31410.0,31598.55,60220
4,BANKNIFTY20FEBFUT,20-01-08 00:00:00,31191.95,31645.6,31067.15,31559.05,72340


In [16]:
df.reindex(index=[0,2,4,6,8],columns=["Open","Close"])

Unnamed: 0,Open,Close
0,32444.85,32685.85
2,32080.2,31474.1
4,31191.95,31559.05
6,32313.3,32231.9
8,32307.75,32282.8
