# Pandas

* Pandas is an open-source library in Python that is made mainly for working with relational or labeled data both easily and intuitively. 
* It provides various data structures and operations for manipulating numerical data and time series.
* For more information refer - https://pandas.pydata.org/

In [1]:
# Import the libraries

import pandas as pd
import numpy as np

#### So, What is this DataFrame??

* DataFrame is a data structure that organizes data into a 2-D table of rows and columns. 
* Reference - https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html

![image-2.png](attachment:image-2.png)

In [2]:
# Creating a dataframe 
# Syntax - pd.DataFrame(data, index, columns, dtype, copy)

df = pd.DataFrame(data=[[1, 2, 3], [4, 5, 6], [7, 8, 9]], index=['Row_A', 'Row_B', 'Row_C'], columns=['Col_A', 'Col_B', 'Col_C'])

# Viewing a dataframe by using head(n) where n is number of rows to be displayed
# By default it will display first 5 rows.
df.head()

Unnamed: 0,Col_A,Col_B,Col_C
Row_A,1,2,3
Row_B,4,5,6
Row_C,7,8,9


In [3]:
# Viewing a dataframe by using tail(n) where n is number of rows to be displayed
# By default it will display last 5 rows.
df.tail()

Unnamed: 0,Col_A,Col_B,Col_C
Row_A,1,2,3
Row_B,4,5,6
Row_C,7,8,9


* There are two ways of accessing elements of the dataframe:
    1. .loc (Location) (Accessing using **Row Index**)
    2. .iloc (Index Location) (Accessing using both **Row Index** & **Column Index**, something similar to accessing elements in n-D array)

In [4]:
# Accesing elements of dataframe using 'loc'

df.loc['Row_B']

Col_A    4
Col_B    5
Col_C    6
Name: Row_B, dtype: int64

In [5]:
print(type(df.loc['Row_B']))

<class 'pandas.core.series.Series'>


In [6]:
# Accessing elements of dataframe using 'iloc'

df.iloc[0:2, 1:2] # Similar to what we do in n-D array

Unnamed: 0,Col_B
Row_A,2
Row_B,5


In [7]:
# Shape of dataframe
# Returns rows and columns

df.shape 

(3, 3)

In [8]:
# Convert dataframe to array 

df.iloc[1:3, 0:2].values

array([[4, 5],
       [7, 8]], dtype=int64)

In [9]:
# Counting unique values in the dataframe

df['Col_B'].value_counts()

2    1
5    1
8    1
Name: Col_B, dtype: int64

In [10]:
# Checking if there are any null values in the dataframe

df.isnull().sum()

Col_A    0
Col_B    0
Col_C    0
dtype: int64

In [11]:
# Checking for unique values

df['Col_A'].unique()

array([1, 4, 7], dtype=int64)