# Introduction to Pandas

pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool,
built on top of the Python programming language.

In [1]:
import pandas as pd

In [2]:
# checking version
pd.__version__

'2.1.3'

# What kind of data does pandas handle?

When working with tabular data, such as data stored in spreadsheets or databases, pandas is the right tool for you. pandas will help you to explore, clean, and process your data.

Pandas has two data structure:

* DataFrame : Two or More Dimension

* Series : One Dimension

In [3]:
# Series objects from list
name = ['Sita','Rita','Gita','Hari','Ram']
s = pd.Series(name)
s

0    Sita
1    Rita
2    Gita
3    Hari
4     Ram
dtype: object

In [4]:
# Check the dimension
s.ndim

1

In [5]:
# check the shape(row and column)
s.shape

(5,)

In [6]:
s.dtype # 'O' means Objects

dtype('O')

* In pandas Series has automatically created index of the values.

* Sita to Ram is the value of user defined 

* In Series has present Index and values

In [7]:
s

0    Sita
1    Rita
2    Gita
3    Hari
4     Ram
dtype: object

In [8]:
# To find the range of the value 
s.index

RangeIndex(start=0, stop=5, step=1)

In [9]:
# To convert the index of the Series into list
s.index.tolist()

[0, 1, 2, 3, 4]

In [10]:
# The value of the Series is store in the Numpy array
s.values

array(['Sita', 'Rita', 'Gita', 'Hari', 'Ram'], dtype=object)

In [11]:
# To convert the values of the Series into list
s.values.tolist()

['Sita', 'Rita', 'Gita', 'Hari', 'Ram']

In [12]:
# 3 is the series of the index,and we use to print the any value of the Series.
s[3]


'Hari'

In [13]:
# s.values cover the array and we can use the array of the index to print the any values
s.values[3]

'Hari'

In [14]:
# first 3 values of Series 
s.values[:3]

array(['Sita', 'Rita', 'Gita'], dtype=object)

In [15]:
# first 3 item of Series
s[:3]

0    Sita
1    Rita
2    Gita
dtype: object

# We can define our own index in Series

In [16]:
s1 = pd.Series(['BMW','Mercedes','Audi','Bentely'],index = [100,120,99,105]) # This index is random which is user define
s1

100         BMW
120    Mercedes
99         Audi
105     Bentely
dtype: object

In [17]:
# we cann't access this why? because s1 hasn't 3 index, if we want to print Audi so print 99 index 
s1[3] 

KeyError: 3

In [None]:
# But we can access this why? because, s1.values is a numpy array
s1.values[2]

'Audi'

# We can assign Name to a Pandas Series

In [None]:
fruits = pd.Series(['Apple','Banana','Orange','Kiwi','Cherry','Mango'],name='Fruits')
fruits

0     Apple
1    Banana
2    Orange
3      Kiwi
4    Cherry
5     Mango
Name: Fruits, dtype: object

# We can Vertically concat Series to create a DataFrame

In [None]:
s1 = pd.Series(['Sachhyam','Aditya','Dipa','Ashmita'],name= 'Student_name')
s2 = pd.Series(['Male','Male','Female','Female'],name = 'gender')

In [None]:
s1,s2

(0    Sachhyam
 1      Aditya
 2        Dipa
 3     Ashmita
 Name: Student_name, dtype: object,
 0      Male
 1      Male
 2    Female
 3    Female
 Name: gender, dtype: object)

In [None]:
# Concatenate the s1 and s2 to make DataFrame
df = pd.concat([s1,s2],axis='columns')
df

Unnamed: 0,Student_name,gender
0,Sachhyam,Male
1,Aditya,Male
2,Dipa,Female
3,Ashmita,Female


In [None]:
# As a view point DataFrame and table are same but work is differnt
type(df)

pandas.core.frame.DataFrame

In [None]:
df.dtypes

Student_name    object
gender          object
address         object
roll             int64
dtype: object

In [None]:
df.shape

(4, 2)

In [None]:
df.ndim

2

# How to access column of the DataFrame

In [None]:
df['gender'] # access gender column  # preferred way

0      Male
1      Male
2    Female
3    Female
Name: gender, dtype: object

In [None]:
df.gender

0      Male
1      Male
2    Female
3    Female
Name: gender, dtype: object

In [None]:
df['Student_name'] # this is an preferred way

0    Sachhyam
1      Aditya
2        Dipa
3     Ashmita
Name: Student_name, dtype: object

In [None]:
df['Student_name'][2]

'Dipa'

In [None]:
# To print the values dipa
df['Student_name'].values[2]

'Dipa'

# How to add a new column to a DataFrame

In [None]:
df

Unnamed: 0,Student_name,gender
0,Sachhyam,Male
1,Aditya,Male
2,Dipa,Female
3,Ashmita,Female


In [None]:
# add the address column(key) with its values
df['address'] = ['Sitapaila','Kuleshwor','Kausaltar','Kausaltar']
df

Unnamed: 0,Student_name,gender,address
0,Sachhyam,Male,Sitapaila
1,Aditya,Male,Kuleshwor
2,Dipa,Female,Kausaltar
3,Ashmita,Female,Kausaltar


In [None]:
df['roll'] = [21,20,16,34] # add the roll column
df

Unnamed: 0,Student_name,gender,address,roll
0,Sachhyam,Male,Sitapaila,21
1,Aditya,Male,Kuleshwor,20
2,Dipa,Female,Kausaltar,16
3,Ashmita,Female,Kausaltar,34


In [None]:
df['address'].value_counts() # to find the how many people are located from where


address
Kausaltar    2
Sitapaila    1
Kuleshwor    1
Name: count, dtype: int64

In [None]:
#  To find the unique values
df['address'].unique()

array(['Sitapaila', 'Kuleshwor', 'Kausaltar'], dtype=object)

In [None]:
# To find the no.of unique values in column
df['address'].nunique()

3