# Pandas


* Pandas is a popular and powerful Python library widely used for data manipulation and analysis which plays a crucial role in data science. It built on top of NumPy library 
* It provides two primary data structures, Series and DataFrame, which allow you to work with one-dimensional and two-dimensional data, respectively. 
These data structures are designed to handle different types of data, such as time series, tabular data, and even unstructured data. Pandas is built on top of NumPy, and it integrates well with other Python libraries, making it a fundamental tool in the data science ecosystem.




 ##### some of the key ways Pandas is used in data science:
 * Data Cleaning and Preparation
 * Data Transformation and Reshaping
 * Data Exploration and Analysis
 * Time Series Analysis
 * Data Input and Output(Reading and Writing files)
 * Integarting with other libraries(ex:NumPy)
 * Data Modeling and Machine Learning
 * Efficient Handling of Large Datasets

#### Firstly import python library

In [1]:
import pandas as pd

### Series
* Series : A one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.). Think of it as a column in a table.

In [2]:
obj=pd.Series([4,7,-5,3])
obj

0    4
1    7
2   -5
3    3
dtype: int64

In [3]:
obj.values

array([ 4,  7, -5,  3], dtype=int64)

In [4]:
obj.index

RangeIndex(start=0, stop=4, step=1)

In [5]:
obj2=pd.Series([4,5,-7,3],index=['a','b','c','d'])
obj2

a    4
b    5
c   -7
d    3
dtype: int64

In [6]:
obj2[obj2>0]

a    4
b    5
d    3
dtype: int64

In [7]:
obj2*2

a     8
b    10
c   -14
d     6
dtype: int64

In [8]:
sdata={'Ohio':35000,'Texas':71000,'Oregon':16000,'Utah':5000}

In [9]:
obj3=pd.Series(sdata)
obj3

Ohio      35000
Texas     71000
Oregon    16000
Utah       5000
dtype: int64

In [10]:
states=['California','Ohio','Oregon','Texas']

In [11]:
obj4=pd.Series(sdata,index=states)
obj4

California        NaN
Ohio          35000.0
Oregon        16000.0
Texas         71000.0
dtype: float64

In [12]:
pd.isnull(obj4)

California     True
Ohio          False
Oregon        False
Texas         False
dtype: bool

In [13]:
pd.notnull(obj4)

California    False
Ohio           True
Oregon         True
Texas          True
dtype: bool

In [14]:
obj3+obj4

California         NaN
Ohio           70000.0
Oregon         32000.0
Texas         142000.0
Utah               NaN
dtype: float64

### Data Frame
* DataFrame : A two-dimensional labeled data structure with columns of potentially different types. It can be thought of as a table or a spreadsheet in memory.


> Let us consider a csv file for better understanding of how pandas works with datasets
> --Titanic Dataset

In [15]:
## Reading csv file
data=pd.read_csv(r"Titanic-Dataset.csv")

In [16]:
data

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.2500,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.9250,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1000,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.0500,,S
...,...,...,...,...,...,...,...,...,...,...,...,...
886,887,0,2,"Montvila, Rev. Juozas",male,27.0,0,0,211536,13.0000,,S
887,888,1,1,"Graham, Miss. Margaret Edith",female,19.0,0,0,112053,30.0000,B42,S
888,889,0,3,"Johnston, Miss. Catherine Helen ""Carrie""",female,,1,2,W./C. 6607,23.4500,,S
889,890,1,1,"Behr, Mr. Karl Howell",male,26.0,0,0,111369,30.0000,C148,C


In [17]:
## First five rows
data.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


In [18]:
## Last Five rows
data.tail()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
886,887,0,2,"Montvila, Rev. Juozas",male,27.0,0,0,211536,13.0,,S
887,888,1,1,"Graham, Miss. Margaret Edith",female,19.0,0,0,112053,30.0,B42,S
888,889,0,3,"Johnston, Miss. Catherine Helen ""Carrie""",female,,1,2,W./C. 6607,23.45,,S
889,890,1,1,"Behr, Mr. Karl Howell",male,26.0,0,0,111369,30.0,C148,C
890,891,0,3,"Dooley, Mr. Patrick",male,32.0,0,0,370376,7.75,,Q


In [19]:
data.isnull().sum()

PassengerId      0
Survived         0
Pclass           0
Name             0
Sex              0
Age            177
SibSp            0
Parch            0
Ticket           0
Fare             0
Cabin          687
Embarked         2
dtype: int64

In [20]:
x1=pd.DataFrame(data,columns=["PassengerId","Survived","Sex","Age","Cabin"])
x1

Unnamed: 0,PassengerId,Survived,Sex,Age,Cabin
0,1,0,male,22.0,
1,2,1,female,38.0,C85
2,3,1,female,26.0,
3,4,1,female,35.0,C123
4,5,0,male,35.0,
...,...,...,...,...,...
886,887,0,male,27.0,
887,888,1,female,19.0,B42
888,889,0,female,,
889,890,1,male,26.0,C148


> As we are having some missing values,Let's handle it.

#### Dropping Missing Values


In [21]:
x2=x1.dropna(ignore_index=True)
x2

Unnamed: 0,PassengerId,Survived,Sex,Age,Cabin
0,2,1,female,38.0,C85
1,4,1,female,35.0,C123
2,7,0,male,54.0,E46
3,11,1,female,4.0,G6
4,12,1,female,58.0,C103
...,...,...,...,...,...
180,872,1,female,47.0,D35
181,873,0,male,33.0,B51 B53 B55
182,880,1,female,56.0,C50
183,888,1,female,19.0,B42


In [22]:
x3=x1.dropna(axis=1)
x3

Unnamed: 0,PassengerId,Survived,Sex
0,1,0,male
1,2,1,female
2,3,1,female
3,4,1,female
4,5,0,male
...,...,...,...
886,887,0,male
887,888,1,female
888,889,0,female
889,890,1,male


In [23]:
x1.dropna(axis=1,thresh=100)

Unnamed: 0,PassengerId,Survived,Sex,Age,Cabin
0,1,0,male,22.0,
1,2,1,female,38.0,C85
2,3,1,female,26.0,
3,4,1,female,35.0,C123
4,5,0,male,35.0,
...,...,...,...,...,...
886,887,0,male,27.0,
887,888,1,female,19.0,B42
888,889,0,female,,
889,890,1,male,26.0,C148


#### Filling Missing Values

In [24]:
## Filling with random string
x1.fillna("Hello")

Unnamed: 0,PassengerId,Survived,Sex,Age,Cabin
0,1,0,male,22.0,Hello
1,2,1,female,38.0,C85
2,3,1,female,26.0,Hello
3,4,1,female,35.0,C123
4,5,0,male,35.0,Hello
...,...,...,...,...,...
886,887,0,male,27.0,Hello
887,888,1,female,19.0,B42
888,889,0,female,Hello,Hello
889,890,1,male,26.0,C148


In [25]:
## Forward Fill
x4=x1.ffill()
x4

Unnamed: 0,PassengerId,Survived,Sex,Age,Cabin
0,1,0,male,22.0,
1,2,1,female,38.0,C85
2,3,1,female,26.0,C85
3,4,1,female,35.0,C123
4,5,0,male,35.0,C123
...,...,...,...,...,...
886,887,0,male,27.0,C50
887,888,1,female,19.0,B42
888,889,0,female,19.0,B42
889,890,1,male,26.0,C148


In [26]:
## Backward Fill
x5=x1.bfill()
x5

Unnamed: 0,PassengerId,Survived,Sex,Age,Cabin
0,1,0,male,22.0,C85
1,2,1,female,38.0,C85
2,3,1,female,26.0,C123
3,4,1,female,35.0,C123
4,5,0,male,35.0,E46
...,...,...,...,...,...
886,887,0,male,27.0,B42
887,888,1,female,19.0,B42
888,889,0,female,26.0,C148
889,890,1,male,26.0,C148


In [27]:
## Filling With Mean
x1.fillna(x1["Age"].mean())

Unnamed: 0,PassengerId,Survived,Sex,Age,Cabin
0,1,0,male,22.000000,29.699118
1,2,1,female,38.000000,C85
2,3,1,female,26.000000,29.699118
3,4,1,female,35.000000,C123
4,5,0,male,35.000000,29.699118
...,...,...,...,...,...
886,887,0,male,27.000000,29.699118
887,888,1,female,19.000000,B42
888,889,0,female,29.699118,29.699118
889,890,1,male,26.000000,C148


In [28]:
# Filter rows where Age is greater than 60
x1[x1['Age'] > 60]

Unnamed: 0,PassengerId,Survived,Sex,Age,Cabin
33,34,0,male,66.0,
54,55,0,male,65.0,B30
96,97,0,male,71.0,A5
116,117,0,male,70.5,
170,171,0,male,61.0,B19
252,253,0,male,62.0,C87
275,276,1,female,63.0,D7
280,281,0,male,65.0,
326,327,0,male,61.0,
438,439,0,male,64.0,C23 C25 C27


In [29]:
## Select rows by index
data.iloc[0:2]

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C


In [30]:
## Select rows and columns by labels
data.loc[0:2, ['Name', 'Age']]

Unnamed: 0,Name,Age
0,"Braund, Mr. Owen Harris",22.0
1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",38.0
2,"Heikkinen, Miss. Laina",26.0


In [31]:
## Sort data by age in descending order
x4.sort_values(by='Age', ascending=False)

Unnamed: 0,PassengerId,Survived,Sex,Age,Cabin
630,631,1,male,80.00,A23
851,852,0,male,74.00,C92
493,494,0,male,71.00,C30
96,97,0,male,71.00,A5
116,117,0,male,70.50,C110
...,...,...,...,...,...
644,645,1,female,0.75,B35
470,471,0,male,0.75,E63
469,470,1,female,0.75,E63
755,756,1,male,0.67,E121


In [32]:
grp=data.groupby("Sex")

In [33]:
grp.get_group("male")

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.2500,,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.0500,,S
5,6,0,3,"Moran, Mr. James",male,,0,0,330877,8.4583,,Q
6,7,0,1,"McCarthy, Mr. Timothy J",male,54.0,0,0,17463,51.8625,E46,S
7,8,0,3,"Palsson, Master. Gosta Leonard",male,2.0,3,1,349909,21.0750,,S
...,...,...,...,...,...,...,...,...,...,...,...,...
883,884,0,2,"Banfield, Mr. Frederick James",male,28.0,0,0,C.A./SOTON 34068,10.5000,,S
884,885,0,3,"Sutehall, Mr. Henry Jr",male,25.0,0,0,SOTON/OQ 392076,7.0500,,S
886,887,0,2,"Montvila, Rev. Juozas",male,27.0,0,0,211536,13.0000,,S
889,890,1,1,"Behr, Mr. Karl Howell",male,26.0,0,0,111369,30.0000,C148,C


## Conclusion

This notebook provides a comprehensive overview of using the Pandas library for data manipulation and analysis. It covers essential tasks such as data loading, cleaning, transformation, analysis, merging, visualization, and exporting. These techniques are fundamental for anyone working with data in Python, making Pandas a powerful tool for data science, analytics, and general data processing tasks.