# Pandas

* Pandas is a fundamental and widely used library in Python for data science, providing powerful tools for data manipulation, cleaning, analysis, and visualization. In the data science workflow, pandas is often the first tool data scientists use to explore and prepare their data. 
* It provides data structures and functions that make it easy to work with structured data, such as data from spreadsheets, databases, or CSV files.
* pandas is a versatile and essential library in Python for data manipulation, cleaning, transformation, analysis, and visualization. It provides a user-friendly interface to perform complex data operations with ease and is a foundational tool in the data science and machine learning workflow in Python.

##### Pandas offers two primary data structures that are foundational for handling and analyzing data:
* Series
* DataFrame

#### import pandas library

In [4]:
import pandas as pd
import numpy as np

In [5]:
from pandas import Series,DataFrame

### Series:
> * A Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floats, etc.).
> * Each element in a Series is associated with a unique label, called an index. By default, the index is a range of integers starting from 0.
> * A Series can be created from a list, NumPy array, dictionary.

In [6]:
# From a list
s=pd.Series([1,2,3,4,5])
s

0    1
1    2
2    3
3    4
4    5
dtype: int64

In [7]:
# values in series
s.values

array([1, 2, 3, 4, 5], dtype=int64)

In [8]:
# index of series
s.index

RangeIndex(start=0, stop=5, step=1)

In [9]:
# Assigning index
s.index=['M','N','O','P','Q']
s

M    1
N    2
O    3
P    4
Q    5
dtype: int64

In [10]:
# From a dictionary
ser = pd.Series({'a': 1, 'b': 2, 'c': 3})
ser

a    1
b    2
c    3
dtype: int64

In [11]:
s*2

M     2
N     4
O     6
P     8
Q    10
dtype: int64

In [12]:
sdata={'Andhra':35000,'Goa':71000,'Kahmir':16000,'Kerala':5000}

In [13]:
states=['Andhra','Telagana','Goa','UP']

In [14]:
obj=pd.Series(sdata,index=states)
obj

Andhra      35000.0
Telagana        NaN
Goa         71000.0
UP              NaN
dtype: float64

In [15]:
pd.isnull(obj)

Andhra      False
Telagana     True
Goa         False
UP           True
dtype: bool

In [16]:
pd.notnull(obj)

Andhra       True
Telagana    False
Goa          True
UP          False
dtype: bool

### DataFrame:
> * A DataFrame is a two-dimensional, size-mutable, heterogeneous tabular data structure with labeled axes (rows and columns). It's similar to a spreadsheet or SQL table.
> * A DataFrame has both row and column indices. The row index can be customized, and columns can be accessed either as attributes (df.column_name) or as dictionary-like keys (df['column_name']).
> * A DataFrame can be created from dictionaries of lists/arrays, lists of dictionaries, Series, or even other DataFrames.

#### Creating DataFrame

In [17]:
# From a dictionary of lists
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
df

Unnamed: 0,A,B
0,1,4
1,2,5
2,3,6


In [18]:
# From a list of dictionaries
df = pd.DataFrame([{'A': 1, 'B': 4}, {'A': 2, 'B': 5}, {'A': 3, 'B': 6}])
df

Unnamed: 0,A,B
0,1,4
1,2,5
2,3,6


#### Indexing and Slicing

In [19]:
# Label-based indexing
df.loc[0, 'A']

1

In [20]:
# Integer-based indexing
df.iloc[0, 1]

4

#### Dropping Entries from an Axis

In [21]:
obj = pd.Series(np.arange(5.), index=['a', 'b', 'c', 'd', 'e'])
obj

a    0.0
b    1.0
c    2.0
d    3.0
e    4.0
dtype: float64

In [22]:
new_obj = obj.drop('c')
new_obj

a    0.0
b    1.0
d    3.0
e    4.0
dtype: float64

In [23]:
data = pd.DataFrame(np.arange(16).reshape((4, 4)),
index=['Ohio', 'Colorado', 'Utah', 'New York'],
columns=['one', 'two', 'three', 'four'])
data

Unnamed: 0,one,two,three,four
Ohio,0,1,2,3
Colorado,4,5,6,7
Utah,8,9,10,11
New York,12,13,14,15


In [24]:
data.drop(['Colorado', 'Ohio'])

Unnamed: 0,one,two,three,four
Utah,8,9,10,11
New York,12,13,14,15


In [25]:
data.drop('two', axis=1)

Unnamed: 0,one,three,four
Ohio,0,2,3
Colorado,4,6,7
Utah,8,10,11
New York,12,14,15


In [26]:
obj.drop('c', inplace=True)
obj

a    0.0
b    1.0
d    3.0
e    4.0
dtype: float64

### Manipulation and Analysing Data

In [27]:
data = {
    'Roll No': [101, 102, 103, 104, 105, 106, 107, 108, 109, 110],
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Edward', 'Frank', 'Grace', 'Hannah', 'Ivy', 'Jack'],
    'Math': [85, 76, 89, None, 91, 78, 88, 95, 70, 84],
    'Science': [92, 85, 94, 70, 85, None, 90, 98, 75, 88],
    'English': [78, 80, 92, 72, 87, 84, None, 99, 73, 85],
    'History': [90, 88, 95, 68, 93, 79, 92, 100, 78, 87],
    'Art': [89, 92, 91, 74, 90, 85, 87, 97, 72, 90]
}

In [29]:
df=pd.DataFrame(data)
df

Unnamed: 0,Roll No,Name,Math,Science,English,History,Art
0,101,Alice,85.0,92.0,78.0,90,89
1,102,Bob,76.0,85.0,80.0,88,92
2,103,Charlie,89.0,94.0,92.0,95,91
3,104,David,,70.0,72.0,68,74
4,105,Edward,91.0,85.0,87.0,93,90
5,106,Frank,78.0,,84.0,79,85
6,107,Grace,88.0,90.0,,92,87
7,108,Hannah,95.0,98.0,99.0,100,97
8,109,Ivy,70.0,75.0,73.0,78,72
9,110,Jack,84.0,88.0,85.0,87,90


In [30]:
df.isnull().sum()

Roll No    0
Name       0
Math       1
Science    1
English    1
History    0
Art        0
dtype: int64

In [33]:
## Dropping Missing values across rows
df.dropna(axis=0,ignore_index=True)

Unnamed: 0,Roll No,Name,Math,Science,English,History,Art
0,101,Alice,85.0,92.0,78.0,90,89
1,102,Bob,76.0,85.0,80.0,88,92
2,103,Charlie,89.0,94.0,92.0,95,91
3,105,Edward,91.0,85.0,87.0,93,90
4,108,Hannah,95.0,98.0,99.0,100,97
5,109,Ivy,70.0,75.0,73.0,78,72
6,110,Jack,84.0,88.0,85.0,87,90


In [34]:
## Dropping Missing Values across columns
df.dropna(axis=1)

Unnamed: 0,Roll No,Name,History,Art
0,101,Alice,90,89
1,102,Bob,88,92
2,103,Charlie,95,91
3,104,David,68,74
4,105,Edward,93,90
5,106,Frank,79,85
6,107,Grace,92,87
7,108,Hannah,100,97
8,109,Ivy,78,72
9,110,Jack,87,90


In [46]:
df.describe()

Unnamed: 0,Roll No,Math,Science,English,History,Art
count,10.0,9.0,9.0,9.0,10.0,10.0
mean,105.5,84.0,86.333333,83.333333,87.0,86.7
std,3.02765,7.968689,8.958236,8.746428,9.486833,7.888811
min,101.0,70.0,70.0,72.0,68.0,72.0
25%,103.25,78.0,85.0,78.0,81.0,85.5
50%,105.5,85.0,88.0,84.0,89.0,89.5
75%,107.75,89.0,92.0,87.0,92.75,90.75
max,110.0,95.0,98.0,99.0,100.0,97.0


In [40]:
## Filling Values
df.fillna("0")

Unnamed: 0,Roll No,Name,Math,Science,English,History,Art
0,101,Alice,85.0,92.0,78.0,90,89
1,102,Bob,76.0,85.0,80.0,88,92
2,103,Charlie,89.0,94.0,92.0,95,91
3,104,David,0.0,70.0,72.0,68,74
4,105,Edward,91.0,85.0,87.0,93,90
5,106,Frank,78.0,0.0,84.0,79,85
6,107,Grace,88.0,90.0,0.0,92,87
7,108,Hannah,95.0,98.0,99.0,100,97
8,109,Ivy,70.0,75.0,73.0,78,72
9,110,Jack,84.0,88.0,85.0,87,90


In [43]:
## Forward Fill
df.fillna(method='ffill')

  df.fillna(method='ffill')


Unnamed: 0,Roll No,Name,Math,Science,English,History,Art
0,101,Alice,85.0,92.0,78.0,90,89
1,102,Bob,76.0,85.0,80.0,88,92
2,103,Charlie,89.0,94.0,92.0,95,91
3,104,David,89.0,70.0,72.0,68,74
4,105,Edward,91.0,85.0,87.0,93,90
5,106,Frank,78.0,85.0,84.0,79,85
6,107,Grace,88.0,90.0,84.0,92,87
7,108,Hannah,95.0,98.0,99.0,100,97
8,109,Ivy,70.0,75.0,73.0,78,72
9,110,Jack,84.0,88.0,85.0,87,90


In [45]:
## Backward Fill
df.bfill()

Unnamed: 0,Roll No,Name,Math,Science,English,History,Art
0,101,Alice,85.0,92.0,78.0,90,89
1,102,Bob,76.0,85.0,80.0,88,92
2,103,Charlie,89.0,94.0,92.0,95,91
3,104,David,91.0,70.0,72.0,68,74
4,105,Edward,91.0,85.0,87.0,93,90
5,106,Frank,78.0,90.0,84.0,79,85
6,107,Grace,88.0,90.0,99.0,92,87
7,108,Hannah,95.0,98.0,99.0,100,97
8,109,Ivy,70.0,75.0,73.0,78,72
9,110,Jack,84.0,88.0,85.0,87,90


In [48]:
df['Total Marks'] = df[['Math', 'Science', 'English', 'History', 'Art']].sum(axis=1)

In [49]:
df

Unnamed: 0,Roll No,Name,Math,Science,English,History,Art,Total Marks
0,101,Alice,85.0,92.0,78.0,90,89,434.0
1,102,Bob,76.0,85.0,80.0,88,92,421.0
2,103,Charlie,89.0,94.0,92.0,95,91,461.0
3,104,David,,70.0,72.0,68,74,284.0
4,105,Edward,91.0,85.0,87.0,93,90,446.0
5,106,Frank,78.0,,84.0,79,85,326.0
6,107,Grace,88.0,90.0,,92,87,357.0
7,108,Hannah,95.0,98.0,99.0,100,97,489.0
8,109,Ivy,70.0,75.0,73.0,78,72,368.0
9,110,Jack,84.0,88.0,85.0,87,90,434.0


# Conclusion

This notebook provides a comprehensive overview of using the Pandas library for data manipulation and analysis. It covers essential tasks such as data loading, cleaning, transformation, analysis, merging, visualization, and exporting. These techniques are fundamental for anyone working with data in Python, making Pandas a powerful tool for data science, analytics, and general data processing tasks.