# Pandas:

## Data Handling in python: A deep dive into pandas Series and Data Frame

**This notebook covers**

- pandas data structure: Series and Data Frame* 
- handling Data Frame: Loading, inspecting and modifying Data*
- different ways to read csv file*

## Basic data structures in pandas
**Pandas provides two types of classes for handling data:**

- Series: a one-dimensional labeled array holding data of any type
- such as integers, strings, Python objects etc.

- DataFrame: a two-dimensional data structure that holds data like a two-dimension array or a table with rows and columns.

# 📊 Pandas

## Data Handling in Python: A Deep Dive into Pandas **Series** and **DataFrame**

---

### ✅ This Notebook Covers:
- **Pandas Data Structures: Series and DataFrame**  
- **Handling DataFrame: Loading, Inspecting, and Modifying Data**  
- **Different Ways to Read CSV Files**  

---

## 🔹 1. Pandas Data Structures  

| Structure  | Description | Example |
|------------|-------------|---------|
| **Series** | 1-D labeled array (like a column in Excel or table) | `pd.Series([10,20,30])` |
| **DataFrame** | 2-D labeled data structure with rows & columns (like Excel sheet / SQL table) | `pd.DataFrame({"Name":["A","B"], "Age":[20,25]})` |

---

## 🔹 2. Handling DataFrame  

| Operation | Code Example | Purpose |
|-----------|-------------|---------|
| **Load Data** | `df = pd.read_csv("data.csv")` | Load data from CSV file |
| **Inspect Data** | `df.head()`, `df.tail()`, `df.info()`, `df.describe()` | Quick look at data |
| **Shape of Data** | `df.shape` | Rows × Columns |
| **Column Names** | `df.columns` | List all column names |
| **Select Column** | `df["Name"]` | Access one column |
| **Select Row** | `df.loc[0]`, `df.iloc[0]` | Access by label or index |
| **Filter Rows** | `df[df["Age"] > 25]` | Apply condition |
| **Add Column** | `df["NewCol"] = df["Age"] * 2` | Create new column |
| **Drop Column** | `df.drop("ColName", axis=1, inplace=True)` | Remove a column |

---

## 🔹 3. Different Ways to Read CSV Files  

| Method | Code Example |
|--------|--------------|
| **Default Read** | `pd.read_csv("data.csv")` |
| **With Index Column** | `pd.read_csv("data.csv", index_col=0)` |
| **Select Specific Columns** | `pd.read_csv("data.csv", usecols=["Name","Age"])` |
| **With Separator** | `pd.read_csv("data.tsv", sep="\t")` |
| **Read Large Files in Chunks** | `pd.read_csv("data.csv", chunksize=1000)` |

---

⚡ With Pandas, you can handle **large datasets efficiently**, perform **data cleaning**, and prepare data for **analysis or ML models**.  


**How to create Series with nd array:-**


In [2]:
import pandas as pd
import numpy as np 
arr=np.array([10,15,18,22])
s = pd.Series(arr)
print(s)

0    10
1    15
2    18
3    22
dtype: int64


**How to create Series with Mutable index:-**

In [3]:
import pandas as pd
import numpy as np
arr=np.array(['a','b','c','d'])
s=pd.Series(arr, index=['first','second','third','fourth'])
print(s)

first     a
second    b
third     c
fourth    d
dtype: object


### Creating a series from Scalar value
*To create a series from scalar value, an index must be provided. The
scalar value will be repeated as per the length of index.*

In [6]:
import pandas as pd
s=pd.Series(50, index=[0,1,2,3,4])
print(s)            

0    50
1    50
2    50
3    50
4    50
dtype: int64


### Creating a series from a Dictionary

In [9]:
# importing pandas lib
import pandas as pd

# creating an Dictionary
d={'name':'Aman', 'IPL_team': 'RCB', 'runs': '1729'}
s=pd.Series(d)
print(s)

name        Aman
IPL_team     RCB
runs        1729
dtype: object


In [10]:
import pandas as pd
s=pd.Series([1,2,3,4,5])
print(s)
print('To multiply all values in a series by 2 :- ')
print(s*2)

0    1
1    2
2    3
3    4
4    5
dtype: int64
To multiply all values in a series by 2 :- 
0     2
1     4
2     6
3     8
4    10
dtype: int64


In [11]:
import pandas as pd
s=pd.Series([1,2,3,4,5])
print('To find square of all values in a series:- ')
print(s*s)

To find square of all values in a series:- 
0     1
1     4
2     9
3    16
4    25
dtype: int64


In [14]:
import pandas as pd
s=pd.Series([1,2,3,4,5])
print('To find all values in a series that is greater than 2:- ')
# Boolean indexing
result = s[s > 2]
print(result)

To find all values in a series that is greater than 2:- 
2    3
3    4
4    5
dtype: int64


In [17]:
import pandas as pd
s1=pd.Series([1,2,3,4,5], index=['a','b','c','d','e'])
s2=pd.Series([10,20,30,40,50], index=['a','b','c','d','e'])
s3=pd.Series([5,14,23,32], index=['a','b','c','d'])
print("To add Series 1st and Series 2nd")
print(s1+s2)
print("To add Series 2nd and Series 3rd")
print(s2+s3)
print("To add Series 2nd and Series 3rd & filled non matching index with 0")
print(s2.add(s3,fill_value=0))

To add Series 1st and Series 2nd
a    11
b    22
c    33
d    44
e    55
dtype: int64
To add Series 2nd and Series 3rd
a    15.0
b    34.0
c    53.0
d    72.0
e     NaN
dtype: float64
To add Series 2nd and Series 3rd & filled non matching index with 0
a    15.0
b    34.0
c    53.0
d    72.0
e    50.0
dtype: float64


**Head and Tail Functions in Series**
- head (): It is used to access the first 5 rows of a series.
- Note :To access first 3 rows we can call series_name.head(3)
- Note :To access first 7 rows we can call series_name.head(7)

In [22]:
import pandas as pd
import numpy as np
arr=np.array([10,15,18,22,55,77,48,97])
#crearte a series from array
s=pd.Series(arr)
# to print first 5 rows
print (s.head())
# to print first 3 rows
print (s.head(3))
# to print first 7 rows
print (s.head(7))

0    10
1    15
2    18
3    22
4    55
dtype: int64
0    10
1    15
2    18
dtype: int64
0    10
1    15
2    18
3    22
4    55
5    77
6    48
dtype: int64


**tail(): It is used to access the last 5 rows of a series.**
- Note :To access last 4 rows we can call series_name.tail (4)

In [23]:
import pandas as pd
import numpy as np
arr=np.array([10,15,18,22,55,77,48,97])
#crearte a series from array
s=pd.Series(arr)
# to print last 5 rows
print (s.tail())
# to print last 3 rows
print (s.tail(3))
# to print last 7 rows
print (s.tail(7))

3    22
4    55
5    77
6    48
7    97
dtype: int64
5    77
6    48
7    97
dtype: int64
1    15
2    18
3    22
4    55
5    77
6    48
7    97
dtype: int64


## Selection in Series
**Series provides index label loc and ilocand [] to access rows and
columns.**
- 1. loc index label :-
- Syntax:-series_name.loc[StartRange: StopRange]
- ***Example***

In [26]:
import numpy as np
import pandas as pd
arr=np.array([10,15,18,22,55,77])
s = pd.Series(arr)
print(s)
print(s.loc[:2]) #To Print Values from Index 0 to 2
print(s.loc[3:4]) #To Print Values from Index 3 to 4
print(s.loc[2:3])

0    10
1    15
2    18
3    22
4    55
5    77
dtype: int64
0    10
1    15
2    18
dtype: int64
3    22
4    55
dtype: int64
2    18
3    22
dtype: int64


### 2. Selection Using iloc index label :-
**Syntax:-series_name.iloc[StartRange : StopRange]**
- Example

In [30]:
import numpy as np
import pandas as pd 
array=np.array([10,15,18,22,55,77])
s=pd.Series(arr)
print(s)
print(s.iloc[:4]) #To Print Values from Index 0 to 3
print(s.iloc[3:4]) #To Print Values from Index 3
print(s.iloc[2:3]) #To Print Values from Index 2

0    10
1    15
2    18
3    22
4    55
5    77
dtype: int64
0    10
1    15
2    18
3    22
dtype: int64
3    22
dtype: int64
2    18
dtype: int64


### 3. Selection Using [] :
**Syntax:-series_name[StartRange> : StopRange] or
series_name[ index]**
- Example

In [33]:
import numpy as np
import pandas as pd 
array=np.array([10,15,18,22,55,77])
s=pd.Series(arr)
print(s)
print(s[1])
print('\n')
print(s[3:4])
print (s[:4])

0    10
1    15
2    18
3    22
4    55
5    77
dtype: int64
15


3    22
dtype: int64
0    10
1    15
2    18
3    22
dtype: int64


### Indexing in Series
**Pandas provide index attribute to get or set the index of entries or
values in series.**
- Example-

In [34]:
import pandas as pd 
import numpy as np
arr=np.array(['a','b','c','d'])
s=pd.Series(arr,index=['first','second','third','fourth'])
print(s)
# To print only indexes in series 
print('\n indexes in Series are:::')
print(s.index)

first     a
second    b
third     c
fourth    d
dtype: object

 indexes in Series are:::
Index(['first', 'second', 'third', 'fourth'], dtype='object')


## Slicing in Series
**Slicing is a way to retrieve subsets of data from a pandas object. A
slice object syntax is-**
## SERIES_NAME [start:end: step]
- The segments start representing the first item, end representing the
last item, and step representing the increment between each item that
you would like.
- Example :-

In [37]:
import numpy as np
import pandas as pd 
arr=np.array([10,15,18,22,55,77])
s=pd.Series(arr,index=['A','B','C','D','E','F'])
print(s)
print(s[1:5:2])
print('\n')
print(s[0:6:2])
print (s[:4])

A    10
B    15
C    18
D    22
E    55
F    77
dtype: int64
B    15
D    22
dtype: int64


A    10
C    18
E    55
dtype: int64
A    10
B    15
C    18
D    22
dtype: int64


# Pandas DataFrame

A **DataFrame** is a two-dimensional object that is useful in representing data in the form of **rows and columns**.  
It is similar to a **spreadsheet** or an **SQL table**.  
This is the most commonly used pandas object.  

Once we store the data into the DataFrame, we can perform various operations that are useful in analyzing and understanding the data.

---

## DataFrame Structure

| COLUMNS     | PLAYERNAME | IPLTEAM | BASEPRICEINCR |
|-------------|------------|---------|---------------|
| **0**       | ROHIT      | MI      | 13            |
| **1**       | VIRAT      | RCB     | 17            |
| **2**       | HARDIK     | MI      | 14            |

---

## Key Points about DataFrame

1. A DataFrame has **axes (indices)**:
   - Row index → `axis=0`
   - Column index → `axis=1`

2. It is similar to a **spreadsheet**:
   - Row index is called **Index**
   - Column index is called **Column Name**

3. A DataFrame contains **Heterogeneous data** (different data types).

4. A DataFrame **Size is Mutable** (you can add/remove rows/columns).

5. A DataFrame **Data is Mutable** (you can update/change values).

---


**A data frame can be created using any of the following**
1. Series
2. Lists
3. Dictionary
4. A numpy 2D array

In [38]:
# How to create Empty dataframe

import pandas as pd
df=pd.DataFrame()
print(df)

Empty DataFrame
Columns: []
Index: []


In [40]:
#How to create Dataframe From Series
import pandas as pd
s = pd.Series(['a','b','c','d'])
df=pd.DataFrame(s)
print(df)
#it has default coloumn name As 0 

   0
0  a
1  b
2  c
3  d


### DataFrame from Dictionary of Series
Example

In [42]:
import pandas as pd
name=pd.Series(['Hardik','Virat'])
team=pd.Series(['MI','RCB'])
dic={'Name':name,'Team':team}
df=pd.DataFrame(dic)
print(df)

     Name Team
0  Hardik   MI
1   Virat  RCB


## DataFrame from List of Dictionaries
Example-

In [43]:
import pandas as pd 
l=[{'Name':'Sachin','Sir_Name':'Tendulkar'},
  {'Name':'Aman','Sir_Name':'Patel'},
  {'Name':'Ananya','Sir_Name':'Verma'}]
df1=pd.DataFrame(l)
print(df1)

     Name   Sir_Name
0  Sachin  Tendulkar
1    Aman      Patel
2  Ananya      Verma


## Iteration on Rows and Columns
If we want to access record or data from a data frame row wise or
column wise then iteration is used. Pandas provide 2 functions to
perform iterations
1. iterrows ()
2. iteritems ()
---
**iterrows()**
- It is used to access the data row wise.
- Example-

In [54]:
import pandas as pd 
l=[{'Name':'Sachin','Sir_Name':'Tendulkar'},
  {'Name':'Aman','Sir_Name':'Patel'},
  {'Name':'Ananya','Sir_Name':'Verma'}]
df1=pd.DataFrame(l)
print(df1)
for(row_index,row_value) in df1.iterrows():
    print('\n Row index is ::',row_index)
    print('Row value is::') 
    print(row_value)

     Name   Sir_Name
0  Sachin  Tendulkar
1    Aman      Patel
2  Ananya      Verma

 Row index is :: 0
Row value is::
Name           Sachin
Sir_Name    Tendulkar
Name: 0, dtype: object

 Row index is :: 1
Row value is::
Name         Aman
Sir_Name    Patel
Name: 1, dtype: object

 Row index is :: 2
Row value is::
Name        Ananya
Sir_Name     Verma
Name: 2, dtype: object


### iteritems()
- It is used to access the data column wise.
- in pandas >= 2.0, the method .iteritems() has been deprecated and replaced with .items()
- Example-

In [55]:
import pandas as pd 
l=[
  {'Name':'Aman','Sir_Name':'Patel'},
  {'Name':'Ananya','Sir_Name':'Verma'}
]
df2=pd.DataFrame(l)
print(df2)
for(col_name,col_value) in df2.items():
    print('\n')
    print('Coloumn name is ::',col_name)
    print('Column value are::') 
    print(col_value)

     Name Sir_Name
0    Aman    Patel
1  Ananya    Verma


Coloumn name is :: Name
Column value are::
0      Aman
1    Ananya
Name: Name, dtype: object


Coloumn name is :: Sir_Name
Column value are::
0    Patel
1    Verma
Name: Sir_Name, dtype: object
