# **What is Pandas and Why to use it?**
Pandas is a powerful Python library used for data manipulation and analysis. It provides easy-to-use data structures like Series and DataFrames, which help in handling tabular data (like spreadsheets or SQL tables). If you're working with structured data—whether it's from a CSV file, an Excel sheet, or a database—Pandas makes it easy to clean, explore, and process your data efficiently.

In [None]:
import pandas as pd

## Series
A one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.).

In [None]:
data1 = [5,4]
index=['a', 'b']
s = pd.Series(data1) # the length of data and index should be same, if you don’t provide an index, pandas assigns default integer indices (0, 1, 2, ...)
data2 = {'a': 1, 'b': 2} # a and b will be taken as index
s = pd.Series(data2, index=['b', 'c', 'd', 'a']) # if u still specify index, the values in data2 corresponding to the labels in the index will be pulled out
print(s.shape) # (length)
print(s.head(2)) # return the given no of rows from the top
print(s.tail(2)) # returns the given no of rows from bottom
print(s.info()) # Summary of the DataFrame (data types, missing values, etc.)
print(s.describe()) # Statistical summary of numeric columns

(4,)


## **DataFrames**
DataFrame is a 2-dimensional labeled data structure with columns of different types. You can think of it like a spreadsheet or SQL table, or a dict of Series objects. It is generally the most commonly used pandas object.

In [None]:
df = pd.DataFrame([[1, 2, 3], [4, 5, 6, 10], [7, 9]],  
                  columns=['a', 'b', 'c', 'z'],  
                  index=['d', 'e', 'f'])  
df.head(1) # return the given no of rows from the top
df.tail(2) # return the given no of rows from the bottom
print(df.shape) # (Rows, Columns)  
print(df.size) # Total number of elements  
print(df.info()) # Summary of the DataFrame (data types, missing values, etc.)
print(df.describe()) # Statistical summary of numeric columns  

#### Handling Missing Values (NaN) in Pandas
Pandas provides several ways to deal with missing values (NaN).

In [None]:
# Replacing Missing Values with a Specific Value (fillna)
df_filled = df.fillna({'z': 5}) # replaces NaN values only in the column 'z' with 5.

# Filling Missing Values Using Forward Fill (ffill)
# ffill (forward fill): Fills missing values with the last valid value above
# axis=0 (rows): Works row-wise (fills down)
# limit=1: Fills at most one missing value per column if axis=0 or per row if axis=1
df_ffill = df.fillna(method='ffill', axis=0, limit=1)

# Filling Missing Values Using Backward Fill (bfill)
# bfill (backward fill): Fills missing values with the next valid value below.
# if there’s no value below (e.g., last row), the missing value remains NaN
df_bfill = df.fillna(method='bfill')

# Removing Rows or Columns with Missing Values (dropna)
# this removes rows where any value is NaN.
# to drop columns instead, use df.dropna(axis=1).
df_cleaned = df.dropna()
