# Introduction to Pandas
**`01-introduction.ipynb`**

Pandas is a **powerful Python library** for data manipulation and analysis.  
It is built on top of **NumPy** and provides **easy-to-use data structures** and functions for handling structured data like tables (CSV, Excel, SQL, etc.).

---



## Why Pandas?

- Handle **tabular data** with rows and columns efficiently.
- Easy **data cleaning, manipulation, and transformation**.
- Supports **time series data**, merging, joining, and aggregation.
- Well-integrated with other libraries like **NumPy, Matplotlib, and Scikit-learn**.
- Used widely in **Data Science, Machine Learning, and Analytics**.

---


## Key Features

- **Series:** 1D labeled array (like a column in Excel or SQL)
- **DataFrame:** 2D labeled data structure with rows and columns
- **Reading/Writing:** CSV, Excel, JSON, SQL, etc.
- **Data operations:** Filtering, grouping, merging, aggregation
- **Time Series support:** DateTime indexing, resampling, rolling windows

---


## Installation

If Pandas is not installed, run:

```python
!pip install pandas
````

---

## Importing Pandas

In [3]:
import pandas as pd
import numpy as np  # NumPy is often used alongside Pandas


---


## Creating a Pandas Series

A **Series** is a 1-dimensional labeled array capable of holding data of any type.

In [4]:
# Create a Series from a list
data = [10, 20, 30, 40]
series = pd.Series(data)
print(series)

# Series with custom index
series_custom_index = pd.Series(data, index=['a', 'b', 'c', 'd'])
print(series_custom_index)


0    10
1    20
2    30
3    40
dtype: int64
a    10
b    20
c    30
d    40
dtype: int64



---

## Creating a Pandas DataFrame

A **DataFrame** is a 2-dimensional labeled data structure with columns of potentially different types.


In [5]:
# Create a DataFrame from a dictionary
data_dict = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 22],
    'City': ['New York', 'Los Angeles', 'Chicago']
}

df = pd.DataFrame(data_dict)
print(df)


      Name  Age         City
0    Alice   25     New York
1      Bob   30  Los Angeles
2  Charlie   22      Chicago



---

## Inspecting DataFrames

In [6]:
# View first few rows
print(df.head())

# View last few rows
print(df.tail())

# Basic information
print(df.info())

# Summary statistics (numeric columns)
print(df.describe())

      Name  Age         City
0    Alice   25     New York
1      Bob   30  Los Angeles
2  Charlie   22      Chicago
      Name  Age         City
0    Alice   25     New York
1      Bob   30  Los Angeles
2  Charlie   22      Chicago
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Name    3 non-null      object
 1   Age     3 non-null      int64 
 2   City    3 non-null      object
dtypes: int64(1), object(2)
memory usage: 200.0+ bytes
None
             Age
count   3.000000
mean   25.666667
std     4.041452
min    22.000000
25%    23.500000
50%    25.000000
75%    27.500000
max    30.000000


---



## Accessing Columns and Rows


In [7]:

# Access a single column
print(df['Name'])

# Access multiple columns
print(df[['Name', 'City']])

# Access rows by index
print(df.iloc[0])  # First row
print(df.loc[1])   # Row with index 1


0      Alice
1        Bob
2    Charlie
Name: Name, dtype: object
      Name         City
0    Alice     New York
1      Bob  Los Angeles
2  Charlie      Chicago
Name       Alice
Age           25
City    New York
Name: 0, dtype: object
Name            Bob
Age              30
City    Los Angeles
Name: 1, dtype: object



---


## Checking Missing Values

In [8]:
# Introduce missing value
df.loc[1, 'City'] = None
print(df)

# Check for missing values
print(df.isnull())
print(df.isnull().sum())

      Name  Age      City
0    Alice   25  New York
1      Bob   30      None
2  Charlie   22   Chicago
    Name    Age   City
0  False  False  False
1  False  False   True
2  False  False  False
Name    0
Age     0
City    1
dtype: int64



---


## Summary

* Pandas is essential for **data analysis and manipulation**.
* **Series** = 1D labeled data; **DataFrame** = 2D labeled data.
* Pandas provides **easy ways to inspect, access, and clean data**.
* Mastering Pandas is a prerequisite for **Data Science and Machine Learning workflows**.

---