#### Introduction

**Who created pandas and why?**
- *Wes McKinney*, a data scientist and software developer, created Pandas in 2008.

**Reason for Creating**
- While working at AQR Capital Management, Weq faced challanges analyzing large financial datasets.
- Existing tools like Excel were inefficient for large-scale data cleaning and analysis.

**What is Pandas?**

Pandas is a powerful, open-source Python library used for data manipulation, cleaning, and analysis. It procides two main data structures:

- **Series**: A one-dimensional labeled array.
- **DataFrame**: A two-dimensional labeled table (like an Excel sheet or SQL table)
---

##### Series - 1D Labeled Array
A Series is like a list with labels (indices).

In [1]:
import pandas as pd

s = pd.Series([10, 20, 30, 40, 50])
print(s)

0    10
1    20
2    30
3    40
4    50
dtype: int64


Notice the automatic index: 0, 1, 2, 3, 4

You can also define a custom index:

In [3]:
s = pd.Series([10, 20, 30], index=['a', 'b', 'c'])
print(s)

a    10
b    20
c    30
dtype: int64


A Pandas Series may look similar to Python dictionary because both store data with labels, but a Series offers much more. Unlike a dictionary, a Series supports fast vectorized operations, automatic index alignment during arithmetic, and handles missing data using *NaN*. It also allows both label-based and position-based access, and integrates seamlessly with the Pandas ecosystem, especially DataFrames. While dictionary is great for simple key-value storage, a Series is better suited for data analysis and manipulation tasks where performance, flexibility, and built-in functionality matter.

--- 

##### DataFrame - 2D Labeled Table
A DataFrame is like a dictionary of Series - multiple columns with labels.

In [4]:
data = {
    'name':['Alice', 'Bob', 'Charlie'],
    'age':[25, 30, 35],
    'city':['Delhi', 'Mumbai', 'Bangalore']
}

df = pd.DataFrame(data)
print(df)

      name  age       city
0    Alice   25      Delhi
1      Bob   30     Mumbai
2  Charlie   35  Bangalore


Each column in a *DataFrame* is a *Series*

---

##### Index and Labels
Every Series and DataFrame has an index -- it helps with:
- Fast Lookups
- Aligning data
- Merging and joining
- Time Series operations

In [5]:
print(df.index)   # Row labels
print(df.columns) # Column labels

RangeIndex(start=0, stop=3, step=1)
Index(['name', 'age', 'city'], dtype='object')


You can change them using:

In [6]:
df.index = ['a', 'b', 'c']
df.columns = ['Name', 'Age', 'City']

print(df)

      Name  Age       City
a    Alice   25      Delhi
b      Bob   30     Mumbai
c  Charlie   35  Bangalore
