### ---------------------- CHAPTER 5 --------------------------

- Series -> single column excel sheet but with lables
- DataFrames -> full fledged excel sheet but supercharged
------
**SERIES (PYTHON LIST WITH SOME ATTITUDE)**
- Instead of asking where is the second element you can directly ask what's under 'ohio'😉
- Just like NumPy arrays you can slice , dice and multiply with the series
- Think of this as an python dictionary with indexes equivalent of keys and data -> values 


In [1]:
import pandas as pd
import numpy as np

In [None]:
obj = pd.Series([1, 4, -6, 8])  # Default index: 0 to len-1
obj2 = pd.Series([1, 4, -5, 7], index=["a", "b", "c", "d"])  # Custom index
obj2.index = ["c", "d", "a", "b"]  # Rename index just because we can

obj2["b"]        # Access single value (4)
obj2[["a", "b", "d"]]  # Multiple row access like a boss

# NumPy-like operations (because pandas = numpy + swagger)
obj2[obj2 > 0]   # Filter positives
obj2 * 2         # Multiply everything by 2 (easy gains)
np.exp(obj2)     # Exponentials, no calculator needed

# Series works like a dictionary (but cooler)
"b" in obj2      # Membership test

# Create Series from a dictionary (upgrade complete)
sdata = {"Ohio": 35000, "Texas": 71000, "Oregon": 16000, "Utah": 5000}
obj3 = pd.Series(sdata)

obj3.to_dict()   # Back to dictionary, if you’re nostalgic

# Custom index (adds NaN where data's missing)
states = ["California", "Ohio", "Oregon", "Texas"]
obj4 = pd.Series(sdata, index=states)

pd.isna(obj4)    # True where data is missing
pd.notna(obj4)   # False where data is missing

# Automatic alignment on index when combining Series
obj3 + obj4

# Name your Series and index to feel important
obj3.index.name = "hello"
obj3.name = "world"

**DATAFRAMES (DF)**

- A DF is a dictionary of Series sharing the same index (2D but can represent hierarchical structures).
- Assigning a Series to a column aligns it by index; missing matches → `NaN`.
- Transposing mixed-type columns results in `dtype=object`.
- Nested dictionary structure:
  - Outer keys → columns.
  - Inner keys → indexes (missing keys → `NaN`).
- Dot notation (`df.col`) **won’t work if**:
  - Column names have spaces or special characters.
  - Column names conflict with DataFrame methods.
- Assigning lists/arrays to columns requires matching length.



In [None]:
# Creating a DataFrame from a dictionary
data = {
    "state": ["Ohio", "Ohio", "Ohio", "Nevada", "Nevada", "Nevada"],
    "year": [2000, 2001, 2002, 2001, 2002, 2003],
    "pop": [1.5, 1.7, 3.6, 2.4, 2.9, 3.2]
}
frame = pd.DataFrame(data)  # keys become columns, values become data

# Viewing rows (head & tail)
frame.head()  # top 5 rows
frame.tail()  # bottom 5 rows

# Rearranging columns and adding new ones (with NaN)
pd.DataFrame(frame, columns=["pop", "year", "state"])
pd.DataFrame(frame, columns=["pop", "year", "state", "debt"])  # 'debt' doesn't exist yet → NaN

# Accessing columns
frame["pop"]  # Series access
frame.state   # dot notation (don't clash with method names!)

# Accessing rows
frame.loc[1]   # by label (index)
frame.iloc[2]  # by position (integer index)

# Assigning values to columns
frame["debt"] = 16.5  # broadcasts to the entire column
frame["pop"] = np.arange(6.)  # replaces 'pop' with array values (lengths must match)
val = pd.Series([12, 5, 6], index=["A", "B", "C"])  # custom index Series
frame["debt"] = val  # aligns by index, missing rows → NaN

# Creating and deleting columns
frame["easter"] = frame["state"] == "Ohio"  # boolean condition as a new column
del frame["easter"]  # delete the column

# Nested dictionary → DataFrame
populations = {
    "Ohio": {2000: 1.5, 2001: 1.7, 2002: 3.6},
    "Nevada": {2001: 2.4, 2002: 2.9}
}
pd.DataFrame(populations)  # outer keys → columns, inner keys → row index
pd.DataFrame(populations, index=[2001, 2002, 2003])  # custom index → NaN where missing

# Transposing the DataFrame
frame.T  # flip rows and columns

import pandas as pd

# Nested dictionary to DataFrame
populations = {
    "Gotham": {2020: 5.5, 2021: 6.0, 2022: 6.8},
    "Metropolis": {2021: 8.2, 2022: 8.5}
}
frame = pd.DataFrame(populations)  # Outer keys = columns, inner keys = index
# Missing data becomes NaN

# Dictionary of Series to DataFrame
pdata = {
    "Gotham": frame["Gotham"][:-1],        # Excludes last row (2022)
    "Metropolis": frame["Metropolis"][:2]  # First two rows (2020, 2021)
}
new_frame = pd.DataFrame(pdata)  # Indexes are aligned, missing values filled with NaN

# Naming index and columns
frame.index.name = "Year"        # Sets index name
frame.columns.name = "City"      # Sets columns name

# Convert DataFrame to NumPy array
raw_data = frame.to_numpy()      # Returns 2D ndarray without index/column labels

# DataFrame with mixed data types
crazy_data = pd.DataFrame({
    "Year": [2020, 2021, 2022],
    "City": ["Gotham", "Gotham", "Gotham"],
    "Population": [5.5, 6.0, 6.8],
    "Growth_Rate": [None, 0.1, 0.15]
})
crazy_numpy = crazy_data.to_numpy()  # dtype becomes object because of mixed types

**INDEX OBJECTS IN PANDAS**
- What is an Index? -># Index holds row and column labels along with metadata in pandas.<br>
-----------INDEX METHODS & PROPERTIES SUMMARY-----------<br>
---------
**Set-like operations:**
- append(): Concatenate with additional Index objects
- difference(): Compute set difference as an Index
- intersection(): Compute set intersection
- union(): Compute set union
- isin(): Boolean array indicating whether each value is in another collection

**Element operations:**
- delete(): Delete element at a specific index
- drop(): Compute new Index by deleting passed values
- insert(): Insert element at a specific index

**properties:**
- is_monotonic: True if each element is greater than or equal to the previous one
- is_unique: True if there are no duplicate values
- unique(): Compute an array of unique values in the Index

**You may not directly interact with Index objects often BUT they show up in most pandas operations:**
- Merging
- Joining
- Aligning
- Reindexing



In [None]:
# pandas converts it into an Index internally.
obj = pd.Series(np.arange(3), index=["a", "b", "c"])
# Accessing Index
obj.index["a"] # you can choose a specific index or do slicing or fancy indexing
# Immutable
obj.index[1] = "d"  # Raises TypeError
# Example of sharing Index objects
labels = pd.Index(np.arange(3))  # Int64Index([0, 1, 2], dtype='int64')
obj2 = pd.Series([1.5, -2.5, 0], index=labels) #now they share same types of indexes
# INDEX OBJECT BEHAVIOR: ARRAY + SET HYBRID
# - Fixed-size set (supports membership checks)
# Unlike Python sets, Index objects CAN have duplicates!
dup_index = pd.Index(["foo", "foo", "bar", "bar"])
# Selections with duplicate labels return all occurrences.