# Pandas DataFrame
**`03-dataframe.ipynb`**

A **DataFrame** is a 2-dimensional labeled data structure with **rows and columns**.  
It is the most commonly used Pandas object for working with structured data, similar to a table in Excel or SQL.

---

## Step 1: Import Libraries

import pandas as pd
import numpy as np


---

## Step 2: Creating a DataFrame

### From a Python Dictionary


data = {
    "Name": ["Alice", "Bob", "Charlie", "David"],
    "Age": [25, 30, 22, 28],
    "City": ["New York", "Los Angeles", "Chicago", "Houston"]
}

df = pd.DataFrame(data)
print(df)

### From a List of Lists

data = [
    [1, "Alice", 25],
    [2, "Bob", 30],
    [3, "Charlie", 22]
]

df = pd.DataFrame(data, columns=["ID", "Name", "Age"])
print(df)

### From a NumPy Array

arr = np.array([[1, 2], [3, 4], [5, 6]])
df = pd.DataFrame(arr, columns=["Column1", "Column2"])
print(df)


---

## Step 3: Inspecting a DataFrame

# View first few rows
print(df.head())

# View last few rows
print(df.tail())

# Basic info about data types and non-null counts
print(df.info())

# Summary statistics for numeric columns
print(df.describe())


---

## Step 4: Accessing Columns

# Access a single column
print(df['Name'])

# Access multiple columns
print(df[['Name', 'Age']])


---

## Step 5: Accessing Rows

# Access rows by integer position
print(df.iloc[0])  # first row

# Access rows by index label (if index is custom)
df_custom = df.set_index('ID')
print(df_custom.loc[2])  # row with ID = 2


---

## Step 6: Accessing Rows and Columns Together

# Single value
print(df.loc[0, 'Name'])

# Multiple rows and columns
print(df.loc[0:2, ['Name', 'Age']])



---

## Step 7: Adding and Removing Columns

# Add a new column
df['Score'] = [85, 90, 78, 92]
print(df)

# Remove a column
df = df.drop('Score', axis=1)
print(df)


---

## Step 8: Adding and Removing Rows


# Add a new row using append (note: append returns a new DataFrame)
new_row = {"Name": "Eva", "Age": 26, "City": "Boston"}
df = df.append(new_row, ignore_index=True)
print(df)

# Remove a row by index
df = df.drop(0, axis=0)
print(df)


---

## Step 9: Indexing and Selection

# Set a column as index
df = df.set_index('Name')
print(df)

# Access a row using index label
print(df.loc['Bob'])

# Access multiple rows
print(df.loc[['Bob', 'Charlie']])


---

## Step 10: Conditional Selection

# Filter rows where Age > 25
print(df[df['Age'] > 25])

# Multiple conditions
print(df[(df['Age'] > 25) & (df['City'] == 'Houston')])


---


## Step 11: Renaming Columns and Index

# Rename columns
df = df.rename(columns={"Age": "Years"})
print(df)

# Rename index
df = df.rename(index={"Bob": "Bobby"})
print(df)


---

## Step 12: Sorting Data

# Sort by column
df_sorted = df.sort_values(by='Years', ascending=False)
print(df_sorted)

# Sort by index
df_sorted_index = df.sort_index()
print(df_sorted_index)


---

## Step 13: Handling Missing Data

# Introduce missing values
df.loc['Charlie', 'City'] = None
print(df)

# Detect missing values
print(df.isnull())

# Fill missing values
df['City'] = df['City'].fillna("Unknown")
print(df)

# Drop rows with missing values
df_clean = df.dropna()
print(df_clean)



---


## Step 14: Real-World Example

# Suppose we have employee data
employees = {
    "EmployeeID": [101, 102, 103, 104],
    "Name": ["Alice", "Bob", "Charlie", "David"],
    "Department": ["HR", "IT", "Finance", "IT"],
    "Salary": [50000, 60000, 55000, 65000]
}

df_emp = pd.DataFrame(employees)

# View first few rows
print(df_emp.head())

# Employees in IT department
print(df_emp[df_emp['Department'] == 'IT'])

# Average salary
print("Average Salary:", df_emp['Salary'].mean())


---


## ✅ Summary

* A **DataFrame** is a 2D labeled data structure with rows and columns.
* Can be created from **lists, dictionaries, NumPy arrays, or other DataFrames**.
* Supports **indexing, selection, adding/removing rows/columns, conditional filtering**.
* Essential methods: `.head()`, `.tail()`, `.info()`, `.describe()`, `.isnull()`, `.fillna()`, `.drop()`.
* Indexing and conditional selection are the backbone of **data manipulation in Pandas**.

---