# Pandas DataFrame: The Core

In this notebook, we will explore Pandas DataFrame, a versatile 2D labeled data structure.

## Topics Covered:
- Understanding DataFrame: 2D labeled data structure
- Creating DataFrames (from lists, dictionaries, NumPy arrays)
- Indexing, selection, and slicing (`loc[]`, `iloc[]`, boolean indexing)
- Adding, renaming, and deleting columns
- Practical: Basic DataFrame manipulation

## Understanding DataFrame

A **DataFrame** in Pandas is a two-dimensional, size-mutable, and heterogeneous data structure with labeled axes (rows and columns).

### Key Features:
- Think of it as a spreadsheet or SQL table.
- Each column can hold a different data type.
- Provides powerful methods for data manipulation and analysis.

### When to Use DataFrames:
- To handle structured data with rows and columns.
- To perform complex operations like grouping, filtering, and joining.
- To clean, transform, and analyze tabular datasets.

In [None]:
# Importing Pandas
import pandas as pd

# Example of a DataFrame
data = {"Name": ["Alice", "Bob", "Charlie"], "Age": [25, 30, 35], "City": ["New York", "Los Angeles", "Chicago"]}
df = pd.DataFrame(data)
print(df)

## Creating DataFrames

You can create a DataFrame from various sources, including lists, dictionaries, and NumPy arrays.

### Creating DataFrame from a List of Lists
Each sublist represents a row, and column headers can be specified.

In [None]:
# Creating a DataFrame from a list of lists
data = [["John", 28, "HR"], ["Anna", 24, "Finance"], ["Peter", 29, "IT"]]
columns = ["Name", "Age", "Department"]
df_from_list = pd.DataFrame(data, columns=columns)
print(df_from_list)

### Creating DataFrame from a NumPy Array

This method is useful when working with numerical data stored in arrays.

In [None]:
# Creating a DataFrame from a NumPy array
import numpy as np

array_data = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
columns = ["A", "B", "C"]
df_from_array = pd.DataFrame(array_data, columns=columns)
print(df_from_array)

## Indexing, Selection, and Slicing

Pandas provides several methods to access rows and columns in a DataFrame.

### Methods for Indexing and Selection:
1. **.loc[]**: Label-based indexing.
2. **.iloc[]**: Position-based indexing.
3. **Boolean Indexing**: Filter rows based on conditions.

In [None]:
# Using .loc[] for label-based indexing
print(df.loc[0])  # Access the first row
print(df.loc[:, "Name"])  # Access the 'Name' column

# Using .iloc[] for position-based indexing
print(df.iloc[1])  # Access the second row
print(df.iloc[:, 1])  # Access the second column

# Boolean indexing
print(df[df["Age"] > 28])  # Rows where 'Age' is greater than 28

## Adding, Renaming, and Deleting Columns

DataFrames are mutable, allowing you to modify their structure by adding, renaming, or deleting columns.

### Adding a Column:
You can directly assign a new column to the DataFrame.

### Renaming Columns:
Use the `rename` method to change column names.

### Deleting Columns:
Use the `drop` method to remove a column.

In [None]:
# Adding a new column
df["Salary"] = [60000, 70000, 80000]
print(df)

# Renaming columns
df.rename(columns={"Name": "Employee Name"}, inplace=True)
print(df)

# Deleting a column
df.drop(columns=["City"], inplace=True)
print(df)

## Practical: Basic DataFrame Manipulation

Let’s create a DataFrame representing sales data and perform some operations.

### Task:
1. Create a DataFrame for sales data.
2. Add a new column for total sales.
3. Filter rows where total sales exceed $3000.

In [None]:
# Creating a sales DataFrame
sales_data = {"Product": ["Laptop", "Phone", "Tablet"], "Price": [1000, 500, 300], "Quantity": [5, 10, 7]}
sales_df = pd.DataFrame(sales_data)
print(sales_df)

# Adding a new column for total sales
sales_df["Total Sales"] = sales_df["Price"] * sales_df["Quantity"]
print(sales_df)

# Filtering products with total sales greater than $3000
print(sales_df[sales_df["Total Sales"] > 3000])