# **Pandas DataFrame**

In [1]:
import pandas as pd

## **Table of Contents**
1. [What is a Pandas DataFrame?](#1-what-is-a-pandas-dataframe)
2. [Creating a DataFrame](#2-creating-a-dataframe)
   - [2.1: Creating a DataFrame from a Dictionary](#21-creating-a-dataframe-from-a-dictionary)
   - [2.2: Creating a DataFrame from a List of Dictionaries](#22-creating-a-dataframe-from-a-list-of-dictionaries)
   - [2.3: Creating a DataFrame from a CSV File](#23-creating-a-dataframe-from-a-csv-file)
3. [Accessing Data in a DataFrame](#3-accessing-data-in-a-dataframe)
   - [3.1: Accessing Columns](#31-accessing-columns)
   - [3.2: Accessing Rows](#32-accessing-rows)
   - [3.3: Slicing a DataFrame](#33-slicing-a-dataframe)
4. [DataFrame Operations](#4-dataframe-operations)
   - [4.1: Adding and Deleting Columns](#41-adding-and-deleting-columns)
   - [4.2: Filtering Data](#42-filtering-data)
   - [4.3: Arithmetic Operations](#43-arithmetic-operations)
5. [DataFrame Attributes and Methods](#5-dataframe-attributes-and-methods)
   - [5.1: Attributes](#51-attributes)
   - [5.2: Methods](#52-methods)
6. [Advanced Filtering Techniques](#6-advanced-filtering-techniques)
   - [6.1: `query()` Method](#61-query-method)
   - [6.2: `where()` Method](#62-where-method)
   - [6.3: `isin()` Method](#63-isin-method)

---


## **1. What is a Pandas DataFrame?**

**DataFrame**: The DataFrame is a two-dimensional labeled data structure, similar to a spreadsheet or SQL table. It consists of rows and columns, where each column can accommodate various data types.

**Key Features:**
- **Rows and Columns**: Data is organized in rows and columns.
- **Index**: Each row has a label (index), and each column has a name.
- **Heterogeneous Data**: Columns can contain different data types (e.g., integers, strings, floats).

---


## **2. Creating a DataFrame**



### **2.1: Creating a DataFrame from a Dictionary**



You can create a DataFrame from a dictionary where keys are column names and values are lists (or arrays) representing the data.


In [2]:
# Create a DataFrame from a dictionary
data = {
    "Name": ["Alice", "Bob", "Charlie", "David"],
    "Age": [25, 30, 35, 40],
    "City": ["New York", "Los Angeles", "Chicago", "Houston"],
}

df = pd.DataFrame(data)
df

Unnamed: 0,Name,Age,City
0,Alice,25,New York
1,Bob,30,Los Angeles
2,Charlie,35,Chicago
3,David,40,Houston



---



### **2.2: Creating a DataFrame from a List of Dictionaries**


In [3]:
# Create a DataFrame from a list of dictionaries
data = [
    {"Name": "Alice", "Age": 25, "City": "New York"},
    {"Name": "Bob", "Age": 30, "City": "Los Angeles"},
    {"Name": "Charlie", "Age": 35, "City": "Chicago"},
    {"Name": "David", "Age": 40, "City": "Houston"},
]

df = pd.DataFrame(data)
df

Unnamed: 0,Name,Age,City
0,Alice,25,New York
1,Bob,30,Los Angeles
2,Charlie,35,Chicago
3,David,40,Houston



---



### **2.3: Creating a DataFrame from a CSV File**


Load data from a CSV file into a DataFrame using `pd.read_csv()`.


In [4]:
# Load data from a CSV file
# df = pd.read_csv("data.csv")


---



## **3. Accessing Data in a DataFrame**



### **3.1: Accessing Columns**


In [5]:
# Access the "Name" column
df["Name"]

0      Alice
1        Bob
2    Charlie
3      David
Name: Name, dtype: object

In [6]:
# Access multiple columns
df[["Name", "City"]]

Unnamed: 0,Name,City
0,Alice,New York
1,Bob,Los Angeles
2,Charlie,Chicago
3,David,Houston



---


### **3.2: Accessing Rows**


You can access rows using `loc` (label-based) or `iloc` (position-based).


The syntax for using `loc`, `iloc` is:

```python
df.loc[row_indexer, column_indexer]
df.iloc[row_indexer, column_indexer]
```

- `row_indexer`: Specifies the row labels to select, which can be a single label, a list of labels, a slice, or a boolean array.

- `column_indexer`: Specifies the column labels to select, with similar indexing options.


In [7]:
# Create a dictionary for the DataFrame
data = {"Name": ["Alice", "Bob", "Charlie"], "Age": [25, 30, 35]}

# Create a DataFrame with custom index
df = pd.DataFrame(data, index=["ID1", "ID2", "ID3"])
df

Unnamed: 0,Name,Age
ID1,Alice,25
ID2,Bob,30
ID3,Charlie,35


In [8]:
# Access a row by label
df.loc["ID1"]

Name    Alice
Age        25
Name: ID1, dtype: object

In [9]:
# Access multiple rows by label
df.loc[["ID1", "ID3"]]

Unnamed: 0,Name,Age
ID1,Alice,25
ID3,Charlie,35


In [10]:
# Access rows based on a condition and specific columns
df.loc[df["Age"] >= 30]

Unnamed: 0,Name,Age
ID2,Bob,30
ID3,Charlie,35


In [11]:
# Access value
print(df.loc["ID1", "Age"])

25


In [12]:
# Access a row by position
df.iloc[0]

Name    Alice
Age        25
Name: ID1, dtype: object

In [13]:
# Access the first two rows and all columns using iloc
df.iloc[:2]

Unnamed: 0,Name,Age
ID1,Alice,25
ID2,Bob,30


In [14]:
# Access specific rows and columns by position using iloc
df.iloc[[0, 2], [0, 1]]

Unnamed: 0,Name,Age
ID1,Alice,25
ID3,Charlie,35



---



### **3.3: Slicing a DataFrame**


In [15]:
# Create a DataFrame from a dictionary
data = {
    "Name": ["Alice", "Bob", "Charlie", "David"],
    "Age": [25, 30, 35, 40],
    "City": ["New York", "Los Angeles", "Chicago", "Houston"],
}

df = pd.DataFrame(data)
df

Unnamed: 0,Name,Age,City
0,Alice,25,New York
1,Bob,30,Los Angeles
2,Charlie,35,Chicago
3,David,40,Houston


In [16]:
# Slice rows
df.iloc[1:3]

Unnamed: 0,Name,Age,City
1,Bob,30,Los Angeles
2,Charlie,35,Chicago


In [17]:
# Slice rows and columns
df.loc[[0, 2], ["Name"]]

Unnamed: 0,Name
0,Alice
2,Charlie


In [18]:
# Slice rows and columns
df.loc[[1, 3], ["Name", "Age"]]

Unnamed: 0,Name,Age
1,Bob,30
3,David,40



---



## **4. DataFrame Operations**


### **4.1: Adding and Deleting Columns**


In [19]:
# Add a new column
df["Salary"] = [50000, 60000, 70000, 80000]
df

Unnamed: 0,Name,Age,City,Salary
0,Alice,25,New York,50000
1,Bob,30,Los Angeles,60000
2,Charlie,35,Chicago,70000
3,David,40,Houston,80000


In [20]:
# Delete a column
df.drop("Salary", axis=1, inplace=False)

Unnamed: 0,Name,Age,City
0,Alice,25,New York
1,Bob,30,Los Angeles
2,Charlie,35,Chicago
3,David,40,Houston


---


### **4.2: Filtering Data**


In [21]:
# Filter rows where Age > 30
df[df["Age"] > 30]

Unnamed: 0,Name,Age,City,Salary
2,Charlie,35,Chicago,70000
3,David,40,Houston,80000


### **4.3: Arithmetic Operations**


In [22]:
# Add 5 to the "Age" column
df["Age"] = df["Age"] + 5
df

Unnamed: 0,Name,Age,City,Salary
0,Alice,30,New York,50000
1,Bob,35,Los Angeles,60000
2,Charlie,40,Chicago,70000
3,David,45,Houston,80000



---



## **5. DataFrame Attributes and Methods**


### **Attributes:**


- `df.shape`: Returns the number of rows and columns.
- `df.columns`: Returns the column names.
- `df.index`: Returns the row labels.


In [23]:
df.shape

(4, 4)

In [24]:
df.columns

Index(['Name', 'Age', 'City', 'Salary'], dtype='object')

In [34]:
df.index

RangeIndex(start=0, stop=6, step=1)

### **Methods:**

- `df.head(n)`: Returns the first `n` rows.
- `df.tail(n)`: Returns the last `n` rows.
- `df.describe()`: Provides summary statistics for numerical columns.


In [25]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 4 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Name    4 non-null      object
 1   Age     4 non-null      int64 
 2   City    4 non-null      object
 3   Salary  4 non-null      int64 
dtypes: int64(2), object(2)
memory usage: 260.0+ bytes


In [26]:
df.head(2)

Unnamed: 0,Name,Age,City,Salary
0,Alice,30,New York,50000
1,Bob,35,Los Angeles,60000


In [27]:
df.tail(2)

Unnamed: 0,Name,Age,City,Salary
2,Charlie,40,Chicago,70000
3,David,45,Houston,80000


In [28]:
df.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Age,4.0,37.5,6.454972,30.0,33.75,37.5,41.25,45.0
Salary,4.0,65000.0,12909.944487,50000.0,57500.0,65000.0,72500.0,80000.0


---


## **6. Advanced Filtering Techniques**


### **6.1: `query()` Method**


The [`query()`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.query.html) method allows you to write complex queries using a more concise syntax.

In [29]:
# Create a DataFrame
data = {
    "Value1": [10, 20, 30, 40],
    "Value2": [5, 15, 25, 35],
}
df = pd.DataFrame(data)
df

Unnamed: 0,Value1,Value2
0,10,5
1,20,15
2,30,25
3,40,35


In [30]:
# Using query() to filter data
df.query("Value1 > 20 and Value2 == 35")

Unnamed: 0,Value1,Value2
3,40,35


### **6.2: `where()` Method**



The `where()` method returns a DataFrame with the same shape as the original but with NaNs where the condition is not met.

In [31]:
# Using where() to filter data conditionally
df.where(df["Value1"] > 20, other=pd.NA)

Unnamed: 0,Value1,Value2
0,,
1,,
2,30.0,25.0
3,40.0,35.0


### **6.3: `isin()` Method**


The `isin()` method allows you to filter data based on values in a list.


In [32]:
# Create a DataFrame
data = {"Category": ["A", "B", "C", "A", "B", "C"]}
df = pd.DataFrame(data)
df

Unnamed: 0,Category
0,A
1,B
2,C
3,A
4,B
5,C


In [33]:
# Filtering using isin() method
df[df["Category"].isin(["A", "C"])]

Unnamed: 0,Category
0,A
2,C
3,A
5,C



---
