# Pandas Functions, Methods, and Attributes Reference

---

## 1. **`pd.DataFrame()`** - Function

Creates a DataFrame from various data structures like dictionaries, lists, or arrays.

**Usage**:
- `pd.DataFrame(dictionary)` - Creates DataFrame from dictionary where keys become column names
- `pd.DataFrame(list_of_lists)` - Creates DataFrame from nested lists
- Each key in dictionary represents a column
- All lists/values must have the same length

---

## 2. **`.columns`** - Attribute

Returns the column labels of the DataFrame as an Index object.

**Usage**:
- `df.columns` - Shows all column names
- Returns an Index object containing column names
- No parentheses needed (it's an attribute, not a method)

---

## 3. **`.info()`** - Method

Provides a concise summary of the DataFrame.

**Shows**:
- Number of entries (rows)
- Column names
- Non-null count for each column
- Data type (dtype) of each column
- Memory usage

**Usage**:
- `df.info()` - Displays complete information about DataFrame structure

---

## 4. **`.head()`** - Method

Returns the first n rows of the DataFrame.

**Usage**:
- `df.head()` - Returns first 5 rows (default)
- `df.head(n)` - Returns first n rows
- If n is greater than total rows, returns all rows

---

## 5. **`.tail()`** - Method

Returns the last n rows of the DataFrame.

**Usage**:
- `df.tail()` - Returns last 5 rows (default)
- `df.tail(n)` - Returns last n rows
- If n is greater than total rows, returns all rows

---

## 6. **Column Selection with `[]`**

Selects one or more columns from a DataFrame.

**Usage**:
- `df['column_name']` - Select single column, returns Series
- `df[['col1', 'col2']]` - Select multiple columns, returns DataFrame
- Must use double brackets `[[]]` for multiple columns
- Single brackets with multiple columns will raise KeyError

---

## 7. **`.loc[]`** - Attribute/Indexer

Label-based indexing to access rows and columns.

**Format**: `df.loc[row_selection, column_selection]`

**Row Selection Options**:
- `df.loc[index]` - Single row by index label
- `df.loc[[index1, index2]]` - Multiple rows by list of indices
- `df.loc[start:end]` - Slice rows (end is **inclusive**)
- `df.loc[start::step]` - Slice with step
- `df.loc[::-1]` - Reverse order
- `df.loc[condition]` - Boolean/conditional selection

**Column Selection Options**:
- `df.loc[row, 'column']` - Single column
- `df.loc[row, ['col1', 'col2']]` - Multiple columns

**Conditional Selection**:
- `df.loc[df['column'] > value]` - Single condition
- `df.loc[(df['col1'] > value) & (df['col2'] == value)]` - Multiple conditions with AND
- `df.loc[(df['col1'] > value) | (df['col2'] == value)]` - Multiple conditions with OR
- Must wrap each condition in parentheses when using `&` or `|`

**Important Notes**:
- Slicing is **inclusive** on both ends
- Negative indexing does NOT work (e.g., `df.loc[-1]` raises KeyError)
- Use `&` for AND, `|` for OR (not `and`/`or` keywords)

---

## 8. **`.to_csv()`** - Method

Writes DataFrame to a CSV file.

**Usage**:
- `df.to_csv('filename.csv')` - Saves with default index
- `df.to_csv('filename.csv', index=False)` - Saves without row index
- Creates new file or overwrites existing file

**Parameters**:
- `index=False` - Excludes row index from CSV file
- `index=True` (default) - Includes row index as first column

---

## 9. **`.to_excel()`** - Method

Writes DataFrame to an Excel file.

**Usage**:
- `df.to_excel('filename.xlsx')` - Saves with default index
- `df.to_excel('filename.xlsx', index=False)` - Saves without row index
- Creates .xlsx file format

**Parameters**:
- `index=False` - Excludes row index from Excel file
- `index=True` (default) - Includes row index as first column

---

## 10. **`pd.read_csv()`** - Function

Reads a CSV file into a DataFrame.

**Usage**:
- `df = pd.read_csv('filename.csv')` - Loads CSV file into DataFrame
- Automatically detects column names from first row
- Returns a DataFrame object

**Common Use**:
- Reading previously saved CSV files
- Loading external datasets

---

## 11. **`pd.concat()`** - Function

Concatenates (combines) multiple DataFrames along rows or columns.

**Usage**:
- `pd.concat([df1, df2, df3], axis=0)` - Concatenate rows (vertical stacking)
- `pd.concat([df1, df2], axis=1)` - Concatenate columns (horizontal stacking)

**Parameters**:
- `axis=0` - Joins DataFrames vertically (adds rows below)
- `axis=1` - Joins DataFrames horizontally (adds columns to the right)

**Important Notes**:
- Takes a list of DataFrames as first argument
- When axis=0: columns must match (or NaN fills missing values)
- When axis=1: row indices are aligned
- May create duplicate indices when concatenating rows

---

## 12. **`.index`** - Attribute

Returns the index (row labels) of the DataFrame.

**Usage**:
- `df.index` - Shows all row indices
- Returns RangeIndex, Index, or other index type
- No parentheses needed (it's an attribute)

**Common Use Cases**:
- Check for duplicate indices after concatenation
- Verify index structure
- See if index needs resetting

---

## 13. **`.reset_index()`** - Method

Resets the index to default integer index (0, 1, 2, ...).

**Usage**:
- `df.reset_index()` - Resets index, old index becomes a new column named 'index'
- `df.reset_index(drop=True)` - Resets index and removes old index completely

**Parameters**:
- `drop=False` (default) - Keeps old index as a new column
- `drop=True` - Discards old index completely

**Important Notes**:
- Must assign result back to variable: `df = df.reset_index(drop=True)`
- Essential after concatenation to fix duplicate indices
- Creates continuous integer index starting from 0

---

## 14. **Column Assignment with `[]`**

Creates a new column or modifies an existing column in the DataFrame.

**Usage**:
- `df['new_column'] = value` - Single value assigned to all rows
- `df['new_column'] = [val1, val2, ...]` - List of values (must match row count)
- `df['new_column'] = ['value'] * n` - Repeat same value n times using multiplication
- `df['new_column'] = None` - Assigns None to create empty column

**Important Notes**:
- List length must exactly match number of rows in DataFrame
- Using `[value] * n` is shorthand for creating list with repeated values
- Can use `None` or `0` to initialize columns
- Directly modifies the DataFrame (in-place operation)

---

## Key Concepts

### Difference Between Method and Attribute
- **Method**: Requires parentheses `()` - e.g., `df.head()`, `df.info()`
- **Attribute**: No parentheses - e.g., `df.columns`, `df.index`

### Slicing Behavior in Pandas
- `.loc[]` slicing is **inclusive** on both ends
- Different from Python list slicing which excludes the end
- Example: `df.loc[0:5]` includes rows 0, 1, 2, 3, 4, **and 5**

### Indexing Notes
- Cannot use direct indexing like `df[0]` or `df[0:3]` for rows
- Must use `.loc[]` for label-based row selection
- Negative indices don't work with `.loc[]`

### Boolean Conditions
- Use `&` for AND operator (not `and`)
- Use `|` for OR operator (not `or`)
- Always wrap each condition in parentheses
- Example: `(df['col1'] > 5) & (df['col2'] < 10)`

### Working with 2D Data
- DataFrames are 2D (rows and columns)
- Always need to specify both row and column selections when needed
- Different from 1D Python lists or Series