#### Pandas Tutorial - Part 67: DataFrame.to_dict()

This notebook covers the `to_dict()` method, which converts a DataFrame to a dictionary with various orientation options.

In [None]:
import pandas as pd
import numpy as np
from collections import defaultdict, OrderedDict

# Set display options
pd.set_option('display.max_columns', None)
pd.set_option('display.expand_frame_repr', False)

##### DataFrame.to_dict()

The `to_dict()` method converts a DataFrame to a dictionary. The structure of the resulting dictionary depends on the `orient` parameter, which determines how the DataFrame's data is organized in the dictionary.

In [None]:
# Create a sample DataFrame
df = pd.DataFrame({
    'col1': [1, 2],
    'col2': [0.5, 0.75]
}, index=['row1', 'row2'])

print("Sample DataFrame:")
print(df)

### Default orientation: 'dict'

By default, `to_dict()` uses the 'dict' orientation, which creates a dictionary with column names as keys and nested dictionaries as values. The nested dictionaries have index values as keys and cell values as values.

In [None]:
# Default orientation: 'dict'
dict_default = df.to_dict()
print("Default orientation ('dict'):")
print(dict_default)

### Orientation: 'list'

The 'list' orientation creates a dictionary with column names as keys and lists of values as values.

In [None]:
# Orientation: 'list'
dict_list = df.to_dict(orient='list')
print("Orientation 'list':")
print(dict_list)

### Orientation: 'series'

The 'series' orientation creates a dictionary with column names as keys and Series objects as values.

In [None]:
# Orientation: 'series'
dict_series = df.to_dict(orient='series')
print("Orientation 'series':")
print(dict_series)

# Demonstrate that the values are Series objects
print("\nType of the first value:", type(dict_series['col1']))
print("\nAccessing the first Series:")
print(dict_series['col1'])

### Orientation: 'split'

The 'split' orientation creates a dictionary with keys 'index', 'columns', and 'data', containing the index values, column names, and data values respectively.

In [None]:
# Orientation: 'split'
dict_split = df.to_dict(orient='split')
print("Orientation 'split':")
print(dict_split)

### Orientation: 'records'

The 'records' orientation creates a list of dictionaries, where each dictionary represents a row in the DataFrame with column names as keys.

In [None]:
# Orientation: 'records'
dict_records = df.to_dict(orient='records')
print("Orientation 'records':")
print(dict_records)

### Orientation: 'index'

The 'index' orientation creates a dictionary with index values as keys and dictionaries of column-value pairs as values.

In [None]:
# Orientation: 'index'
dict_index = df.to_dict(orient='index')
print("Orientation 'index':")
print(dict_index)

### Using abbreviations for orientation

Pandas allows using abbreviations for the orientation parameter. For example, 's' for 'series' and 'sp' for 'split'.

In [None]:
# Using abbreviation 's' for 'series'
dict_s = df.to_dict(orient='s')
print("Abbreviation 's' for 'series':")
print(dict_s)

# Using abbreviation 'sp' for 'split'
dict_sp = df.to_dict(orient='sp')
print("\nAbbreviation 'sp' for 'split':")
print(dict_sp)

### Using different dictionary types with the 'into' parameter

The `into` parameter allows specifying a different dictionary type to use for the result. This can be any subclass of `collections.abc.Mapping`.

In [None]:
# Using OrderedDict
dict_ordered = df.to_dict(into=OrderedDict)
print("Using OrderedDict:")
print(dict_ordered)
print("Type:", type(dict_ordered))

In [None]:
# Using defaultdict
# Note: defaultdict must be initialized with a factory function
dict_default_dict = df.to_dict(into=lambda: defaultdict(int))
print("Using defaultdict:")
print(dict_default_dict)
print("Type:", type(dict_default_dict))

# Demonstrate defaultdict behavior
print("\nAccessing a non-existent key in the first nested dictionary:")
print(dict_default_dict['col1']['non_existent_key'])  # Returns 0 (default value for int)

### Working with a larger DataFrame

In [None]:
# Create a larger DataFrame
df_large = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'age': [25, 30, 35, 40, 45],
    'salary': [50000, 60000, 70000, 80000, 90000],
    'department': ['HR', 'IT', 'Finance', 'Marketing', 'Sales']
})

print("Larger DataFrame:")
print(df_large)

In [None]:
# Convert to dictionary with 'records' orientation
records = df_large.to_dict(orient='records')
print("Records orientation:")
for record in records:
    print(record)

In [None]:
# Convert to dictionary with 'index' orientation
index_dict = df_large.to_dict(orient='index')
print("Index orientation:")
for idx, row_dict in index_dict.items():
    print(f"Row {idx}: {row_dict}")

### Working with a DataFrame containing different data types

In [None]:
# Create a DataFrame with different data types
df_types = pd.DataFrame({
    'string': ['a', 'b', 'c'],
    'integer': [1, 2, 3],
    'float': [1.1, 2.2, 3.3],
    'boolean': [True, False, True],
    'datetime': pd.date_range('2020-01-01', periods=3),
    'category': pd.Categorical(['X', 'Y', 'Z']),
    'complex': [1+2j, 3+4j, 5+6j],
    'object': [{'a': 1}, {'b': 2}, {'c': 3}]
})

print("DataFrame with different data types:")
print(df_types)
print("\nData types:")
print(df_types.dtypes)

In [None]:
# Convert to dictionary with 'dict' orientation
types_dict = df_types.to_dict()
print("Dictionary with different data types:")
for col, values in types_dict.items():
    print(f"{col}: {values}")

### Practical example: Converting to and from dictionary

In [None]:
# Create a DataFrame
df_original = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6],
    'C': [7, 8, 9]
})

print("Original DataFrame:")
print(df_original)

In [None]:
# Convert to dictionary with 'split' orientation
dict_split = df_original.to_dict(orient='split')
print("Dictionary with 'split' orientation:")
print(dict_split)

In [None]:
# Convert back to DataFrame
df_reconstructed = pd.DataFrame(**dict_split)
print("Reconstructed DataFrame:")
print(df_reconstructed)

# Check if the reconstructed DataFrame is identical to the original
print("\nIs the reconstructed DataFrame identical to the original?")
print(df_original.equals(df_reconstructed))

In [None]:
# Convert to dictionary with 'records' orientation
dict_records = df_original.to_dict(orient='records')
print("Dictionary with 'records' orientation:")
print(dict_records)

In [None]:
# Convert back to DataFrame
df_from_records = pd.DataFrame(dict_records)
print("DataFrame from records:")
print(df_from_records)

# Check if the reconstructed DataFrame is identical to the original
print("\nIs the reconstructed DataFrame identical to the original?")
print(df_original.equals(df_from_records))

##### Summary

In this notebook, we've explored the `to_dict()` method of pandas DataFrame, which converts a DataFrame to a dictionary. We've covered:

1. **Different orientation options**:
   - `'dict'` (default): `{column -> {index -> value}}`
   - `'list'`: `{column -> [values]}`
   - `'series'`: `{column -> Series(values)}`
   - `'split'`: `{'index' -> [index], 'columns' -> [columns], 'data' -> [values]}`
   - `'records'`: `[{column -> value}, ..., {column -> value}]`
   - `'index'`: `{index -> {column -> value}}`

2. **Using abbreviations** for orientation parameters, such as 's' for 'series' and 'sp' for 'split'.

3. **Using different dictionary types** with the `into` parameter, such as `OrderedDict` and `defaultdict`.

4. **Working with different data types** and how they are preserved in the resulting dictionary.

5. **Converting back and forth** between DataFrames and dictionaries.

The `to_dict()` method is particularly useful for:
- Serializing DataFrames to JSON or other formats
- Interfacing with APIs that expect dictionary data
- Converting DataFrame data for use in other Python libraries
- Creating custom data structures based on DataFrame data