#### Pandas Tutorial - Part 67: DataFrame.to_dict()

This notebook covers the `to_dict()` method, which converts a DataFrame to a dictionary with various orientation options.

In [1]:
import pandas as pd
import numpy as np
from collections import defaultdict, OrderedDict

# Set display options
pd.set_option('display.max_columns', None)
pd.set_option('display.expand_frame_repr', False)

##### DataFrame.to_dict()

The `to_dict()` method converts a DataFrame to a dictionary. The structure of the resulting dictionary depends on the `orient` parameter, which determines how the DataFrame's data is organized in the dictionary.

In [2]:
# Create a sample DataFrame
df = pd.DataFrame({
    'col1': [1, 2],
    'col2': [0.5, 0.75]
}, index=['row1', 'row2'])

print("Sample DataFrame:")
print(df)

Sample DataFrame:
      col1  col2
row1     1  0.50
row2     2  0.75


### Default orientation: 'dict'

By default, `to_dict()` uses the 'dict' orientation, which creates a dictionary with column names as keys and nested dictionaries as values. The nested dictionaries have index values as keys and cell values as values.

In [3]:
# Default orientation: 'dict'
dict_default = df.to_dict()
print("Default orientation ('dict'):")
print(dict_default)

Default orientation ('dict'):
{'col1': {'row1': 1, 'row2': 2}, 'col2': {'row1': 0.5, 'row2': 0.75}}


### Orientation: 'list'

The 'list' orientation creates a dictionary with column names as keys and lists of values as values.

In [4]:
# Orientation: 'list'
dict_list = df.to_dict(orient='list')
print("Orientation 'list':")
print(dict_list)

Orientation 'list':
{'col1': [1, 2], 'col2': [0.5, 0.75]}


### Orientation: 'series'

The 'series' orientation creates a dictionary with column names as keys and Series objects as values.

In [5]:
# Orientation: 'series'
dict_series = df.to_dict(orient='series')
print("Orientation 'series':")
print(dict_series)

# Demonstrate that the values are Series objects
print("\nType of the first value:", type(dict_series['col1']))
print("\nAccessing the first Series:")
print(dict_series['col1'])

Orientation 'series':
{'col1': row1    1
row2    2
Name: col1, dtype: int64, 'col2': row1    0.50
row2    0.75
Name: col2, dtype: float64}

Type of the first value: <class 'pandas.core.series.Series'>

Accessing the first Series:
row1    1
row2    2
Name: col1, dtype: int64


### Orientation: 'split'

The 'split' orientation creates a dictionary with keys 'index', 'columns', and 'data', containing the index values, column names, and data values respectively.

In [6]:
# Orientation: 'split'
dict_split = df.to_dict(orient='split')
print("Orientation 'split':")
print(dict_split)

Orientation 'split':
{'index': ['row1', 'row2'], 'columns': ['col1', 'col2'], 'data': [[1, 0.5], [2, 0.75]]}


### Orientation: 'records'

The 'records' orientation creates a list of dictionaries, where each dictionary represents a row in the DataFrame with column names as keys.

In [7]:
# Orientation: 'records'
dict_records = df.to_dict(orient='records')
print("Orientation 'records':")
print(dict_records)

Orientation 'records':
[{'col1': 1, 'col2': 0.5}, {'col1': 2, 'col2': 0.75}]


### Orientation: 'index'

The 'index' orientation creates a dictionary with index values as keys and dictionaries of column-value pairs as values.

In [8]:
# Orientation: 'index'
dict_index = df.to_dict(orient='index')
print("Orientation 'index':")
print(dict_index)

Orientation 'index':
{'row1': {'col1': 1, 'col2': 0.5}, 'row2': {'col1': 2, 'col2': 0.75}}


### Using abbreviations for orientation

Pandas allows using abbreviations for the orientation parameter. For example, 's' for 'series' and 'sp' for 'split'.

In [10]:
# Using full name 'series' instead of abbreviation 's'
dict_series = df.to_dict(orient='series')
print("Using orient='series':")
print(dict_series)

# Using full name 'split' instead of abbreviation 'sp'
dict_split = df.to_dict(orient='split')
print("\nUsing orient='split':")
print(dict_split)

Using orient='series':
{'col1': row1    1
row2    2
Name: col1, dtype: int64, 'col2': row1    0.50
row2    0.75
Name: col2, dtype: float64}

Using orient='split':
{'index': ['row1', 'row2'], 'columns': ['col1', 'col2'], 'data': [[1, 0.5], [2, 0.75]]}


### Using different dictionary types with the 'into' parameter

The `into` parameter allows specifying a different dictionary type to use for the result. This can be any subclass of `collections.abc.Mapping`.

In [11]:
# Using OrderedDict
dict_ordered = df.to_dict(into=OrderedDict)
print("Using OrderedDict:")
print(dict_ordered)
print("Type:", type(dict_ordered))

Using OrderedDict:
OrderedDict({'col1': OrderedDict({'row1': 1, 'row2': 2}), 'col2': OrderedDict({'row1': 0.5, 'row2': 0.75})})
Type: <class 'collections.OrderedDict'>


In [14]:
# Import defaultdict if not already imported
from collections import defaultdict

# Initialize defaultdict with a factory function
initialized_defaultdict = defaultdict(int)

# Using initialized defaultdict
dict_default_dict = df.to_dict(into=initialized_defaultdict)
print("Using initialized defaultdict:")
print(dict_default_dict)
print("Type:", type(dict_default_dict))

# Demonstrate defaultdict behavior
print("\nAccessing a non-existent key:")
print(dict_default_dict.get('non_existent_key', 0))  # Safe access with get()

Using initialized defaultdict:
defaultdict(<class 'int'>, {'col1': defaultdict(<class 'int'>, {'row1': 1, 'row2': 2}), 'col2': defaultdict(<class 'int'>, {'row1': 0.5, 'row2': 0.75})})
Type: <class 'collections.defaultdict'>

Accessing a non-existent key:
0


### Working with a larger DataFrame

In [15]:
# Create a larger DataFrame
df_large = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'age': [25, 30, 35, 40, 45],
    'salary': [50000, 60000, 70000, 80000, 90000],
    'department': ['HR', 'IT', 'Finance', 'Marketing', 'Sales']
})

print("Larger DataFrame:")
print(df_large)

Larger DataFrame:
      name  age  salary department
0    Alice   25   50000         HR
1      Bob   30   60000         IT
2  Charlie   35   70000    Finance
3    David   40   80000  Marketing
4      Eve   45   90000      Sales


In [16]:
# Convert to dictionary with 'records' orientation
records = df_large.to_dict(orient='records')
print("Records orientation:")
for record in records:
    print(record)

Records orientation:
{'name': 'Alice', 'age': 25, 'salary': 50000, 'department': 'HR'}
{'name': 'Bob', 'age': 30, 'salary': 60000, 'department': 'IT'}
{'name': 'Charlie', 'age': 35, 'salary': 70000, 'department': 'Finance'}
{'name': 'David', 'age': 40, 'salary': 80000, 'department': 'Marketing'}
{'name': 'Eve', 'age': 45, 'salary': 90000, 'department': 'Sales'}


In [17]:
# Convert to dictionary with 'index' orientation
index_dict = df_large.to_dict(orient='index')
print("Index orientation:")
for idx, row_dict in index_dict.items():
    print(f"Row {idx}: {row_dict}")

Index orientation:
Row 0: {'name': 'Alice', 'age': 25, 'salary': 50000, 'department': 'HR'}
Row 1: {'name': 'Bob', 'age': 30, 'salary': 60000, 'department': 'IT'}
Row 2: {'name': 'Charlie', 'age': 35, 'salary': 70000, 'department': 'Finance'}
Row 3: {'name': 'David', 'age': 40, 'salary': 80000, 'department': 'Marketing'}
Row 4: {'name': 'Eve', 'age': 45, 'salary': 90000, 'department': 'Sales'}


### Working with a DataFrame containing different data types

In [18]:
# Create a DataFrame with different data types
df_types = pd.DataFrame({
    'string': ['a', 'b', 'c'],
    'integer': [1, 2, 3],
    'float': [1.1, 2.2, 3.3],
    'boolean': [True, False, True],
    'datetime': pd.date_range('2020-01-01', periods=3),
    'category': pd.Categorical(['X', 'Y', 'Z']),
    'complex': [1+2j, 3+4j, 5+6j],
    'object': [{'a': 1}, {'b': 2}, {'c': 3}]
})

print("DataFrame with different data types:")
print(df_types)
print("\nData types:")
print(df_types.dtypes)

DataFrame with different data types:
  string  integer  float  boolean   datetime category   complex    object
0      a        1    1.1     True 2020-01-01        X  1.0+2.0j  {'a': 1}
1      b        2    2.2    False 2020-01-02        Y  3.0+4.0j  {'b': 2}
2      c        3    3.3     True 2020-01-03        Z  5.0+6.0j  {'c': 3}

Data types:
string              object
integer              int64
float              float64
boolean               bool
datetime    datetime64[ns]
category          category
complex         complex128
object              object
dtype: object


In [19]:
# Convert to dictionary with 'dict' orientation
types_dict = df_types.to_dict()
print("Dictionary with different data types:")
for col, values in types_dict.items():
    print(f"{col}: {values}")

Dictionary with different data types:
string: {0: 'a', 1: 'b', 2: 'c'}
integer: {0: 1, 1: 2, 2: 3}
float: {0: 1.1, 1: 2.2, 2: 3.3}
boolean: {0: True, 1: False, 2: True}
datetime: {0: Timestamp('2020-01-01 00:00:00'), 1: Timestamp('2020-01-02 00:00:00'), 2: Timestamp('2020-01-03 00:00:00')}
category: {0: 'X', 1: 'Y', 2: 'Z'}
complex: {0: (1+2j), 1: (3+4j), 2: (5+6j)}
object: {0: {'a': 1}, 1: {'b': 2}, 2: {'c': 3}}


### Practical example: Converting to and from dictionary

In [20]:
# Create a DataFrame
df_original = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6],
    'C': [7, 8, 9]
})

print("Original DataFrame:")
print(df_original)

Original DataFrame:
   A  B  C
0  1  4  7
1  2  5  8
2  3  6  9


In [21]:
# Convert to dictionary with 'split' orientation
dict_split = df_original.to_dict(orient='split')
print("Dictionary with 'split' orientation:")
print(dict_split)

Dictionary with 'split' orientation:
{'index': [0, 1, 2], 'columns': ['A', 'B', 'C'], 'data': [[1, 4, 7], [2, 5, 8], [3, 6, 9]]}


In [22]:
# Convert back to DataFrame
df_reconstructed = pd.DataFrame(**dict_split)
print("Reconstructed DataFrame:")
print(df_reconstructed)

# Check if the reconstructed DataFrame is identical to the original
print("\nIs the reconstructed DataFrame identical to the original?")
print(df_original.equals(df_reconstructed))

Reconstructed DataFrame:
   A  B  C
0  1  4  7
1  2  5  8
2  3  6  9

Is the reconstructed DataFrame identical to the original?
True


In [23]:
# Convert to dictionary with 'records' orientation
dict_records = df_original.to_dict(orient='records')
print("Dictionary with 'records' orientation:")
print(dict_records)

Dictionary with 'records' orientation:
[{'A': 1, 'B': 4, 'C': 7}, {'A': 2, 'B': 5, 'C': 8}, {'A': 3, 'B': 6, 'C': 9}]


In [24]:
# Convert back to DataFrame
df_from_records = pd.DataFrame(dict_records)
print("DataFrame from records:")
print(df_from_records)

# Check if the reconstructed DataFrame is identical to the original
print("\nIs the reconstructed DataFrame identical to the original?")
print(df_original.equals(df_from_records))

DataFrame from records:
   A  B  C
0  1  4  7
1  2  5  8
2  3  6  9

Is the reconstructed DataFrame identical to the original?
True


##### Summary

In this notebook, we've explored the `to_dict()` method of pandas DataFrame, which converts a DataFrame to a dictionary. We've covered:

1. **Different orientation options**:
   - `'dict'` (default): `{column -> {index -> value}}`
   - `'list'`: `{column -> [values]}`
   - `'series'`: `{column -> Series(values)}`
   - `'split'`: `{'index' -> [index], 'columns' -> [columns], 'data' -> [values]}`
   - `'records'`: `[{column -> value}, ..., {column -> value}]`
   - `'index'`: `{index -> {column -> value}}`

2. **Using abbreviations** for orientation parameters, such as 's' for 'series' and 'sp' for 'split'.

3. **Using different dictionary types** with the `into` parameter, such as `OrderedDict` and `defaultdict`.

4. **Working with different data types** and how they are preserved in the resulting dictionary.

5. **Converting back and forth** between DataFrames and dictionaries.

The `to_dict()` method is particularly useful for:
- Serializing DataFrames to JSON or other formats
- Interfacing with APIs that expect dictionary data
- Converting DataFrame data for use in other Python libraries
- Creating custom data structures based on DataFrame data