# Saving and Loading DataFrames

In this guide, you will learn how to save and load Woodwork DataFrames.

## Saving a Woodwork DataFrame

After defining a Woodwork DataFrame with the proper logical types and semantic tags, you can save the DataFrame and  typing information by using [`DataFrame.ww.to_disk`](https://woodwork.alteryx.com/en/stable/generated/woodwork.table_accessor.WoodworkTableAccessor.to_disk.html#woodwork.table_accessor.WoodworkTableAccessor.to_disk). This method will create a directory that contains a `data` folder and a `woodwork_typing_info.json` file. To illustrate, I will use this retail DataFrame which already comes configured with Woodwork typing information.

In [None]:
from woodwork.demo import load_retail
df = load_retail(nrows=100)
df.ww.schema

In [None]:
df.head()

From the `ww` acessor, use [`to_disk`](https://woodwork.alteryx.com/en/stable/generated/woodwork.table_accessor.WoodworkTableAccessor.to_disk.html#woodwork.table_accessor.WoodworkTableAccessor.to_disk) to save the Woodwork DataFrame.

In [None]:
df.ww.to_disk('retail')

You should see a new directory that contains the data and typing information.

```
retail
├── data
│   └── demo_retail_data.csv
└── woodwork_typing_info.json
```

### Data Directory

The `data` directory contains the underlying data written in the specified format. The method derives the filename from  `DataFrame.ww.name` and uses CSV as the default format. You can change the format by setting the method's `format` parameter to any of the following formats:

- csv (default)
- pickle
- parquet

### Typing Information

In the `woodwork_typing_info.json`, you can see all of the typing information and metadata associated with the DataFrame. This information includes:

- the version of the schema at the time of saving the DataFrame
- the DataFrame name specified by `DataFrame.ww.name`
- the column names for the index and time index
- the column typing information, which contains the logical types with their parameters and semantic tags for each column
- the loading information required for the DataFrame type and file format
- the table metadata provided by `DataFrame.ww.metadata` (must be JSON serializable)

```
{
    "schema_version": "10.0.2",
    "name": "demo_retail_data",
    "index": "order_product_id",
    "time_index": "order_date",
    "column_typing_info": [...],
    "loading_info": {
        "table_type": "pandas",
        "location": "data/demo_retail_data.csv",
        "type": "csv",
        "params": {
            "compression": null,
            "sep": ",",
            "encoding": "utf-8",
            "engine": "python",
            "index": false
        }
    },
    "table_metadata": {}
}
```

## Loading a Woodwork DataFrame

After saving a Woodwork DataFrame, you can load the DataFrame and typing information by using [`woodwork.deserialize.read_woodwork_table`](https://woodwork.alteryx.com/en/stable/generated/woodwork.deserialize.read_woodwork_table.html#woodwork.deserialize.read_woodwork_table). This function will use the stored typing information in the specified directory to recreate the Woodwork DataFrame.

In [None]:
from woodwork.deserialize import read_woodwork_table
df = read_woodwork_table('retail')
df.ww.schema

## Loading a DataFrame and Typing Information separately

You can also load the Woodwork DataFrame and typing information separately by using [`woodwork.read_file`](https://woodwork.alteryx.com/en/stable/generated/woodwork.utils.read_file.html#woodwork.utils.read_file). This function uses the `content_type` parameter to determine the file format. If `content_type` is not specified, the function will try to infer the file format from the file extension. The typing information such as the index, time index, logical types, and semantics tags are optional parameters. This approach is helpful if you want to save and load the typing information outside the specified directory or read a data file directly into a Woodwork DataFrame. To illustrate, I will write a pandas DataFrame to a single file in different formats, then load it into a Woodwork DataFrame using this typing information.

In [None]:
typing_information = {
    'index': 'order_product_id',
    'time_index': 'order_date',
    'logical_types': {
        'order_product_id': 'Categorical',
        'order_id': 'Categorical',
        'product_id': 'Categorical',
        'description': 'NaturalLanguage',
        'quantity': 'Integer',
        'order_date': 'Datetime',
        'unit_price': 'Double',
        'customer_name': 'Categorical',
        'country': 'Categorical',
        'total': 'Double',
        'cancelled': 'Boolean',
    },
    'semantic_tags': {
        'order_id': {'category'},
        'product_id': {'category'},
        'quantity': {'numeric'},
        'unit_price': {'numeric'},
        'customer_name': {'category'},
        'country': {'category'},
        'total': {'numeric'},
    }
}

First, load the data file into a pandas DataFrame and save it to different formats.

In [None]:
import pandas as pd

pandas_df = pd.read_csv('retail/data/demo_retail_data.csv')
pandas_df.to_csv('retail.csv')
pandas_df.to_parquet('retail.parquet')
pandas_df.to_feather('retail.feather')

Now, you can use `read_file` to load the data directly into a Woodwork DataFrame.

In [None]:
from woodwork import read_file

woodwork_df = read_file(
    filepath='retail.csv',
    content_type='csv',
    **typing_information,
)

woodwork_df = read_file(
    filepath='retail.parquet',
    content_type='parquet',
    **typing_information,
)

woodwork_df = read_file(
    filepath='retail.feather',
    content_type='feather',
    **typing_information,
)

woodwork_df.ww