In [None]:
import woodwork as ww

data = ww.demo.load_retail(nrows=100, return_dataframe=True)
data.head(5)

As you can see, this is a dataframe containing several different data types, including dates, categorical values, numeric values, and natural language descriptions. Next, use Woodwork to create a DataTable from this data.

## Creating a DataTable
Creating a Woodwork DataTable is as simple as passing in a dataframe with the data of interest during initialization. An optional name parameter can be specified to label the DataTable.

In [None]:
dt = ww.DataTable(data, name="retail")
dt

Using just this simple call, Woodwork was able to infer the logical types present in the data by analyzing the dataframe dtypes as well as the information contained in the columns. In addition, Woodwork also added semantic tags to some of the columns based on the logical types that were inferred.

You can also view the typing information along with the first few columns of data.

In [None]:
dt.head()

## Updating Logical Types
If the initial inference was not to our liking, the logical type can be changed to a more appropriate value. Let's change some of the columns to a different logical type to illustrate this process. In this case, set the logical type for the `quantity`, `customer_name`, and `country` columns to be `Categorical`.

In [None]:
dt = dt.set_types(logical_types={
    'quantity': 'Categorical',
    'customer_name': 'Categorical',
    'country': 'Categorical'
})
dt

Inspect the information in the `types` output. There, you can see that the Logical type for the three columns has been updated with the `Categorical` logical type you specified.

## Selecting Columns

Now that you've prepared logical types, you can select a subset of the columns based on their logical types. Select only the columns that have a logical type of `Integer` or `Double`.

In [None]:
numeric_dt = dt.select(['Integer', 'Double'])
numeric_dt

This selection process has returned a new `DataTable` containing only the columns that match the logical types you specified. After you have selected the columns you want, you can also access a dataframe containing just those columns if you need it for additional analysis.

In [None]:
numeric_dt.to_dataframe()

## Adding Semantic Tags

Next, let’s add semantic tags to some of the columns. Add the tag of `product_details` to the `description` column, and tag the `total` column with `currency`.

In [None]:
dt = dt.set_types(semantic_tags={'description':'product_details', 'total': 'currency'})
dt

Select columns based on a semantic tag. Only select the columns tagged with `category`.

In [None]:
category_dt = dt.select('category')
category_dt

Select columns using multiple semantic tags or a mixture of semantic tags and logical types.

In [None]:
category_numeric_dt = dt.select(['numeric', 'category'])
category_numeric_dt

In [None]:
mixed_dt = dt.select(['Boolean', 'product_details'])
mixed_dt

To select an individual column, specify the column name. You can then get access to the data in the DataColumn using the `to_series` method.

In [None]:
dc = dt['total']
dc

In [None]:
dc.to_series()

Access multiple columns by supplying a list of column names.

In [None]:
multiple_cols_dt = dt[['product_id', 'total', 'unit_price']]
multiple_cols_dt

## Removing Semantic Tags
Remove specific semantic tags from a column if they are no longer needed. In this example, remove the `product_details` tag from the `description` column.

In [None]:
dt = dt.remove_semantic_tags({'description':'product_details'})
dt

Notice how the ``product_details`` tag has been removed from the ``description`` column. If you want to remove all user-added semantic tags from all columns, you can do that, too.

In [None]:
dt = dt.reset_semantic_tags()
dt

## Set Index and Time Index
At any point, you can designate certain columns as the DataTable's `index` or `time_index` with the methods [set_index](generated/woodwork.datatable.DataTable.set_index.rst) and [set_time_index](generated/woodwork.datatable.DataTable.set_time_index.rst). These methods can be used to assign these columns for the first time or to change the column being used as the index or time index.

Index and time index columns contain `index` and `time_index` semantic tags, respectively.

In [None]:
dt = dt.set_index('order_product_id')
dt.index

In [None]:
dt = dt.set_time_index('order_date')
dt.time_index

In [None]:
dt

## List Logical Types
Retrieve all the Logical Types present in Woodwork. These can be useful for understanding the Logical Types, as well as how they are interpreted.

In [None]:
from woodwork.type_sys.utils import list_logical_types

list_logical_types()