# Gain Statistical Insights into Your DataTable

Woodwork provides methods on DataTable to allow you to use the typing information inherent in a DataTable to better understand your data.

Follow along to learn how to use `describe` and `mutual_information` on a retail DataTable so that you can see the full capabilities of the functions.

In [None]:
import pandas as pd
from woodwork import DataTable
from woodwork.demo import load_retail

dt = load_retail()
dt

## DataTable.describe

Use `dt.describe()` to calculate statistics for the DataColumns in a DataTable in the format of a pandas DataFrame with the relevant calculations done for each DataColumn.

In [None]:
dt.describe()

There are a couple things to note in the above dataframe:

- The DataTable's index, `order_product_id`, is not included
- We provide each DataColumn's typing information according to Woodwork's typing system
- Any statistics that can't be calculated for a DataColumn, say `num_false` on a `Datetime` are filled with `NaN`.
- Null values do not get counted in any of the calculations other than `nunique`

## DataTable.value_counts

Use `dt.value_counts()` to calculate the most frequent values for each Data Columns that has `category` as a standard tag. This returns a dictionary where each DataColumn is associated with a sorted list of dictionaries. Each dictionary contains `value` and `count`.

In [None]:
dt.value_counts()

## DataTable.mutual_information

`dt.mutual_information` calculates the mutual information between all pairs of relevant DataColumns. Certain types, like strings, can't have mutual information calculated.

The mutual information between columns `A` and `B` can be understood as the amount of knowledge you can have about column `A` if you have the values of column `B`. The more mutual information there is between `A` and `B`, the less uncertainty there is in `A` knowing `B`, and vice versa. 

In [None]:
dt.mutual_information()

#### Available Parameters
`dt.mutual_information` provides two parameters for tuning the mutual information calculation.

- `num_bins` - In order to calculate mutual information on continuous data, Woodwork bins numeric data into categories. This parameter allows you to choose the number of bins with which to categorize data.
    - Defaults to using 10 bins
    - The more bins there are, the more variety a column will have. The number of bins used should accurately portray the spread of the data.
- `nrows` - If `nrows` is set at a value below the number of rows in the DataTable, that number of rows is randomly sampled from the underlying data
    - Defaults to using all the available rows.
    - Decreasing the number of rows can speed up the mutual information calculation on a DataTable with many rows, but you should be careful that the number being sampled is large enough to accurately portray the data.

Now that you understand the parameters, you can explore changing the number of bins. Note—this only affects numeric Data Columns `quantity` and `unit_price`. Increase the number of bins from 10 to 50, only showing the impacted columns.

In [None]:
mi = dt.mutual_information()
mi[mi['column_1'].isin(['unit_price', 'quantity']) | mi['column_2'].isin(['unit_price', 'quantity'])]

In [None]:
mi = dt.mutual_information(num_bins = 50)
mi[mi['column_1'].isin(['unit_price', 'quantity']) | mi['column_2'].isin(['unit_price', 'quantity'])]