# Gain Statistical Insights into Your DataTable

Woodwork provides two methods on DataTable to allow users to better understand their data: `describe` and `get_mutual_information`

In [None]:
import pandas as pd
from woodwork import DataTable

df = pd.DataFrame({
        'id': range(3),
        'first_name': ['John', 'Jane', 'James'],
        'full_name': ['Mr. John Doe', 'Doe, Mrs. Jane', 'James Brown'],
        'email': ['john.smith@example.com', None, 'team@featuretools.com'],
        'phone_number': ['5555555555', '555-555-5555', '1-(555)-555-5555'],
        'delta_no_nans': (pd.Series([pd.to_datetime('2020-09-01')] * 3) - pd.to_datetime('2020-07-01')),
        'age': [33, 25, 33],
        'signup_date': [pd.to_datetime('2020-09-01')] * 3,
        'is_registered': [True, False, True],
    })

dt = DataTable(df, index='id')
dt = dt.set_logical_types({'email':'EmailAddress', 'full_name': 'FullName', 'phone_number':'PhoneNumber'})

Both `describe` and `get_mutual_information` will not perform calculations on a DataTable's index column.

## DataTable.describe

Using `dt.describe()` will calculate statistics for the columns in your DataTable. The specific stats calculated for each DataColumn depends on its LogicalType. We split the available Logical Types into five categories:

- Categorical
    - `Categorical`, `CountryCode`, `Ordinal`, `SubRegionCode`, `ZIPCode`
- Numeric
    - `Double`, `Integer`, `WholeNumber`
- String
    - `EmailAddress`, `FilePath`, `FullName`, `IPAddress`, `LatLong`, `NaturalLanguage`, `PhoneNumber`, `URL`
- Boolean - just the `Boolean` LogicalType
- Datetime - just the `Datetime` LogicalType
- Timedelta - just the `Timedelta` LogicalType

The statistics calculated can be broken down into a few types:

- General - can be applied to all columns
    - `nan_count` and `mode`
- Aggregate
    - `count` - Categorical, Numeric, Datetime
    - `nunique` - Categorical, Numeric, Datetime
    - `mean` - Numeric, Datetime
    - `std` - Numeric
    - `min` - Numeric, Datetime
    - `max` - Numeric, Datetime
- Boolean - only relevant for columns of Booleans
    - `num_false` and `num_true`
- Quartile - calculated on Numeric columns
    - `first_quartile`
    - `second_quartile`
    - `third_quartile`

We provide type information for each column in addition to the statistics. Since we are building a table, any statistic that cannot be calculated for a column will be filled with `NaN`. 

In [None]:
dt.describe()

## DataTable.get_mutual_information()

`dt.get_mutual_information` will calculate the mutual information between all pairs of columns that 