alteryx · gsheni · Nov 16, 2020 · Nov 16, 2020 · Nov 16, 2020 · Nov 16, 2020
diff --git a/docs/source/api_reference.rst b/docs/source/api_reference.rst
@@ -22,7 +22,7 @@ DataTable
     DataTable.set_time_index
     DataTable.to_dataframe
     DataTable.describe
-    DataTable.get_mutual_information
+    DataTable.mutual_information
     DataTable.value_counts
     DataTable.to_csv
     DataTable.to_pickle

diff --git a/docs/source/guides/statistical_insights.ipynb b/docs/source/guides/statistical_insights.ipynb
@@ -8,7 +8,7 @@
     "\n",
     "Woodwork provides methods on DataTable to allow users to utilize the typing information inherent in a DataTable to better understand their data.\n",
     "\n",
-    "Let's walk through how to use `describe` and `get_mutual_information` on a retail DataTable so that we can see the full capabilities of the functions."
+    "Let's walk through how to use `describe` and `mutual_information` on a retail DataTable so that we can see the full capabilities of the functions."
    ]
   },
   {
@@ -79,13 +79,13 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## DataTable.get_mutual_information()\n",
+    "## DataTable.mutual_information()\n",
     "\n",
-    "`dt.get_mutual_information` will calculate the mutual information between all pairs of relevant Data Columns. Certain types such as datetimes or strings cannot have mutual information calculated.\n",
+    "`dt.mutual_information` will calculate the mutual information between all pairs of relevant Data Columns. Certain types such as datetimes or strings cannot have mutual information calculated.\n",
     "\n",
     "The mutual information between columns `A` and `B` can be understood as the amount of knowlege we can have about column `A` if we have the values of column `B`. The more mutual information there is between `A` and `B`, the less uncertainty there is in `A` knowing `B` or vice versa. \n",
     "\n",
-    "If we call `dt.get_mutual_information()`, we'll see that `order_date` will be excluded from the resulting dataframe."
+    "If we call `dt.mutual_information()`, we'll see that `order_date` will be excluded from the resulting dataframe."
    ]
   },
   {
@@ -94,15 +94,15 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "dt.get_mutual_information()"
+    "dt.mutual_information()"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
     "#### Available Parameters\n",
-    "`dt.get_mutual_information` provides two parameters for tuning the mutual information calculation.\n",
+    "`dt.mutual_information` provides two parameters for tuning the mutual information calculation.\n",
     "\n",
     "- `num_bins` - In order to calculate mutual information on continuous data, we bin numeric data into categories. This parameter allows users to choose the number of bins with which to categorize data.\n",
     "    - Defaults to using 10 bins\n",
@@ -125,7 +125,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "mi = dt.get_mutual_information()\n",
+    "mi = dt.mutual_information()\n",
     "mi[mi['column_1'].isin(['unit_price', 'quantity']) | mi['column_2'].isin(['unit_price', 'quantity'])]"
    ]
   },
@@ -135,7 +135,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "mi = dt.get_mutual_information(num_bins = 50)\n",
+    "mi = dt.mutual_information(num_bins = 50)\n",
     "mi[mi['column_1'].isin(['unit_price', 'quantity']) | mi['column_2'].isin(['unit_price', 'quantity'])]"
    ]
   }

diff --git a/docs/source/guides/using_woodwork_with_dask_and_koalas.ipynb b/docs/source/guides/using_woodwork_with_dask_and_koalas.ipynb
@@ -101,7 +101,7 @@
    "metadata": {},
    "source": [
     "### Analyzing Underlying Data\n",
-    "There are three DataTable methods that also require bringing the underlying Dask DataFrame into memory: `describe`, `value_counts` and `get_mutual_information`. When called, these methods will call a `compute` operation on the DataFrame associated with the DataTable in order to calculate the desired information. This may be problematic for datasets that cannot fit in memory, so exercise caution when using these methods."
+    "There are three DataTable methods that also require bringing the underlying Dask DataFrame into memory: `describe`, `value_counts` and `mutual_information`. When called, these methods will call a `compute` operation on the DataFrame associated with the DataTable in order to calculate the desired information. This may be problematic for datasets that cannot fit in memory, so exercise caution when using these methods."
    ]
   },
   {
@@ -128,7 +128,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "dt.get_mutual_information().head()"
+    "dt.mutual_information().head()"
    ]
   },
   {
@@ -233,7 +233,7 @@
    "metadata": {},
    "source": [
     "### Analyzing Underlying Data\n",
-    "As with Dask, running `describe`, `value_counts` or `get_mutual_information` requires bringing the data into memory to perform the analysis. When called, these methods will call a `to_pandas` operation on the DataFrame associated with the DataTable in order to calculate the desired information. This may be problematic for very large datasets, so exercise caution when using these methods."
+    "As with Dask, running `describe`, `value_counts` or `mutual_information` requires bringing the data into memory to perform the analysis. When called, these methods will call a `to_pandas` operation on the DataFrame associated with the DataTable in order to calculate the desired information. This may be problematic for very large datasets, so exercise caution when using these methods."
    ]
   },
   {
@@ -260,7 +260,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "dt.get_mutual_information().head()"
+    "dt.mutual_information().head()"
    ]
   },
   {

diff --git a/docs/source/release_notes.rst b/docs/source/release_notes.rst
@@ -9,6 +9,7 @@ Release Notes
     * Fixes
         * Rename ``data_column.py`` ``datacolumn.py`` (:pr:`386`)
         * Rename ``data_table.py`` ``datatable.py`` (:pr:`387`)
+        * Rename ``get_mutual_information`` ``mutual_information`` (:pr:`390`)
     * Changes
         * Lower moto test requirement for serialization/deserialization (:pr:`376`)
         * Make Koalas an optional dependency installable with woodwork[koalas] (:pr:`378`)

diff --git a/koalas-requirements.txt b/koalas-requirements.txt
@@ -1,2 +1,2 @@
 pyspark>=3.0.0
-koalas>=1.1.0
+koalas>=1.1.0,<=1.3.0
diff --git a/woodwork/datatable.py b/woodwork/datatable.py
@@ -764,11 +764,11 @@ def _make_categorical_for_mutual_info(self, data, num_bins):
             data[col_name] = new_col.cat.codes
         return data
 
-    def get_mutual_information(self, num_bins=10, nrows=None):
+    def mutual_information(self, num_bins=10, nrows=None):
         """
-        Calculates mutual information between all pairs of columns in the DataTable
-        that support mutual information. Logical Types that support mutual information are
-        as follows:  Boolean, Categorical, CountryCode, Double, Integer, Ordinal, SubRegionCode, and ZIPCode
+        Calculates mutual information between all pairs of columns in the DataTable that
+        support mutual information. Logical Types that support mutual information are as
+        follows:  Boolean, Categorical, CountryCode, Double, Integer, Ordinal, SubRegionCode, and ZIPCode
 
         Args:
             num_bins (int): Determines number of bins to use for converting

diff --git a/woodwork/tests/datatable/test_datatable.py b/woodwork/tests/datatable/test_datatable.py
@@ -2205,14 +2205,14 @@ def test_datatable_make_categorical_for_mutual_info():
     assert formatted_num_bins_df['categories'].equals(pd.Series([0, 1, 1, 0], dtype='int8'))
 
 
-def test_datatable_get_mutual_information(df_same_mi, df_mi):
+def test_datatable_mutual_information(df_same_mi, df_mi):
     # Only test if df_same_mi and df_mi are same type
     if type(df_same_mi) != type(df_mi):
         return
 
     dt_same_mi = DataTable(df_same_mi, logical_types={'date': Datetime(datetime_format='%Y-%m-%d')})
 
-    mi = dt_same_mi.get_mutual_information()
+    mi = dt_same_mi.mutual_information()
 
     cols_used = set(np.unique(mi[['column_1', 'column_2']].values))
     assert 'nans' not in cols_used
@@ -2223,20 +2223,20 @@ def test_datatable_get_mutual_information(df_same_mi, df_mi):
 
     dt = DataTable(df_mi)
     original_df = dt.to_dataframe().copy()
-    mi = dt.get_mutual_information()
+    mi = dt.mutual_information()
     assert mi.shape[0] == 6
     np.testing.assert_almost_equal(mi_between_cols('ints', 'bools', mi), 0.734, 3)
     np.testing.assert_almost_equal(mi_between_cols('ints', 'strs', mi), 0.0, 3)
     np.testing.assert_almost_equal(mi_between_cols('strs', 'bools', mi), 0, 3)
 
-    mi_many_rows = dt.get_mutual_information(nrows=100000)
+    mi_many_rows = dt.mutual_information(nrows=100000)
     pd.testing.assert_frame_equal(mi, mi_many_rows)
 
-    mi = dt.get_mutual_information(nrows=1)
+    mi = dt.mutual_information(nrows=1)
     assert mi.shape[0] == 6
     assert (mi['mutual_info'] == 1.0).all()
 
-    mi = dt.get_mutual_information(num_bins=2)
+    mi = dt.mutual_information(num_bins=2)
     assert mi.shape[0] == 6
     np.testing.assert_almost_equal(mi_between_cols('bools', 'ints', mi), .274, 3)
     np.testing.assert_almost_equal(mi_between_cols('strs', 'ints', mi), 0, 3)
@@ -2248,19 +2248,19 @@ def test_datatable_get_mutual_information(df_same_mi, df_mi):
 
 def test_mutual_info_does_not_include_index(sample_df):
     dt = DataTable(sample_df, index='id')
-    mi = dt.get_mutual_information()
+    mi = dt.mutual_information()
     assert 'id' not in mi['column_1'].values
 
 
 def test_mutual_info_returns_empty_df_properly(sample_df):
     dt = DataTable(sample_df.copy()[['id', 'age']], index='id')
-    mi = dt.get_mutual_information()
+    mi = dt.mutual_information()
     assert mi.empty
 
 
 def test_mutual_info_sort(df_mi):
     dt = DataTable(df_mi)
-    mi = dt.get_mutual_information()
+    mi = dt.mutual_information()
 
     for i in range(len(mi['mutual_info']) - 1):
         assert mi['mutual_info'].iloc[i] >= mi['mutual_info'].iloc[i + 1]