From 618f3362e5cf12ad8fe0143c3c7f61cd897f2100 Mon Sep 17 00:00:00 2001 From: Harrison Wong Date: Sat, 2 Jan 2021 13:49:40 -0500 Subject: [PATCH 1/2] Update data_deduplication.rst --- docs/guides/data_deduplication.rst | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/docs/guides/data_deduplication.rst b/docs/guides/data_deduplication.rst index 29ccb79b..251270be 100644 --- a/docs/guides/data_deduplication.rst +++ b/docs/guides/data_deduplication.rst @@ -56,8 +56,7 @@ input argument. With the method ``index``, all possible (and unique) record pairs are made. The method returns a ``pandas.MultiIndex``. The number of pairs is -equal to the number of records in ``dfA`` times the number of records in -``dfB``. +equal to the number of records in ``dfA`` choose ``2``. .. ipython:: From 974a45798b1ab4519697c523cfd7f0244986fdfc Mon Sep 17 00:00:00 2001 From: Harrison Wong Date: Sat, 2 Jan 2021 16:34:59 -0500 Subject: [PATCH 2/2] Fix documentation. --- docs/guides/data_deduplication.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/guides/data_deduplication.rst b/docs/guides/data_deduplication.rst index 251270be..81b67a6c 100644 --- a/docs/guides/data_deduplication.rst +++ b/docs/guides/data_deduplication.rst @@ -82,7 +82,7 @@ method can be used in the ``recordlinkage`` module. ...: len(candidate_links) The argument "given\_name" is the blocking variable. This variable has -to be the name of a column in ``dfA`` and ``dfB``. It is possible to +to be the name of a column in ``dfA``. It is possible to parse a list of columns names to block on multiple variables. Blocking on multiple variables will reduce the number of record pairs even further.