Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update data_deduplication.rst #154

Merged
merged 2 commits into from Apr 7, 2022
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
5 changes: 2 additions & 3 deletions docs/guides/data_deduplication.rst
Expand Up @@ -56,8 +56,7 @@ input argument.

With the method ``index``, all possible (and unique) record pairs are
made. The method returns a ``pandas.MultiIndex``. The number of pairs is
equal to the number of records in ``dfA`` times the number of records in
``dfB``.
equal to the number of records in ``dfA`` choose ``2``.

.. ipython::

Expand All @@ -83,7 +82,7 @@ method can be used in the ``recordlinkage`` module.
...: len(candidate_links)

The argument "given\_name" is the blocking variable. This variable has
to be the name of a column in ``dfA`` and ``dfB``. It is possible to
to be the name of a column in ``dfA``. It is possible to
parse a list of columns names to block on multiple variables. Blocking
on multiple variables will reduce the number of record pairs even
further.
Expand Down