Name Dupliaction Problem

Variation in names leads to difficulty in identifying a unique person and hence deduplication of records is an unsolved challenge. The problem becomes more complicated in cases where data is coming from multiple sources. Following variations are same as Vladimir Frometa:

Vladimir Antonio Frometa Garo Vladimir A Frometa Garo Vladimir Frometa Vladimir Frometa G Vladimir A Frometa Vladimir A Frometa G

This model is trained to reduce duplication of records with various formats of names.

It uses Decision Trees and classification to filter out unique users from the records with their first names, and then it makes the decision further on with other features. The unique record list will be saved in another file called Records.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.DS_Store		.DS_Store
analytics.py		analytics.py
dset.csv		dset.csv
readME.md		readME.md
records.csv		records.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Name Dupliaction Problem

Run-time: Python 3

Dependencis: Pandas

About

Releases

Packages

Languages

PriyadharshanSaba/NameDuplication

Folders and files

Latest commit

History

Repository files navigation

Name Dupliaction Problem

Run-time: Python 3

Dependencis: Pandas

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages