Skip to content

In this project, we need to find out commercial products listed on Google that refer to the same entity across Amazon by comparing the similarity. This problem is called Entity Resolution.

License

Notifications You must be signed in to change notification settings

YungChunLu/Entity-Resolution-On-Text-Similarity

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Entity-Resolution-On-Text-Similarity

In this project, we need to find out commercial products listed on Google that refer to the same entity across Amazon by comparing the similarity. This problem is called Entity Resolution.

Project Highlights

  • Applied powerful and scalable text analysis techniques.
  • Perform entity resolution across two datasets of commercial products.
  • Discussed the use scenario of Broadcast Variable.
  • Implemented a scalable ER algorithm.

General Description

Entity Resolution (ER) refers to the task of finding records in a dataset that refer to the same entity across different data sources (e.g., data files, books, websites, databases). ER is necessary when joining datasets based on entities that may or may not share a common identifier (e.g., database key, URI, National identification number), as may be the case due to differences in record shape, storage location, and/or curator style or preference. A dataset that has undergone ER may be referred to as being cross-linked.

About

In this project, we need to find out commercial products listed on Google that refer to the same entity across Amazon by comparing the similarity. This problem is called Entity Resolution.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages