Skip to content

Multiple scripts to aid in the data collection of stool images - includes Image duplication algorithm to remove similar images.

Notifications You must be signed in to change notification settings

defidecap/image_analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

Duplicate Image Detection

Author: Nick DeCapite

Algorithm

As apart of my work with the Duke Smart Toilet, my role is to perfrom machine learning analysis on stool images for different criterias of detection. However, the first step of this process is to obtain the data to perform such analysis. I created a web crawler that will go through websites and extract downloadable stool image URLs into a CSV file. In order to ensure there are no duplicate images, I wrote a detection algorithm that compares the specific hashes of each pixel in every image. It then compares the hashes of each image and calculates a similiarity statistic. If it is under a threshold of 0.05, they are considered duplicate and are deleted from my local folder where they are stored.

About

Multiple scripts to aid in the data collection of stool images - includes Image duplication algorithm to remove similar images.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages