Instagram Image Reverse Search Engine

This is an experimental Image Reverse Search Engine for Instagram (IG).

Reason to create this Search Engine

Instagram images are not searchable via Google Images Reverse Search
There is no similar IG-targeted search engine in the market

Demo link for web version of Search Engine

Here is the a demo website of this search engine: https://igsearch.yourappapp.com. In this demo, records of 28,000 photos from 30 IG users are available to search. Source codes are available in this repo (in website/ directory) as well.

You can download the demo.jpg from tools/ folder to play with the search engine.

The Mechanism

The mechanism is simple yet ineffective at this moment.

Download some IG users' images via rarcega's Instagram Scraper
Image Hashing / Perceptual Hashing: Use Python library ImageHash's difference hash (dhash()) function to hash every image and save it to MySQL database.
When a user uploads an image to perform reverse search, the system will compare the hashes in the database and found out the answer(s); more than 1 answer might return. We only return the IG user name(s) as answer.

Requirements and Installation

In this project, programming language Python and MySQL database are used. The following libraries are used:

ImageHash 4.0
Pillow 5.3
imutils 0.5.2
OpenCV 3.4.5.20
MySQL
MySQL Client

Note: We're using Ubuntu Server 16.04 LTS, assumed Python 2.7 and MySQL server are installed. Before installing Python libraries, the following command has to be executed in Ubuntu first (to install additional library in Ubuntu):

sudo apt-get install libmysqlclient-dev libsm6 libxrender1 libxext6

Then, you can install these via pip: pip install imagehash pillow imutils opencv-python mysqlclient

Command Usage

Preparation Phase - Images

Download the images from IG users
Save it to images/ folder, with IG user name as subfolder names. Example: if the IG user name is abc123, the photos should be stored at images/abc123/.

Preparation Phase - Database

Create a schema
Run the database initiation script: db_init.sql in MySQL
Update the database connection settings in config.py

Indexing Phase

Run the indexing script: python index.py --dataset images
The script will load for few minutes, computing the hashes and store in MySQL database.
(Optional) Remove the contents in images folder to save space (only if you plan not to show the image to your user)

Usage Phase

To search the images, run: python search.py --query path/to/image_to_search.jpg
The result will return.

Problems

The major problem is the Instagram Scraper Python script always return 403 Forbidden after few minutes of scraping, due to increased security measures of Instagram. Instagram will block suspecious connections if the scraping is too fast.

To deal with this problem, we temporarily use some Chrome extension to fetch IG users' images. This approach has to be done manually, which is inefficient.

The next problem is the storage space. we fetched around 30 IG users' images (around 28,000 images) and 8GB is consumed. Fun fact: Each Instagram image can be as low as 19KB and can be as large as 2.2MB.

To deal with this problem, we archived the completed images (JPEG is always efficient in archiving) and took it offline, as we only require to return the user name of the IG users. If some days later an image is required to return to user, the image will be extracted from the archive and show in the search result page.

Updates of IG images: The Instagram Scraper can download IG users' images incrementally via last update time, but due to the 403 Forbidden issue, the plan is no longer working.

Still finding a way to tackle this problem. Yet, for a prototype, it is already sufficient.

Future Updates

~~Website version of this prototype~~ Done on 2 Jan 2019
Cron job / Scheduled job for indexing / fetching Instagram
Regular cleanup of rogue images

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

images

images

tools

tools

website

website

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

config.py

config.py

index.py

index.py

search.py

search.py

Repository files navigation

Instagram Image Reverse Search Engine

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
images		images
tools		tools
website		website
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.py		config.py
index.py		index.py
search.py		search.py

License

shivanraptor/instagram-search-engine

Folders and files

Latest commit

History

Repository files navigation

Instagram Image Reverse Search Engine

About

Topics

Resources

License

Stars

Watchers

Forks

Languages