This is a small one-day-project to crawl wikipedia for company logos, build a database out of it and query the database with photos of logos, to find out to which company it belongs.
The project is build in Java 8 (jre1.8.0_25) and uses the jsoup library (jsoup-1.7.3) as well as the lire library (Lire-0.9.4 beta_2).
To run the software, you first have to run the crawler (Crawler.main),
which crawls (Config.LOAD_ALL = false) the wikipedia
List of companies of the United States (Alternatively a wikipage can be
used as an program parameter, e.g. java Crawler "List of companies of the United States"). If you set Config.LOAD_ALL = true it will crawl all
pages listed in Config.COMPANY_LISTS.
Note that for the first run you should set Config.RENEW_INDEX = true!
Please refrain from crawling the whole pages again and again.
To query the database you need to simply run the identifier
(Identifier.main). It will open a FileChooser which tries to load
the selected file as an image and searches for similar images in the database.
The program was built during the Hackathon of data://Münster.