fczuardi / abrcrawl
- Source
- Commits
- Network (1)
- Issues (0)
- Downloads (0)
- Wiki (1)
- Graphs
-
Branch:
master
tree aa98d957c39cb74cde022c448c9ef47ea8405393
parent 06aae67c87217757cd0a4712059ec50df28201c2
| name | age | message | |
|---|---|---|---|
| |
.gitignore | ||
| |
README.textile | Fri Dec 11 14:10:22 -0800 2009 | |
| |
abrcrawl.py | Sat Dec 12 10:17:26 -0800 2009 | |
| |
add_images_info.py | Sat Dec 12 10:16:37 -0800 2009 | |
| |
data/ |
ABrCrawl
ABrCrawl is a simple command line tool for crawling the web pages of Agência Brasil’s free Images Archive and extract metadata to a structured table (csv or json file).
Requirements
- Python >= 2.5
- simplejson
- jpeglib
- PIL
Mac
Installing jpeglib and PIL on Mac OSX 10.6 (Snow Leopard)
From http://proteus-tech.com/blog/cwt/install-pil-in-snow-leopard/
“Next, I download libjpeg latest version (http://www.ijg.org/files/jpegsrc.v7.tar.gz), untar it, and configure it:
$ tar zxvf jpegsrc.v7.tar.gz
$ cd jpeg-7
$ ./configure —enable-shared —enable-static
$ make
$ sudo make install
Now, you just installed libjpeg into /usr/local/lib. So, it’s time to install PIL. Download the PIL source code from http://effbot.org/downloads/Imaging-1.1.6.tar.gz then untar it:
$ tar zxvf Imaging-1.1.6.tar.gz
$ cd Imaging-1.1.6
This is the trick, you must open setup.py with your prefered text editor then looking for the line JPEG_ROOT = None and then it to JPEG_ROOT = libinclude(“/usr/local”) then save the file and continue:
$ python setup.py build
If everything fine, you can install PIL to your system library. If the PIL need something that didn’t exist, it will tell you on the error message. You may install the missing library via fink or download and compile it by yourself. Then install the PIL as root:
$ sudo python setup.py install —optimize=1
Done. :)”
Download
The latest version is available at the ABrCrawl Git repository if you have git installed, checkout the repository on your machine:
git clone git://github.com/fczuardi/abrcrawl.git
Or if you prefer, just download the latest version zip file
Usage
The main script is the abrcrawl.py, you can call it using the help argument to get a list of available options:
python abrcrawl.py --help
Contribute
ABrCrawl is a free and open source software, if you find a bug or have any suggestions and patches to send and make it better, please use the ABrCrawl Github page either to file an issue or to send a pull request. Alternatively, you can contact me directly.
Software License (BSD)
Copyright © 2009, Fabricio Zuardi
All rights reserved.Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
* Neither the name of the author nor the names of its contributors
may be used to endorse or promote products derived from this
software without specific prior written permission.THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
“AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Content License (CC-Attribution-2.5)
The images and their descriptions produced by Agência Brasil are released under a Creative Commons Atribuição 2.5. Brasil license, here is the quote (in Portuguese) from the website:
COBERTURA GRATUITA
Diariamente, a equipe de repórteres fotográficos da Agência Brasil produz uma média de 100 imagens, diretamente de Brasília, e as distribui gratuitamente para todo o país. Todo esse conteúdo pode ser adquirido em várias definições, inclusive em alta resolução, e ser utilizado livremente, mediante citação do crédito.
Note
Although the majority of images of the image bank are produced by Agência Brasil some photos sometimes comes from a different source, so make sure you check the rights owner of an individual photo before using it on your projects, one way of identifying if the photo came from Agência Brasil is to look for an “/Abr” attached in the end of the photographer’s name.

