Skip to content
This repository has been archived by the owner on Aug 5, 2020. It is now read-only.

hudora/huImages

Repository files navigation

Functionality

huImages provides an infrastructure for storing, serving and scaling images. It might not work for facebook or flickr scale image-pools, but for a few hundred thousand images it woks very nicely. Currently it only supports JPEG files.

When you upload an image, you get back a alpha-numeric ID back for further accessing the image. You can get the URL of the Image via imageurl(ID) and scaled_imageurl(ID). You can get a complete XHTML <img> tag via scaled_tag(ID).

This module uses the concept of "sizes". A size can be a numeric specification like "240x160". If the numeric specification ends with "!" (like in "75x75!") the image is scaled and cropped to be EXACTLY of that size. If not the image keeps it aspect ratio.

You can use get_random_imageid(), get_next_imageid(ID) and get_previous_imageid(ID) to implement image browsing.

The image Server

server.py implements the image servin infrastructure. huImages assumes that images are served form a separate server. We strongly suggest to serve them from a separate domain. This domain should have which been used for cookies. This is because the existence of cookies usually badly hurts caching behaviour. "yourdomain-img.net" would be a good choice for a domain name. We use "i.hdimg.net." for that purpose.

In the first few Versions of huImages Meta-Data and Images where stored in CouchDB. After the fist few dozen Gigabytes it turned out that the huge database files are a kind of headache and we moved to Storing the actual original image data to Amazon S3. The server still is able to handle Content stored in CouchDB and migrates it automatically to S3 where the need arises.

server.py works with any fast FastCGI compliant Webserver. It needs the Flup toolkit installed to interface to a FastCGI enabled Server. We use lighttpd for connectiong to it and server.py contains configuration instructions for lighttpd. Of course you also can use other httpd servers instead.

server.py assumes that you have a filesystem which is able to handele very large cache directories with no substential preformance penalty. We have been running the system on UFS2/dirhash and XFS systems with success but it should also work well on modenrn ext2/3 implementations with directory indexing.

When a image is Requested and the original image is not in the Cache, the original is pulled form CouchDB/S3 and put into the filesystem cache. Then the [Python Imaging Library (PIL)7 isused to generate the scaled version of the image. The result is cached again in the filesystem and send to the client.

If the image is requested again, it is served directly from the filesystem by lighttpd without ever hitting the Python based server.py.

If you are short on diskspace fou can expire files from the cache directory by just removing the oldest file until you have enough space again.

Server Installation

We will show installation on a Ubuntu 9.10 based Amazon EC2 instance. huImages should qork on every POSIX system but requores a recent CouchDB version. I assume you have an EC2 environment up and running and your EC2-SSH key is named "ssh-ec2" and located in the current directory.

INSTANCE=`ec2-run-instances ami-a62a01d2 --key ssh-ec2 --region eu-west-1 | cut -f2 | tail -n1`
sleep 60
IP=`ec2-describe-instances $INSTANCE | cut -f 17 | tail -n1`
ssh -i ssh-ec2 ubuntu@$IP

You now should be logged into the new Amazon instance

sudo apt-get update -y
sudo apt-get install -y couchdb lighttpd git-core python-pip python-boto python-imaging python-couchdb python-flup

sudo git clone git://github.com/hudora/huImages.git /usr/local/huImages
cd /usr/local/huImages
sudo mkdir /mnt/huimages-cache
sudo ln -s /mnt/huimages-cache /usr/local/huImages/cache
sudo cp examples/lighttpd.conf /etc/lighttpd/lighttpd.conf
sudo vi /etc/lighttpd/lighttpd.conf

Change %%AWS_ACCESS_KEY_ID%%, %%AWS_SECRET_ACCESS_KEY%% and %%S3BUCKET%% to the appropriate values.

sudo /etc/init.d/lighttpd restart
sudo chown www-data.www-data /mnt/huimages-cache /usr/local/huImages/cache
curl -X PUT http://127.0.0.1:5984/huimages
curl -X PUT http://127.0.0.1:5984/huimages_meta

Now you can start putting images into the Database.

Client usge

Now you can start putting images into the Database. If you don't run on the same Server, you must find a way to make CouchDB accessible to the client. Running a CouchDB cluster on Amazon EC2 might be a good startingpoint. An other (easier) approach is simply running the client on the same machine as the server. Under extreme circumstances image serving can happen without access to CouchDB but you loose some of the features.

Now ensure the required environment variables are set. Here are some sample values:

AWS_ACCESS_KEY_ID=AAOWSMAKNATAM5
AWS_SECRET_ACCESS_KEY=aHo789V1H1Kzrs3yIaj7Uvxtskz6fUvgpa6n
IMAGESERVERURL=http://i.hdimg.net/
HUIMAGESCOUCHSERVER=http://admin:7o8V3yIjaU7xtv@127.0.0.1:5984/
HUIMAGES3BUCKET=originals.i.hdimg.net

Now you should be able to use it like this:

>>> import huimages
>>> imagedata=open('./test.jpeg').read()
>>> huimages.save_image(imagedata, filename='test.jpeg')
'23EQ53G6WZTGF5675CUJQFKBIS6UWWOL01'

>>> huimages.imageurl('23EQ53G6WZTGF5675CUJQFKBIS6UWWOL01')
'http://i.hdimg.net/o/23EQ53G6WZTGF5675CUJQFKBIS6UWWOL01/test.jpeg'

>>> huimages.scaled_imageurl('23EQ53G6WZTGF5675CUJQFKBIS6UWWOL01', size="150x150!")
'http://i.hdimg.net/150x150!/23EQ53G6WZTGF5675CUJQFKBIS6UWWOL01/test.jpeg'

>>> huimages.get_length('23EQ53G6WZTGF5675CUJQFKBIS6UWWOL01')
87761

>>> huimages.scaled_dimensions('23EQ53G6WZTGF5675CUJQFKBIS6UWWOL01', '320x240')
(240, 240)

Call pydoc huimages for further documentation. Most useful ist scaled_tag() which can create an image tag including dimensions for faster rendering and tries hard to generate a meaningful file name and alt tag to make the image easier to be found by search engines. You can use the environment variable HUIMAGESALTADDITION to add an extra text to all alt tags

Security

Malicious users knowing the ID of an image can consume great amounts of CPU time, bandwith and diskspace.

Users knowing the ID of an image can pass that one to unautorized users.

Nobody should be able to see images on the server unless he knows the ID or has access to the CouchDB or S3 bucket. Be sure that your S3 bucket does not provide public read access!

The imagebrowser

This distribution includes huimages.imagebrowser, a Django[django Application using huImages to produce a (very basic) flickr like experience. It allows uploading and tagging of images and browsing by tag. Upload comes with a multi file uploadimplemented with SWFUpload.

Further Reading