Skip to content
This repository has been archived by the owner on Feb 14, 2018. It is now read-only.

Storing images && Content Addressable Storage #236

Open
harlantwood opened this issue May 30, 2012 · 6 comments
Open

Storing images && Content Addressable Storage #236

harlantwood opened this issue May 30, 2012 · 6 comments

Comments

@harlantwood
Copy link

Nicholas and I were chatting a bit about image storage in SFW today. Here's an initial stab at an image storage system:

In the filesystem (or other "stores" such as CouchDB), we add an images dir as a sibling of the pages dir -- note that this could be within a farm or non-farm instance --

|____pages
| |____welcome-visitors
| |____smallest-federated-wiki
|____images
| |____bike-shed.png
| |____logo.jpg
| |____logo-5.jpg
| |____logo-58.jpg
|____status
| |____favicon.png
| |____open_id.identity
| |____local-identity

We try to name images the same as the original image file, or some other meaningful name. If there is a name conflict (as there was with the 3 logo.jpg originals we tried to upload in the example above), we append a dash, then n random integers, until we get a unique file name.

The client renders image tags:

<img src="/images/bike-shed.png" />

The servers recognize this path pattern and serve up the given image for the current instance (farm or otherwise).

@harlantwood
Copy link
Author

I was originally thinking of storing the images all in one dir, with the image filename being the MD5 of the image data. This would make the images content addressable, such that if there are 100 forks of the same page with an image, we only store the image once.

My latest thought is that content addressability should be a separate layer -- eg I would love to see us add a GitStore (or GithubStore) as well as the current FileStore and CouchStore. If you chose the git backend, you would get this content addressable deduplication for free.

@hallahan
Copy link
Contributor

This looks really good. How would it work with git as a backend? Would we have a bare repository and then check things out upon request? It is a beautiful system, and exploring using it in this way sounds compelling. Does it perform well enough to be treated like a database?

@harlantwood
Copy link
Author

We could have a git repo on the local file system. Even more compelling from my point of view (largely because it would work with cloud-based hosts like Heroku) is just using the Github API. So you would push to the github repo backing the given site, using their HTTP API, from your SFW server.

Then the cool part: when you want to access the images, you can just link to the "raw" version of the image on github, eg:

<img src="https://github.com/harlantwood/open_your_project/raw/master/doc/images/collections-of-pages-circle-pack-viz.png" />

Note that we could do the same for JSON, eg:

https://raw.github.com/WardCunningham/Smallest-Federated-Wiki/master/default-data/pages/welcome-visitors

So github could serve a lot of our dynamic content over it's highly optimized pipeline. We might want to check their TOS, and even check in with them directly before doing this, to make sure they're cool with it. If so, it could be awesome.

@hallahan
Copy link
Contributor

I just assumed that git has whatever http server it uses on top of a file system where the git repos live. If that is the case, does git really provide any functionality with serving content? Do you have any info on how this actually works? Git may be a useful tool for deployment, and github may be a useful service to use to serve content, but I am wondering if there is anything special we actually get from git in this usage model.

@harlantwood
Copy link
Author

Again, if there are 100 forks of the same page with an image, in 100 different farm instances, even though we will make 100 "copies" of the image in the SFW backend, the git repo (or any other content addressable storage layer) will only make one copy of the image.

@harlantwood
Copy link
Author

I have begun work on a GithubStore in another project --

https://github.com/harlantwood/software_zero/blob/3137bf56106393627c20008417e9724ab86c677b/lib/stores/github_store.rb

-- so far the #get_text & #put_text methods are implemented.

This uses the excellent github_api gem, which closely mirrors the Github HTTP API.

Because the Github API is very low level, we need to create repos, branches, trees, etc.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants