deadline: 11:59pm AOE, 14th of Dec‘18
you will have two opportunities to test your solution before the deadline. We will test all repositories on November 30th, 2018 and December 12th, 2018 and publish the results on the course website.
- mind that the results of these tests are not graded. Only the final result matters.
- if your solution fails a test, you will not receive a detailed explanation of the reasons why it failed. Your task is to figure it out on your own.
git usage is mandatory (multiple commits with meaningful messages)
Go is mandatory
you have to work alone
don't share code
ask questions in the Auditorium
You are required to develop an online document library. Users are presented with an input form, where they can submit documents (e.g., books, poems, recipes) along with metadata (e.g., author, mime type, ISBN). For the sake of simplicity, they can view all stored documents on a single page.
input form | output sample |
Hence, create an application with the following architecture. Don't worry, in this repository you can find some Makefiles, Dockerfiles, configuration files and source code to get you started.
Nginx is a web server that delivers static content in our architecture.
Static content comprises the landing page (index.html), JavaScript, css and font files located in nginx/www
.
- complete the
nginx/Dockerfile
- upgrade the system
- install nginx
- copy
nginx/nginx.conf
from host to container's/etc/nginx/nginx.conf
- use port 80 in the container
- run nginx on container startup
- in docker-compose
- build the image
- assign nginx to the
se_backend
network - mount the host directory
nginx/www
to/var/www/nginx
in the container
- verify your setup (it should display the landing page)
We use HBase, the open source implementation of Bigtable, as database.
hbase/hbase_init.txt
creates the se2
namespace and a library
table with two column families: document
and metadata
.
- build the image for the container description located in
hbase/
- in docker-compose
- add hbase to the
se_backend
network
- add hbase to the
The Dockerfile exposes different ports for different APIs. We recommend the JSON REST API, but choose whatever API suits you best.
Note
- HBase REST documentation
- the client port for REST is 8080
- employ curl to explore the API
curl -vi -X PUT -H "Content-Type: application/json" -d '<json row description>' "localhost:8080/se2:library/fakerow"
- yes, it's really fakerow
gserve/src/gserve/HbaseJSON.go
contains helpers to convert data from frontend JSON via Go types to base64-encoded HBase JSON and back- you might want to use the (Un)marshal functions from the encoding/JSON package
Deviating from the architecture image, you don't need to create an extra ZooKeeper container. The HBase image above already contains a ZooKeeper installation.
- add an alias to the hbase section in docker-compose such that other containers can connect to it by referring to the name
zookeeper
Note
- you are allowed to use the go-zookeeper library
This is the first service/server you have to write by yourself.
Implement a reverse proxy that forwards every request to nginx, except those with a "library" prefix in the path (e.g., http://host/library
).
Discover running gserve instances with the help of ZooKeeper and forward library
requests in circular order among those instances (Round Robin).
- complete
grproxy/Dockerfile
- in docker-compose
- build grproxy
- add grproxy to both networks:
se_frontend
andse_backend
Note
- you are allowed to use httputil.ReverseProxy
- you don't need to handle the case where an instance registered to ZooKeeper doesn't reply
Gserve is the second service you need to implement, and it serves basically two purposes.
Firstly, it receives POST
requests from the client (via grproxy) and adds or alters rows in HBase.
And secondly, it replies to GET
requests with an HTML page displaying the contents of the whole document library.
It only receives requests from grproxy after it subscribed to ZooKeeper, and automatically unsubscribes from ZooKeeper if it shuts down or crashes.
- gserve shall return all versions of HBase cells (see output sample above)
- the returned HTML page must contain the string "proudly served by gserve1" (or gserve2, ...) without HTML tags in between
- complete
gserve/Dockerfile
- in docker-compose
- build gserve
- start two instances gserve1 and gserve2
- add both instances to
se_backend
- make sure, that both instances start after hbase and grproxy
- provide the names of the instances (gserve1, gserve2) via environmental variables
- Start small, don't try to solve every problem at once.
- Test your components against single Docker containers (e.g., gserve with HBase container), and integrate them into docker-compose later on.
- The developer tools of your browser may help you to capture and analyse requests and responses.
- Docker Docs
- Docker Compose file reference
- Apache HBase Reference Guide
- ZooKeeper Documentation
- Go Documentation
- Pro Git
- push changes to your repo
- if you find bugs in provided files or the documentation, feel free to open a pull request on Bitbucket
How do I use the JSON/Base64-encoding/(Un)Marshaling code?
package main import "encoding/json" func main() { // unencoded JSON bytes from landing page // note: quotation marks need to be escaped with backslashes within Go strings: " -> \" unencodedJSON := []byte("{\"Row\":[{\"key\":\"My first document\",\"Cell\":[{\"column\":\"document:Chapter 1\",\"$\":\"value:Once upon a time...\"},{\"column\":\"metadata:Author\",\"$\":\"value:The incredible me!\"}]}]}") // convert JSON to Go objects var unencodedRows RowsType json.Unmarshal(unencodedJSON, &unencodedRows) // encode fields in Go objects encodedRows := unencodedRows.encode() // convert encoded Go objects to JSON encodedJSON, _ := json.Marshal(encodedRows) println("unencoded:", string(unencodedJSON)) println("encoded:", string(encodedJSON)) } /* output: unencoded: {"Row":[{"key":"My first document","Cell":[{"column":"document:Chapter 1","$":"value:Once upon a time..."},{"column":"metadata:Author","$":"value:The incredible me!"}]}]} encoded: {"Row":[{"key":"TXkgZmlyc3QgZG9jdW1lbnQ=","Cell":[{"column":"ZG9jdW1lbnQ6Q2hhcHRlciAx","$":"dmFsdWU6T25jZSB1cG9uIGEgdGltZS4uLg=="},{"column":"bWV0YWRhdGE6QXV0aG9y","$":"dmFsdWU6VGhlIGluY3JlZGlibGUgbWUh"}]}]} */
Do I need a library to connect with HBase?
No, we recommend the REST interface. You might also consider using Thrift, but we haven't tested it.
Could you provide an example for an HBase scanner?
Yes, for the command line:
#!/usr/bin/bash echo "get scanner" scanner=`curl -si -X PUT \ -H "Accept: text/plain" \ -H "Content-Type: text/xml" \ -d '<Scanner batch="10"/>' \ "http://127.0.0.1:8080/se2:library/scanner/" | grep Location | sed "s/Location: //" | sed "s/\r//"` echo $scanner curl -si -H "Accept: application/json" "${scanner}" echo "delete scanner" curl -si -X DELETE -H "Accept: text/plain" "${scanner}"
What is meant by "build gserve"?
Build the docker image with docker compose, not the gserve binary.
You had a lot of fun and want more? No problem! Select a topic you're interested in, and enhance any of the components. For instance, query single documents or rows, replace nginx with a web server written by yourself, improve the error handling in Grproxy, write test cases or in the worst case just beautify the HTML/CSS. But keep in mind: your application shall still conform to the task description.