- Lead: Richard Rodger
- Sponsor: nearForm
A workshop for the nodezoo project. Nodezoo is a search engine for Node.js modules. The nodezoo search engine is an example of a real-world service built using Node.js microservices. Each microservice is published in its own github repository along with all of the necessary config to run the system locally or live . The codebase is intended to be used as an example, and as a starting point for your own projects.
Below we provide a complete workshop to work through. Our current live system has it's roots in this workshop. By working through the iterations below you can get a feel for how a microservice system is bootstrapped together and how the system evolves as needs change.
Note: This repo contains the nodezoo workshop, to explore and run the live version of nodezoo, please see nodezoo-system project.
The micro-services that make up the system are:
- nodezoo-web: the web server
- nodezoo-info: collect data on modules from multiple sources
- nodezoo-search: interface with an elasticsearch server
- nodezoo-npm: interface with the NPM registry
- nodezoo-github: interface with github.com
- nodezoo-npm-update: get live module updates
Each service should be downloaded and placed in the same folder including this repository.
The system is built in a set of iterations so that you can follow its development. This mirrors the way that real microservice projects are developed. Each iteration, more services are introduced.
When working with the individual microservices, it is easier to open a separate terminal for each one.
Not all microservices are available in all iterations, as some are only introduced later.
Each iterations contains a set of tasks to execute. You should try to get them all up and running to verify to yourself that you understand the mechanics of the system.
Each iteration also includes a set of experiments that you can attempt. Use these to develop your understanding of the system - there are no right answers!
The basic tools are:
Install these before getting started. Later iterations introduce additional tools, and these will be indicated.
This example only places microservices into containers. All other services (e.g. redis) are run as normal from the host machine. This does not prevent you from containerising them of course!
To use the docker command in a terminal, you need to set up the docker environment variables. From the initial docker terminal (as provided by the docker installation), you can run
$ docker-machine env default
to obtain these environment settings. Copy and paste them into new terminals as needed.
Docker runs containers in a host machine. You use the IP address of this host to access containers. The easiest way to get this IP address is to run the command:
$ docker-machine ip default
Finally, from inside docker containers, your microservices will need to talk to the outside world. To do this, they use a special IP address representing your host machine (Host IP). You can obtain this address in multiple ways:
- run
ifcongig -a
and look for the docker or virtualbox entries. - run
docker-machine inspect default | grep HostOnly
Docker networking can be tricky, and is fragile with respect to network changes, with DNS, for example, failing. When wierdness happens, your best bet is to restart:
$ docker-machine restart default
This will invalidate your environment, so you will need to launch a new docker terminal.
Each microservice repository has a branch for each iteration: i00, i01, etc. You can clone these branches directly - for example:
$ git clone -b i00 https://github.com/nodezoo/nodezoo-web.git nodezoo-web-i00
However you will not be able to save your changes to your own repositories.
To save your own work, it is better to first fork the repository on github.com, and then
$ git clone https://github.com/[YOUR_USER]/nodezoo-web.git
$ cd nodezoo-web
$ git remote add upstream https://github.com/nodezoo/nodezoo-web.git
$ git fetch upstream
$ git checkout upstream/i00
$ git checkout -b i00
This sequence of commands downloads the branch into your local clone of your fork. You can then push your changes back to your own fork.
One you have downloaded all the branches, you can switch between them,
across all microservice repositories using the iteration.sh
script:
$ ./iteration.sh i00 # moves all to iteration 00
$ ./iteration.sh i01 # moves all to iteration 01
... etc.
These commands must be used before using the above script, for each branch for the first time :
$ git checkout upstream/[BRANCH NAME]
$ git checkout -b [BRANCH NAME]
This script only works once the branch has been fully set-up for a first time.
In each branch, you always need to run the following command:
npm install
Then go into the folder nodezoo-workshop/system and run:
npm install
to get the dependent Node.js modules. This must be done each time a branch is changed for each micro-service.
In the folder nodezoo-web use the following command :
npm run build
IMPORTANT NOTE: the build command is not required on branch i00 - i05
This branch starts with a simple web server. Use this branch to validate your configuration.
- web (stub)
- Clone the microservice.
- Review code.
- Run in terminal with
node srv/app-dev.js --seneca.options.tag=web --seneca.log.all
- Verify functionality:
- Observe the seneca logs to follow the execution of action patterns
- Open http://localhost:8000 - all searches return "foo"
- Open http://localhost:8000/info/express - all info is for "foo"
- Use the HTTP API:
$ curl "http://localhost:44000/act?role=search&cmd=search&query=express"
- Use the repl:
$ telnet localhost 43000
> seneca.list('role:search')
> role:search,cmd:search,query:express
- Docker image and container: build and run
- Open the Dockerfile in a text editor and the commands to use that Dockerfile are in its comments
- The command
$ docker build -t TAG-NAME .
tells docker to build with the tag TAG-NAME using the Dockerfile in the current directory - Verify functionality as above, against docker host IP
- If Docker cannot connect to the Docker daemon during building use the following command before building:
$ eval "$(docker-machine env default)"
- If Docker cannot connect to the Docker daemon during building use the following command before building:
- Learn some hapi: add some more API end points
- how about /api/ping, and respond with the time?
- Learn some seneca: add some more actions, and expose them as API end points
- how about /api/ping triggers role:web,cmd:ping, and that responds with the time
- The front end is old-school jQuery - how about some react?
- Setup nginx as a load-balancer with multiple instances of web running
- update the configuration to handle port conflicts
This branch introduces two microservices that support the web service. Both are stubs that perform no actual work, instead returning hard-coded responses. The focus here is on understanding how simple microservice communication is configured using static addressing with fixed IP addresses and ports.
- web
- info (stub)
- search (stub)
- Clone the microservices.
- Review code for each one - in particular the message transport configuration.
- Run in separate terminals with
node srv/app-dev.js --seneca.options.tag=web --seneca.log.all
node srv/info-dev.js --seneca.options.tag=info --seneca.log.all
node srv/search-dev.js --seneca.options.tag=search --seneca.log.all
- Verify functionality:
- Observe the seneca logs to follow the execution of action patterns
- Open http://localhost:8000 - all searches return "bar"
- Open http://localhost:8000/info/express - all info is for "bar"
- Use the HTTP API:
$ curl "http://localhost:44000/act?role=search&cmd=search&query=express"
- Use the repl of each microservice, and test its action patterns
- Build and run the Docker containers, and verify the same functionality
- Add another microservice
- ... perhaps ping can live in its own service?
- How would you unit test this code?
- testing the inbound and outbound messsages for each action is a good place to start
- What happens when microservices are down?
- Perhaps an auto-restarter like forever might help
- Place the info and/or search microservices behind nginx
- and run multiple instances of them - scaling!
- and run multiple versions - fine-grained deployment!
- a simple change is to return 'zed' instead of 'bar'
- Seneca lets you merge microservices into one process
- just seneca.use each microservice inside web
This branch introduces infrastructure services that are used by the microservices to perform work. Elasticsearch is used as a search engine, and Redis is used for publish/subscribe messaging. The search can now index and search for Node.js modules, with some manual help.
- Install redis and run in default configuration
- Install elasticsearch and run in default configuration
- Clone the nodezoo repository, and build the nodezoo-level container
- See folder
docker/level
; runnpm install first
- This is necessary, as the seneca-level-store module must compile binaries
- See folder
- web
- info
- search
- npm
- redis
- elasticsearch
- Clone the microservices.
- Review code for each one - in particular the external intergrations.
- Run in separate terminals with
node srv/app-dev.js --seneca.options.tag=web --seneca.log.all
node srv/info-dev.js --seneca.options.tag=info --seneca.log.all
node srv/search-dev.js --seneca.options.tag=search --seneca.log.all
node srv/npm-dev.js --seneca.options.tag=npm --seneca.log.all
- Verify functionality:
- Observe the seneca logs to follow the execution of action patterns
- Open http://localhost:8000/info/request - adds request to the search engine
- Manually change the module name in the URL to index other modules.
- Open http://localhost:8000 - searches now work! Try "request".
- Use the HTTP API:
$ curl "http://localhost:44000/act?role=search&cmd=search&query=express"
- Use the repl of each microservice, and test its action patterns
- Build and run the Docker containers, and verify the same functionality
- Add another info microservice
- copy npm, and then modify
- perhaps it would be useful to know something about the author(s) of the module?
- What happens when microservices are down? and what about redis and elasticsearch?
- Can you run multiple copies of npm
- what happens?
- If you used nginx for scaling, does it still work?
- Elasticsearch might run slow - is there a way to deal with this?
- what about a separate caching microservice that sits in front of search?
This branch uses influxdb and grafana to chart message flow rates through the system. Influxdb is used due to it's ease of installation and because it is based on plotting time-series data. Grafana is used because it officially supports influx, and is relatively easy to use.
- Install influxdb and run in default configuration
- Start influxdb with
$ influxd run
- Set up your database by running the console
$ influx
- Start influxdb with
> CREATE DATABASE seneca_msgstats;
> CREATE USER msgstats WITH PASSWORD 'msgstats';
> GRANT ALL ON seneca_msgstats TO msgstats;
- Install grafana and run in default configuration
- In your clone of the main nodezoo repository, run the msgstats service:
- located in the
system
folder npm install
first as usual- run with
HOST=localhost|host-ip node msgstats.js
- use host-ip for docker scenario
- located in the
- You'll need a personal access token for the github.com API
- See the menu item under account settings on github.com
- web (stats)
- info (stats)
- search
- npm
- github
- redis
- elasticsearch
- influxdb
- grafana
- msgstats
- Clone the microservices.
- Review code for each one - in particular the message statistics collection.
- Run in separate terminals with
node srv/app-dev.js --seneca.options.tag=web --seneca.log.all
node srv/info-dev.js --seneca.options.tag=info --seneca.log.all
node srv/search-dev.js --seneca.options.tag=search --seneca.log.all
node srv/npm-dev.js --seneca.options.tag=npm --seneca.log.all
node srv/github-dev.js --seneca.options.tag=npm --seneca.log.all --seneca.options.plugin.github.token=YOUR_GITHUB_TOKEN
- Verify functionality:
- Observe the seneca logs to follow the execution of action patterns
- Use the website, API and repl as before
- Verify that message flow rate charts are generated in grafana
- Build and run the Docker containers, and verify the same functionality
- Write a test script to generate queries via HTTP and then observe the charts
- the message flow rates should remain relatively proportional to each other
- Write a seneca plugin that induces a failure rate on a given set of messages
- read the article on priors
- run this on npm and github - does running more instances of these services help?
- Can you implement a rate limiter?
- Use your test script to overload the system
- Use a plugin to implement the rate limiting
- It's ok to drop excess load on the floor (aka "load-shedding")
This branch shows the use of a message bus to avoid the high coupling and configuration costs of direct service-to-service communication. This is one way to avoid the need for service discovery solutions.
- Install beanstalkd and run in default configuration
- web (stats)
- info (stats)
- search
- npm (stats)
- github
- update (stats)
- redis
- elasticsearch
- influxdb
- grafana
- msgstats
- Clone the microservices.
- Review code for each one - in particular the npm update event emitter.
- Run in separate terminals with
node srv/app-dev.js --seneca.options.tag=web --seneca.log.all
node srv/info-dev.js --seneca.options.tag=info --seneca.log.all
node srv/search-dev.js --seneca.options.tag=search --seneca.log.all
node srv/npm-dev.js --seneca.options.tag=npm --seneca.log.all
node srv/github-dev.js --seneca.options.tag=npm --seneca.log.all --seneca.options.plugin.github.token=YOUR_GITHUB_TOKEN
node srv/update-dev.js --seneca.options.tag=update --seneca.log.all --seneca.options.plugin.npm_update.task=registry_subscribe
- Verify functionality:
- Observe the seneca logs to follow the execution of action patterns
- Use the website, API and repl as before
- Verify that live npm publishes are registered
- Verify that message flow rate charts are generated in grafana
- Build and run the Docker containers, and verify the same functionality
- The npm-update microservice also provides download and batch functionality
- experiment with these (look at the source to see the action patterns)
- use the repl to control and observe
- In production, how can you ensure that you have all the npm registry data?
- which configuration of npm-update instances do you run?
- A long time ago, in a galaxy far away, the original nodezoo could calculate "node rank", which is just like "page rank" only for node modules.
- https://github.com/nodezoo/nodezoo-workshop/tree/bdd18c030ef32f19e0b28e1f7ed30f80a9854b59/bin
- Perhaps this can be turned into a batch processing microservice?
This branch shows the use of mesh networking to completely remove the need for service discovery. The seneca-mesh plugin uses the SWIM gossip algorithm to enable microservices to automatically discover the appropriate destinations for messages dynamically.
- In your clone of the main nodezoo repository, run the base-node service:
- located in the
system
folder npm install
first as usual- run with
node base-node.js
- located in the
- web
- info
- search
- npm
- github
- update
- influxdb
- grafana
- msgstats
- base-node
- Clone the microservices.
- Review code for each one - in particular the updated service scripts in the
srv
folders. - Make sure to run the base-node service before starting the microservices.
- Run in separate terminals with
node srv/app-dev.js --seneca.options.tag=web --seneca.log=type:act --seneca.options.debug.short_logs=true
node srv/info-dev.js --seneca.options.tag=info --seneca.log=type:act --seneca.options.debug.short_logs=true
node srv/search-dev.js --seneca.options.tag=search --seneca.log=type:act --seneca.options.debug.short_logs=true
node srv/npm-dev.js --seneca.options.tag=npm --seneca.log=type:act --seneca.options.debug.short_logs=true
node srv/npm-github.js --seneca.options.tag=npm --seneca.log=type:act --seneca.options.debug.short_logs=true --seneca.options.plugin.github.token=YOUR_GITHUB_TOKEN
node srv/update-dev.js --seneca.options.tag=update --seneca.log=type:act --seneca.options.debug.short_logs=true --seneca.options.plugin.npm_update.task=registry_subscribe
- These logging options add a filter to show only actions, and also shorten the logs so they are easier to see for debuggin.
- Verify functionality:
- Observe the seneca logs to follow the execution of action patterns
- Use the website and API as before.
- Verify that live npm publishes are registered
- Verify that message flow rate charts are generated in grafana
- Build and run the Docker containers, and verify the same functionality
- Try stopping and starting services at random.
- Observe how the mesh network dynamically reconfigures the microservice message flows.
- Try running multiple instances of the search service.
- Observe that the web service automatically load balances between the current search services dynamically.
The NodeZoo org encourages open and safe participation.
If you feel you can help in any way, be it with documentation, examples, extra testing, or new features please get in touch.
Copyright (c) 2014-2016, Richard Rodger and other contributors. Licensed under MIT.