Skip to content

Latest commit

 

History

History
392 lines (299 loc) · 21.5 KB

index.adoc

File metadata and controls

392 lines (299 loc) · 21.5 KB

Getting Started

In this section we’re going to build an example application to cover some the most important concepts and features of HellHound.

In order to continue with this section you need to have Leiningen setup in your box.

The complete application is available at this git repository and under the getting-started namespace.

Before start

The aim of this tutorial is to given you a rough overview about the HellHound systems. In order to continue with this tutorial you should know about clojure and Leiningen.

A little bit of theory

In order to have easier time reading this tutorial, I’m going to give you a brief introduction about some of the concepts we’re going to use in this tutorial. You can read more about each of these concepts in more details later in their own sections.

  • System: It’s a hashmap describing our application system. Basically each program maps to a system. There can be only one running system in any given time. A program might have more than one system for different environments. For example a development system and a production system. Learn More.

  • Component: A system is created by a set of components. You can think of a component as a unix command or a function ( they are more than a function though ). Just like a unix command each component has an input and an output and some optional dependencies. Learn More

  • Workflow: A workflow is a graph like data structure which defines the data flow of the system. Simply describes how to pipe the output of a component into the input of another component. Using workflow we can define the flow of information in our system. Learn More

  • execution: Execution describes the execution model of the system. For example whether it should be executed in a single threaded or multi-threaded fashion (or distributed execution in the futur). By default systems get executed in single threaded mode.

What do we want to build ?

In order to gain an overview of HellHound systems we’re going to build a real world application together. In this example we’re going to create a simple web proxy application which gets a domain name and a port number, setup a webserver to serve the content of the given domain name and serve it locally.

HellHound is library to create data pipelines. Basically data pipelines are suitable for stream processing and those use cases which involve with data streams. But I chose the proxy use case intentionally to demonstrate the usage of HellHound for a use case which is not a case of stream processing by default. This way, you don’t need to know much about stream processing to understand the example.

Tackle the problem

For this specific usage case web need to spin up a webserver which listens to a port for any HTTP URL and contruct new requests from the incoming requests with different host ( the remote host ) and sent them to the actual host and response back to user with the reponses fetched from the remote host.

So we’re going to create a system hashmap with two component. A webserver component and a loader component. As the name suggests webserver component should be responsible for getting the user requests and serving responses to the user. The loader component should fetch the url content from the remote host. In this system, we want to connect the output of webserver to the input of loader and connect the output of loader to input of webserver. So we’re going to end up with a closed system.

Now let’s get our feet wet.

Installation

Add HellHound to your dependencies as follow:

[codamic/hellhound "1.0.0-alpha4"]

codamic/hellhound is a meta package which automatically install serveral HellHound libraries like codamic/hellhound.core, codamic/hellhound.http and codamic/hellhound.utils. So your can only install an specific library that you need instead of using codamic/hellhound. For instance, you might not need the codamic/hellhound.i18n library or codamic/hellhound.http so you can just install codamic/hellhound.core. Systems are part of codamic/hellhound.core artifact.

Components

Lets start with components creation. HellHound’s components are just hashmaps. Simple, good old hashmaps. Any component map should contains at least three keys, :hellhound.component/name, :hellhound.component/start-fn and :hellhound.component/stop-fn. The value of the :hellhound.component/name key should be a namespaced keyword which is the component name. We use this name to refer to the component later in our system configuration.

The value of :hfellhound.component/start-fn should be a function that takes two arguments. The first argument is the component hashmap itself and the second argument is another hashmap which is called context map (In order to learn more about components in details, please checkout the components section of this documentation). This function should contains the initialization or even the main logic of the component (depends on the use case) and return a component map again.

The :hellhound.component/stop-fn function is similar to start-fn but accept only one argument which is the component map itself. This function is responsible for tearing down the component and do the clean up. For example, close the connection which has been opened on start-fn or similar stuff.

Ok, It’s time to write some code. Let’s start with a really basic component skeleton for the webserver component. Check out the following code.

Basic skeleton for a component
(ns getting-started.proxy.components.web
  (:require [hellhound.component :as component])) (1)

(defn start-fn   (2)
  [this context]
  this)

(defn stop-fn    (3)
  [this]
  this)

(defn factory   (4)
  []
  (component/make-component ::server start-fn stop-fn)) (5)
  1. In order to use make-component function we need the hellhound.component ns.

  2. start-fn is the entry point to each component logic. Simply think of it as the place where you have to implement the logic behind your component. In this case our start-fn does nothing.

  3. start-fn is responsible for component termination logic, mostly cleanup. For instance, closing a socket, flushing data do disk and so on.

  4. Factory function is a simple function to create instances of our component with different configuration. In this case we just creates a fix component map. Nothing special.

  5. make-component function is just a shortcut function to create the component map easily. We can replace it by the actual hashmap definition of the component. We passed ::server as the first argument to this function which is the name of our component. Component’s name should be namespaced keyword.

Many of the components that you may encounter or create, may have the same skeleton. Let’s go over the basics of a component again. Each component has a name which must be a namespaced keyword, a start-fn with holds the initialization or main logic of the component and gets two arguments, the component map itself and a context map, and should return a component map, a stop-fn that gets a component map and holds the termination logic. Every component has an input stream and an output stream defined in their component-map. There are more details about components which you can read in components section.

HellHound created around and shares similar ideas as unix philosophy. Components are isolated unit of execution which read data from their input, process the data and write the result to the output. System makes a data pipeline by pipeing IO of different components to one another. You’ll learn more about this concept later on this documentation.

Now that we setup a very basic component and we have better understanding about component concept. let’s focus on the webserver logic and create a real world webserver component.

Note
codamic/hellhound.http provides a webserver component under hellhound.components.webserver namespace which is a full fledged webserver is based on Pedestal and Aleph webserver. You can read more about it on HTTP library documentations.
Web server component.
link:https://raw.githubusercontent.com/Codamic/hellhound_examples/master/src/clj/getting_started/proxy/components/web.clj[role=include]
  1. Namespace definition. As you can see we orginized the web component under getting-started.proxy.components namespace. It is a good practice to store all of your components under the same parent namespace.

  2. We’re going to use the awesome http://aleph.io(Aleph) library to build our webserver. HellHound depends on this library by default (If we use the codamic/hellhound or codamic/hellhound.http artifact).

  3. manifold.stream library provides a very robust stream abstraction. Component’s IO build around this abstraction and we need to use this library to work with component’s IO. HellHound depends on the manifold library, so there’s no need to add it explicitly to your dependencies.

  4. manifold.deferred library provides an abstraction around promises and futures, and aleph webserver uses them as async responses.

  5. The return value of this function is in fact the ring handler which we want to use to handle the incoming requests. The two paramter of handler function are the input and output of the webserver component.

  6. Before we return the actual ring handler, we need to setup a consumer for incoming data from the input stream. The basic idea is to treat any incoming data from the input stream of webserver component as a response and return the response to the user. Alph support async responses using deferreds, So by returning a deferred from (8) alph send back the response as soon as the deferred realised. In our consumer we just extract the same deferred value from the incoming map ( any incoming value is a hashmap and we placed the deferred into it in (9)) and resolve it the response map that again extracted from the same hashmap.

  7. The actual ring handler that receives the HTTP request hashmap (req).

  8. Created a deferred value to use as the response and pass it through the pipeline.

  9. Simply create a hashmap containing :request and :response-deferred with the corresponding values and put the hashmap into the webserver output stream (send it down to the pipeline). Then, return the created deferred so alph can return the reponse to user as soon as the response deferred realise.

  10. Receives a port number and return an anonymous function to be used as a start function for webserver component. It passes the given port number to the start-server function of aleph.

  11. hellhound.component/io function is a helper function which returns a vector in form of [input output] of the IO streams of the given component.

  12. Starts the aleph webserver. As you can see we passed the input and output of the component to handler fn. We assigned the returned value of start-server to a key in component map so we can close it later.

  13. stop-fn of the webserver component.

  14. Stop the aleph server by calling .close on the server object which we assigned to :server in the start-fn.

  15. Remove the :server key to get back to the initial state.

  16. Desctruct the port number from the given config map.

  17. Create and return a component map for with the name of ::server.

So far we created a webserver component which its output is a stream of hashmaps that each of them has a :request key containing the request map and conceptually it will send any incoming response from the input stream to user.

Now we need another component to fetch the content of each requested URL from the remote host. Let’s jump in:

Crawler component.
link:https://raw.githubusercontent.com/Codamic/hellhound_examples/master/src/clj/getting_started/proxy/components/crawler.clj[role=include]
  1. Yupe, aleph again. Aleph library provides a HTTP client as well.

  2. A simple an stupid function to extract data from the response and generate a ring response again. This function is useless in this case but You might want to change the response a bit. In that case this function would be the best place for your logic.

  3. Fetch the content of the given url and uses the manifold.deferred/chain function to pass the content to response function as soon as it arrived. manifold.deferred/chain returns a deferred value.

  4. Returns an anonymous function with gets a parameter that would be any incoming value from the input stream. Then, we we extract the request from the incoming value and fetch the content for that request.

  5. Fetch the content or url by using fetch-url function and creates a new hashmap from the received input data with an additional key called :response that has a deferred value representing the content and push it to the output of crawler component.

  6. Retrieve the content of the given url. The return value of this function is a deferred value.

  7. hellhound.component/io is a helper function that returns input and output of the given component.

  8. Consumes any incoming value from the input and uses the function returned by the proxy function to process each value.

  9. A dummy stop funtion which does nothing. Because we don’t have any termination or cleanup logic.

  10. Creates a new component with name :getting-started.proxy.components.crawler/job.

Alright, Now we have two components that we can create a system from them and build our proxy.

System

HellHound’s systems are the most important part of our application. We express our thoughts on how the application should behave using systems.

Let’s create the first version of our system and then talk about a little bit of details.

Crawler component.
(ns getting-started.proxy.system
  (:require
   [getting-started.proxy.components.web :as web] (1)
   [getting-started.proxy.components.crawler :as crawler])) (2)


;; System factory
(defn factory (3)
  [port host]
  {:components                       (4)
        [(web/factory {:port port})  (5)
         (crawler/factory host)]     (6)

   :workflow                             (7)
        [[::web/server ::crawler/job]    (8)
         [::crawler/job ::web/server]})  (9)
  1. Our webserver component.

  2. Our crawler component.

  3. A system factory function which gets the configuration of the system and returns a system map based on those values.

  4. Each system must have a :components key with a collection of components as the value.

  5. We created an instance of webserver component by calling the factory function of web namespace and passing the webserver configuration to it. We can refer to this component later using its name which we defined in the web ns.

  6. Same as webserver component we created an instance of the crawler.

  7. Every system must have a :workflow key as well. This key describes how HellHound should create the data pipeline by pipeing IO of components together. Its value should be a collection of collections which each element represent a connection.

  8. In this connection we want to plug the output stream of ::web/server (which is the name of our webserver component) to input stream of ::crawler/job (which is the name of our crawler component). So any value that goes to ::web/server output would be available to ::crawler/job through its input stream.

  9. In this connection we connect ::crawler/job back to ::web/server. So values from crawler’s output will goes to webserver input stream.

As you can see, system definition is fairly straight forward. HellHound starts all the components by calling their start function first and the setup the data pipeline based on the :workflow description.

In our example by sending a request to our webserver, the ::web/server component will process the request and send to to the its output and the request would endup in the input of ::crawler/job. Then crawler fetches the content of any incoming request and put it to its output that according to the :workflow, its output connected to the input of ::web/server. Webserver component will send back in value coming from the input stream to user as response. Pretty simple, isn’t it ?

Systems decouple the logic of the program in to smaller pieces (components). It’s easier to reason about each piece and it’s much easier to replace a piece with another piece. Due to input/output abstraction systems are highly composable and components are heavily reusable.

Now it’s time to use our system and run our program. Here is our core namespace and main function:

Core namespace.
link:https://raw.githubusercontent.com/Codamic/hellhound_examples/master/src/clj/getting_started/proxy/core.clj[role=include]
  1. Create an instance of our system by passing the configuration to the factory function.

  2. Set the default system of our program. Each program can have only one default system at any given time.

  3. Fire up the default system (start the program).

Now it’s time to run our program using lein. Issue the following command in the project root:

lein run -m getting-started.proxy.core/-main http://hellhound.io 3000

Or if you’re using your own project:

lein run http://hellhound.io 3000

Open up your browser and navigate to http://localhost:3000/(http://localhost:3000/). You should be able to see HellHound’s website loading using your proxy.

Version 2.0

Let’s improve our code so we can serve the index of the website locally and through a file (Actually I personally use this software on my daily development). But first, let’s think about what we want to do. Generally in order to do such thing, we should look into the :uri value inside the request map of a ring handler. If it is / we need to load the content from a file instead of sending a request to the remote host. We need to treat other URIs the same as before.

We can use system’s workflow to achieve our goal. But we need more components. Let’s create a component which loads the index content from a file.

Index loader component
link:https://raw.githubusercontent.com/Codamic/hellhound_examples/master/src/clj/getting_started/proxy/components/index_loader.clj[role=include]
  1. Returns a function that process any incoming value from the input and but the result to output stream

  2. event will hold each of the incoming values.

  3. Get the home directory of the user.

  4. Set the file path to $HOME/index.html

  5. Reads the content of the file and assign it to a new key in the map. Send the map to the output.

  6. Consumes any incoming value and process them using the anonymous function returned by load-index fn.

  7. Creates a component and call it ::job.

Now just for fun let’s create another component which reads from the input stream and construct a ring response from any incoming value and put the generate ring response to the output stream. Totally simple and easy to build.

Response generator component
link:https://raw.githubusercontent.com/Codamic/hellhound_examples/master/src/clj/getting_started/proxy/components/response.clj[role=include]
  1. Let’s call this component ::→response.

As you see, the response generator component is very very simple.

Now It is time to improve our system and workflow.

Response generator component
link:https://raw.githubusercontent.com/Codamic/hellhound_examples/master/src/clj/getting_started/proxy/system.clj[role=include]
  1. A helper function that extract the uri from the ring request map.

  2. In this connection we want all the values which their :uri value is not / to go to ::crawler/job component from ::web/server. The second value in the connection collection (if total number of values in the connection collection is more than two) is a predicate function. If that predicate returns true only then values will pass to downstream component.

  3. By the predicate of this connection, only requests which has / as the :uri value passes down to the downstream component that is ::loader/job.

  4. Same as before, we want the output of ::crawler/job to be connected to input of ::webserver.

  5. Connects the output of ::loader/job to input of ::response/→response. This step might be un-necessary we could just do it in the same component but just to have more complicated workflow, we separated the components.

  6. And finally the output of ::response/→response component (which will be a stream of ring responses) should goes to ::web/server.

In this example we branched off our workflow from ::web/server based on couple of predicates functions. Each branch will do its job and will have its own workflow. Finally we merge both pipes back in ::web/server again. So any request flowing from webserver that is not a request to get the root path will goes to ::crawler/job and the content would be fetched from the actual host and the request for root path will goes to ::loader/job and loads from a file on filesystem. Pretty neat, right?

I hope this tutorial gave you an idea about HellHound systems. HellHound is definitely more than this, but now you have enough knowledge to get started with HellHound.