GSoC2015 Proposal: Riak destination (htrap)

Parth Oberoi edited this page Mar 22, 2015 · 1 revision

##Abstract ####Aim:- Adding Riak as a Destination to Syslog-ng

Syslog-ng is the trusted log management infrastructure for hundreds of thousands of users worldwide. Organizations use Syslog-ng to reliably and securely collect, process and store log messages from across their IT environments. Syslog is rightly known as the "Swiss Army Knife of logging". Riak on the other hand is an open source, distributed database architectured for:

  • Low-Latency: Stores data and serve requests predictably and quickly, even during peak times.
  • Availability: Replicates and retrieves data intelligently, making it available for read and write operations even in failure conditions.
  • Fault-Tolerance: Riak is fault-tolerant so you can lose access to nodes due to network partition or hardware failure and never lose data.
  • Operational Simplicity: Allows you to add machines to the cluster easily, without a large operational burden.
  • Scalability: Automatically distributes data around the cluster and yields a near-linear performance increase as capacity is added.

While Syslog-ng has destinations such as Mongo-db and Redis to name a few , Riak's various ways to query your data beyond the basic key/value operations such as Full-Text Search, built-in MapReduce and Secondary Indexes makes it a powerful Add-on to the Swiss Knife(Syslog-ng).Also now Riak Search 2.0 integrates with Solr which in itself is a popular and open source search platform built on Apache Lucene.

Also Riak was originally built as a mostly data-agnostic key/value store, Riak's Data Types enable you to use Riak as a data-aware system and thus to perform a variety of transactions on five CRDT-inspired data types: flags, registers, counters, sets, and maps and enables applications to use CRDTs through a simple interface, without being exposed to the complex state-based logic underneath.

##Goals for the project While a lot can be accomplished in adding Riak as a Destination to Syslog-ng, below listed are the tasks I would like to complete in the GSoc project time-frame.

  • To begin with, adding a simple destination that places each log message in its own(unique) key, under a configurable (template-able) bucket, Bucket types enable you to create bucket configurations and assign those configurations to as many buckets as you wish.

  • Adding support for data types (most likely Sets) :- All of the values in a set are unique. For example, if you attempt to add the element shovel to a set that already contains shovel, the operation will be ignored by Riak. Sets can be used either on their own or embedded in a map. So all in all to make it easier to query a range of messages, one can time-box them into a single key using Set. That makes by-key queries much more efficient.

  • Support for Protocol Buffer APIs also known as PBC API in Riak:- The PBC API supports Bucket Operations, Object/Key Operations, Query Operations, Server Operations Bucket Type Operations and Yokozuna(code name for Riak Search 2.0) operations.

####Optional -

  • Riak Security (authentication & encryption) support :- As of version 2.0, Riak administrators can selectively apportion access to a wide variety of Riak's functionality, including accessing, modifying, and deleting objects, changing bucket properties, and running MapReduce jobs.

Since achieving this correctly would be hard in the GSoC time-frame, I would like to pursue this as an optional goal - only after the Goals above have been completed

##Development Tools

For implementation of the project, the following tools, libraries and technologies will be used primarily:

  • The C language, because that's what Syslog-ng is written in.
  • Flex and Bison, to teach Syslog-ng's config parser about the new driver's options.
  • The Syslog-ng Incubator, the staging and experimental module collection for Syslog-ng. Most of the work would be done here, and not in Syslog-ng core.
  • Google Protocol Buffers - the serialization format Riak uses, and with which the destination driver will use to talk to it. The protobuf-c library will be used to help with this.
  • Riak, because the entire purpose of the project is to create a pipeline between riak and Syslog-ng.

I explored the riak client options and tried compiling the existing riak-c-client, but it has no support for CRDTs and other Riak 2.0 features useful for Syslog-ng, also i was also told in the #riak irc channel by one of the ops that the riak-c-client has been abandoned . Therefore, working with an abandoned client is pointless and it will take less effort to implement a new client, one that does only as much as Syslog-ng strictly needs.

##Set of Tasks -

  • Getting to know the LogThreadedDestDriver architecture within Syslog-ng whose classes are built for all those destination drivers that use a worker thread for its operations(blocking)

  • Design the configuration that the plugin would recognize, post it to the mailing list and gather feedback from users.

  • At the same time, start implementing a Syslog-ng-incubator plugin that accepts such configuration, but doesn't do anything else, just a dummy skeleton.

  • Start creating a Riak client library in C, by gradually adding functionality to Syslog-ng:

    • Connect to Riak
    • Store a value in a key
    • Store a value in a Set
  • Then, once we can store values in Riak, add template support to the Syslog-ng part of the project.

##Deliverables

  • A new destination driver for Syslog-ng-incubator.

    • Can connect to Riak using the PBC API. If time allows, using Riak Security too (for encryption and authentication).
    • The host and port of the Riak cluster is configurable. The driver does not need to support fallback hosts as it expects to be connected to a load balancer in front of Riak.
    • The bucket-type, bucket and key are configurable, all of the templates.
    • The storage model is configurable: either unique keys, or using a Set. It is not yet decided how this would be configured, but it must be settable. This option is not a template-able option.
    • Optionally, various write-parameters, such as w, dw, pw, timeout, sloppy_quorum, n_val could be settable (none of them template-able),where
      • pr and pw ensure that many primary nodes are available before a read or write. Riak will read or write from backup nodes if one is unavailable
      • dw represents the minimal durable writes necessary for success
      • sloppy_quorum represents that if any primary (expected) node is unavailable, the next available node in the ring will accept requests.
      • n_val represents the number of nodes to replicate to

##Scheduled plan -

  • Between 27 April to 25th May: The Community Bonding Period

    • Get to know the LogThreadedDestDriver classes and understand its usage by looking at other destinations already setup in Syslog-ng
    • Understand Configurations of both Syslog-ng and Riak relevant to the project that would help me in designing the Plugin
    • Post my understandings in the mailing list and clear out ambiguities from the mentor and other members
    • Reiterate this wiki as per the feedback.
  • Between 25 May to 21 August: The Coding Period

    • week 1: Design the configuration that the Plugin would understand - reiterate on the community feedback
    • week 2 and week 3: Implement a Syslog-ng incubator plugin that understands the above configuration, the basic skeleton to be implemented here.
    • week 4: Documentation and testing of the above work, also a buffer for uncompleted work or extension.
    • week 5 : Creating a riak-client library in C that connects to riak and stores a value in a key
    • week 6 : Add functionality to the above client to store values in a set either using unique keys for values or using a key for a set-to be decided later with the help of the mentor
    • week 7: Documentation and testing for the above implemented library
    • week 8: Buffer Period for any uncertain behaviour by the code or to complete the above tasks if incomplete
    • week 9 to week 10: Add Template support to Syslog-ng particular data
    • Week 11: Documentation and Testing of the above tasks
    • week 12: Buffer Period

I have not yet included the optional tasks like authentication and encryption in the above schedule but there is a couple of weeks buffer, so if everything went well and the above tasks are completed in time, then in the last couple of weeks I would like to implement support for the Riak's security features in the project. Also the Documentation and Testing after each major implementation would help me in getting community feedback by providing proof of concept with documentation support to configurable settings and templates for the users.

Also during the entire time-line i would be taking feedback and suggestions from the mentor, and since these conversations would be held via the mailing list therefore, the entire community could give me feedback and suggestions which could be of great help in completing the project successfully with great developers testing the project, hence the quality of the Documentation can also be ensured.

##About Me:

  • Name - Parth Oberoi
  • Age - 21
  • IRC nick - htrap
  • email_id - htrapdev@gmail.com
  • phone - +918904505281
  • github_handle - htrap
  • blog - thetechtrap.com (under development)
  • College - PES Institute Of Technology South Campus (PESIT)
  • Branch - Information Science Engineering
  • level - Undergraduate

I am a technology addict and like to learn everything that interests me,mainly programming and other technical things. The main reason of choosing this particular project was the learning opportunities that were listed in the ideas page. I am an open source enthusiast and a Linux User for quiet some time now ,I also have working knowledge of Python, Ruby, C, git, lex and yacc . Also i have been using git and github for almost all the projects i do hence i am accustomed to making regular and meaningful commits. I am hardworking and Committed to the work I do.

####Other Commitments: During the project time-line i would also be enrolled in a couple of on-line Courses which would require 10-15 hours per week of work, which can be managed alongside Project Schedule and would not affect in its Development. Other than that I have no commitments for the GSoC period .

Clone this wiki locally
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.