Skip to content
Brian Redd edited this page Jan 20, 2018 · 8 revisions

Due to the nature of web applications, Augeo uses many different technologies. For instance, Augeo is a single page application currently running on the MEAN stack. That is MongoDB, ExpressJs, AngularJs, and NodeJs. The client/ front end is written using AngularJs v1.5.3 and communicates with the server via APIs written using Node's Express framework. The APIs act as 'Controllers', if you're familiar with MVC (Model-View-Controller), and leverage service objects to perform business logic. Augeo's database layer is implemented in MongoDB, a NoSQL solution that works seamlessly with NodeJS since they both employ JavaScript and JSON.

A Simple Request

To best understand how the application was designed, it might be easiest to walk through some examples. Lets start with the process for a user to login. First, the client must send a post request to Augeo's server specifying a login with the user's credentials.

Once the NodeJs backend receives the login request, it is forwarded to the user-api. From there, the user-service is leveraged to verify the user's credentials and to obtain the user's information.

This is a simplified example of the login procedure, however it gives a high level overview of processing a request. The application currently consists of the following APIs:

  • augeo-api - high level API that routes requests to a specific API
  • github-api - API responsible for Github specific requests
  • twitter-api - API responsible for Twitter specific requests
  • user-api - API responsible for Augeo user requests

All requests pertaining to these APIs will more or less follow the same flow.

Twitter Event Queues

When a person signs up with Augeo, the first task is to retrieve all their past Twitter activity (i.e., Tweets and Mentions). Due to Twitter’s data restrictions requests can only be made periodically. Therefore, a queueing system had to be implemented to reduce the amount of traffic to Twitter. Twitter has separate API end points for retrieving tweets and mentions, thus two queues are used to simultaneously retrieve users data. The API calls have different rate limits, 1 request every 3 seconds for retrieving a user's tweets, and 1 request every minute for retrieving a user's mentions.

Most APIs have some sort of rate limit for requests hitting their servers, so this system will be important when expanding Augeo to different interfaces.

Twitter Stream Queue

The Event queue tackles the problem of retrieving past Twitter activity, but it doesn't necessarily make sense to use the Event queue to capture real time data. That is where the Stream Queue comes into play. Twitter's API has a way to open up a stream connection with 3rd party applications. The stream is configured in Augeo such that whenever a user tweets or is mentioned, an event will fire. Since it is possible for multiple users to have events triggered at the same time, the Stream Queue will capture the events to ensure they are all accounted for.

Github Event Queue

The Github Event Queue is used to capture commits pushed by the authenticated Github user. This queue is very similar to the Twitter Event Queue, but instead of executing a queue-task and disregarding it afterwards, a queue-task will be placed back onto the queue to be executed again. We refer to this as a revolving queue. Since Github does not have a Stream API, the revolving queue will continuously poll Github for updates.

Fitbit Event Queue

The Fitbit Event Queue is a revolving queue very similar to the Github Event Queue, but with one major difference. Since Github has a rate limit on an application basis, requests can only be made periodically, no matter which user it is. With the Fitbit API, the rate limit is dependent on the user, not the application. Therefore, a request per user can be made every queue iteration.

Schemas

MongoDB does not require schemas, however it is a component of the Mongoose library which gives structure to documents that reside in the collections.

Currently, the application enforces the following schemas:

  • AUGEO_ACTIVITY - Contains data specific to an Augeo activity that is generated by an interface (e.g., Twitter and Github).
  • AUGEO_USER - Contains data specific to an Augeo user including signup information and skill data.
  • GITHUB_COMMIT - Contains all relevant information from a git commit that was pushed to Github.
  • GITHUB_USER - Contains user information from Github.
  • TWITTER_TWEET - Contains all relevant information from a tweet from Twitter.
  • TWITTER_USER - Contains user information from Twitter.

Natural Language Processing

Classifying users online activity into skills is the heart of Augeo, and that is where Natural Language Processing (NLP) comes into effect. Specifically, the Naive Bayes classifying algorithm is trained and used on user data. Currently, the classification process is relatively simple. Words are first categorized into 3 tiers for each skill where Tier 1 words are given more weight than Tier 2 words and Tier 2 words are given more weight than Tier 3 words. These words are then fed to the Naive Bayes training algorithm, and a JSON file is produced that represents the classifier. As user activity comes into Augeo's servers, the activity is ran by the classifier and classified accordingly. Since the algorithm isn't perfect, it is our intention to create a flagging system that allows users to request a classification change. With a combination of improving the classification algorithm and implementing a flagging system, the result should give users a seamless Augeo experience.

Conclusion

A great design will ensure Augeo is maintainable and will reduce the amount of defects over the lifetime of the project. Therefore, we are welcome to feedback and other design considerations. If you have any ideas or suggestions please leave a comment in our design forum in the issues section of this repository. If you are interested in contributing to Augeo, head over to our contributing page to gather details on app installation and our coding standards.