Skip to content

Technical Approach

ericlucb edited this page Mar 3, 2017 · 15 revisions

We default open source for all of our work. As such, we are happy to outline our engineering plan, our technology stack of open source tools and technologies, and our data ingestion techniques.

Schematic

Application architecture

The application used the React framework to build an easily maintainable and interactive single page application. We decided to use Google’s Firebase PaaS to get simple hosting, authentication and authentication out of the box. After the static files of the application are built in a Docker container by our CI system, they are deployed to Firebase hosting to be accessible by the client. The client furthermore uses Mapbox for an interactive map and Google places API for lockup of geographic coordinates. Firebase has the advantage of automated realtime syncing of data, which allows us to show new users and hazards in the admin view as soon as they are added to the database. The second part of the application is a node.js application running in a Docker container in Amazons’s Elastic Container Service. Also here the real-time nature of Firebase is advantageous because it allowed us to easily implement a message queue for SMS notifications. As soon as the client creates a new entry in the notifications collection on Firebase, the backend is notified of this new entry and sends a notification through the Twilio API.

Technology Stack

The following is a description of the technologies we used and how we used them to create this prototype.

Technologies:

  • React: React is a frontend framework for single page applications and is usually used in combination with a Flux framework. We decided to not include any Flux implementation due to the simplicity of the project and the reactive nature of the real-time Firebase Database. We simply build an ad-hoc layer around the Firebase API, which would be replaced by a more explicit and better maintainable Redux store in case this project increases in complexity.
  • Webpack: A build system for web applications that we used to compile the code from JSX to javascript, to include items such as CSS assets. It also comes with a convenient development server that features hot reloading of the application on file changes.
  • Eslint: We use extensive and strict linters on all of our projects to simplify adherence to coding standards and automatically check for best practices.
  • Jest: A test runner that can be setup with minimal configuration that features a great watch mode to continuously execute tests during development
  • Targaryan: A library that lets us write tests against the security rules of Firebase.
  • Docker: We use Docker to easily deploy our application to any cloud service of choice. The biggest advantage is that we can guarantee that the application runs in exactly the same environment as it ran in during testing. We also do all local development within Docker containers, which reduces the setup of the development environment on a new computer with less than three commands on the terminal.
  • Python Data Science Stack (Jupyter, Pandas, …): Jupyter notebooks in combination with Pandas are the hammer and nail of every data scientist. We use it to investigate datasets and communicate. In order to make sure that our notebooks are of high quality, we strictly follow our in house style guide and currently develop a linter to automatically check for compliance with the style guide.

Services and Platforms:

  • Firebase: Firebase is a real-time database that comes with a set of other services that are convenient for prototyping a project like the current one, but also promises to scale to thousands of concurrent users. Firebase allowed us to host the app with a simple deployment of static assets. We also used Firebase as a real-time database that automatically pushes changes to all connected clients. This allows us to update in real-time the users that an administrator will be able to see on the map or to notify our notification handler backend that a new SMS should be sent out via Twilio.
  • AWS ECS: We already have a cluster of nodes running to which we deploy containers for our other projects. Thus we created a new service on this cluster which always keeps one instance of the Docker container with our notification handler running. If we want to deploy a new version we simply have to kill a task of this service and it will recreate a task with the latest image of the backend container.
  • CircleCI: Circle CI is used for continuous testing and deployment. When creating a new pull request on GitHub, CircleCI will automatically trigger a run of the whole Pipeline on CircleCI. Circle CI ensures that all containers are going to be built, the code is linted, tests are run, and Reviewable, Github and our Slack channel get notified of the result of this build. Moreover, A branch can only be merged if the build was successful.
  • GitHub: We used GitHub for code hosting, wiki, and pull requests.
  • AWS CloudWatch for ECS: We use AWS CloudWatch for ECS for continuous monitoring.
  • Reviewable: Reviewable was used for better reviews of Pull Requests than on GitHub. We have a strict review process with custom rules of when a PR counts as accepted. Reviewable lets us implement such a process, where a PR can only be merged after all line comments are answered by the submitter and approval is given by at least one reviewer.

Data:

We investigated several of the APIs that were mentioned in the solicitation. We found one endpoint that contained warnings of different kind of hazards and we decided to use this single endpoint for our prototype. Details can be found in these links.. Because the API endpoint fails with a HTTP 500 regularly, we did not set up a scheduled importer task yet. We are in communication with the providers of the API and will deploy a task to pull in fresh data at regular intervals as soon as the problems on their side are fixed.

To import the data we use a Python script that pulls the data from an ArcGIS server, simplifies the shapes of the hazard polygons in order to save bandwidth and space in our database and import them into our Firebase database.

#Continuous Integration Flow Our continuous delivery process is built around Github and CircleCI. As soon as a new Pull Request gets created on Github, a new testing and build process is triggered on CircleCI. Additionally the reviewer assigned to the PR is notified via Reviewable.io. We decided to use Reviewable instead of GitHub because it gives us more control over the acceptance criteria of a pull request. When the build process was successful and the PR was accepted by at least one reviewer, the feature branch can be merged into master. The merge into master will trigger the deployment pipeline, which first will build again the containers, then run the linters and the tests again to check that nothing went wrong during the merge. If that was successful, the container of the backend is pushed to the Docker Hub registry and Amazon’s container service ECS will get notified that a new version of the backend is ready for deployment. ECS will automatically start running this new task in our AWS cluster. In a second step the static files for the client application are built in a Docker container and directly pushed to Firebase hosting. The third step is to deploy the Firebase database rules to the Firebase real-time database to make sure that data is only accessible to authorized users. We wrote extensive tests to guarantee the correctness of our database access rules.

How to Run and Install On Another Machine

As we packaged all components of our application into docker containers, it is simple to set up a local development environment after a dev installed docker on their local machine.

Steps to run and install:

  1. Start the frontend development environment with hot reloading with one single command: docker-compose up frontend

  2. Start the backend (sending SMS through Twilio) with docker-compose up backend. This command expects some environment variables to be set. These variables are the API keys that we factored out in order to make all code easily sharable in an open repository.

  3. Import new data from the ArcGIS server into the database with docker-compose run --rm data-preparation python watches_importer.py data/watches.json

  4. To run the environment for data manipulation and preparation with Jupyter notebooks on localhost:8889: docker-compose up data-preparation

Clone this wiki locally