Skip to content

Latest commit

 

History

History
303 lines (237 loc) · 14.2 KB

RepositoryOverview.asciidoc

File metadata and controls

303 lines (237 loc) · 14.2 KB

Neo4j: Repository Overview

This document provides a quick introduction to the organization of the development process of Neo4j. It is intended for new developers, to help them get acquainted with the system.

First, the section Module Structure will discuss how Neo4j is comprised of several components and modules, which work together to provide Neo4j’s functionalities. Then the organization of the source code is discussed. This includes an explanation of the directory structure of the source code repository, instructions on how to build and test the system and an overview of the configuration management of the source code.

Module Structure

This model will describe how the source code of Neo4j is split up into smaller pieces called components and modules. Note that some components or modules were skipped, because they are deprecated (e.g. cypher-plugin) or we regard them as not required for the initial understanding of the main system (e.g. testing-utils or the code examples used in the manual).

Main Components

Neo4j is comprised of several main components. Each such component is an Apache Maven component and may be comprised of one or more modules. The main components are:

community-build

The main neo4j database, which can be used under a GPL license. It corresponds to the community version described here. This component can be found in the community directory of the source repository.

If you are looking to fix a bug or add a feature to the community edition, this is the place to be.

advanced-build

This component adds extra functionality which is then included in the enterprise edition. The code for the extra functionality can be found in the advanced directory of the repository.

enterprise-build

This enterprise build adds extra features to the advanced build and corresponds to the enterprise edition described here. The source code for these extra features can be found in the enterprise directory of the repository.

packaging-build

This component is responsible for packaging all the other components. It consists of the neo4j-server-qa module, containing tests, and the neo4j-standalone module, which is responsible for creating standalone installers from the other components. If the standalone installers are not working (like discussed here) this component is the place to go.

neo4j-manual

This component pulls together the documentation of the other components and generates a single manual from it.

This component is only important if you want to change the main outline of the manual. For example, to add a new section or make a new top-level chapter. If you are looking to fix or expand existing documentation, the specific manual files can be found in the directory of the component that they document.

Main Modules

Module overview
Figure 1. Module overview

This figure gives an overview of the modules of which the main components are comprised. The arrows indicate dependencies between modules. Some modules and their dependencies are colored to help keep crossing arrows apart.

We will now briefly describe the modules and their roles.

Community Modules

server-api

This module provides classes which can be used to create plugins on the Neo4j server. More information about its usage can be found here.

neo4j-udc

This module is a Usage Data Collector which can gather usage data to help improve Neo4j. For more information on what data it gathers and how it can be disabled, see the manual.

neo4j-graphviz

The graphviz module is a library that allows visualization of Neo4j graph data using graphviz. For more information about its usage, see this blogpost by Peter Neubauer.

neo4j-jmx

The jmx module offers a JMX interface to different Neo4j modules. Its details and usage information can be found here.

neo4j-graph-algo

This module offers some graph algorithms which can be performed on your graphs. They can be called using any of the interfaces to the system, like the REST api as documented here, or the traversal API as documented here. Finally, the graph algorithms are integrated into the Cypher Query Language as is illustrated here.

neo4j-graph-matching

This module provides an API to perform pattern matching on graphs. It is mainly intended for internal use by the Cypher Execution Engine. Although its API can be used when using an embedded version of Neo4j, this is not recommended.

neo4j-lucene-index

This module provides indexing capabilities to allow users to lookup nodes based on their properties. As described in the manual, this way of indexing is no longer recommended if schema indexing would suffice.

neo4j-server

The community version of the Neo4j server. It contains the functionality of the REST API and the WebAdmin tool.

neo4j/neo4j-community

The neo4j module and the neo4j-community module are pretty much the same. The overview only shows the dependencies of neo4j-community, but these are the same as those of the neo4j module. Both modules are dependencies of other modules and therefore in use. They both still exist for historical reasons. It is important to note, however, that only the neo4j module contains documentation.

Both modules are simply a meta-package which allows you to include all its dependencies at once.

neo4j-shell

The shell module provides a simple shell to monitor a Neo4j database. It is documented here and its manpage can be found in the manpages section of the manual.

neo4j-cypher

This module provides the Cypher Execution Engine, which allows users to query using the Cypher Query Language. As it is partly written in Scala, working with this module requires some extra setup as discussed here.

neo4j-kernel

The kernel is the core of Neo4j. It contains the custom storage system, the embedded API of Neo4j and transaction support as listed here.

Advanced Modules

neo4j-server-advanced

This module contains the extra monitoring features of the advanced version of the Neo4j server.

neo4j-advanced

Just like the neo4j-community module, this is simply a meta package to allow easy inclusion of the other maven modules. This meta package includes modules from the advanced version and also includes neo4j-community.

neo4j-management

The management module extends the neo4j-jmx module with extra monitoring features. These features are documented here.

Enterprise Modules

neo4j-ha

The high availability (ha) module allows the Neo4j server to be clustered, to allow for fault-tolerance and read-scalability as discussed here.

neo4j-cluster

This module is a library to provide Heartbeat and Paxos implementations, which are used by the high availability cluster.

neo4j-backup

This modules provides the possibility of easily creating backups, even from remote machines. The features of this module and its usage are documented here and the manpage can be found in the manpages section of the manual.

neo4j-com

The communication module supports the communication between the nodes in the high availability cluster.

neo4j-consistency-check

This module contains a tool to check the consistency of a Neo4j data store. It is used by the backup module.

neo4j-server-enterprise

This version of the Neo4j server incorporates the high availability and clustering features into the Neo4j server. It also contains all the features of the advanced and community server.

neo4j-enterprise

This meta package can be used to easily include a lot of the other modules of Neo4j.

Codeline Model

This section will cover the codeline organization of Neo4j. The code is currently hosted on Github and is mainly located in the Neo4j repository.

First, an overview of the directory structure of the repository is given. Then, the build and test approach is discussed. Finally, the use of git and Github for source code configuration management is discussed.

Overview of the directory structure

The main repository reflects the structure of components and modules as discussed in the Module Structure section.

The top-level directories in the repository contain the main components:

community/

contains the community-build component

advanced/

contains the advanced-build component

enterprise/

contains the enterprise-build component

packaging/

contains the packaging-build component

manual/

contains the neo4j-manual component

Inside these component directories, you will find a subdirectory for each module. For example, the community/cypher directory contains the neo4j-cypher module contained in the community component. The directory names may differ a bit from the module names (cypher versus neo4j-cypher), but it should not be a problem to figure out where to find a specific module.

Each module is organized according to maven conventions. So source code can be found in src/main/java for Java code and src/main/scala for Scala code. Tests are located in the src/test/java and src/test/scala directories. More information about the maven conventions can be found here.

Finally, each module can have documentation. This documentation is located in the src/docs folder, which is organized as described here. These documentation files can be incorporated into the manual by including them in the neo4j-manual component.

Build, Integration, Test approach

The source code of Neo4j can be built and tested using Apache Maven, a build automation tool used primarily for Java projects.

To build from the sources and run the unit tests, a simple mvn clean install in the main repository should suffice. This will also run the unit tests. If you don’t want to run the unit tests, add -DskipTests to the maven call, which will skip the execution of the unit tests. If you don’t even want to compile the tests, use -Dmaven.test.skip=true instead. For more information about building Neo4j, please consult the main readme. For further instructions on building the manual, please refer to the manual component’s readme.

The test cases are named according to the configuration of the maven surefire plugin. At the time of writing, this configuration can be found in the grandparent pom file and the following names are allowed (using * as wildcard for any number of characters):

  • Test*.java

  • *Test.java

  • *Tests.java

  • *TestCase.java

For unit tests related to the documentation there is addition configuration in the main repository’s pom file, which allows the following names:

  • DocTest*.java

  • *DocTest.java

  • *DocTests.java

  • *DocTestCase.java

Integration tests are run using the failsafe plugin. So these should be named according to the configuration of the failsafe plugin. At the time of writing the default configuration is used. So please name your integration tests accordingly:

  • IT*.java

  • *IT.java

  • *ITCase.java

Again, there is extra configuration for the test cases related to documentation. This also allows the following names:

  • DocIT*.java

  • *DocIT.java

  • *DocITCase.java

Contributing Process

If you want to contribute to the system, please read this section of the manual, which describes some general guidelines for contributing. Note that test-driven development (write tests first, code later) is recommended. Your contribution should adhere to the structure as described in section Overview of the directory structure.

Configuration management

Git is used as the version control system for the source code. Work can be done on different releases at the same time, as they are located on their own branch.

Configuration files for the Eclipse and Intellij IDEA IDEs can be found here. These files will configure your IDE to use the Neo4j coding style. There is also a configuration file for PMD located here, which can help you find bugs or code smells in your contribution.