Skip to content
This repository has been archived by the owner on Apr 5, 2024. It is now read-only.

Commit

Permalink
Merge branch 'master' of https://github.com/CDECatapult/fornax
Browse files Browse the repository at this point in the history
  • Loading branch information
Dan-Staff committed Dec 3, 2018
2 parents cf85ea0 + 7cc41ac commit 063a381
Show file tree
Hide file tree
Showing 24 changed files with 2,437 additions and 1,277 deletions.
158 changes: 29 additions & 129 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,24 @@
[![CircleCI](https://circleci.com/gh/CDECatapult/fornax.svg?style=svg&circle-token=2110b6bc1d713698d241fd08ae60cd925e60062f)](https://circleci.com/gh/CDECatapult/fornax)
[![Coverage Status](https://coveralls.io/repos/github/CDECatapult/fornax/badge.svg?branch=master)](https://coveralls.io/github/CDECatapult/fornax?branch=master)
[![Known Vulnerabilities](https://snyk.io/test/github/CDECatapult/fornax/badge.svg)](https://snyk.io/test/github/CDECatapult/fornax/badge.svg)


# Fornax

An implementation of [NeMa: Fast Graph Search with Label Similarity](http://www.vldb.org/pvldb/vol6/p181-khan.pdf) using python3 and sqlite or postgres.

![FORNAX](./fornax.png)
![FORNAX](./docs/img/fornax.png)

## Install (Dev)

From the root directory:

```bash
# install dev dependencies
pip install -r requirements/dev.txt

# install fornax
pip install -e .
```

## Test
Expand All @@ -30,146 +36,40 @@ The available options for installing SciPy packages are listed [here](https://sc

See the tutorials for a full working example.

### Tutorial Dependencies
* [Part 1](docs/tutorial/tutorial1.ipynb) - Download a small graph dataset
* [Part 2](docs/tutorial/tutorial2.ipynb) - Search the dataset using fornax

### Install Tutorial Dependencies (using conda)

The following tutorials use jupyter notebooks to create a worked example.
We reccomend you use the anaconda python distribution to run the notebooks.
We recommend you use the anaconda python distribution to run the notebooks.

```bash
conda env create -f environment.yml
pip install -r requirements.txt
```

* [Part 1](https://github.com/CDECatapult/fornax/blob/master/notebooks/tutorial/Tutorial%201%20-%20Creating%20a%20Dataset.ipynb)
* [Part 2](https://github.com/CDECatapult/fornax/blob/master/notebooks/tutorial/Tutorial%202%20-%20Making%20a%20Query.ipynb)

## Database Setup

By default fornax will use an in memory SQlite database.

Alternative databases can be used by setting the environment variable `FORNAX_DB_URL` using the [sqlalchemy database url format](https://docs.sqlalchemy.org/en/latest/core/engines.html).
SQLite and Postgresql are supported although other databases are untested.

All tables and indicies are initialised at import time if they do not exist already.

## Quick start

```python
# create a query graph
query_graph_handle = fornax.GraphHandle.create()
query_graph_handle.add_nodes(id_src=[0, 1, 2], label=['Hulk', 'Lady', 'Storm'])
query_graph_handle.add_edges([0, 1], [1, 2])
### Run the Tutorials

```bash
source activate fornax_tutorial
cd docs/tutorial
jupyter-notebook
```

# create a target graph
target_graph_handle = fornax.GraphHandle.create()
target_graph_handle.add_nodes(id_src=comic_book_nodes['id], label=comic_book_nodes['name'])
target_graph_handle.add_edges(comic_book_edges['start'], comic_book_edges['end'])
## Documentation

matches = [
(query_node_id, target_node_id, weight)
for query_node_id, target_node_id, weight
in string_similarities
]
### Build the Docs (requires dev dependencies)

match_starts, match_ends, weights = zip(*matches)
```bash
cd docs
make html
```

# stage a query
query = fornax.QueryHandle.create(query_graph_handle, target_graph_handle)
query.add_matches(match_starts, match_ends, weights)
### View the Docs Locally

# go!
query.execute()
```bash
cd _build/html
python3 -m http.server
```

```json
{
"graphs": [
{
"cost": 0.024416640711327393,
"nodes": [
{
"id": 9437002,
"type": "query",
"id_src": 0,
"label": "hulk"
},
{
"id": 13982314,
"type": "query",
"id_src": 1,
"label": "lady"
},
{
"id": 76350203,
"type": "query",
"id_src": 2,
"label": "storm"
},
{
"id": 75367743,
"type": "target",
"id_src": 37644418,
"label": " Susan Storm",
"type_": 2
},
{
"id": 5878004,
"type": "target",
"id_src": 995920086,
"label": "Lady Liberators",
"type_": 1
},
{
"id": 71379958,
"type": "target",
"id_src": 2142361735,
"label": "She-Hulk",
"type_": 0
}
],
"links": [
{
"start": 9437002,
"end": 71379958,
"type": "match",
"weight": 0.9869624795392156
},
{
"start": 13982314,
"end": 5878004,
"type": "match",
"weight": 0.9746778514236212
},
{
"start": 76350203,
"end": 75367743,
"type": "match",
"weight": 0.9651097469031811
},
{
"start": 9437002,
"end": 13982314,
"type": "query",
"weight": 1.0
},
{
"start": 13982314,
"end": 76350203,
"type": "query",
"weight": 1.0
},
{
"start": 5878004,
"end": 71379958,
"type": "target",
"weight": 1.0
}
]
}
],
"iters": 2,
"hopping_distance": 2,
"max_iters": 10
}
```
navigate to `0.0.0.0:8000` in your browser.
7 changes: 3 additions & 4 deletions docs/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,9 @@

# You can set these variables from the command line.
SPHINXOPTS =
SPHINXBUILD = python -msphinx
SPHINXPROJ = fornax
SOURCEDIR = source
BUILDDIR = build
SPHINXBUILD = sphinx-build
SOURCEDIR = .
BUILDDIR = _build

# Put it first so that "make" without argument is like "make help".
help:
Expand Down
Empty file added docs/_static/.gitignore
Empty file.
80 changes: 80 additions & 0 deletions docs/api.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
.. module:: fornax.api

API
===

.. _fornax-api-introduction:

Introduction
------------

This part of the documentation covers the the interface for creating an searching graphs using the fornax package.
For the full documentation of the module api see :ref:`fornax-api-module`.


All of the functionality in :mod:`fornax` can be accessed via the follwoing three classes.

* :class:`Connection`
* :class:`GraphHandle`
* :class:`QueryHandle`

:class:`Connection` is used to manage a connection to a SQL database.
:class:`GraphHandle` and :class:`QueryHandle` are used to create, insert
update and delete graphs and queries.

Connection API
--------------------


Fornax stores and queries graphs using a database via a database connection.
:class:`Connection` manages the lifecycle of this database connection,
the creation of database schema (if required)
and any cleanup once the connection is closed.


.. autoclass:: Connection
:members:
:noindex:

Graph API
--------------------------------

Since Graphs are persisted in a database they are not represented
directly by any object.
Rather, graphs are accessed via a graph handle which permits the user
to manipulate graphs via a :class:`Connection` instance.

.. autoclass:: GraphHandle
:members:
:noindex:

Query API
------------------------------

Like Graphs, queries exist in a database and a accessed via a handle.
Queries are executed using the :func:`QueryHandle.execute` method.

A query brings together three important concenpts.

A **target graph** is the graph which is going to be searched.

A **query graph** is the subgraph that is being seached for in the target graph.

**matches** are label similarities between nodes in the query graph and target graph
with a weight where :math:`0 \lt weight \lt= 1`.
Users are free to caculate label similarity scores however they like.
Fornax only needs to know about non zero weights between matches.

Once a query has been created and executed it will return the *n* subgraphs in the
target graph which are most similar to the query graph based on the similarity score
between nodes and their surrounding neighbourhoods.

.. note::
Nodes in the target graph will only be returned from a query if they have a
non zero similarity score to at least one node in the query graph.


.. autoclass:: QueryHandle
:members:
:noindex:

0 comments on commit 063a381

Please sign in to comment.