An interactive graph database of openly available seismic datasets
Over the years numerous geophysical datasets have been released for public usage. However, with no central storage location or consistent description strategy, finding suitable openly available datasets still poses a large challenge to the geophysics community. GeoGraphI aims to tackle this problem by providing a single access point with the necessary structured information to search for suitable datasets.
GeoGraphI is a graph database, originally built using the Neo4J graph database management system, with the provided example queries wrote in the Cypher query language. With data subsets ranging from passive seismic to migrated volumes, and from core images to interpreted horizons, GeoGraphI can be queried either by key information matching, such as Return a field seismic dataset acquired over a salt body, or by computing similarity scores, such as Return similar field datasets to the SEAM dataset.
The database schema is an extension to that developed by the Open Subsurface Data Universe with additional, less-technical dataset descriptors which highlight the interesting features related to a dataset. For example, is the survey acquired in an area of CO_2 injection, or in an area of turbidites, or perhaps the data is plagued by simultaneous shooting, or surface related multiples.
Finally, being a graph database, GeoGraphI can naturally handle a large number of relationships between different data features as well as being able to easily adapt for future growth, either through further population of the database or by modification of the underlying schema.
A dump of the dataset is in the *.dump
file and can be loaded into a local instance of neo4j by running the following command:
neo4j-admin load --from=<archive-path> --database=<database> [--force]
Or by using the neo4J Desktop - see https://tbgraph.wordpress.com/2020/11/11/dump-and-load-a-database-in-neo4j-desktop/ for guidance.
The current schema is as follows:
Node | Properties | Description | Example |
---|---|---|---|
Dataset | id | Unique dataset identifier | 001 |
name | The common name given to the dataset | Northern Lights | |
link | web link to the general overview of the dataset | https://data.equinor.com/data... | |
Operating Environment | type | Marine, Land or Cross-environment | Marine |
Generation Procedure | type | Synthetic, Lab or Field | Field |
Geographic Region | name | Continent of survey | Europe |
Seismic Geometry | type | 2D, 3D, 4D, or Passive | 2D |
Energy Source | type | Type of seismic source used | Airgun |
Receiver Type | type | Recording Instrument | Geophone |
location | Surface or Borehole | Borehole | |
no_of_components | Number of components in each receiver | 3C | |
Seismic Subset | id | Unique seismic subset identifier | 002 |
processing_stage | The data processing stage | VSP | |
data_format | The file format in which the data is available | segy | |
link | Web link for downloading the data | https://data.equinor.com/data... | |
Supplementary Subset | id | Unique supplementary subset identifier | 003 |
data_type | The type of supplementary data available | Well Logs | |
link | Web link for downloading the supplementary data | https://data.equinor.com/data... | |
Interesting Features | type | Interesting features whether geological or artefacts | CO2 Storage |
With the relationships between the nodes as illustrated in the following plot:
The input data is stored in an excel spreadsheet of which the sheets are saved seperately prior to being used to populate the graph.
A simple use case is identifying a dataset which matches a few criteria, for example it has a post-stack seismic dataset as well as a horizon set.
MATCH p=(s:SeismicSubset)-[]-(d:Dataset)-[]-(n:SupplementarySubset)
WHERE n.data_type='Horizons' and s.processing_stage='Poststack Seismic'
RETURN p
A more advanced use case is to find a dataset that is most similar to another dataset. In this case we need to make a projection of the graph with only the relevant nodes for the similarity comparison (query 1) and then we can compute the Jaccard Similarity Score between the different datasets (query 2).
Query 1:
CALL gds.graph.create("dataset-graph",
['Dataset', 'OperatingEnvironment', 'SeismicGeometry', 'Feature', 'ReceiverType', 'EnergySource'],
['from_operating_environment','has_survey_geometry', 'contains_feature', 'recorded_by', 'uses_seismic_source'])
YIELD nodeCount, relationshipCount
Query 2:
CALL gds.nodeSimilarity.stream('dataset-graph')
YIELD node1, node2, similarity
WHERE gds.util.asNode(node2).name='SEAM P1 Elastic'
RETURN gds.util.asNode(node1).name as Dataset1, gds.util.asNode(node2).name as Dataset2, similarity
ORDER BY similarity DESCENDING, Dataset1, Dataset2
Whilst originally developed for identifying a dataset to work with, GeoGraphI can also be queried to see all available information on a specific dataset.
MATCH p=(d:Dataset)-[]-()
WHERE d.name='Northern Lights'
RETURN p
GeoGraphI was built to be shared with the geoscience community. On its initial release (April 2021), GeoGraphI has a strong seismic focus and is in at an MVP-stage. The hope is that via crowd-sourcing GeoGraphI will grow to have a more general geoscience focus, for example by including substantially more well datasets. As it grows the schema will undergo many revisions to remain flexible and providing the best possible information relevant to the datasets that populate the graph.
If you would like to add data to the GeoGraphI, please add the necessary information into the excel input file.
Additionally, if you have any feedback on the schema or notice any data inaccuracies then please open a GitHub Issue which we will respond to asap.
If GeoGraphI helps you find data then please let us know - we love to hear that it has been of use 😊