Hardware_Configuration
Pages 115
APIs
NanoSparqlServer
REST API
JAVA Client API
Client Libraries
Notes on compliance with the Sesame API
Apache Tinkerpop 3
Using Blueprints with Blazegraph
Blazegraph FAQ
Sample Applications
Sesame API embedded mode
Sesame API remote mode
Blueprints API embedded mode
Blueprints API remote mode
RDR
SPARQL Extensions
Full Text Search
External Full Text Search
GeoSpatial Search
Analytic Query
Virtual Graphs
Query Hints
Named Subquery
SPARQL Update
Federated Query
Stored Query
Property Paths
Custom Function
InlineIVs
Linked Data
Inference and Truth Maintenance
Reification Done Right
Graph Mining
RDF GAS API
Optimizations and Benchmarking
Performance Optimization
IO Optimization
Query Optimization
Using Explain
SPARQL Benchmarks
Deeper Background
Standalone Guide
BTree Guide
Tx Guide
Striterators
RWStore
Memory Manager
Retention History
Unicode
Developers
Maven Repository
Client Libraries
Sesame API Compliance
Maven Notes
Code Examples
Submitting Bugs
Contributors
ReleaseGuide
Javadoc/API
Common Problems
Operations procedures
Data Migration
Bulk Data Load
Rebuilding the Text Index
Hardware Configuration
Clone this wiki locally
General Overview
Blazegraph uses a native graph database with an underlying B-tree-based implementation. It is not required to store the full graph database in memory. The general guidance is to get a machine with the fastest disk that is cost-effective for your application. In many high-end settings, customers have used devices such as FusionIO to achieve a very high performance for loading and query. If there is a tradeoff between additional RAM and faster disks, we recommend faster disks.
If you expect a workload with a large number of concurrent queries, it is recommended to get a fast multi-core CPU with sufficient RAM.
It is also highly recommended that you review the optimizations sections below to properly tune your instance.
Data on Disk Sizing Guidance
As a rule-of-thumb, we use 90 Bytes per Triple as an estimate. Actual size will vary based on data and the options used when configuring the namespace, i.e. quads, RDF, text indexing, etc.
Triples | Est. Size on Disk (GBs) |
---|---|
10M | .84 GB |
100M | 8.4 GB |
1B | 83.8 GB |
RAM Sizing Guidance
Because of the underlying B-tree implementation, the amount of RAM required is predicated to grow and n * log(n) where n is the data scale in GBs. We estimate a floor value for RAM at 4GB for a 10M edge graph. The chart below is a sizing guide, but actual performance will depend on your query workload and data needs.
Edges (triples) | RAM (GB) |
---|---|
10M | 4GB |
100M | 4GB |
200M | 8GB |
500M | 16GB |
750M | 24GB |
1B | 32GB |
Amazon EC2
We have a number of deployments within Amazon EC2 instances. For the best performance, we recommend SSD storage for the journal files. We typically see people in C3, R2, or I2 instance types. The M3 instances can be a good choice for development and testing or production with smaller data scales. IO behaviors on EC2 can be tricky.
Benchmarking Configuration
The table below shows the machine configuration used for our benchmarking activities performed during release QA.
Configuration | Value |
---|---|
Server Info | (hosted CI benchmark server) |
Processor | Intel® Xeon® E3-1270 V3 |
Processor speed | 4 Cores (HT) x 3,5 GHz |
RAM | 16 GB DDR3 ECC |
Hard Disk | 240 GB (2 x 240 GB SSD) Intel® S3500 |
RAID | Software RAID 1 |
Operating System | Ubuntu 14.04.1 LTS |
Runtime configuration | 4g RAM given to server for execution |
Store Type | DiskRW |
JVM args | -ea -Xmx4g -server -XX:+UseParallelOldGC |