New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I'm new to blazegraph, could you clarify? #203
Comments
The easiest is to run two instances (ideally on two machines). Load into
one in the background, cut over once loaded, then delete the journal on the
other instance and start your next load there.
…On Thu, Jun 10, 2021 at 06:18 Olivier4477 ***@***.***> wrote:
Hello,
I discover blazegraph. I want to use government data for an app.
However, the data (.rdf) is very big (3.30go)
For example, if I do:
curl -X POST -H "Content-Type: application / rdf + xml" --data-binary @
flux.rdf "http: // localhost: 9999 / blazegraph / namespace / kb / sparql"
Blazegraph takes about 2 hours to load the data. When I wrote to you, I
did:
curl -X POST http: // localhost: 9999 / blazegraph / namespace / kb /
sparql --data-urlencode 'update = DROP ALL'
Obviously the drop time is also very long.
Knowing that the data (.rdf) is updated every day, how can I update
blazegraph? Is it possible to update blazegraph without deleting (drop all)?
How can I speed up the upload / update of data?
Thanking you
Have a good day
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#203>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AATW7YEO2W7OQ3NMP4YEZBTTSC3UBANCNFSM46OQIYTQ>
.
|
thank you for your reply. But I already have to use a minimum 8GB machine for blazegraph to work ... Is it really the only solution? It is not possible for example: Or another possibility? I really want to use the data the government provides me but it's RDF / sparql ... thank you so much |
Run two instances on the same machine then.
There is no trivial way to identify all of the allocations in the storage
layer associated with one loaded triple or quad store such that they may be
trivially dropped.
It is possible to use lower level apis to drop indices but you might not be
freeing up the allocations immediately if you do that - this depends on how
the rwstore is set up.
On the other hand, as long as the machine can handle the two workloads
(load and query) you can just use two instances.
You can also use the DataLoader for loading into the second one. This way
you can always have the full database responding at the same URL and port
with a short downtime when you kill that process and restart it over the
other database.
…On Thu, Jun 10, 2021 at 07:00 Olivier4477 ***@***.***> wrote:
thank you for your reply.
But I already have to use a minimum 8GB machine for blazegraph to work ...
If I have to use a second it is not the same budget.
Is it really the only solution?
It is not possible for example:
curl -X POST -H "Content-Type: application / rdf + xml" --data-binary @
flux.rdf "http: // localhost: 9999 / blazegraph / namespace / kb / sparql"
to specify a table name (example current date)
at midnight load the update (with the new name of the table) and delete
the table from the days before?
Or another possibility?
I really want to use the data the government provides me but it's RDF /
sparql ...
thank you so much
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#203 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AATW7YDQYCWHFF4LJJDZYY3TSDAQRANCNFSM46OQIYTQ>
.
|
Ok I think I understood your logic, but to put it into practice I will need help. I'll explain, I use a docker-compose like this: This image is provided in government documentation for data usage. So for the moment I do: Then, if I want to reload I must: This is how I proceed now. Before this solution, I used apache java Jena for sparql, it took 5 hours to load the data (on my computer 32 gb of ram) |
Not a docker expert. You’ll need to get someone else’s advise on that.
…On Thu, Jun 10, 2021 at 07:16 Olivier4477 ***@***.***> wrote:
Ok I think I understood your logic, but to put it into practice I will
need help.
I'll explain, I use a docker-compose like this:
version: '3.1'
services:
blazegraph:
image: conjecto/blazegraph:2.1.5
restart: always
ports:
- 9999:9999
environment:
JAVA_OPTS: "-Xms2g -Xmx3g"
volumes:
- ./dataset:/docker-entrypoint-initdb.d
datatourisme:
build: docker
ports:
- "8080:80"
restart: always
depends_on:
- blazegraph
This image is provided in government documentation for data usage.
So for the moment I do:
docker-compose up
then I load the data like this:
curl -X POST -H "Content-Type: application / rdf + xml" --data-binary @
flux.rdf "http: // localhost: 9999 / blazegraph / namespace / kb / sparql"
(the data file must be stored in dataset / kb / data
Then, if I want to reload I must:
docker-compose rm blazegraph
then docker system plum
then relaunch blazegraph
This is how I proceed now.
Before this solution, I used apache java Jena for sparql, it took 5 hours
to load the data (on my computer 32 gb of ram)
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#203 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AATW7YC76JSU3ZASGMM5ENLTSDCLLANCNFSM46OQIYTQ>
.
|
Ok but ... how would you have done? in any case thank you very much, hoping that another person can take over to help me Thank you so much! |
Hello,
I discover blazegraph. I want to use government data for an app.
However, the data (.rdf) is very big (3.30go)
For example, if I do:
curl -X POST -H "Content-Type: application / rdf + xml" --data-binary @ flux.rdf "http: // localhost: 9999 / blazegraph / namespace / kb / sparql"
Blazegraph takes about 2 hours to load the data. When I wrote to you, I did:
curl -X POST http: // localhost: 9999 / blazegraph / namespace / kb / sparql --data-urlencode 'update = DROP ALL'
Obviously the drop time is also very long.
Knowing that the data (.rdf) is updated every day, how can I update blazegraph? Is it possible to update blazegraph without deleting (drop all)?
How can I speed up the upload / update of data?
Thanking you
Have a good day
The text was updated successfully, but these errors were encountered: