Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I'm new to blazegraph, could you clarify? #203

Open
Olivier4477 opened this issue Jun 10, 2021 · 6 comments
Open

I'm new to blazegraph, could you clarify? #203

Olivier4477 opened this issue Jun 10, 2021 · 6 comments

Comments

@Olivier4477
Copy link

Hello,

I discover blazegraph. I want to use government data for an app.

However, the data (.rdf) is very big (3.30go)
For example, if I do:
curl -X POST -H "Content-Type: application / rdf + xml" --data-binary @ flux.rdf "http: // localhost: 9999 / blazegraph / namespace / kb / sparql"

Blazegraph takes about 2 hours to load the data. When I wrote to you, I did:
curl -X POST http: // localhost: 9999 / blazegraph / namespace / kb / sparql --data-urlencode 'update = DROP ALL'

Obviously the drop time is also very long.

Knowing that the data (.rdf) is updated every day, how can I update blazegraph? Is it possible to update blazegraph without deleting (drop all)?

How can I speed up the upload / update of data?

Thanking you

Have a good day

@thompsonbry
Copy link
Contributor

thompsonbry commented Jun 10, 2021 via email

@Olivier4477
Copy link
Author

thank you for your reply.

But I already have to use a minimum 8GB machine for blazegraph to work ...
If I have to use a second it is not the same budget.

Is it really the only solution?

It is not possible for example:
curl -X POST -H "Content-Type: application / rdf + xml" --data-binary @ flux.rdf "http: // localhost: 9999 / blazegraph / namespace / kb / sparql"
to specify a table name (example current date)
at midnight load the update (with the new name of the table) and delete the table from the days before?

Or another possibility?

I really want to use the data the government provides me but it's RDF / sparql ...

thank you so much

@thompsonbry
Copy link
Contributor

thompsonbry commented Jun 10, 2021 via email

@Olivier4477
Copy link
Author

Olivier4477 commented Jun 10, 2021

Ok I think I understood your logic, but to put it into practice I will need help.

I'll explain, I use a docker-compose like this:

This image is provided in government documentation for data usage.

So for the moment I do:
docker-compose up
then I load the data like this:
curl -X POST -H "Content-Type: application / rdf + xml" --data-binary @ flux.rdf "http: // localhost: 9999 / blazegraph / namespace / kb / sparql"
(the data file must be stored in dataset / kb / data

Then, if I want to reload I must:
docker-compose rm blazegraph
then docker system plum
then relaunch blazegraph

This is how I proceed now.

Before this solution, I used apache java Jena for sparql, it took 5 hours to load the data (on my computer 32 gb of ram)

@thompsonbry
Copy link
Contributor

thompsonbry commented Jun 10, 2021 via email

@Olivier4477
Copy link
Author

Ok but ... how would you have done?
Use blazegraph.jar directly?

in any case thank you very much, hoping that another person can take over to help me

Thank you so much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants