Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trim the README.md and add quickstart guide #202

Merged
merged 10 commits into from
Mar 19, 2019
Merged

Trim the README.md and add quickstart guide #202

merged 10 commits into from
Mar 19, 2019

Conversation

niklas88
Copy link
Member

@niklas88 niklas88 commented Mar 7, 2019

pretty aggressive trimming and a very quick no details no fuss quickstart guide

@niklas88 niklas88 requested a review from joka921 March 8, 2019 15:37
Copy link
Member

@floriankramer floriankramer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read through everything and marked all language, spelling and grammar errors and potential improvements I noticed. I also added one or two comments as to the context, but did not verify the correctness and completeness of the commands.


On top of the vanilla SPARQL functionality, QLever allows so-called SPARQL+Text
queries on a text corpus linked to a knowledge base via entity recognition. For
example, the following query find all mentions of astronauts next to the words
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the following query find all mentions -> the following query finds all mentions

[here](docs/sparql_plus_text.md).

QLever also supports efficient SPARQL autocompletion. For example, the
following query yields a list of all predicates associated with persons in the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

persons -> people


QLever also supports efficient SPARQL autocompletion. For example, the
following query yields a list of all predicates associated with persons in the
knowledge base, ordered by the number of persons which have that predicate.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pesons -> people

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly it turns out "persons" is okay too and is in fact the more formal (older) form. Still, since we are using C++17 we might as well stick to the more modern form for English as well. So thanks for pointing this out to me!

GROUP BY ?predicate
ORDER BY DESC(?count)

Note that this query could also be processed by standard SPARQL simply by
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As SPARQL is the language, I would replace this with Note that this query could also be processes by a standard SPARQL engine or Note that this query is equivalent to a standard SPARQL query.

ORDER BY DESC(?count)

Note that this query could also be processed by standard SPARQL simply by
replacing the second triple by ?x ?predicate ?object. However, that query is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replacying by -> replacuing with that avoids having two repetitive bys. Also replace with is normally used in the active case (versus replace by in the passive case).

docs/wikidata.md Outdated
## Build a QLever Index

Now we can build a QLever Index from the `latest-all.ttl` Wikidata Turtle file
using the `wikidata_settings.json` file for some useful default settings for
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... .json file for some ... this sentence is ginourmous and should probably be split into two. E.g.:
.json file. The .json file conatains some useful...

docs/wikidata.md Outdated

Now we can build a QLever Index from the `latest-all.ttl` Wikidata Turtle file
using the `wikidata_settings.json` file for some useful default settings for
relations that can be safely stored on disk because their actual values are
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that can be stored on disk safely as their

docs/wikidata.md Outdated
-u`) is not 1000 you have to make the `./index` folder writable for QLever
inside the container e.g. by running `chmod -R o+rw ./index`

**Note (1):** This takes about half a day but should be much faster than with most
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but should be ->which is still faster than [most] other triple stores.. I would consider simply removing that part of the sentence though, as this is a tutorial and not a comparison, and it feels somewhat out of place to me.

docs/wikidata.md Outdated
qlever

Then point your browser to [http://localhost:7001/](http://localhost:7001/) and
enter the query.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as before, I prefer open ... in your browser and I would remove the enter the query part, as that should be obvious, and the implication that the user wants to enter exactly one query sounds strange to me.

docs/wikidata.md Outdated
Then point your browser to [http://localhost:7001/](http://localhost:7001/) and
enter the query.

For example the following query retrieves all mountains above 8000 m
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example,

@niklas88
Copy link
Member Author

@floriankramer thanks for the review. I think I addressed all of your comments. I also reworked the Wikidata Quickstart Guide a bit

  • Moved the filesystem space part to the git clone so it's less likely one needs to move it later
  • Changed to using a wikidata-input folder so it's more clear how the input can be stored separately
  • Uncompress wikidata directly while downloading, so that it is one slow command and one can get started immediately on the next steps once that is finished

@niklas88 niklas88 requested review from floriankramer and removed request for joka921 March 13, 2019 13:26
Copy link
Member

@floriankramer floriankramer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read through everything again and added some more comments for both things I missed the first time around, and new changes. Overall I really like the new documentation though, and the comments focus mostly on language details.

README.md Outdated
on > 4 GB files or allocate enough RAM for larger KBs), docker version 18.05 or newer
(needs multi-stage builds without leaking files (for End-to-End Tests)) and `git`.
Then you can simply do the following:
If you use QLever in your work, please cite this paper.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this should be that paper, since this is written inside of the readme and not the paper.

README.md Outdated
Alternatively to get started with a real (and really big) dataset we have prepared
a [Wikidata Quickstart Guide](docs/wikidata.md). This guide takes you through the entire
process of loading the full Wikidata Knowledge Base into QLever, but don't worry
it is pretty.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to write pretty simple / easy, or do you want to express the beauty of wikidata, qlever, the process or the guide?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I missed a flush in my brain's output path.

machine. If you have no input data yet obtain it from one of our [recommended
sources](docs/obtaining_data.md) or create your own knowledge base in standard
sources](docs/knowledge_bases.md) or create your own knowledge base in standard
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

input data yet -> input data yet,
or ->, or

README.md Outdated
machine. If you have no input data yet obtain it from one of our [recommended
sources](docs/obtaining_data.md) or create your own knowledge base in standard
sources](docs/knowledge_bases.md) or create your own knowledge base in standard
*NTriple* or *Turtle* formats and (obtionally) add a [text
corpus](docs/sparql_plus_text.md).

Note that QLever only accepts UTF-8 encoded input files, then again [you should
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to your changes, but I would replace the comma here by a fullstop.

README.md Outdated
By default and when running `docker` **without user namespaces**, the container
will use the user ID 1000 which on Linux is almost always the first real user.
If the default user does not work add `-u "$(id -u):$(id -g)"` to `docker run`
to let QLever execute as the current user.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to let -> to have / to make

docs/wikidata.md Outdated
[here](https://docs.docker.com/install/linux/docker-ce/ubuntu/).

To download QLever we will clone the `git` repository from GitHub. As we
create the QLever index in a subfolder of the repository in this tutorial, **make
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the in this tutorial needs to either be moved to after the As we, or simply be omitted (the context should be clear).
make sure -> you should make sure, that you
of available space -> of space available on the drive on which you execute...

docs/wikidata.md Outdated
To download QLever we will clone the `git` repository from GitHub. As we
create the QLever index in a subfolder of the repository in this tutorial, **make
sure you have about 2 TB of available space** where you execute the following
steps. Alternatively you can see the full
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively -> Alternatively,

docs/wikidata.md Outdated
## Download and uncompress Wikidata

If you already downloaded **and decrompressed** Wikidata to uncompressed Turtle
format you can skip this step, otherwise we download and uncompress it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

step, otherwise -> step. Otherwise
You could also simply remove the second sentence, as it is redundant, given that we just said you could skip this step, if you already downloaded and uncompressed wikidata.

docs/wikidata.md Outdated
[README](https://github.com/ad-freiburg/QLever#building-the-index) for
instructions on using a different path for the index.

**The index plus unpacked Wikidata will use up to about 2 TB.**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think that turning this into a more fully formed sentence would help the text flow.

docs/wikidata.md Outdated
Now we can build a QLever Index from the `latest-all.ttl` Wikidata Turtle file.
For the process of building an index we can tune some settings to the particular
Knowledge Base. The most important of these is a list of relations which can safely be
stored on disk as their actual values are rarely accessed. For Wikidata these
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on disk -> on disk, (the comma is not required, but I think it helps structure the sentence)

@niklas88
Copy link
Member Author

@floriankramer thank you for the great (as always) review. I've addressed your comments. I'm really looking forward to the new README. I also used the commands in the Wikidata Quickstart Guide for building the xsd:double index yesterday, so these definitely work.

Copy link
Member

@floriankramer floriankramer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I commented on some minor things, but overall I think that this is ready to be merged.

README.md Outdated
@@ -55,8 +55,9 @@ Further documentation is available on the following topics

# Building the QLever Docker Container

We recommend using QLever with `docker` if you absolutely want to run QLever
directly on your host see [here](docs/native_setup.md).
We recommend using QLever with [docker](https://www.docker.com) if you
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if -? . If

a correspondingly long query time. In contrast, the query above takes only
about 100 ms on a standard Linux machine (with 16 GB memory) and a dataset with 360
million triples and 530 million text records.
by replacing the second triple with `?x ?predicate ?object` and add `DISTINCT`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add -> adding

docs/wikidata.md Outdated
@@ -27,12 +29,14 @@ build the index under a different path.
## Download and uncompress Wikidata

If you already downloaded **and decrompressed** Wikidata to uncompressed Turtle
format you can skip this step, otherwise we download and uncompress it.
format you can skip this step. Otherwise we download and uncompress it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to uncompressed -> to the uncompressed
I'm probably just nitpicking at this point, but the Otherwise we download and uncompress it. still sounds slightly weird to me. What about Otherwise we'll download and uncompress it in this step.

@niklas88 niklas88 merged commit 647bb7e into ad-freiburg:master Mar 19, 2019
@niklas88 niklas88 deleted the improve_readme branch October 1, 2019 15:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants