Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Range query support #1

Open
HouzuoGuo opened this issue Jun 28, 2013 · 29 comments
Open

Range query support #1

HouzuoGuo opened this issue Jun 28, 2013 · 29 comments
Assignees

Comments

@HouzuoGuo
Copy link
Owner

Currently tiedot only supports hash table index, it will be very helpful to add another index type to support range queries.

@ifq
Copy link

ifq commented Aug 8, 2013

+1
I was going to ask this question. I'm a key-value db noobie, I thought there is way to do range query, turns out it is not supported yet :)

@HouzuoGuo
Copy link
Owner Author

recently I am practicing Scala, I will start working on more tiedot stuff when I have bit more free time C:

@HouzuoGuo
Copy link
Owner Author

btw tiedot isn't quite a key-value db, it is more like a conventional nosql db.

your hovering ability is cool.

@ifq
Copy link

ifq commented Aug 8, 2013

well, you need enough lights and faster shutter:)

I'm not good at db stuff, and I'm looking for a embedded db for my small project. so whats your recommendation? tiedot or leveldb? (probably not the right man to ask:)

@HouzuoGuo
Copy link
Owner Author

It probably depends on what your use case is and how big fan of Go you are.
LevelDB has proven performance and reliability, while tiedot is a spare-time pet project (although it was made with utmost seriousness).

@ifq
Copy link

ifq commented Aug 8, 2013

I need to store about 50000 image file's info, like filename, size, location, tag, time,etc. and query with tag or time etc.
it's a practice project for me.

@HouzuoGuo
Copy link
Owner Author

which language is it in?

@ifq
Copy link

ifq commented Aug 8, 2013

in Go. I chose Go because it can setup a http server inside the program, no dependency issue, and seems fast.
I want everything compact in my app, so it easy to deploy.
I just realize LevelDB is not Go program, what I wanted to say was choice between leveldb-go or tiedot.

@HouzuoGuo
Copy link
Owner Author

I was aware of leveldb's implementation in Go. Depends on your preference, leveldb is a key-value store, your data may be stored in these maps:

(filename => image), (image => size), (tag => image), (image => time).

If you chose to use tiedot, you may store entire image metadata in one document, similar to:

{"image": "~/png", "size": 1024, "tags": ["friend", "family"], "location: {"country": "CN"}}

And then put indexs on image, tags.

Two different paradigms, I think both of them should work for you.

@ifq
Copy link

ifq commented Aug 8, 2013

tiedot's way looks good.
If all the images separate in different folders, and I want to list all the folder and image amount inside each folder, how should I implement it?
should I create another collection or just insert another document in the same collection that contains image info, document like:

 {"folder":"path/to/dir", "amount":99}

@HouzuoGuo
Copy link
Owner Author

The easiest way is...

find /path/to/dir -name '*.jpg' | wc -l

But if you prefer to think in NoSQL, see if this works: each document in collection library represents an image; the document itself has file path information (let's make it absolute).

Now we want to count number of images (documents) in a path. Problem is that path is hierarchical so we have to figure out a way to index all information in an absolute path, therefore let us index all paths which lead to the image, and put them into a vector, for example, given image /home/howard/pix/1.jpg, the document will look like:

{"dirs": ["/", "/home", "/home/howard", "/home/howard/pix"], "abspath": "/home/howard/pix/1.jpg"}

Put an index on dirs, and the image will appear in search result of dir eq /home, dir eq /home/howard, etc.

@ifq
Copy link

ifq commented Aug 8, 2013

what I need is list all the folders that contain image, with image amount. not one given folder's image amount.

my app is about http server and image display, so I want to show all the folders and subfolders in one page. I can walk through folders with program, but I want to save the result for later usage.

@HouzuoGuo
Copy link
Owner Author

How many concurrent users do you want to support?

If not many and your metadata collection isn't too big, then collection scan (the method above) may not be a bad idea.

But if you have hundreds of concurrent users and metadata collection is not sharded, then latency could lead to bad UX.

@ifq
Copy link

ifq commented Aug 9, 2013

less than 10 users. I'll try it. thank you very much~

@ifq
Copy link

ifq commented Aug 13, 2013

Hi, is there any chance you will add range query feature recently?

It seems I need this feature very much:) Otherwise, I don't know how to select data by time range. Iter them all and check time field manually?

@HouzuoGuo
Copy link
Owner Author

Hello!

Recently I shifted my attention to Scala, check out my project "Schale". It has made its way to first release and now I can do some more Golang...

Range query support will definitely be the next major feature, together with new query syntax (the current query syntax is very ugly).

How granular are your time range queries (by month/day/hour)?

@ifq
Copy link

ifq commented Aug 14, 2013

that's good news.

I need query by day , or may be by integer range.

@HouzuoGuo
Copy link
Owner Author

It may take a little while to add range index support, but talking about range "query", given that your queries work with discrete integer values over a small range, how about I make a feature to do hash lookup over a range of values?

For example... to find photos taken in between February and May, it is merely a hash table lookup of month = 2,3,4 and 5.

@ifq
Copy link

ifq commented Aug 14, 2013

That might helps too . In that case, my document format would be:

 {"year": "2013", "month": 3, "day":"2", "tags": ["friend", "family"],...}

right?

this could be a temporary solution.

@HouzuoGuo
Copy link
Owner Author

Sounds good. Let's go ahead and support this simple range query first.

@ghost ghost assigned HouzuoGuo Aug 16, 2013
@HouzuoGuo
Copy link
Owner Author

Hey buddy.

The new query processor adepts the new range lookup feature together with totally re-designed syntax.

Please check out latest master branch and give API v2 a try by running tiedot with -mode=v2.

I have not yet completed new API document, but here's a glimpse:

  • Lookup {"eq": "the_value", "in": ["path_segment1", "segment2"]}
  • Value exists {"has": ["path_segment1, "segment2"]}
  • Get all docs "all"
  • Union [query1, query2, etc]
  • Intersect {"n": [query1, query2, etc]}
  • Complement {"c": [query1, query2, etc]}
  • Range lookup {"int-from": 1, "int-to": 12, "in": ["path_segment1", "path_segment2"]}

New syntax should be a lot more cleaner, and benchmark shows that new query processor is consistently 5% faster compare to the old one.

How does this look?

@ifq
Copy link

ifq commented Aug 18, 2013

yes, It's better than v1 syntax. I'll try it out.

does this range lookup implemented as you said before, or it's the real range query already?

@HouzuoGuo
Copy link
Owner Author

Yes.
"range lookup" uses hash table and only supports integers.

HouzuoGuo pushed a commit that referenced this issue Nov 7, 2013
Support regular expression query. It's based on collection scan
@HouzuoGuo HouzuoGuo mentioned this issue Jan 24, 2014
@HouzuoGuo
Copy link
Owner Author

Remember to add query result ordering options as well.

@kenkeiter
Copy link

@HouzuoGuo How would I go about implementing reverse result ordering? I.e. get me the last 30 items inserted.

Also, I have a better ID generation method for you :) I'll submit a PR in the near future.

@HouzuoGuo
Copy link
Owner Author

@kenkeiter Thank you very much, I look forward to it.

Result ordering has very limited support at the moment, and getting latest 30 docs cannot be easily done. We will introduce proper range index in the future, stay tuned.

omeid added a commit that referenced this issue Nov 8, 2014
@gibsonsyd
Copy link

Any news on new range query types? I think ordering ASC/DESC by time.Time or timestamp will be a useful feature. int range queries seem a little inefficient?

@HouzuoGuo
Copy link
Owner Author

tiedot uses hash function to partition data, making range query fairly difficult to implement. integer-range lookup should be quite sufficient for some common usage scenarios.
Till now I do not yet have a good idea about implementation of range query, sorry. It sure will be a nice thing to have.

@guileen
Copy link

guileen commented Dec 19, 2015

I think the most important thing is about id. Incremental id, and automatically order by id, id range query like {id: {gte: '1234567', lte: '2345678'}}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants