Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implements Geosearch #505

Merged
merged 29 commits into from
Oct 12, 2023
Merged

Implements Geosearch #505

merged 29 commits into from
Oct 12, 2023

Conversation

micheleriva
Copy link
Member

@micheleriva micheleriva commented Oct 1, 2023

This PR aims to implement geolocation-based search (as requested in #212).

The initial design for the APIs is the following:

import { search, create, insert } from '@orama/orama'

const db = await create({
  name: 'string',
  position: 'geopoint'
})

await insert(db, {...})
await insert(db, {...})
await insert(db, {...})

await search(db, {
  term: 'duomo',
  where: {
    position: { // the name of the property
      radius: {
        coordinates: { lon: 11.012345, lat: 42.012345 },
        unit: 'km',
        value: 100,
        inside: false // optional, default is true
      }
    }
  }
})

await search(db, {
  term: 'duomo',
  where: {
    position: { // the name of the property
      polygon: {
        coordinates: [ { lon: 11.019283, lat: 42.019283 }, { lon: 11.0192839, lat: 42.019283 }, ...],
        inside: false // optional, default is true
      }
    }
  }
})

@vercel
Copy link

vercel bot commented Oct 1, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
orama-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Oct 12, 2023 6:10pm

@micheleriva micheleriva changed the title adds basic bkd tree Implements Geosearch Oct 2, 2023
packages/orama/src/trees/bkd.ts Outdated Show resolved Hide resolved
packages/orama/src/trees/bkd.ts Outdated Show resolved Hide resolved
packages/orama/src/trees/bkd.ts Outdated Show resolved Hide resolved
packages/orama/src/trees/bkd.ts Outdated Show resolved Hide resolved
packages/orama/src/trees/bkd.ts Outdated Show resolved Hide resolved
packages/orama/src/trees/bkd.ts Outdated Show resolved Hide resolved
@micheleriva
Copy link
Member Author

@H4ad thanks a lot for the early review, I'm still writing/refactoring most of the code, so it's expected to see things that are not super optimized yet 🙏

I'll make sure to ask you for a review as soon as the PR is ready :)

@H4ad
Copy link
Contributor

H4ad commented Oct 2, 2023

Just a little comment about a micro-optimization, probably using a tuple will use less memory than having an object to store lat and lon

@micheleriva
Copy link
Member Author

Just a little comment about a micro-optimization, probably using a tuple will use less memory than having an object to store lat and lon

Yes I'm using an object during development as the property names helps me read the code easily!

@allevo
Copy link
Collaborator

allevo commented Oct 2, 2023

My idea for DX:

import { create, insert } from '@orama/orama'

const db = await create({
  schema: {
    name: 'string',
    city: 'string',
    center: 'geopoint[2]' // 2 dim points
  },
})
// https://datatracker.ietf.org/doc/html/rfc7946#section-3.1.2
await insert(db, {
  name: 'Duomo',
  city: 'Milan',
  // https://datatracker.ietf.org/doc/html/rfc7946#section-3.1.1
  center: [9.191383, 45.464211]
})

const resultsByRadius = await search(db, {
  term: 'Duomo',
  where {
    center: {
      byRadius: {
        center: [11.012345, 42.012345],
        radius: {
          unit: 'km',
          value: 100
        },
        keep: 'inner', // or 'outer'
      }
  }
})

const resultsByPolygon = await search(db, {
  term: 'Duomo',
  where {
    center: {
      byPolygon: {
        coordinates: [ [11.019283, 42.019283], [11.0192839, 42.019283 ], ...],
        keep: 'inner', // or 'outer'
      }
  }
})

I prefer to use:

  • geopoint[K] to indicate which dimension we are working in
  • think to use GeoJson over other formats.
  • use an object to represent radius to avoid string parse inside Orama
  • where where possible: we want to filter the results by other values as we did for numbers / booleans / enums etc...
  • avoid inclusive that seems like > vs >=

@micheleriva
Copy link
Member Author

@allevo I reviewed your comment, I'd propose a compromise between your proposal and mine.

  1. Every orama instance should have only one geo property. We can always increase this limit later.
  2. You can use whatever name you want for your geo property.
  3. We can use a tuple for storing longitude and latitude (as in RFC7946), even though objects give a better developer experience IMHO.
  4. We don't need to specify the number of geopoints, right now we only support 2, and we default to that. Eventually, we will be able to introduce a type parameter for that.
  5. Following the previous point, we might also want to support geo[point], geo[point:3], and geo[multiPoint] in the future. Maybe in a following major release.

The API might look like this:

import { search, create, insert } from '@orama/orama'

const db = await create({
  name: 'string',
  coordinates: 'geopoint'
})

await insert(db, {...})
await insert(db, {...})
await insert(db, {...})

await search(db, {
  term: 'duomo',
  where: {
    geo: {
      radius: {
        coordinates: [11.012345, 42.012345],
        unit: 'km', // defaults to 'm'
        value: 100,
        inside: false // optional, default is true
      }
    }
  }
})

await search(db, {
  term: 'duomo',
  where: {
    geo: {
      polygon: {
        coordinates: [ [11.019283, 42.019283], [11.0192839, 42.019283 ], ...],
        unit: 'km', // defaults to 'm'
        value: 100,
        inside: false // optional, default is true
      }
    }
  }
})

That way we could reuse most of the code for polygon and radius search. What do you think?

@allevo
Copy link
Collaborator

allevo commented Oct 3, 2023

Another challenge.

  • Return to the object annotation for coordinates: few people know it, and I would like to reduce the issue of coordinates ordering (the first is the longitude, the second is latitude, anti-intuitive IMHO). Also, typescript will support the DX, suggesting the correct type. Also, other RFCs (https://www.rfc-editor.org/rfc/rfc5870.html#section-6) use the inverse representation as a "list of numbers".
  • We can implement geopoint[2] not for the generic geopoint[K] but just with that dimension:
    • we are explicitly saying we support only 2 dimension
    • we can implement the generic later.
  • I don't like the geo[point] annotation because we can internally use different structures to index different geo types.
  • I don't know if the unit should have a default. Why m is better than km? We should consider that a sort radius of the search in the geospatial world is meaningless: you can approximate it as Euclidean space. I prefer to keep the mandatory values:
    • coordinates
    • unit
    • value

So something like

import { search, create, insert } from '@orama/orama'

const db = await create({
  name: 'string',
  position: 'geopoint[2]'
})

await insert(db, {...})
await insert(db, {...})
await insert(db, {...})

await search(db, {
  term: 'duomo',
  where: {
    position: { // the name of the property
      radius: {
        coordinates: { lon: 11.012345, lat: 42.012345 },
        unit: 'km',
        value: 100,
        inside: false // optional, default is true
      }
    }
  }
})

await search(db, {
  term: 'duomo',
  where: {
    position: { // the name of the property
      polygon: {
        coordinates: [ { lon: 11.019283, lat: 42.019283 }, { lon: 11.0192839, lat: 42.019283 }, ...],
        unit: 'km',
        value: 100,
        inside: false // optional, default is true
      }
    }
  }
})

@micheleriva
Copy link
Member Author

@allevo looks almost good to me. I just don't like the geopoint[2] type, I'd really prefer just using geopoint right now. We will eventually support 3D worlds but will see then how to deal with them at a type level. Right now I would only focus on 2D dimensions.

I agree with coordinates ordering, but internally, we will always store them as tuples to save some space (especially when serializing the db).

I think I'll start implementing the last API design you shared, then it will be easy to find a compromise on the geopoint type anyway 🙂

@micheleriva micheleriva marked this pull request as ready for review October 12, 2023 09:42
@micheleriva micheleriva requested review from H4ad and allevo and removed request for H4ad October 12, 2023 09:44
@micheleriva
Copy link
Member Author

I now only need to write the geopoint deletion function, but that's overall good for review. @H4ad , @allevo

packages/orama/src/components/defaults.ts Outdated Show resolved Hide resolved
packages/orama/src/trees/bkd.ts Show resolved Hide resolved
packages/orama/src/utils.ts Outdated Show resolved Hide resolved
packages/orama/src/trees/bkd.ts Show resolved Hide resolved
packages/orama/src/utils.ts Outdated Show resolved Hide resolved
packages/orama/tests/search.geo.test.ts Outdated Show resolved Hide resolved
packages/orama/src/trees/bkd.ts Show resolved Hide resolved
packages/orama/src/trees/bkd.ts Outdated Show resolved Hide resolved
reqOperation = 'polygon'
} else {
throw new Error(`Invalid operation ${operation}`)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the search for both operations is not yet supported. We should throw an error.

radius: {
coordinates: Point,
value: number,
unit?: GeosearchDistanceUnit,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer not to use a default here.

Co-authored-by: Vinicius Lourenço <12551007+H4ad@users.noreply.github.com>
@micheleriva micheleriva merged commit 3bf3258 into main Oct 12, 2023
3 checks passed
@micheleriva micheleriva deleted the feat/geolocation branch October 12, 2023 18:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants