Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use new fields in lucene index to calculate most relevant locations #2

Closed
smadha opened this issue Nov 1, 2015 · 3 comments
Closed
Assignees
Milestone

Comments

@smadha
Copy link
Contributor

smadha commented Nov 1, 2015

One Possible implementation:

  1. Define a sorting criteria for "feature class" and sort on it. This could be a possible order A P S T L H R V U
  2. For each "feature class" define sorting criteria of "feature code" and sort on it.
  3. Now sort it by population within same "feature class" and "feature code"
  4. We need to modify our edit distance calculations because "China" is stored with "People's Republic of China" and so on. I propose below based on my observation on "alternatenames" field.
    • "alternatenames" contains a CSV of all pronunciations and synonyms for that location.
    • We split "alternatenames" and calculate edit distance with each of the variable on our sorted list
      • if it's a exact match add this location to result list
      • if not store all edit distances and assign a weight to all "feature code" to prioritize bigger land masses and add those in the end of results.

@chrismattmann
Can we take one extra parameter to define number of results that should be returned by lucene-geo-gazetteer?

@chrismattmann
Copy link
Owner

+1 @smadha

@chrismattmann
Copy link
Owner

note that we should then expose this in Tika so will need an associated JIRA there.

@chrismattmann chrismattmann added this to the Release-0.2 milestone Nov 4, 2015
@chrismattmann chrismattmann self-assigned this Nov 4, 2015
chrismattmann added a commit that referenced this issue Nov 9, 2015
#2 Using new fields in lucene index to return more known locations
@chrismattmann
Copy link
Owner

done in #5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants