Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sorting results #4

Open
eroux opened this issue Jun 5, 2017 · 24 comments
Open

sorting results #4

eroux opened this issue Jun 5, 2017 · 24 comments
Milestone

Comments

@eroux
Copy link

eroux commented Jun 5, 2017

Results should be sorted (see google doc). A solution that seems reasonable is to use Intl.Collator with the bo locale, or maybe falling back to the dz locale for very old phones (the bo locale should work for 2 years or so).

As a side not there is a request to improve tibetan collation in CLDR with the work I did in this repo.

I'm running some experiments on http://eroux.fr/locales.html it doesn't seem to work, but I cannot understand why, i'll file some bugs, ask questions, etc. and update the issue. In the meantime I'm interested by the result of the page in your browser and on your phones!

@eroux
Copy link
Author

eroux commented Jun 5, 2017

@eroux
Copy link
Author

eroux commented Jun 5, 2017

@eroux
Copy link
Author

eroux commented Jun 5, 2017

@subsystem7
Copy link
Collaborator

Supported locales: dz, bo
Sorting Tibetan... error: ["ལྔ","ང","ཅ","རྔ","སྔ","བརྔ","བསྔ"] has been sorted as ["ང","ཅ","བརྔ","བསྔ","རྔ","ལྔ","སྔ"], should have been ["ང","རྔ","ལྔ","སྔ","བརྔ","བསྔ","ཅ"]

@subsystem7
Copy link
Collaborator

iPhone 7 is 10.3.3 beta 1

@eroux
Copy link
Author

eroux commented Jun 5, 2017

Ok, in addition to following these bureports (plus this Debian one), I'll find a workaround so that we can sort Tibetan (we'll need it in other contexts too anyway)

@subsystem7
Copy link
Collaborator

Why do we need to sort results for this app? Is this the equivalence of sort alpha? For large search results on weaker phones, this will slow performance. If the indexes were pre-sorted, it may be that the results would be sorted. Another way to do this is to assign a presort number across all indexes so that I can do a simple numeric sort at the database level. It would increase the index sizes slightly, but grant a nice level of performance and let the sort code be based on the server (a more controlled environment) rather than the client.

@eroux
Copy link
Author

eroux commented Jun 5, 2017

What does "sort alpha" mean? if it means alphabetical sorting then yes, and that's necessary...

We can imagine doing that on the server but that's going to take some time (Tibetan sorting is not obvious at all), so let's first give it a try on the client.

lasca is some sort of reimplementation of the UCA in JS, it has some rules for Tibetan so we don't even need to write them down...

I've forked it so that we can package it properly, but I admit I'm not completely sure how to do that properly, I could use a little help... what would be needed to use it in a simple way in the app?

@subsystem7
Copy link
Collaborator

To sort after the results have been returned from the database, the library would be best as a module that I can import. (ES6 syntax). If it is possible for you to get it working on a JavaScript page by including a script, I can take care of the rest. Packaging can be as easy as exporting the object/class etc that does the work. I have no experience with UCA or Iasca, so I am glad you do!

@eroux
Copy link
Author

eroux commented Jun 5, 2017

I've added a demo.html on the repo. I'll update the real_tibetan algorithm with the complete collation data, but the tests seem to run fine so far!

@eroux
Copy link
Author

eroux commented Jun 5, 2017

(done)

@eroux
Copy link
Author

eroux commented Jun 5, 2017

BTW, don't hesitate to modify the lasca.js file to make it export something nice and modern, I suspect it's doing something not very intelligent right now, but modules in JS is not really my coMfort zone... (looks like something always evolving in many different directions)

@subsystem7
Copy link
Collaborator

Great, thanks, Élie!

@subsystem7 subsystem7 self-assigned this Jun 5, 2017
@subsystem7
Copy link
Collaborator

There appears to be a problem with sort. Maybe you can help me, @eroux ?

Here is a list of 20 titles,

ཀུན་མཁྱེན་རིག་པ་འཛིན་པ་ཆེན་པོ་ཆོས་ཀྱི་གྲགས་པའི་གསུང་འབུམ
བཀའ་འགྱུར ༼སྡེ་དགེ་པར་ཕུད༽
གསུང་འབུམ། པདྨ་དཀར་པོ
རྒྱལ་དབང་ཀརྨ་པ་ཆེན་པོ་བཅོ་ལྔ་པའི་གསུང་འབུམ།
བཀའ་འགྱུར ༼སྟོག་ཕོ་བྲང་བྲིས་མ།)
རྒྱལ་དབང་མཁའ་ཀྱབ་རྡོ་རྗེའི་བཀའ་འབུམ།
གསུང་འབུམ། མཁའ་ཁྱབ་རྡོ་རྗེ། (དཔལ་སྤུངས་པར་མ།)
ཆོས་རྗེ་དཔལ་ལྡན་སྒམ་པོ་བ་ཆེན་པོའི་རྣམ་པར་ཐར་པ་ཡིད་བཞིན་གྱི་ནོར་བུ་རིན་པོ་ཆེ་ཀུན་ཁྱབ་སྙན་པའི་བ་དན་ཐར་པ་རིན་པོ་ཆེའི་རྒྱན་གྱི་མཆོག
འདུལ་ཊཱིཀ་རིན་ཆེན་འཕྲེང་བ
མཚོ་སྔོན་མི་རིགས་དཔེ་སྐྲུན་ཁང
འབྲུག་དཀར་པོ།
དེལྷི་ཀརྨཔཨེ་ཆོདྷེཡ་གྱལྭཨེ་སུངྲབ་པརྟུན་ཁང
ལེགས་པར་གསུངས་པའི་དམ་པའི་ཆོས་འདུལ་བ་མཐའ་དག་གི་སྙིང་པོའི་དོན་ལེགས་པར་བཤད་པ་རིན་པོ་ཆེའི་འཕྲེང་བ
འབྲུག་རྒྱལ་ཁབ་ཀྱི་ཆོས་སྲིད་གནས་སྟངས། འབྲུག་དཀར་པོ།
དམ་ཆོས་འདུལ་བ་མདོ་རྩ།
མཉམ་མེད་སྒམ་པོ་པའི་རྣམ་ཐར
དེལྷི
ཟི་ལིང
ལ་དྭགས་ཏོག་མུ་མཁར་བཞུགས་པའི་བོད་ཡིག་བཀའ་འགྱུར་བྲིས་མ།

As a list of objects:

        let unsorted = [{"nodeId":"W22082-W22082","title":"ཀུན་མཁྱེན་རིག་པ་འཛིན་པ་ཆེན་པོ་ཆོས་ཀྱི་གྲགས་པའི་གསུང་འབུམ","type":"Work"},{"nodeId":"W22084","title":"བཀའ་འགྱུར ༼སྡེ་དགེ་པར་ཕུད༽","type":"Work"},{"nodeId":"W22080","title":"གསུང་འབུམ། པདྨ་དཀར་པོ","type":"Work"},{"nodeId":"W22081-W22081","title":"རྒྱལ་དབང་ཀརྨ་པ་ཆེན་པོ་བཅོ་ལྔ་པའི་གསུང་འབུམ།","type":"Work"},{"nodeId":"W22081-W22081","title":"རྒྱལ་དབང་ཀརྨ་པ་ཆེན་པོ་བཅོ་ལྔ་པའི་གསུང་འབུམ།","type":"Work"},{"nodeId":"W22083","title":"བཀའ་འགྱུར ༼སྟོག་ཕོ་བྲང་བྲིས་མ།)","type":"Work"},{"nodeId":"W22081-W22081","title":"རྒྱལ་དབང་མཁའ་ཀྱབ་རྡོ་རྗེའི་བཀའ་འབུམ།","type":"Work"},{"nodeId":"W22081","title":"གསུང་འབུམ། མཁའ་ཁྱབ་རྡོ་རྗེ། (དཔལ་སྤུངས་པར་མ།)","type":"Work"},{"nodeId":"W22088-W22088","title":"ཆོས་རྗེ་དཔལ་ལྡན་སྒམ་པོ་བ་ཆེན་པོའི་རྣམ་པར་ཐར་པ་ཡིད་བཞིན་གྱི་ནོར་བུ་རིན་པོ་ཆེ་ཀུན་ཁྱབ་སྙན་པའི་བ་དན་ཐར་པ་རིན་པོ་ཆེའི་རྒྱན་གྱི་མཆོག","type":"Work"},{"nodeId":"W22086","title":"འདུལ་ཊཱིཀ་རིན་ཆེན་འཕྲེང་བ","type":"Work"},{"nodeId":"W22088-W22088","title":"མཚོ་སྔོན་མི་རིགས་དཔེ་སྐྲུན་ཁང","type":"Work"},{"nodeId":"W22085","title":"འབྲུག་དཀར་པོ།","type":"Work"},{"nodeId":"W22084-W22084","title":"དེལྷི་ཀརྨཔཨེ་ཆོདྷེཡ་གྱལྭཨེ་སུངྲབ་པརྟུན་ཁང","type":"Work"},{"nodeId":"W22086-W22086","title":"ལེགས་པར་གསུངས་པའི་དམ་པའི་ཆོས་འདུལ་བ་མཐའ་དག་གི་སྙིང་པོའི་དོན་ལེགས་པར་བཤད་པ་རིན་པོ་ཆེའི་འཕྲེང་བ","type":"Work"},{"nodeId":"W22085-W22085","title":"འབྲུག་རྒྱལ་ཁབ་ཀྱི་ཆོས་སྲིད་གནས་སྟངས། འབྲུག་དཀར་པོ།","type":"Work"},{"nodeId":"W22087","title":"དམ་ཆོས་འདུལ་བ་མདོ་རྩ།","type":"Work"},{"nodeId":"W22088","title":"མཉམ་མེད་སྒམ་པོ་པའི་རྣམ་ཐར","type":"Work"},{"nodeId":"W22084-W22084","title":"དེལྷི","type":"Work"},{"nodeId":"W22088-W22088","title":"ཟི་ལིང","type":"Work"},{"nodeId":"W22083-W22083","title":"ལ་དྭགས་ཏོག་མུ་མཁར་བཞུགས་པའི་བོད་ཡིག་བཀའ་འགྱུར་བྲིས་མ།","type":"Work"}];
        let sorted = lasca.sort(unsorted, 'title');

This throws an exception.

@subsystem7
Copy link
Collaborator

and more basically:

        let unsortedStrings = ["ཀུན་མཁྱེན་རིག་པ་འཛིན་པ་ཆེན་པོ་ཆོས་ཀྱི་གྲགས་པའི་གསུང་འབུམ",
        "བཀའ་འགྱུར ༼སྡེ་དགེ་པར་ཕུད༽",
        "གསུང་འབུམ། པདྨ་དཀར་པོ",
        "རྒྱལ་དབང་ཀརྨ་པ་ཆེན་པོ་བཅོ་ལྔ་པའི་གསུང་འབུམ།",
        "བཀའ་འགྱུར ༼སྟོག་ཕོ་བྲང་བྲིས་མ།)",
        "རྒྱལ་དབང་མཁའ་ཀྱབ་རྡོ་རྗེའི་བཀའ་འབུམ།",
        "གསུང་འབུམ། མཁའ་ཁྱབ་རྡོ་རྗེ། (དཔལ་སྤུངས་པར་མ།)",
        "ཆོས་རྗེ་དཔལ་ལྡན་སྒམ་པོ་བ་ཆེན་པོའི་རྣམ་པར་ཐར་པ་ཡིད་བཞིན་གྱི་ནོར་བུ་རིན་པོ་ཆེ་ཀུན་ཁྱབ་སྙན་པའི་བ་དན་ཐར་པ་རིན་པོ་ཆེའི་རྒྱན་གྱི་མཆོག",
        "འདུལ་ཊཱིཀ་རིན་ཆེན་འཕྲེང་བ",
        "མཚོ་སྔོན་མི་རིགས་དཔེ་སྐྲུན་ཁང",
        "འབྲུག་དཀར་པོ།",
        "དེལྷི་ཀརྨཔཨེ་ཆོདྷེཡ་གྱལྭཨེ་སུངྲབ་པརྟུན་ཁང",
        "ལེགས་པར་གསུངས་པའི་དམ་པའི་ཆོས་འདུལ་བ་མཐའ་དག་གི་སྙིང་པོའི་དོན་ལེགས་པར་བཤད་པ་རིན་པོ་ཆེའི་འཕྲེང་བ",
        "འབྲུག་རྒྱལ་ཁབ་ཀྱི་ཆོས་སྲིད་གནས་སྟངས། འབྲུག་དཀར་པོ།",
        "དམ་ཆོས་འདུལ་བ་མདོ་རྩ།",
        "མཉམ་མེད་སྒམ་པོ་པའི་རྣམ་ཐར",
        "དེལྷི",
        "ཟི་ལིང",
        "ལ་དྭགས་ཏོག་མུ་མཁར་བཞུགས་པའི་བོད་ཡིག་བཀའ་འགྱུར་བྲིས་མ།"];

        unsortedStrings.sort(lasca.compare);

@eroux
Copy link
Author

eroux commented Jun 21, 2017

should be ok with latest lasca commit

@subsystem7 subsystem7 modified the milestone: Release 1 Jun 21, 2017
@subsystem7
Copy link
Collaborator

It appears that the search now works!! But.. it is not fast enough, even on a laptop.

Search for 'W2' 5869 results, 15 seconds to sort on a MacBookPro 2.9 Ghz Intel Core i7 in chrome
Search for 'W22' 623 results, 1.5 seconds to sort
Search for 'W220' 237 results, 1 seconds to sort
Search for 'W2208' 20 results, instant sort

It is my belief that we need to think of a different method for sorting. Perhaps before the indices are provided to the app. We could assign an integer value that is valid across all types of index. This way, I could order results during the initial query.

An alternative option is to pre-sort in the app and assign this sort integer. Either way, this has now been relegated to the bottom of the Release 1 list.

@eroux
Copy link
Author

eroux commented Jun 21, 2017

Wow, these numbers are clearly too big. I've spotted a few easy optimizations that can be made in lasca, I'll take some time tomorrow to implement them...

@eroux
Copy link
Author

eroux commented Jun 22, 2017

I've made some optimizations, I'm trying to test them, i'll open a separate issue for that

@eroux eroux mentioned this issue Jun 22, 2017
@subsystem7
Copy link
Collaborator

Tried again with the latest lasca.js optimisations. Unfortunately, the exception reared its ugly head with the largest dataset. I did get numbers for the smaller sets, and they show no change in execution time. I am pushing sort to Release 2.

@subsystem7 subsystem7 modified the milestones: Release 2, Release 1 Jun 22, 2017
@eroux
Copy link
Author

eroux commented Jun 22, 2017

hmm ok, I'll test it further tomorrow with a larger dataset (I just realized I could use the .json files from the repo, I'll do that)

@eroux
Copy link
Author

eroux commented Jun 22, 2017

(if you have a simple way to test it with the whole data simply, I'd be happy to fix the thing)

@subsystem7
Copy link
Collaborator

When I test, I use phonegap serve -- I also work on a reduced dataset, where I include all of the index files, but only a few selected detail json files.

@eroux
Copy link
Author

eroux commented Jun 25, 2017

I got annoyed by lasca so I wrote a small lib that performs much better: tibetan-sort-js. It is packaged in a quite modern way so it should be easy to use... Example

tibetansort = require('tibetan-sort-js');

var big = require('path/to/workIndex-0.json');

var bigA = Object.keys(big);

var before = new Date();

bigA.sort(tibetansort.compare);

var after = new Date();

console.log("sorted "+bigA.length+" strings in "+(after.getTime() - before.getTime())+"ms");

output:

sorted 10001 strings in 418ms

which is reasonable I think... almost 100 times faster than lasca.

@subsystem7 subsystem7 removed their assignment May 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants