Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Record Estimate in d/l pop-up is broken #349

Closed
dbloom opened this issue Sep 6, 2013 · 5 comments
Closed

Record Estimate in d/l pop-up is broken #349

dbloom opened this issue Sep 6, 2013 · 5 comments

Comments

@dbloom
Copy link
Member

dbloom commented Sep 6, 2013

Aaron,

While checking on some search issues presented by the folks at UTEP, and then double checking things with Laura, I have discovered that the estimated number of records that VertNet says will be returned via a download is very inaccurate.

I've tested three Institutions in Chrome and FF - the results are the same in both cases. I tested UTEP, PMNS, and GSU - when searching for these I tested both full-text searches and searched via the "Search this Publisher" button via the publishers list. The results were the same (although the downloaded files varied in size by about 1000k records depending upon whether I used the full-text or publisher search, which was expected).

For UTEP (Centennial Museum), VN estimated that I would be downloading ~217,743 records - this could be correct when using full-text search, but the file I d/l only include ~54k records. When using the 'search this publisher' option I get the same VN estimate, but I should only get 54,300 - the number of records in VN via the UTEP IPT. The file d/l does in fact give me a data set with 54,300 records.
utep_dlestimate

For PMNS (Perot), VN estimates that I will be d/l'ing only 63 records using both full-text and publisher searches. There is no way this can be correct since they have 7509 records in the IPT. The resulting d/l files from both search types provide only 63 records which suggests that there is some else very wrong with the PMNS data set and/or the harvest of the PMNS IPT. Laura is documenting this one further in a separate issue - the two issues may be related.
pmns_dlestimate

For GSU, VN estimates that my download will include ~12,957 records for both full-text and publisher searches. for both I should have a minimum of 24,851 records per the GSU IPT. When I d/l the publisher search I get a records set with 24,851 records.
gsu_dlestimate

Checking some other publishers at random, using the Search this Publisher button:

DMNS (Denver), Publisher search estimates ~71 records - should be at least 56k
KU, Publisher search estimates ~ 483,093 - should be at least 686K
NMMNH, publisher search estimates ~14,092, should be no more than 6211

I know that the explanation for estimated record counts, particularly in the "1-20 of thousands" type is provided in way that it is to improve performance and that there is going to be some play in those numbers and I totally understand that cost/benefit associated. When a user is ready to d/l a record set, however, I think VN needs to be much more accurate in its estimate - even if it takes a few seconds more for the pop-up window to appear. My concern is that if users see that they are going to get ~217k records back from by UTEP search and they only get back 54k in the download, or they expect 450k from KU and get back 686k in the download, users are likely to think one or more of the following: (1) that their computer/Excel/text reader/browser is not functioning properly, (2) that there is an error in the query that they can't fix, and/or (3) VertNet and the data returned via search is not reliable, so they'd better go to GBIF to get the data.

@eightysteele
Copy link
Member

Nice issue! Looking into this. Also looking into the download link issue
you submitted via email.

On Fri, Sep 6, 2013 at 9:44 AM, dbloom notifications@github.com wrote:

Aaron,

While checking on some search issues presented by the folks at UTEP, and
then double checking things with Laura, I have discovered that the
estimated number of records that VertNet says will be returned via a
download is very inaccurate.

I've tested three Institutions in Chrome and FF - the results are the same
in both cases. I tested UTEP, PMNS, and GSU - when searching for these I
tested both full-text searches and searched via the "Search this Publisher"
button via the publishers list. The results were the same (although the
downloaded files varied in size by about 1000k records depending upon
whether I used the full-text or publisher search, which was expected).

For UTEP (Centennial Museum), VN estimated that I would be downloading
~217,743 records - this could be correct when using full-text search, but
the file I d/l only include ~54k records. When using the 'search this
publisher' option I get the same VN estimate, but I should only get 54,300

For PMNS (Perot), VN estimates that I will be d/l'ing only 63 records
using both full-text and publisher searches. There is no way this can be
correct since they have 7509 records in the IPT. The resulting d/l files
from both search types provide only 63 records which suggests that there is
some else very wrong with the PMNS data set and/or the harvest of the PMNS
IPT. Laura is documenting this one further in a separate issue - the two
issues may be related.
[image: pmns_dlestimate]https://f.cloud.github.com/assets/942447/1097598/5b493c7a-1712-11e3-9841-6bd740210b8f.jpg

For GSU, VN estimates that my download will include ~12,957 records for
both full-text and publisher searches. for both I should have a minimum of
24,851 records per the GSU IPT. When I d/l the publisher search I get a
records set with 24,851 records.
[image: gsu_dlestimate]https://f.cloud.github.com/assets/942447/1097597/5b47e848-1712-11e3-9d3b-80ec6d4788b1.jpg

Checking some other publishers at random, using the Search this Publisher
button:

DMNS (Denver), Publisher search estimates ~71 records - should be at least
56k
KU, Publisher search estimates ~ 483,093 - should be at least 686K
NMMNH, publisher search estimates ~14,092, should be no more than 6211

I know that the explanation for estimated record counts, particularly in
the "1-20 of thousands" type is provided in way that it is to improve
performance and that there is going to be some play in those numbers and I
totally understand that cost/benefit associated. When a user is ready to
d/l a record set, however, I think VN needs to be much more accurate in its
estimate - even if it takes a few seconds more for the pop-up window to
appear. My concern is that if users see that they are going to get ~217k
records back from by UTEP search and they only get back 54k in the
download, or they expect 450k from KU and get back 686k in the download,
users are likely to think one or more of the following: (1) that their
computer/Excel/text reader/browser is not functioning properly, (2) that
there is an error in the query that they can't fix, and/or (3) VertNet and
the data returned via search is not reliable, so they'd better go to GBIF
to get the data.


Reply to this email directly or view it on GitHubhttps://github.com//issues/349
.

@laurarussell
Copy link
Member

actually, going to just add my screen casts to this issue so you can see first hand. My screen casts do document two other issues though so I'll add those as separate issues.

https://www.dropbox.com/s/jfn978az9rxn5sk/DownloadAndCountsNotCorrectPerot.mov

https://www.dropbox.com/s/05vcz5nzl2bjbtr/DownloadRecordCountsIssueSchmidtPart1.mov

https://www.dropbox.com/s/a3v3xjif1v6k3k3/DownloadRecordCountsIssueSchmidtPart2.mov

@eightysteele
Copy link
Member

Great! I'm able to reproduce the issue, but this is definitely great to
have.

On Fri, Sep 6, 2013 at 10:01 AM, laurarussell notifications@github.comwrote:

actually, going to just add my screen casts to this issue so you can see
first hand. My screen casts do document two other issues though so I'll add
those as separate issues.

https://www.dropbox.com/s/jfn978az9rxn5sk/DownloadAndCountsNotCorrectPerot.mov

https://www.dropbox.com/s/05vcz5nzl2bjbtr/DownloadRecordCountsIssueSchmidtPart1.mov

https://www.dropbox.com/s/a3v3xjif1v6k3k3/DownloadRecordCountsIssueSchmidtPart2.mov


Reply to this email directly or view it on GitHubhttps://github.com//issues/349#issuecomment-23954614
.

@dbloom
Copy link
Member Author

dbloom commented Sep 6, 2013

I'm off and on today, but let me know if you need anything more on this.
On Sep 6, 2013 10:04 AM, "Aaron Steele" notifications@github.com wrote:

Great! I'm able to reproduce the issue, but this is definitely great to
have.

On Fri, Sep 6, 2013 at 10:01 AM, laurarussell notifications@github.comwrote:

actually, going to just add my screen casts to this issue so you can see
first hand. My screen casts do document two other issues though so I'll
add
those as separate issues.

https://www.dropbox.com/s/jfn978az9rxn5sk/DownloadAndCountsNotCorrectPerot.mov

https://www.dropbox.com/s/05vcz5nzl2bjbtr/DownloadRecordCountsIssueSchmidtPart1.mov

https://www.dropbox.com/s/a3v3xjif1v6k3k3/DownloadRecordCountsIssueSchmidtPart2.mov


Reply to this email directly or view it on GitHub<
https://github.com/VertNet/webapp/issues/349#issuecomment-23954614>
.


Reply to this email directly or view it on GitHubhttps://github.com//issues/349#issuecomment-23954766
.

@tucotuco
Copy link
Member

The original issue as described is solved. Popup always show correct counts matching downloads for <10k records and shows ">10k" for others. Large file downloads is a separate issue #376.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants