Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Admin UI: Publisher Search #417

Open
robredpath opened this issue Dec 1, 2023 · 47 comments
Open

Admin UI: Publisher Search #417

robredpath opened this issue Dec 1, 2023 · 47 comments

Comments

@robredpath
Copy link

robredpath commented Dec 1, 2023

As a member of IATI Support,
I want to find publishers using the information I have available†
so that I can quickly discover the Registry situation for someone that I'm helping.

† Organisation name (e.g. "Open Data Services"), publishing status, presence of errors in data, org-id, country

In conversation with @cormachallinanderilinx we refined this to:

  • A search box that can accept a name, org-id, or country, with the option of leaving it empty to see all results
  • Checkboxes for "is approved", "has published", "has errors in data" per @siwhitehouse below this UI could be confusing and so we'd prefer to find a way of making a single control for this.

Acceptance criteria

  • Publisher search should have the search box and checkboxes described above
  • The free-text search should support some way of specifying both exact-string and fuzzy and/or wildcard matching
  • Results should be presented in a table†† that is sortable††† by any column
  • The results table should include all fields that are searchable as well as those currently included in the publisher list

††† I don't think that we should have both the table headings and the Order By: dropdown for sorting results. We should choose one. My preference would be for the table headings to sort the whole list.

  1. Non-functional Criteria (Include availability, maintainability, performance, reliability, scalability, security, and usability criteria)

This search interface can be available to all users, apart from the ability to see unapproved publishers which should be restricted to logged-in Sysadmin users only.

EDIT: Update 2024-01-04 in line with discussions below

@siwhitehouse
Copy link
Contributor

Thanks for raising this issue @robredpath. The main motivation for having this functionality is so that we can search across the whole set of publishers in one place, something that we don't have at the moment.

aiui the use of checkboxes as suggested makes it possible for a user to check "has published" without checking "is approved". I don't think a UI should allow this and it suggests to me that checkboxes are unsuitable here. Radio buttons with "all", "registered (unapproved)", "approved (unpublished)" and "published" or similar might be better.

In the publishers list,as it is currently implemented, the number of published datasets is shown. I think this is useful and would like to see it retained here, please.

What is the purpose of "presence of errors in data"? Is this a Boolean, numerical or some other type of field?

What is the logic for excluding unapproved publishers from non-Sysadmin users, please? At the moment sometimes people try to register the same organisation more than once. Not allowing people to check if their organisation is in the registration process is likely to increase the number of duplicate registrations that we see.

@robredpath
Copy link
Author

Radio buttons with "all", "registered (unapproved)", "approved (unpublished)" and "published" or similar might be better.

I don't have strong feelings on this. I agree that the UI allowing the user to select (hopefully!) impossible combinations isn't ideal, but having a single control that selects based on a combination of fields doesn't feel great either. We could make some JavaScript to auto-select "is approved" if they select "is published"?

In the publishers list,as it is currently implemented, the number of published datasets is shown. I think this is useful and would like to see it retained here, please.

Agreed. Implicit - but should be explicit - is that there's no loss of display or other functionality as a result of this change.

What is the purpose of "presence of errors in data"? Is this a Boolean, numerical or some other type of field?

As I understand it, this is to make it easier to locate the correct publisher in a list where other search terms might lead to lots of results, and it stops someone having to click through a long list of publishers one-by-one to see if they have errors in their data. I would expect the control to be a Boolean, and the results to either be Boolean or numeric, depending on implementation considerations

What is the logic for excluding unapproved publishers from non-Sysadmin users, please?

This is a security consideration: if someone creates a publisher for some non-IATI-related purpose (such as to advertise their gambling website, or some illegal pursuit) then we don't want any content that they create, even whatever they entered into the publisher name field, to be displayed on the website until it's been reviewed.

Not allowing people to check if their organisation is in the registration process is likely to increase the number of duplicate registrations that we see.

Perhaps a mitigation to this might be to indicate if the search term appears in unapproved publishers, without actually listing them? Some carefully-crafted help text would be required to explain what was going on, however.

@siwhitehouse
Copy link
Contributor

Radio buttons with "all", "registered (unapproved)", "approved (unpublished)" and "published" or similar might be better.

I don't have strong feelings on this. I agree that the UI allowing the user to select (hopefully!) impossible combinations isn't ideal, but having a single control that selects based on a combination of fields doesn't feel great either. We could make some JavaScript to auto-select "is approved" if they select "is published"?

I think what I am suggesting is a single control that allows someone to either search across all categories of publisher, or just one category. Let's move on and see what @cormachallinanderilinx suggests when implementing this.

snip

What is the purpose of "presence of errors in data"? Is this a Boolean, numerical or some other type of field?

As I understand it, this is to make it easier to locate the correct publisher in a list where other search terms might lead to lots of results, and it stops someone having to click through a long list of publishers one-by-one to see if they have errors in their data. I would expect the control to be a Boolean, and the results to either be Boolean or numeric, depending on implementation considerations

I'm still unclear about the use case for a member of IATI support to be using this. Should we be considering how this particular filter interacts, or doesn't, with http://dashboard.iatistandard.org/data_quality.html ?

What is the logic for excluding unapproved publishers from non-Sysadmin users, please?

This is a security consideration: if someone creates a publisher for some non-IATI-related purpose (such as to advertise their gambling website, or some illegal pursuit) then we don't want any content that they create, even whatever they entered into the publisher name field, to be displayed on the website until it's been reviewed.

Thanks. That makes sense.

Not allowing people to check if their organisation is in the registration process is likely to increase the number of duplicate registrations that we see.

Perhaps a mitigation to this might be to indicate if the search term appears in unapproved publishers, without actually listing them? Some carefully-crafted help text would be required to explain what was going on, however.

It feels tricky and something that will be difficult to implement. Let's see what options @cormachallinanderilinx can offer us.

@siwhitehouse
Copy link
Contributor

siwhitehouse commented Dec 20, 2023

#310 paraphrased here:

Add a Date Created column to this list. This will help us determine which are the newly created publishers that are waiting for approval.

Can this be added to the acceptance criteria, please?

@robredpath
Copy link
Author

Thanks @siwhitehouse. I've updated the initial comment in line with our conversation here.

Two unresolved points, though:

What is the purpose of "presence of errors in data"? Is this a Boolean, numerical or some other type of field?

As I understand it, this is to make it easier to locate the correct publisher in a list where other search terms might lead to lots of results

I'm still unclear about the use case for a member of IATI support to be using this. Should we be considering how this particular filter interacts, or doesn't, with http://dashboard.iatistandard.org/data_quality.html ?

I'm not sure where this requirement came from. Maybe @IsabelBirds might know? I have no attachment to it, it's just in the Miro board so it's made its way here!

Not allowing people to check if their organisation is in the registration process is likely to increase the number of duplicate registrations that we see.

To clarify: would this make things worse, or is this the current situation (and so this just doesn't make things better)?

@IsabelBirds
Copy link

The error field was an idea to reduce the amount of digging we have to do to offer support.
Eg a an error count per activity from the validator.

Then if I'm already engaged with an org and can easily notice that they have errors, I can bring this up and offer support. This is likely to increase uptake and changes to data quality compared to contacting orgs out of the blue.

@cormachallinanderilinx
Copy link
Collaborator

For the is approved and has published and has errors.
After spending some time on this I'm not sure this the correct place, the publisher search page only returns approved publishers so their datasets can be viewed.
I think its probably a good idea to keep it like this.

This view is available for viewing pending publishers: https://www.iatiregistry.org/dashboard/mypublishers-pending
I add another tab (or similar) to show all that haven't published and possibly the publishers with errors (although this may not be as straight forward)
Then is is kept to the dashboard which is only available to admins.
there isn't a button to access the dashboard on the registry, so it probably makes sense to link that to the UI.

Just use this this ticket to improve the searching (fuzzy logic) and fix the sorting/?

@robredpath
Copy link
Author

I think we can return to the user story to help us here:

As a member of IATI Support,
I want to find publishers using the information I have available†
so that I can quickly discover the Registry situation for someone that I'm helping.

The end state that we're trying to get to here is a situation where, when someone contacts IATI Support, we can quickly understand which Registry publisher(s) correspond to the person and/or organisation who has contacted us, and what the current state of them is.

Ideally, I think that would be part of the existing publisher search, because then there's just one place that you go to look for information about publishers. However, I think we're open to it being a separate admin tool if that's more straightforward in terms of implementation and security.

If the information is split across multiple tabs or multiple searches it becomes harder to use: at best you need to do the search multiple times, and it becomes very easy for people to either not know about or forget to use the other tabs.

Is that feasible: a tab in the dashboard which supports all the functionality that we've discussed here in a single view that's admin-only?

@cormachallinanderilinx
Copy link
Collaborator

yes, that sounds good to me

@robredpath
Copy link
Author

Cool - I want to hear from @siwhitehouse before we proceed, though!

@cormachallinanderilinx
Copy link
Collaborator

Estimate 3 days

@siwhitehouse
Copy link
Contributor

The error field was an idea to reduce the amount of digging we have to do to offer support. Eg a an error count per activity from the validator.

Then if I'm already engaged with an org and can easily notice that they have errors, I can bring this up and offer support. This is likely to increase uptake and changes to data quality compared to contacting orgs out of the blue.

I'm not clear still, my apologies. How could we show a per-activity error count when the search is at publisher level?

@siwhitehouse
Copy link
Contributor

@robredpath I don't think we should have a single control for "is approved", "has published", "has errors in data".

I think we want to be able to filter by the three statuses that a publisher might be in: "registered (unapproved)", "approved (unpublished)" and "published". By default, a search should show all statuses. Either a single control, or a set of controls, should let us filter by status.

Separately, we want to be able to filter by whether a publisher has errors in its set of published files. That should show the number of errors, which we should be able to order on. @IsabelBirds have I specified what you have in mind here?

Thanks to @cormachallinanderilinx for the estimate.

@siwhitehouse
Copy link
Contributor

I had misinterpreted @IsabelBirds latest comment.

What we would like is the mean average of errors per activity for the publisher as a column in the search results. That figure should also contain a link to the publisher's page on the IATI Validator, please.

@cormachallinanderilinx
Copy link
Collaborator

Is this the URL you would like included? https://validator.iatistandard.org/organisation/aiddata

Do you know is there a validator API that can be used to access this which will allow us to get a count of errors as we dont store the count?

To the best of my knowledge the validator only expose two APIs https://developer.iatistandard.org/api-details#api=iati-validator-v2&operation=get-pub-get-report

@robredpath
Copy link
Author

Do you know is there a validator API that can be used to access this which will allow us to get a count of errors as we dont store the count?

The Validator API returns report.summary.critical which is a count of "critical" (i.e. structural validation) errors. We might also want to include report.summary.error (ruleset errors that contain "must", according to the validator docs) - any views @IsabelBirds @siwhitehouse ?

These are on a per-file basis; the way that we use CKAN in the Registry means that "file" and "dataset" are synonymous.

The pipeline that feeds the Validator starts with the Registry, so any file that exists on the Registry should have an entry in the Validator. There will be a time lag, I'm not sure what it is precisely, but it won't be long! @simon-20 or @odscjames might be able to advise.

Likewise, the Registry should know about updates to files first out of any of our systems. I'm not sure if there's an edge case where a file at an unchanged URL has been updated; again I hope that @simon-20 or @odscjames can advise on that.

We discussed on the call that this could result in a lot of API calls if the results page has a lot of publishers on, each of whom have a lot of datasets. Given that the Registry knows about changes to files first, it should be fine to cache results and invalidate the cache based on Registry / archiver updates. The API isn't actually as fast as I thought (I'm seeing 300-400ms response times); we can look into improving that but caching will likely be important. The API should support a reasonable number of concurrent queries, which would hopefully speed up total time to compile the list.

@robredpath
Copy link
Author

@cormachallinanderilinx do you already have an IATI API key? We can help you get signed up and increase your access level once you're up and running if not.

@robredpath
Copy link
Author

This gist is an example response for a file with several ruleset errors, but that is valid IATI data. The summary elements are at the end of the response.

@robredpath
Copy link
Author

robredpath commented Jan 12, 2024

I have opened several issues against the Validator API repos for us to investigate whether we can make the Validator API more suitable for this use. Depending on complexity and how well this sits alongside other work we're doing on the Validator API, we may be able to make these changes very quickly, or not for several months.

The issues are:
Allow querying of multiple files at once
Speed up response
Allow users to request just particular elements of the response

@simon-20
Copy link
Collaborator

The pipeline that feeds the Validator starts with the Registry, so any file that exists on the Registry should have an entry in the Validator. There will be a time lag, I'm not sure what it is precisely, but it won't be long! @simon-20 or @odscjames might be able to advise.

I did a quick check, and there is a fair bit of variation. Over half of the datasets currently known about by the Datastore were validated within 30 minutes; but there is a long tail on this one, some can take a few hours, and if there is a problem--a publisher is flagged, for instance, for too much invalid data too quickly--then full validation may take much longer.

@cormachallinanderilinx
Copy link
Collaborator

Hi @robredpath yes I have an API key set up for some work we were looking into previously

@robredpath
Copy link
Author

@odscjames could you get in touch with @cormachallinanderilinx via email and make sure that we know which is Derilinx' API key and that it has appropriately high limits? I want to make sure we're ahead of any rate limiting complications.

@siwhitehouse
Copy link
Contributor

Do you know is there a validator API that can be used to access this which will allow us to get a count of errors as we dont store the count?

The Validator API returns report.summary.critical which is a count of "critical" (i.e. structural validation) errors. We might also want to include report.summary.error (ruleset errors that contain "must", according to the validator docs) - any views @IsabelBirds @siwhitehouse ?

We discussed this and we prefer to have them both included, please.

What about Warnings @robredpath ? Are they queryable through the API too?

These are on a per-file basis; the way that we use CKAN in the Registry means that "file" and "dataset" are synonymous.

So, is it possible to get a mean average of errors per activity then?

@robredpath
Copy link
Author

We discussed this and we prefer to have them both included, please.

Is it more useful for them to be provided separately, or added together into one aggregate figure? I'm conscious that fixing a structural issue might then allow validation to proceed to the point where many warnings are triggered, so this number might appear to get worse as the data is actually improving.

What about Warnings @robredpath ? Are they queryable through the API too?

Yes, report.summary.warning provides that figure.

So, is it possible to get a mean average of errors per activity then?

@cormachallinanderilinx this one's for you!

@odscjames
Copy link

@odscjames could you get in touch with @cormachallinanderilinx via email and make sure that we know which is Derilinx' API key and that it has appropriately high limits?

I've found the API key and it looks like it is already at high limits.

@cormachallinanderilinx
Copy link
Collaborator

I have had to do quite a bit of refactoring here, here is some examples of searching I added:

Searching

Search by name of title:
searches all where name 'like' test: https://staging.iatiregistry.org/publisher/?q=test
This exactly matches with a title from the DB: https://staging.iatiregistry.org/publisher/?q=pub_737839

Search by Country: https://staging.iatiregistry.org/publisher/?q=publisher_country%3DAS&sort=title+asc
The search works by name code, but im thinking adding a dropdown (maybe using select2) with countries name displayed and use the code as the value to run the query

Search by publisher id: https://staging.iatiregistry.org/publisher/?q=publisher_iati_id%3Dtest_publisher_id_date&sort=title+asc
OR get all with publisher_ https://staging.iatiregistry.org/publisher/?q=publisher_iati_id%3Dpublisher_&sort=title+asc

Seems searching both country and publisher_id at the same time is breaking with my latest changes.
Will fix and update here.

Sorting

Paging is not working in UI so just update the url for now
_Also when they are rendered on the page they are not in alphabetical order but the paging is correct (see 3rd example for publisher_country on page 11, it goes from A to B but Bangladesh is first with some A's after but page 12 is all B _
publisher_country
https://staging.iatiregistry.org/publisher/?q=&sort=publisher_country+asc&page=1
https://staging.iatiregistry.org/publisher/?q=&sort=publisher_country+asc&page=10
https://staging.iatiregistry.org/publisher/?q=&sort=publisher_country+asc&page=11
publisher_iati_id - this example is descending
https://staging.iatiregistry.org/publisher/?q=&sort=publisher_iati_id&page=1
https://staging.iatiregistry.org/publisher/?q=&sort=publisher_iati_id&page=2
https://staging.iatiregistry.org/publisher/?q=&sort=publisher_iati_id&page=3
publisher_organization_type
https://staging.iatiregistry.org/publisher/?q=&sort=publisher_organization_type&page=1

@siwhitehouse and @robredpath if ye would like to have a play around and give me any feedback on how you would like this better implemented in the UI please let me know.
As you can see when you go to one of the urls it auto populates the search box (this is ckan default) but from trying it out yet might get some ideas.

Im going to work on tests and fix the know issues mentioned above before working on the UI so ye can have a feel and provide any feedback.

@siwhitehouse
Copy link
Contributor

Searching
Search by name of title:
https://staging.iatiregistry.org/publisher/?q=bank returns two entries: World Bank and Caribbean Development Bank
https://iatiregistry.org/publisher/?q=bank returns nine entries, including African Development Bank
https://staging.iatiregistry.org/publisher/?q=afdb returns African Development Bank, suggesting that the new search is failing to find as many organisations as the current one.

@cormachallinanderilinx
Copy link
Collaborator

@siwhitehouse
Sorting:
By default its the most recently created publisher:
After searching you can select a different sort from the drop down
Screenshot 2024-06-27 at 15 27 05

Search
The publisher search will search by default on name, title and IATI Publisher Id.
Example: https://staging.iatiregistry.org/publisher/?q=bank&sort=created+desc
This will check the above 3 fields.

If you want to search for an exact IATI publisher ID you can also add it here and it should match
example: https://staging.iatiregistry.org/publisher/?q=XI-IATI-WBTF%09&sort=created+desc
It will also return if a name or title matches.
If you want a specific match on publisher id you can use

Country Search is a bit different as you cannot search on country name.
I just got an idea on this as im typing, I will look into somthing and update you when I check it out.

Paging
Is should now work properly and keep the same as selected from the dropdown.
Where before it could go from A on page 1 to D on page 2 and A again on page3.
The paging at the bottom has disappeared, ill work to get this back in.
but yo go to the next page add &page=2 to the end of the URL or for page 3 &page=1
Example search 'a' which will return loads of results.
https://staging.iatiregistry.org/publisher/?q=a&sort=created+desc this will start at page 1 so add it to the end
https://staging.iatiregistry.org/publisher/?q=a&sort=created+desc&page=2

State
state is displayed if you are logged in as a sysadmin only
The 2 states are

  1. active
  2. approval needed

This is the url to get all that needs approval, think we should have a checkbox or button or what makes senses to add this query through the UI?
See: https://staging.iatiregistry.org/publisher/?q=&state=approval_needed

@siwhitehouse
Copy link
Contributor

Hello @cormachallinanderilinx - thank you for the update and my apologies for not responding sooner.

Sorting, no pagination

I searched for the word "foreign" and was returned nineteen organisations. Handy for testing the sorting without pagination.

Created date - looks good
Name order - looks good
IATI org identifier - XM-DAC-21-1 is placed before XM-DAC-3-1 - this is probably ok
Organisation type - 18 are government so not a great test, but asc and desc both looked good. I'm not sure we need this sort, to discuss
Country/Region - looks good

Sorting, with pagination

I searched for the word "the" and was returned eight pages of results.

Name

I sorted on "name ascending" and it looked fine until the last entry "National Association of Municipalities of Benin (ANCB), The" a bit of a jump from the previous result of "Doctors of the World / Medecins du Monde".

Clicking on the second page led to a page sorted by "Created Descending".

To see the second page of results sorted by name descending I typed https://staging.iatiregistry.org/publisher/?q=the&sort=name+asc&page=2 directly into the address bar. The first entry was Doctors of the World UK

I think my description of how pagination currently behaves is different to yours, but this may be due to changes you've made since your update. Could you follow my steps, see if you can replicate and look at why "National Association of Municipalities of Benin (ANCB), The" appears out of order, please?

IATI Organisation Identifier

Using the dropdown menu all numerical codes (for e.g. '30001') start appearing after 'GB-CHC-1000566'.

Organisation type

Ascending starts from "government". Descending starts from "Academic, Training and Research". I suspect we are ordering by the code value rather than the name. See https://iatistandard.org/en/iati-standard/104/codelists/organisationtype/

Country/Region

South Africa appears between United States and Uganda.
United Kingdom appears between Netherlands and Nigeria

@cormachallinanderilinx
Copy link
Collaborator

cormachallinanderilinx commented Jul 10, 2024

@siwhitehouse
Sorry I was working on the pagenation button yesterday evening, I got it displaying but the click still isnt full working.
I should have left a comment.

Name
this is fixed, the issues was the ordering was correct but the name in the database starts with "The" but in the UI it was 'normaizing' it and putting "The" at the end.
Im guessing we dont want this anymore as we adding this sorting alphabetically?

IATI Organisation Identifier
Fixed. I had to add in a new version of tablesorter.js
The previous version was 11 years old

Organisation type
Yes, your correct this should be fixed.

Country/Region
Fixed

@siwhitehouse
Copy link
Contributor

@cormachallinanderilinx My apologies in turn for the delay in getting back to you.

I assume the pagination still isn't ready for testing

Name

I don't understand your question. At the moment it looks to me that you no longer 'normalise' names by placing 'the's at the end of a name. I think that is good for the display, but I suspect we would still want to sort on the 'normalised' version. At the moment all organisations whose names begin with "The" are ordered using it, meaning they are all bunched together.

@robredpath can you advise on best practice here, please?

###IATI Organisation Identifier
Yes, that looks better now.

Organisation Type

and

Country/Region

Thanks these both look good now.

That leaves pagination to be fixed and Name ordering we should wait for Rob's opinion.

@siwhitehouse
Copy link
Contributor

@cormachallinanderilinx

I have a couple of comments about styling/layout.

Table settings

The table looks like it has fixed-width columns. Here is a screenshot of the top of the table when I search on 'development'

image

Could we configure the table display so that it avoids such text wrapping? From a fixed-width perspective, I think we could add width in the left-hand side columns from those on the right hand side. Ideally, the table would adapt to the display settings of the person's browsew/display settings. I don't know the possibilities and limitations to an approach like this though.

###Ordering by table header
I can still do this, but the text in the table header doesn't afford clicking. The sorting is on-page only and not across the whole of the returned data.

I would like ordering by table header to be clear to the user and for it to perform the same sorting as the dropdown i.e. across all of the returned data.

@cormachallinanderilinx
Copy link
Collaborator

@siwhitehouse
I don't understand your question. At the moment it looks to me that you no longer 'normalise' names by placing 'the's at the end of a name. I think that is good for the display, but I suspect we would still want to sort on the 'normalised' version. At the moment all organisations whose names begin with "The" are ordered using it, meaning they are all bunched together.
yes I have remove the normalization, so we will leave this until rob gives his opinion?

I have fixed the pagenation, still doing a bit of testing myself but looks good

On the table header clicks I will look at this now.

@robredpath
Copy link
Author

@robredpath can you advise on best practice here, please?

I think that being clear about the normalisation and having it be consistent across the site is more important than whichever approach we choose - so, whatever we do elsewhere is what we should do here.

@siwhitehouse
Copy link
Contributor

We discussed this on our call today. @cormachallinanderilinx will remove the sort from the column headers in the table, @siwhitehouse will check the pagination and then share this with the rest of IATI Support for feedback.

@siwhitehouse
Copy link
Contributor

Pagination looks good now, thank you @cormachallinanderilinx

Is the API set up for the Staging instance? I'm asking because if I query

https://iatiregistry.org/api/action/organization_list?all_fields=true

then I get a list of organisations, but if I query

https://staging.iatiregistry.org/api/action/organization_list?all_fields=true

I get a 401 response. We'd like to be able to check the organisations in the "approval needed" state through the API and the UI.

@cormachallinanderilinx
Copy link
Collaborator

@siwhitehouse
Since it the staging URL the 401 suggests to me that you need to go to the staging site and put in the basic auth credentials.
If you are using python or a tool like postman you will also need to add the basic auth there

python example:
import requests
from requests.auth import HTTPBasicAuth
res = requests.post('https://staging.iatiregistry.org/api/action/organization_list?all_fields=true', auth=HTTPBasicAuth('user', 'password'))
print(res)

Postman:
You will need to set it as well, there should be a basic auth option under Authorization
Screenshot 2024-07-30 at 17 41 09

@siwhitehouse
Copy link
Contributor

@cormachallinanderilinx Thank you. Unfortunately, I am still receiving a

<Response [401]>

error when I follow your instructions.

I logged into my https://staging.iatiregistry.org/user/simonwhitehouse account and I created an API token. I then amended the code you posted above to include my username and API token. Running the code returns the 401.

I have just shared the code with you (via Deepnote) for you to troubleshoot.

I'd note that originally I was sending this as a get request without authentication, as per https://iatistandard.org/en/iati-tools-and-resources/iati-registry/iati-registry-api/publisher-endpoints/#ListPub

@cormachallinanderilinx
Copy link
Collaborator

Hi @siwhitehouse
This issue wasnt adding the API token, it was the Basic Auth

Thanks for sharing the Deepnote, I was able to fix it up there along with 1 or 2 small changes.

FYI the API will have paging (offset and limit), by default the limit is 20 so the next page will be:
https://staging.iatiregistry.org/api/action/organization_list?all_fields=true&offset=20&limit=20

You can also set a higher limit but response will be slower, example of 100 at a time:
https://staging.iatiregistry.org/api/action/organization_list?all_fields=true&offset=0&limit=100
To get the next 100 we set the offset to 100:
https://staging.iatiregistry.org/api/action/organization_list?all_fields=true&offset=100&limit=100

@siwhitehouse
Copy link
Contributor

When logged in as https://staging.iatiregistry.org/user/simonwhitehouse I receive an internal server error when I click on the link to the last page (117). I also receive an internal server error when I select Order By "Created Ascending".

@cormachallinanderilinx I don't know why I am seeing these now when I didn't before. Can you investigate this, please? Let me know if you need any information from me.

@siwhitehouse
Copy link
Contributor

When logged in as https://staging.iatiregistry.org/user/simonwhitehouse I see twenty publishers per page and (I assume) 116 pages return results. So, I would expect to see 2320-2340 publishers in the CSV download. I only see 1357.

This is my alternative check on the number of organisations appearing in the UI matching those in the database, as I don't have the coding skills to page through the API.

@cormachallinanderilinx I think the check here should be that the data in the CSV download matches that returned in the UI via the query in the URL. The API should also be consistent. This doesn't appear to be the case at the moment. Can you investigate this before we do any more testing, please? Happy to provide more information if you need it.

@cormachallinanderilinx
Copy link
Collaborator

Hi @siwhitehouse
Good catch, It looks like this may be an overall issue as the download is completely separate from actual publisher code.
I will work on building the publisher search code into the download functionality as well I think it complete makes sense to do this make sure they are the same.

@cormachallinanderilinx
Copy link
Collaborator

Hi @siwhitehouse
I just wanted to note I have done some work on aligning the downloads.
However, to properly align them will probably take another 1-2 days work.
It is working but it is very very slow and will result in a lot of Timeout Errors.
The reason is getting the GroupExtra details, the way we do it for the page load is a organization_show which is fine for 20 items on a page but for a couple of hundred its an issue.
Just want to check ye are with me continuing this work as part of this ticket or would ye rather it be done separate?

@siwhitehouse
Copy link
Contributor

Hi @siwhitehouse Good catch, It looks like this may be an overall issue as the download is completely separate from actual publisher code. I will work on building the publisher search code into the download functionality as well I think it complete makes sense to do this make sure they are the same.

What do you mean by the "actual publisher code" here, please Cormac?

@siwhitehouse
Copy link
Contributor

Hi @siwhitehouse I just wanted to note I have done some work on aligning the downloads. However, to properly align them will probably take another 1-2 days work. It is working but it is very very slow and will result in a lot of Timeout Errors. The reason is getting the GroupExtra details, the way we do it for the page load is a organization_show which is fine for 20 items on a page but for a couple of hundred its an issue. Just want to check ye are with me continuing this work as part of this ticket or would ye rather it be done separate?

Hi @cormachallinanderilinx

aiui we have two use cases for aligning the downloads:

  1. As a check that the new Publisher search UI is returning a full and correct set of results
  2. Because they should provide the same results to end users when this goes into live

I can't offer an opinion on the detail of how you propose to fix this, other than to say it looks like you are focusing on this end goal. It's fine to spend the time on this, so please do go ahead.

I have a couple of other observations:

  1. The "State" column no longer appears in the table in Staging
  2. The Downloads should also be updated to include the "State" column. I'm not sure if we have stated this explicitly before.

Finally, I'll leave it to you if you think it is better to set up a separate issue for aligning the Downloads. My preference would be for a new issue at this point.

@cormachallinanderilinx
Copy link
Collaborator

As a check that the new Publisher search UI is returning a full and correct set of results
and
What do you mean by the "actual publisher code" here, please Cormac?
The download CSV (even before these changes) were never developed to match the search functionality - the were developed as 2 seperate things.
The download button calls code that only downloads 'active' publishers who have published.

The "State" column no longer appears in the table in Staging
Can you check if you are logged in? this is hidden if you are logged out

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants