Skip to content
This repository was archived by the owner on Feb 25, 2023. It is now read-only.

Expanding API#10

Open
kochbj wants to merge 5 commits into
dpriskorn:masterfrom
kochbj:master
Open

Expanding API#10
kochbj wants to merge 5 commits into
dpriskorn:masterfrom
kochbj:master

Conversation

@kochbj
Copy link
Copy Markdown

@kochbj kochbj commented May 29, 2022

Hi Dennis,

Great API. I made some small changes to make the API more useful to me as a user:

  1. Generalized "get works" methods to function with any type of entity
  2. Added set_email convenience function because I wasn't sure how to do that
  3. Fleshed out author fields and modified Ids and Years a little bit so they work with both authors and works.

Dunno if this is how you planned to extend things so no worries if this doesn't work for you!

Best,
Bernie

kochbj added 2 commits May 29, 2022 12:36
-Changed works to hidden method called get entities so that it works with authors, concepts, institutions, etc....
-Added global PAGE_LIMIT variable
-Added  method to set_email since it wasn't clear to me initially how to do that.
-Fleshed out author objects
-Added fields to ids and years so they work with both authors and works
@dpriskorn
Copy link
Copy Markdown
Owner

I took a peek at the code and really like the abstraction and introduction of a OpenAlexBaseDataType. Thanks 🤩 I have a lot on my plate right now, so it will probably take me a week or two to take a closer look.

@kochbj
Copy link
Copy Markdown
Author

kochbj commented May 31, 2022

Sounds good. Personally, I would've dropped the two works methods and just made one list of entities method, but wanted to make it backwards compatible with current methods.

Also, no worries. I will probably make a few more minor changes in next week or so as I continue to use before you get around to looking at it.

@dpriskorn
Copy link
Copy Markdown
Owner

I'm a fan of https://en.wikipedia.org/wiki/KISS_principle?wprov=sfti1 Let's remove redundant methods and keep general ones.

@dpriskorn
Copy link
Copy Markdown
Owner

Which ones do you suggest we remove?

@kochbj
Copy link
Copy Markdown
Author

kochbj commented Jun 2, 2022

Well if it were me, I would probably remove all entity-specific methods and just keep get list of entities, not make it a private method, with an argument for entity type. I didn't write it this way because I didn't want to break what you already had.

I would keep the cited,reference, and related works methods--I think they are great. For myself I needed to add the below, which should work with any entity type that supports associated_works (authors, venues, institutions). I needed all the header stuff to make sure it would work for dehydrated objects as well.

I'm also going to fool around with a search method if you give me a day or two...

class OpenAlex(BaseModel):
    """This models the OpenAlex HTTP API
    OpenAlex has 2 pools for clients.
    Supplying your email will get you into the polite pool.
    :parameter=email
    """
    email: Optional[EmailStr]
    _base_url: str = "https://api.openalex.org/"
    _headers: dict = {
                "Accept": "application/json",
                "User-Agent": f"OpenAlexAPI https://github.com/dpriskorn/OpenAlexAPI"
            }
        #Convenience dict because dehydrated entities do not have works_api_urls and annoying inconsistencies in endpoints (institution vs instititions, host_venue vs venue)
    _works_urls: dict = {
            Author: _base_url+"works?filter=author.id:",
            Concept: _base_url+"works?filter=concept.id:",
            Institution: _base_url+"works?filter=institutions.id:",
            Venue: _base_url+"works?filter=host_venue.id:"
        }


```    class Config:
        underscore_attrs_are_private = True
        
    def set_email(self,email: EmailStr):
        self.email = email
        self._headers = {
                "Accept": "application/json",
                "User-Agent": f"OpenAlexAPI https://github.com/dpriskorn/OpenAlexAPI mailto:{self.email}"
            }`

`    @backoff.on_exception(backoff.expo,
                          (requests.exceptions.Timeout,
                           requests.exceptions.ConnectionError),
                          max_time=60,
                          on_backoff=print(f"Backing off"))
    def get_associated_works(self, entity: OpenAlexBaseType, limit: int = None) -> List[Work]:
        """Fetches all works associated with the entity, up to some limit.

        :parameter work is OpenAlex Institution, Venue, Author
        :parameter limit is the maximum number of works to return
        """
        if self.email is None:
            print("OpenAlex has 2 pools for clients. "
                  "Please be nice and supply your email as the first argument "
                  "when calling this class to get into the polite pool. This way "
                  "OpenAlex can contact you if needed.")
        per_page = PAGE_LIMIT if limit is None else min(PAGE_LIMIT, limit)
        works = []
        cursor = '*'
        while cursor:
            url = f"{self._works_urls[type(entity)]}{entity.id}&per_page={per_page}&cursor={cursor}"
            response = requests.get(url, headers=self._headers)
            if response.status_code == 200:
                works += [Work(**w) for w in response.json()['results']]
                cursor = response.json()['meta']['next_cursor']
            else:
                raise ValueError(f"Got {response.status_code} from OpenAlex")
            if limit and len(works) >= limit:
                break
        return works[:limit]
    `

@kochbj
Copy link
Copy Markdown
Author

kochbj commented Jun 2, 2022

Last couple of thoughts:

-Institutions, Venues, and Author fields need to be expanded (but methods have to work with dehydrated values)
-There are a couple supplementary objects (i.e., years, ids) that show up on multiple core types. Do you want to just expand these so that they work for every entity or have entity specific entities.
-The formatting of Year in OpenAlex is annoying; It would make more sense to have these attributes as lists rather than dicts and fill in 0 years rather than omitting them. Do you want to parse these into lists or do you feel that's beyond the scope of the package?

@dpriskorn
Copy link
Copy Markdown
Owner

@hstct you implemented a dehydrated author type in #1 which has not yet been merged. Do you have any reactions to this pull request?

@dpriskorn
Copy link
Copy Markdown
Owner

dpriskorn commented Jun 3, 2022

I'm also going to fool around with a search method if you give me a day or two...

Have you seen the search function in R here? https://github.com/KTH-Library/openalex/blob/main/R/open_alex_restclient.R
BTW I just created a project where we can work on user stories that make sense to aid in the development of the library OpenAlexAPI (view)

@kochbj
Copy link
Copy Markdown
Author

kochbj commented Jun 3, 2022

Hi Dennis,

If you give me less than an hour, I'm going to implement another branch. This one breaks current works functionality by generalizing for entities. The full set of changes I am adding:

  1. Remove get_works methods and replace with get_entities. BREAKS TESTS but removes a lot of superfluous methods.
  2. Add all additional slots in the OpenAlex documentation to the interface. This included making a few new objects like Geo, and HostVenue. The interface should now capture everything in the API.
  3. Added independent search method (as in API, not sophisticated filter search which I agree should be added).
  4. Added get_associated_works for all five core entities.

Responses to your points:

  1. I actually fooled around with dehydrated objects and I don't think they should be added as separate classes because all of the additional slots are optional. You can simply hydrate these objects by calling get_entities. If you wanted to add a call to get_entitites method to BaseType you could, but I think its cleaner to keep all the query methods in the init file.
  2. I know nothing about SWE, but don't understand the value of enums.py. Why can't these just be strings?
  3. My generalization of entities breaks the tests that I know someone else worked hard on. I unfortunately don't have more time to dedicate to this. :( However, I think it gives you or someone else an opportunity to go through my changes and see if you like them.

Lots of changes:
1. Remove get_works and replace with universal get_entities
2. Exhaustively fleshed out ALL fields in OpenAlex API. This required making some new enums and classes (i.e., geo)
3. Added dehydrated objects for four core object types. Also added a convenience function hydrate. I had not planned to use dehydrated objects, but it does slow things down and I think it's good to clarify to users why fields are missing.
4. Added basic search functionality
5. Added get associated_works
6. Made explicit set_email method. I needed this to correct the headers using pydantic. Perhaps there is a better way, but I think it's good to allow users to change after construction.

TODO:
IMPORTANT:
1. More serious unit testing
2. Advanced search, filter, groupby functionality

MINOR:
1. If theres a way to not have to do the backoff decorator every time that would be cool.
2.I struggled with the openaccess enum and allowing typing to do None so gave up.
3. I really hate how years are returned. If its possible to parse these as dict of lists rather list of dicts I'd be all for it.
@kochbj
Copy link
Copy Markdown
Author

kochbj commented Jun 3, 2022

Hi Dennis,

I'm pretty much done with my rewrite in last major push. This does break existing get_works, but I think it does make API quite robust. I also changed my tune on dehydrated objects--they're added.

e97491f

Best,
Bernie

@kochbj kochbj changed the title Small QoL changes as a user Expanding API Jun 3, 2022
Lots of changes:
1. Remove get_works and replace with universal get_entities
2. Exhaustively fleshed out ALL fields in OpenAlex API. This required making some new enums and classes (i.e., geo)
3. Added dehydrated objects for four core object types. Also added a convenience function hydrate. I had not planned to use dehydrated objects, but it does slow things down and I think it's good to clarify to users why fields are missing.
4. Added basic search functionality
5. Added get associated_works
6. Made explicit set_email method. I needed this to correct the headers using pydantic. Perhaps there is a better way, but I think it's good to allow users to change after construction.

TODO:
IMPORTANT:
1. More serious unit testing
2. Advanced search, filter, groupby functionality

MINOR:
1. If theres a way to not have to do the backoff decorator every time that would be cool.
2.I struggled with the openaccess enum and allowing typing to do None so gave up.
3. I really hate how years are returned. If its possible to parse these as dict of lists rather list of dicts I'd be all for it.
@dpriskorn
Copy link
Copy Markdown
Owner

Big thanks for your effort. I'll try to make time to review it next week. Since we are still in alpha I don't have a problem with removing get_works() at this point.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants