diff --git a/organization-people/README.md b/organization-people/README.md new file mode 100644 index 0000000..cb05695 --- /dev/null +++ b/organization-people/README.md @@ -0,0 +1,8 @@ +## organization-people + +A collection of Jupyter notebooks showing examples of using a persistent identifier for an organization (here ROR ID) as input for different APIs of PID providers or PID Graphs and retrieving all people (identified by an ORCID iD) connected to it. + +Currently available PID Graphs: +* [FREYA PID Graph](https://blog.datacite.org/powering-the-pid-graph/) [![Google Colab](https://badgen.net/badge/Launch/on%20Google%20Colab/blue?icon=terminal)](https://colab.research.google.com/github/Project-TAPIR/pidgraph-notebooks/blob/main/organization-people/freya_get_people_by_organization.ipynb) +* [OpenAlex](https://openalex.org/about)[![Google Colab](https://badgen.net/badge/Launch/on%20Google%20Colab/blue?icon=terminal)](https://colab.research.google.com/github/Project-TAPIR/pidgraph-notebooks/blob/main/organization-people/openalex_get_people_by_organization.ipynb) +* [ORCID](https://orcid.org/)[![Google Colab](https://badgen.net/badge/Launch/on%20Google%20Colab/blue?icon=terminal)](https://colab.research.google.com/github/Project-TAPIR/pidgraph-notebooks/blob/main/organization-people/orcid_get_people_by_organization.ipynb) diff --git a/organization-people/freya_get_people_by_organization.ipynb b/organization-people/freya_get_people_by_organization.ipynb new file mode 100644 index 0000000..09a9f02 --- /dev/null +++ b/organization-people/freya_get_people_by_organization.ipynb @@ -0,0 +1 @@ +{"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"name":"Kopie von Kopie von freya_get_people_by_organization.ipynb","provenance":[{"file_id":"https://github.com/Project-TAPIR/pidgraph-notebooks/blob/organization-people/organization-people/freya_get_people_by_organization.ipynb","timestamp":1643208926409}],"authorship_tag":"ABX9TyOPyixqZithrfY0TncA4o1K"},"kernelspec":{"name":"python3","display_name":"Python 3"},"language_info":{"name":"python"}},"cells":[{"cell_type":"markdown","source":["### Query the FREYA PID Graph for all people affiliated with an organization\n","\n","This notebook queries the [FREYA PID Graph](https://blog.datacite.org/powering-the-pid-graph/) via [Datacite's GraphQL API](https://api.datacite.org/graphql) to retrieve all people affiliated with an organization. It takes a ROR URL as input which is used to retrieve the according Grid and Ringgold ID of the organization and query the ORCID API with it [for affiliated people](https://info.orcid.org/faq/how-do-i-find-orcid-record-holders-at-my-institution/). From the resulting list of people we output the ORCID iDs."],"metadata":{"id":"etxiXTW668ZD"}},{"cell_type":"code","source":["# needed dependency to make HTTP calls\n","import requests\n","# dependencies for dealing with json\n","!pip install python-benedict\n","from benedict import benedict"],"metadata":{"id":"8Mk7-aYc7x3A"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["The input for the query is a ROR URL."],"metadata":{"id":"J31_ejB6bWqd"}},{"cell_type":"code","source":["# input parameter for all further computations\n","example_ror=\"https://ror.org/021k10z87\""],"metadata":{"id":"UwYUsbnMbZnI","executionInfo":{"status":"ok","timestamp":1643208788232,"user_tz":-60,"elapsed":15,"user":{"displayName":"","photoUrl":"","userId":""}}},"execution_count":2,"outputs":[]},{"cell_type":"markdown","source":["We use it to query Datacite's GraphQL API for the organization's metadata and all people connected to it. Since the API uses pagination, we need to loop through all pages to get the complete result set.\n"],"metadata":{"id":"ba_A3Anpbl4P"}},{"cell_type":"code","source":["# Datacite's GraphQL endpoint for the FREYA PID Graph\n","DATACITE_GRAPHQL_API = \"https://api.datacite.org/graphql\"\n","\n","# Query to retrieve an organization and all its affiliated people\n","QUERY_ORGA2PEOPLE = \"\"\"query organization($ror :ID!, $after:String){\n","organization(id: $ror) {\n"," people(first: 1000, after: $after) {\n"," totalCount\n"," pageInfo {\n"," endCursor\n"," hasNextPage\n"," }\n","\n"," nodes {\n"," id\n"," name\n"," givenName\n"," }\n"," }\n"," }\n","}\"\"\"\n","\n","# query all people that are connected to given ROR\n","def download_data(ror):\n"," continue_paginating = True\n"," cursor=\"\"\n"," while continue_paginating:\n"," vars = {'ror': ror, 'after': cursor}\n"," response = requests.post(url=DATACITE_GRAPHQL_API,\n"," json={'query': QUERY_ORGA2PEOPLE, 'variables': vars},\n"," headers={'Content-Type': 'application/json'})\n"," result=response.json()\n","\n"," # check if next page exists and set cursor to next page\n"," continue_paginating = has_next_page(result)\n"," cursor = next_cursor(result)\n"," yield result\n","\n","# check if there is another page with results to query\n","def has_next_page(response_data):\n"," resp_dict = benedict.from_json(response_data)\n"," has_next_page = resp_dict.get(\"data.organization.people.pageInfo.hasNextPage\")\n"," return has_next_page\n","\n","# set cursor to next value\n","def next_cursor(response_data):\n"," resp_dict = benedict.from_json(response_data)\n"," cursor = resp_dict.get(\"data.organization.people.pageInfo.endCursor\")\n"," return cursor\n","\n","\n","#--- example execution\n","list_of_pages=download_data(example_ror)"],"metadata":{"id":"7FAu2l388OeD","executionInfo":{"status":"ok","timestamp":1643208819281,"user_tz":-60,"elapsed":226,"user":{"displayName":"","photoUrl":"","userId":""}}},"execution_count":4,"outputs":[]},{"cell_type":"markdown","source":["From the returned pages we extract the list of people."],"metadata":{"id":"2lR-J8vUcI5-"}},{"cell_type":"code","source":["# from the result pages we get from the GraphQL API, extract the data about the people\n","def extract_people_from_pages(list_of_pages):\n"," for page in list_of_pages:\n"," page_dict=benedict.from_json(page)\n"," for person in page_dict.get('data.organization.people.nodes'):\n"," yield person\n","\n","#--- example execution\n","people=extract_people_from_pages(list_of_pages)"],"metadata":{"id":"lQqnqydz2hUh","executionInfo":{"status":"ok","timestamp":1643208827139,"user_tz":-60,"elapsed":261,"user":{"displayName":"","photoUrl":"","userId":""}}},"execution_count":5,"outputs":[]},{"cell_type":"markdown","source":["From each person's metadata we extract and print out their name and ORCID iD."],"metadata":{"id":"FwJxfB_12wtY"}},{"cell_type":"code","source":["# extract ORCID from person\n","def extract_orcid(person):\n"," person_dict = benedict.from_json(person)\n"," orcid = person_dict.get('id').replace(\"https://orcid.org/\", \"\")\n"," name = person_dict.get('name')\n"," return orcid, name\n","\n","#--- example execution\n","for person in people:\n"," orcid, name = extract_orcid(person)\n"," print(f\"{orcid}, {name}\")"],"metadata":{"id":"aCYx1t4P3Bpu","executionInfo":{"status":"ok","timestamp":1643208836439,"user_tz":-60,"elapsed":2988,"user":{"displayName":"","photoUrl":"","userId":""}},"outputId":"1c350aa6-6659-4ff9-990d-e0309706941b","colab":{"base_uri":"https://localhost:8080/"}},"execution_count":6,"outputs":[{"output_type":"stream","name":"stdout","text":["0000-0002-3783-6130, Irene Weipert-Fenner\n","0000-0002-5452-0488, Hans-Joachim Spanger\n","0000-0001-6746-1248, Anton Peez\n","0000-0001-6731-5304, Julia Eckert\n","0000-0003-1575-9688, Hendrik Simon\n","0000-0002-1712-2624, Julian Junk\n","0000-0003-0035-5840, Raphael Oidtmann\n","0000-0002-8739-2486, Elvira Rosert\n","0000-0002-5925-043X, Ariadne Natal\n","0000-0002-7012-6739, Peter Kreuzer\n","0000-0001-7843-4480, Dirk Peters\n","0000-0003-0039-9827, Eldad Ben Aharon\n","0000-0001-6823-6819, Janna Lisa Chalmovsky\n","0000-0003-1940-8877, Mikhail Polianskii\n","0000-0002-4259-6071, Felix S. Bethke\n","0000-0001-7286-3575, Paul Chambers\n"]}]}]} \ No newline at end of file diff --git a/organization-people/openalex_get_people_by_organization.ipynb b/organization-people/openalex_get_people_by_organization.ipynb new file mode 100644 index 0000000..3c2dd8b --- /dev/null +++ b/organization-people/openalex_get_people_by_organization.ipynb @@ -0,0 +1 @@ +{"metadata":{"language_info":{"name":"python","version":"3.7.8","mimetype":"text/x-python","codemirror_mode":{"name":"ipython","version":3},"pygments_lexer":"ipython3","nbconvert_exporter":"python","file_extension":".py"},"kernelspec":{"name":"python3","display_name":"Python 3","language":"python"},"colab":{"name":"Kopie von Kopie von openalex_get_people_by_organization.ipynb","provenance":[{"file_id":"https://github.com/Project-TAPIR/pidgraph-notebooks/blob/organization-people/organization-people/openalex_get_people_by_organization.ipynb","timestamp":1643210429142}],"collapsed_sections":[]}},"nbformat_minor":5,"nbformat":4,"cells":[{"cell_type":"markdown","source":["### Query OpenAlex for all people affiliated with an organization\n","This script queries the [OpenAlex API](https://docs.openalex.org/api) via its '`/authors`' endpoint for all authors affiliated with an organization.\n","It takes a ROR URL as input which is used to retrieve all authors that specified the ROR ID in their metadata field '`last_known_institution.ror`'. From the resulting list of people we output their respective ORCID iDs."],"metadata":{"id":"ac7bedaf-05fb-4eb0-9bf5-e4d1d68a08c3"},"id":"ac7bedaf-05fb-4eb0-9bf5-e4d1d68a08c3"},{"cell_type":"code","source":["# needed dependency to make HTTP calls\n","import requests"],"metadata":{"id":"IUqshUWKwSk2","executionInfo":{"status":"ok","timestamp":1643210415322,"user_tz":-60,"elapsed":8,"user":{"displayName":"","photoUrl":"","userId":""}}},"id":"IUqshUWKwSk2","execution_count":1,"outputs":[]},{"cell_type":"markdown","source":["The input for the query is a ROR URL."],"metadata":{"id":"nSJjdkxGdWll"},"id":"nSJjdkxGdWll"},{"cell_type":"code","source":["# input parameter\n","example_ror=\"https://ror.org/021k10z87\""],"metadata":{"id":"7EryzPledIp6","executionInfo":{"status":"ok","timestamp":1643210415322,"user_tz":-60,"elapsed":6,"user":{"displayName":"","photoUrl":"","userId":""}}},"id":"7EryzPledIp6","execution_count":2,"outputs":[]},{"cell_type":"markdown","source":["We use it to query the OpenAlex API for authors that specified the organization's ROR ID in the field '`last_known_institution.ror`'. Since the OpenAlex API uses [pagination](https://docs.openalex.org/api/get-lists-of-entities#pagination), we need to loop through all pages to get the complete result set."],"metadata":{"id":"MiXVDKXid9tq"},"id":"MiXVDKXid9tq"},{"cell_type":"code","source":["# OpenAlex endpoint to query for authors\n","OPENALEX_API_AUTHORS = \"https://api.openalex.org/authors\"\n","\n","# query all people that are connected to given ROR\n","def download_data(ror):\n"," page = 1\n"," max_page = 1\n"," while page <= max_page:\n"," params = {'filter': 'last_known_institution.ror:'+ror, 'page': page}\n","\n"," response = requests.get(url=OPENALEX_API_AUTHORS,\n"," params=params,\n"," headers= {'Content-Type': 'application/json'})\n"," result=response.json()\n","\n"," # calculate max page number in first loop\n"," if max_page == 1:\n"," max_page = determine_max_page(result)\n"," page = page + 1\n"," yield result\n","\n","# calculate max number of result pages\n","def determine_max_page(response_data):\n"," item_count = response_data['meta']['count']\n"," items_per_page = response_data['meta']['per_page']\n"," max_page_ceil = item_count // items_per_page + bool(item_count % items_per_page)\n"," return max_page_ceil\n","\n","\n","#-- example execution\n","list_of_pages=download_data(example_ror)"],"metadata":{"trusted":true,"id":"8b608640-96a8-47d1-9de7-b7d3f6fd5a47","executionInfo":{"status":"ok","timestamp":1643210415323,"user_tz":-60,"elapsed":5,"user":{"displayName":"","photoUrl":"","userId":""}}},"execution_count":3,"outputs":[],"id":"8b608640-96a8-47d1-9de7-b7d3f6fd5a47"},{"cell_type":"markdown","source":["From the resulting list of people we extract and print out each ORCID and name."],"metadata":{"id":"CwRzvAQweuoW"},"id":"CwRzvAQweuoW"},{"cell_type":"code","source":["# extract all ORCIDs from the result\n","def extract_orcids(data):\n"," for author in data['results']:\n"," try:\n"," orcid=author['ids']['orcid'].replace(\"https://orcid.org/\", \"\")\n"," name=author['display_name']\n"," yield orcid, name\n"," except (KeyError,AttributeError) as e:\n"," pass\n","\n","#-- example execution\n","for page in list_of_pages:\n"," for orcid,name in extract_orcids(page):\n"," print(f\"{orcid}, {name}\")"],"metadata":{"trusted":true,"colab":{"base_uri":"https://localhost:8080/"},"id":"1c36737c-4dcf-42d5-80e2-802f0a7a8326","outputId":"5efd986b-0b92-4b0d-e5cc-a65aeae2785e","executionInfo":{"status":"ok","timestamp":1643210418504,"user_tz":-60,"elapsed":3186,"user":{"displayName":"","photoUrl":"","userId":""}}},"execution_count":4,"outputs":[{"output_type":"stream","name":"stdout","text":["0000-0002-3824-5375, Nicole Deitelhoff\n","0000-0002-7348-7206, Jonas Wolff\n","0000-0002-6891-770X, Francis O’Connor\n","0000-0002-3536-8898, Felix Anderl\n","0000-0002-4259-6071, Felix S. Bethke\n","0000-0002-3136-0901, Thorsten Gromes\n","0000-0001-9698-2616, Annika Elena Poppe\n","0000-0002-3783-6130, Irene Weipert-Fenner\n","0000-0002-4793-9010, Arvid Bell\n","0000-0002-7012-6739, Peter Kreuzer\n","0000-0002-0143-5183, Christina Kohler\n"]}],"id":"1c36737c-4dcf-42d5-80e2-802f0a7a8326"}]} \ No newline at end of file diff --git a/organization-people/orcid_get_people_by_organization.ipynb b/organization-people/orcid_get_people_by_organization.ipynb new file mode 100644 index 0000000..63d82c0 --- /dev/null +++ b/organization-people/orcid_get_people_by_organization.ipynb @@ -0,0 +1 @@ +{"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"name":"Kopie von orcid_get_people_by_organization.ipynb","provenance":[{"file_id":"https://github.com/Project-TAPIR/pidgraph-notebooks/blob/organization-people/organization-people/orcid_get_people_by_organization.ipynb","timestamp":1643211404649}],"collapsed_sections":[],"authorship_tag":"ABX9TyPDEh7qk1Vs70HvumKRqfGY"},"kernelspec":{"name":"python3","display_name":"Python 3"},"language_info":{"name":"python"}},"cells":[{"cell_type":"markdown","source":["### Query ORCID for all people affiliated with an organization and filter for current employees only\n","\n","This notebook queries the [ORCID API](https://api.orcid.org/v3.0/) for all [people affiliated with an organization](https://info.orcid.org/faq/how-do-i-find-orcid-record-holders-at-my-institution/). Additionally the affiliation is narrowed down to people **currently employed** by the organization."],"metadata":{"id":"u4HQPvDxKyjs"}},{"cell_type":"code","execution_count":null,"metadata":{"id":"5s5h9I5OKefn"},"outputs":[],"source":["# needed dependency to make HTTP calls\n","import requests\n","# dependencies for dealing with json\n","import pprint\n","!pip install python-benedict\n","from benedict import benedict"]},{"cell_type":"markdown","source":["### Organization metadata\n","The input value for all following queries is a ROR id or ROR URL."],"metadata":{"id":"JqXxTHg026Tk"}},{"cell_type":"code","source":["example_ror=\"https://ror.org/04aj4c181\""],"metadata":{"id":"tAoAtVZP25JT","executionInfo":{"status":"ok","timestamp":1643211341240,"user_tz":-60,"elapsed":8,"user":{"displayName":"","photoUrl":"","userId":""}}},"execution_count":2,"outputs":[]},{"cell_type":"markdown","source":["The first step is to call the [ROR API](https://ror.readme.io/) for the organization's metadata."],"metadata":{"id":"mmV5ar17CiSO"}},{"cell_type":"code","source":["# URL for ROR API\n","ROR_API_ENDPOINT = \"https://api.ror.org/organizations\"\n","\n","# query ROR API for organization's metadata\n","def query_ror_api(ror):\n"," response = requests.get(url=requests.utils.requote_uri(ROR_API_ENDPOINT + \"/\" + ror),\n"," headers={'Content-Type': 'application/json'})\n"," result=response.json()\n","\n"," return result\n","\n","#-- example execution\n","ror_data=query_ror_api(example_ror)\n","# if you want to see the retrieved metadata, uncomment next line\n","# pprint.pprint(ror_data)"],"metadata":{"id":"FKpMNpLLLYaZ","executionInfo":{"status":"ok","timestamp":1643211342054,"user_tz":-60,"elapsed":820,"user":{"displayName":"","photoUrl":"","userId":""}}},"execution_count":3,"outputs":[]},{"cell_type":"markdown","source":["In particular we are interested in the organization's grid ID and Wikidata ID."],"metadata":{"id":"mxOGfasxMAaA"}},{"cell_type":"code","source":["def extract_grid_from_ror_data(ror_data):\n"," orga_dict = benedict.from_json(ror_data)\n"," path_to_grid_id = \"external_ids.GRID.all\"\n"," grid_id = orga_dict.get(path_to_grid_id)\n"," return grid_id\n","\n","def extract_wikidata_from_ror_data(ror_data):\n"," orga_dict = benedict.from_json(ror_data)\n"," path_to_wikidata_id = \"external_ids.Wikidata.all[0]\"\n"," wikidata_id = orga_dict.get(path_to_wikidata_id)\n"," return wikidata_id\n","\n","\n","# example execution\n","organization_grid_id=extract_grid_from_ror_data(ror_data)\n","print(\"grid ID: \" + organization_grid_id)\n","organization_wikidata_id=extract_wikidata_from_ror_data(ror_data)\n","print(\"Wikidata ID: \" + organization_wikidata_id)"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"wSwVZgSELik3","outputId":"510ec8f8-e8bf-4ead-db9e-8ed4a8c36d1e","executionInfo":{"status":"ok","timestamp":1643211342055,"user_tz":-60,"elapsed":11,"user":{"displayName":"","photoUrl":"","userId":""}}},"execution_count":4,"outputs":[{"output_type":"stream","name":"stdout","text":["grid ID: grid.461819.3\n","Wikidata ID: Q2399120\n"]}]},{"cell_type":"markdown","source":["We use the Wikidata ID of the organization to query Wikidata for the Ringgold ID of the organization."],"metadata":{"id":"oLZD9V1lzyQ4"}},{"cell_type":"code","source":["WIKIDATA_API = \"https://www.wikidata.org/w/api.php\"\n","\n","# query Wikidata with an organization's Wikidata ID\n","def query_wikidata_api(wikidata_id):\n","\n"," response = requests.get(url=WIKIDATA_API,\n"," params={'action': 'wbgetentities', 'ids': wikidata_id, 'props':'claims', 'format':'json'},\n"," headers={'Content-Type': 'application/json'})\n"," result=response.json()\n"," return result\n","\n","def extract_ringgold_from_wikidata_data(wikidata, wikidata_id):\n"," wikidata_dict = benedict.from_json(wikidata)\n"," path_to_ringgold_id = f\"entities.{wikidata_id}.claims.P3500[0].mainsnak.datavalue.value\"\n"," ringgold_id = wikidata_dict.get(path_to_ringgold_id)\n"," return ringgold_id\n","\n","\n","#-- example execution\n","wikidata_data = query_wikidata_api(organization_wikidata_id)\n","# if you want to see all metadata retrieved from Wikidata, uncomment next line\n","#pprint.pprint(wikidata_data)\n","organization_ringgold_id = extract_ringgold_from_wikidata_data(wikidata_data, organization_wikidata_id)\n","print(\"Ringgold ID: \" + str(organization_ringgold_id or ''))"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"v8S4dwfM0A-1","outputId":"44ece1df-3a30-4aa8-8bd6-0199353f0585","executionInfo":{"status":"ok","timestamp":1643211342356,"user_tz":-60,"elapsed":307,"user":{"displayName":"","photoUrl":"","userId":""}}},"execution_count":5,"outputs":[{"output_type":"stream","name":"stdout","text":["Ringgold ID: 28359\n"]}]},{"cell_type":"markdown","source":["To sum up the process up until now:\n","1. We used an organization's ROR ID to query the ROR API for an organization's grid ID and Wikidata ID. \n","2. We use Wikidata as intermediary to retrieve the Ringgold ID of the organization.\n","\n","![organization_data.png]()"],"metadata":{"id":"Tx8YWNLZ7_fx"}},{"cell_type":"markdown","source":["### Connection organization -> people\n","The second part of the process is to query for the people affiliated with the organization. For this we use the ORCID API and search for people affiliated with an organization like it is explained in the ORCID tutorial [\"How do I find ORCID record holders at my institution?\"](https://info.orcid.org/faq/how-do-i-find-orcid-record-holders-at-my-institution/). As parameters for the query we use the Grid ID and Ringgold ID for the organization.\n"],"metadata":{"id":"tQ0ZhMZk_Wcz"}},{"cell_type":"code","source":["ORCID_SEARCH_API = \"https://pub.orcid.org/v3.0/search/\"\n","\n","# query ORCID with an organization's Grid ID and Ringgold\n","def query_orcid_for_affiliations(grid_id, ringgold_id):\n"," query = f\"grid-org-id:{grid_id}\" if grid_id else \"\"\n"," query += \" OR \" if grid_id and ringgold_id else \"\"\n"," query += f\"ringgold-org-id:{ringgold_id}\" if ringgold_id else \"\"\n","\n"," response = requests.get(url=ORCID_SEARCH_API,\n"," params={'q': query},\n"," headers={'Content-Type': 'application/json', 'Accept': 'application/json'})\n"," result=response.json()\n"," return result\n","\n","def extract_orcids_from_affiliated_people(affiliated_people):\n"," people_dict = benedict.from_json(affiliated_people)\n"," for person in people_dict.get('result'):\n"," orcid=benedict(person).get('orcid-identifier.path')\n"," yield orcid\n","\n","#-- example execution\n","affiliated_people = query_orcid_for_affiliations(organization_grid_id, organization_ringgold_id)\n","#pprint.pprint(affiliated_people)\n","print(f\"Number of affiliated people: {affiliated_people.get('num-found','')}\")\n","affiliated_orcids= extract_orcids_from_affiliated_people(affiliated_people)\n","for orcid in affiliated_orcids:\n"," print(orcid)"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"LwhzE2Nc_J-x","outputId":"9a77ca26-0568-42fd-d28b-068469735274","executionInfo":{"status":"ok","timestamp":1643211343190,"user_tz":-60,"elapsed":840,"user":{"displayName":"","photoUrl":"","userId":""}}},"execution_count":6,"outputs":[{"output_type":"stream","name":"stdout","text":["Number of affiliated people: 89\n","0000-0003-3922-8638\n","0000-0002-5610-9908\n","0000-0003-2749-7988\n","0000-0001-9758-904X\n","0000-0003-2718-0913\n","0000-0002-2614-1253\n","0000-0002-1266-4029\n","0000-0003-0929-7528\n","0000-0002-4311-5620\n","0000-0002-1595-3213\n","0000-0001-5135-5758\n","0000-0002-3680-2086\n","0000-0001-7408-0611\n","0000-0002-6347-5666\n","0000-0002-2874-4832\n","0000-0001-5375-3063\n","0000-0003-1574-4865\n","0000-0001-5322-0478\n","0000-0002-1407-7362\n","0000-0001-5492-3212\n","0000-0003-1668-3304\n","0000-0002-6802-1241\n","0000-0001-9248-5444\n","0000-0003-2257-0517\n","0000-0001-5693-4708\n","0000-0002-0938-0340\n","0000-0002-0698-2864\n","0000-0003-2510-0529\n","0000-0002-2342-0636\n","0000-0002-9362-4968\n","0000-0003-3320-5187\n","0000-0002-0021-9729\n","0000-0003-0524-1834\n","0000-0001-8824-8390\n","0000-0002-7760-5708\n","0000-0002-0719-5440\n","0000-0003-1537-2862\n","0000-0001-6260-7578\n","0000-0002-5320-0220\n","0000-0003-1132-7220\n","0000-0002-0474-2410\n","0000-0002-3278-0422\n","0000-0002-3447-0575\n","0000-0002-1851-0442\n","0000-0002-7917-3101\n","0000-0002-1442-335X\n","0000-0002-1019-3606\n","0000-0002-9649-7829\n","0000-0003-1702-8707\n","0000-0002-5124-0165\n","0000-0001-9133-4978\n","0000-0001-8080-5308\n","0000-0001-7086-6211\n","0000-0002-1452-9509\n","0000-0002-8579-9717\n","0000-0002-1019-9151\n","0000-0002-9767-3257\n","0000-0003-4040-9073\n","0000-0003-0226-3608\n","0000-0001-8920-7515\n","0000-0002-2593-8754\n","0000-0001-5712-1565\n","0000-0001-6836-1193\n","0000-0003-0232-7085\n","0000-0002-3557-9345\n","0000-0002-7325-5114\n","0000-0002-3075-7640\n","0000-0002-7992-5668\n","0000-0003-2499-7741\n","0000-0001-5839-0177\n","0000-0002-0310-5831\n","0000-0002-7839-3698\n","0000-0002-4450-349X\n","0000-0003-1800-0351\n","0000-0003-1043-4964\n","0000-0002-3060-7052\n","0000-0003-3709-5608\n","0000-0003-3184-5930\n","0000-0001-7460-7794\n","0000-0001-5336-6899\n","0000-0001-8258-2603\n","0000-0001-9924-9153\n","0000-0003-2237-7725\n","0000-0002-8913-9011\n","0000-0002-2013-6920\n","0000-0001-5232-9236\n","0000-0003-3975-5374\n","0000-0002-0687-5460\n","0000-0001-8777-2780\n"]}]},{"cell_type":"markdown","source":["The connection between organization and people via their affiliation as defined by the ORCID API is quite abroad: \n","\n","* It contains each person that used the organization identifier in one of the sections [employment, education & qualifications, membership & service, invited positions & distinctions](https://info.orcid.org/documentation/integration-guide/working-with-organization-identifiers/) in their ORCID record.\n","* Furthermore the connection is not limited to the current affiliation but also contains people that were affiliated with the organization years ago.\n","\n","--> \n","\n","That's why we decided to use the ORCIDs we retrieve via the search API and query the ORCID API for each of their detailed record to narrow the result set down to only people who \n","* use one of the organization's IDs in the employment section\n","* and that are currently employed (end-date of employment is empty)"],"metadata":{"id":"DUMnQM62MXns"}},{"cell_type":"code","source":["ORCID_RECORD_API = \"https://pub.orcid.org/v3.0/\"\n","\n","# query ORCID for an ORCID record\n","def query_orcid_for_record(orcid_id):\n","\n"," response = requests.get(url=requests.utils.requote_uri(ORCID_RECORD_API + orcid_id),\n"," headers={'Content-Type': 'application/json', 'Accept': 'application/json'})\n"," result=response.json()\n"," return result\n","\n","# check if affiliated person is a current employee\n","def is_current_employee(orcid_id, grid_id, ringgold_id):\n"," # get orcid record\n"," orcid_record = query_orcid_for_record(orcid_id)\n","\n"," #filter for current employees only\n"," record_dict = benedict.from_json(orcid_record)\n"," path_to_employments = \"activities-summary.employments.affiliation-group\"\n"," for employment in record_dict.get(path_to_employments):\n"," employment_dict = benedict(employment)\n"," path_to_orga_id = \"summaries[0].employment-summary.organization.disambiguated-organization.disambiguated-organization-identifier\"\n"," path_to_end_date = \"summaries[0].employment-summary.end-date\"\n","\n"," orga_id = employment_dict.get(path_to_orga_id)\n"," end_date = employment_dict.get(path_to_end_date)\n","\n"," return not end_date and (orga_id == grid_id or orga_id == ringgold_id)\n","\n","\n","#-- example execution\n","affiliated_orcids = extract_orcids_from_affiliated_people(affiliated_people)\n","employee_orcids = [orcid_id for orcid_id in affiliated_orcids if is_current_employee(orcid_id, organization_grid_id, organization_ringgold_id)]\n","print(f\"Number of current employees: {len(employee_orcids)}\")\n","for orcid_id in employee_orcids:\n"," print(orcid_id)"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"6Ac2mk4vOz1A","outputId":"711330f3-415d-4404-acb9-a8645bb86fc7","executionInfo":{"status":"ok","timestamp":1643211372146,"user_tz":-60,"elapsed":28960,"user":{"displayName":"","photoUrl":"","userId":""}}},"execution_count":7,"outputs":[{"output_type":"stream","name":"stdout","text":["Number of current employees: 59\n","0000-0003-3922-8638\n","0000-0002-5610-9908\n","0000-0003-2749-7988\n","0000-0001-9758-904X\n","0000-0003-2718-0913\n","0000-0002-2614-1253\n","0000-0003-0929-7528\n","0000-0002-1595-3213\n","0000-0001-5135-5758\n","0000-0002-3680-2086\n","0000-0002-6347-5666\n","0000-0002-2874-4832\n","0000-0001-5375-3063\n","0000-0003-1574-4865\n","0000-0001-5322-0478\n","0000-0002-1407-7362\n","0000-0001-5492-3212\n","0000-0002-6802-1241\n","0000-0001-9248-5444\n","0000-0002-0938-0340\n","0000-0002-0698-2864\n","0000-0003-2510-0529\n","0000-0002-2342-0636\n","0000-0002-9362-4968\n","0000-0001-8824-8390\n","0000-0002-0719-5440\n","0000-0001-6260-7578\n","0000-0003-1132-7220\n","0000-0002-0474-2410\n","0000-0002-3278-0422\n","0000-0002-3447-0575\n","0000-0002-1851-0442\n","0000-0002-7917-3101\n","0000-0002-1442-335X\n","0000-0003-1702-8707\n","0000-0002-5124-0165\n","0000-0001-8080-5308\n","0000-0002-1452-9509\n","0000-0002-8579-9717\n","0000-0002-1019-9151\n","0000-0003-4040-9073\n","0000-0003-0226-3608\n","0000-0001-8920-7515\n","0000-0002-2593-8754\n","0000-0001-6836-1193\n","0000-0003-0232-7085\n","0000-0002-7325-5114\n","0000-0002-7992-5668\n","0000-0003-2499-7741\n","0000-0001-5839-0177\n","0000-0002-0310-5831\n","0000-0003-1043-4964\n","0000-0002-3060-7052\n","0000-0003-3709-5608\n","0000-0003-3184-5930\n","0000-0001-5336-6899\n","0000-0002-8913-9011\n","0000-0002-2013-6920\n","0000-0003-3975-5374\n"]}]},{"cell_type":"markdown","source":["--> For this example we were able to narrow down the result from 89 affiliated people to 59 currently employed people."],"metadata":{"id":"3oj7LKIlZBgT"}}]} \ No newline at end of file