# Github API to Search for FHIR Profiles

The Github API is documented here: https://docs.github.com/en/rest/search and the specific code (e.g, file) search here: https://docs.github.com/en/search-github/searching-on-github/searching-code

This script uses the Github API to Search for FHIR Profiles. It is base on this [tutorial](https://python.gotrained.com/search-github-api/#Search_Files) which go over the setup and how to search for files in a repository using keywords. For searching for FHIR Profiles (Structurdefinition files) in Github the script hardcodes the organization to "HL7" and the typical Implementation Guide file paths as shown below.  The keywords are user input at runtime.  For searching for **xml and json** files the keywords and path I used are:

- path: "input/resources"
- keywords: "differential" {Type}(e.g., "ServiceRequest)

For searching for **fsh** files is separate search. The keywords and path I used are:

- path: "input/fsh"
- keywords: "Profile" {Type}(e.g., "ServiceRequest)

The result of the search is a list of URLs. This is a crude search casting a broad net that includes many more types of fhir resources than StructureDefinition files (Profiles).  You can use the URLS to download files and filter out what you want. I use the Requests library to download the files, create fhir_objects and filter out the Structuredefinition files that I am interested.  This is demonstrated here.

If you try to search across all of GitHub you may get a "secondary rate limit" like I did.


In [24]:
# Import required modules
import time
# import csv
# from json import dumps
from github import Github

In [25]:
# Paste your Access token here
# To create an access token - https://github.com/settings/tokens
ACCESS_TOKEN =  "--------------"  #### don't save this to GitHub or it will be compromised and unusable

g = Github(ACCESS_TOKEN)

In [26]:
# check if the token is valid
print(g.get_user().get_repos())

<github.PaginatedList.PaginatedList object at 0x106cb39a0>


In [27]:
def search_github(keywords):
    rate_limit = g.get_rate_limit() # (For requests using Basic Authentication, OAuth, or client ID and secret, you can make up to 30 requests per minute)  
    rate = rate_limit.search
    if rate.remaining == 0:
        print(f'You have 0/{rate.limit} API calls remaining. Reset time: {rate.reset}')
        return
    else:
        print(f'You have {rate.remaining}/{rate.limit} API calls remaining')

    #query = '+'.join(keywords) + '+ org:HL7 path:input/resources' # for xml and json
    query = '+'.join(keywords) + '+ org:HL7 path:input/fsh' # for fsh
    print(f'Searching for {query}')
    result = g.search_code(query, order='desc')

    max_size = 100
    print(f'Found {result.totalCount} file(s)')
    if result.totalCount > max_size:
        result = result[:max_size]
 
    for file in result:
        print(f'{file.download_url}')

keywords = input('Enter keyword(s)[e.g "differential", "ServiceRequest"]: ')
keywords = [keyword.strip() for keyword in keywords.split(',')]
search_github(keywords)

You have 27/30 API calls remaining
Searching for Profile Coverage+ org:HL7 path:input/fsh
Found 75 file(s)
https://raw.githubusercontent.com/HL7/carin-bb/0759f4ee87cc738c312d9811d694fac6a3563b0d/input/fsh/CoverageProfile.fsh
https://raw.githubusercontent.com/HL7/carin-digital-insurance-card/06ed9d9b9b99fba17e6cedd36d3722cc924b4d9b/input/fsh/CoverageProfile.fsh
https://raw.githubusercontent.com/HL7/davinci-pct/6e5af9d97051035c417c683b78641add6c6d7f66/input/fsh/pct_coverage.fsh
https://raw.githubusercontent.com/HL7/carin-digital-insurance-card/09247529534e7bb9704e2c54408c909e451d7946/input/fsh/scripts/capabilitystatement-c4dic.json
https://raw.githubusercontent.com/HL7/carin-digital-insurance-card/09247529534e7bb9704e2c54408c909e451d7946/input/fsh/scripts/Narrative-capabilitystatement-c4dic.json
https://raw.githubusercontent.com/HL7/carin-bb/2191ed033bb0997762a9adc47496dfadffe94cdc/input/fsh/DEF_VersionInvariants.fsh
https://raw.githubusercontent.com/HL7/davinci-pdex-formulary/7b430f769c