Skip to content
This repository has been archived by the owner on Mar 21, 2024. It is now read-only.

Commit

Permalink
Add basic check for toolbox documentation with denied access
Browse files Browse the repository at this point in the history
See #2
  • Loading branch information
sco1 committed Mar 20, 2018
1 parent 68d8964 commit b621878
Show file tree
Hide file tree
Showing 2 changed files with 12 additions and 4 deletions.
8 changes: 6 additions & 2 deletions MATLABfcnscrape.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ def scrapedocpage(URL):
Object methods (foo.bar) and comments (leading %) are excluded
Returns a list of function name strings
Returns a list of function name strings, or an empty list if none are found (e.g. no permission for toolbox)
"""
r = requests.get(URL, timeout=2)
soup = BeautifulSoup(r.content, 'html.parser')
Expand Down Expand Up @@ -136,7 +136,11 @@ def helpURLbuilder(shortlink, prefix="https://www.mathworks.com/help/", suffix="
for toolbox, URL in toolboxdict.items():
try:
fcnlist = scrapedocpage(URL)
writeToolboxJSON(fcnlist, toolbox, outpath)
if len(fcnlist) == 0:
# No functions found, most likely because permission for the toolbox documentation is denied
logging.info(f"Permission to view documentation for '{toolbox}' has been denied: {URL}")
else:
writeToolboxJSON(fcnlist, toolbox, outpath)
except (requests.exceptions.Timeout, requests.exceptions.ConnectionError):
# TODO: Add a retry pipeline, verbosity of exception
logging.info(f"Unable to access online docs for '{toolbox}': '{URL}'")
Expand Down
8 changes: 6 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,11 @@
# MATLABfcnscrape
Scrape MATLAB's documentation for all valid function names and output to JSON files for external use

A JSON file is output per toolbox
A JSON file is output per toolbox. All unique functions are also consolidated into a single JSON.

## Notes
Object methods (e.g. `cdflib.close`) are ignored for the JSON output
* Only those toolboxes under the 'MATLAB Family' are considered at this time: https://www.mathworks.com/help/index.html
* Several toolboxes are inaccessible by some users due to licensing restrictions
* See [this issue](https://github.com/StackOverflowMATLABchat/MATLABfcnscrape/issues/2) for an up-to-date list
* Pull requests for these toolboxes are welcome
* Object methods (e.g. `cdflib.close`) are ignored for the JSON output

0 comments on commit b621878

Please sign in to comment.