Skip to content

Commit

Permalink
Refactor code based on Wiley input file changes
Browse files Browse the repository at this point in the history
* Add smart_open library so input files can be streamed directly from S3 based on a URI
* Remove unncessary dependencies
* Update test fixture from .xlsx to .csv file
  • Loading branch information
ehanson8 committed Oct 13, 2021
1 parent 9e2c60d commit b4bb0d5
Show file tree
Hide file tree
Showing 7 changed files with 128 additions and 223 deletions.
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -132,4 +132,4 @@ dmypy.json
*.pdf
*.xlsx
.DS_Store
!/fixtures/test.xlsx
!/fixtures/test.csv
3 changes: 1 addition & 2 deletions Pipfile
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,9 @@ verify_ssl = true
name = "pypi"

[packages]
pandas = "*"
openpyxl = "*"
click = "*"
coveralls = "*"
smart-open = "*"

[dev-packages]
black = "==21.7b0"
Expand Down
331 changes: 119 additions & 212 deletions Pipfile.lock

Large diffs are not rendered by default.

12 changes: 5 additions & 7 deletions awd/crossref.py
Original file line number Diff line number Diff line change
@@ -1,14 +1,12 @@
import pandas
import requests
import smart_open


def get_dois_from_spreadsheet(file):
"""Retriev DOIs from the Wiley-provided spreadsheet."""
excel_data_df = pandas.read_excel(
file, sheet_name="MIT Article List", skiprows=range(0, 4)
)
for doi in excel_data_df["DOI"].tolist():
yield doi
"""Retriev DOIs from the Wiley-provided CSV file."""
with smart_open.open(file, encoding="utf-8-sig") as csvfile:
for doi in csvfile.read().splitlines():
yield doi


def get_crossref_work_from_doi(api_url, doi):
Expand Down
1 change: 1 addition & 0 deletions fixtures/test.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
10.1002/term.3131
Binary file removed fixtures/test.xlsx
Binary file not shown.
2 changes: 1 addition & 1 deletion tests/test_crossref.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ def work_record():


def test_get_dois_from_spreadsheet():
dois = crossref.get_dois_from_spreadsheet("fixtures/test.xlsx")
dois = crossref.get_dois_from_spreadsheet("fixtures/test.csv")
for doi in dois:
assert doi == "10.1002/term.3131"

Expand Down

0 comments on commit b4bb0d5

Please sign in to comment.