Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeError: 'NoneType' object is not iterable #15

Closed
Don-Yin opened this issue May 27, 2022 · 6 comments
Closed

TypeError: 'NoneType' object is not iterable #15

Don-Yin opened this issue May 27, 2022 · 6 comments

Comments

@Don-Yin
Copy link

Don-Yin commented May 27, 2022

There appears to be a type error in "finder.py" that only emerges on certain PDF files. This one, for example:
paper12.2009_unknown_040916_440842.pdf

A miniumn code snippet for reproducing this error:

from pathlib import Path
import pdf2doi

pdf2doi.config.set("verbose", False)
PDF_name = "paper12.2009_unknown_040916_440842.pdf"
results = pdf2doi.pdf2doi(str(Path("examples", PDF_name)))

Where the PDF is placed in the example folder.

Here is the error message:

Traceback (most recent call last):
  File "/Users/donyin/Desktop/pdf2doi-master/main.py", line 15, in <module>
    results = pdf2doi.pdf2doi(str(Path("examples", i)))
  File "/Users/donyin/Desktop/pdf2doi-master/pdf2doi/main.py", line 90, in pdf2doi
    result = pdf2doi_singlefile(filename)
  File "/Users/donyin/Desktop/pdf2doi-master/pdf2doi/main.py", line 134, in pdf2doi_singlefile
    result = finders.find_identifier(file,method="document_infos",keysToCheckFirst=['/doi','/identfier'])
  File "/Users/donyin/Desktop/pdf2doi-master/pdf2doi/finders.py", line 548, in find_identifier
    identifier, desc, info = finder_methods[method](file,func_validate,**kwargs)
  File "/Users/donyin/Desktop/pdf2doi-master/pdf2doi/finders.py", line 586, in find_identifier_in_pdf_info
    identifier,desc,info = find_identifier_in_text(pdfinfo[key],func_validate)
  File "/Users/donyin/Desktop/pdf2doi-master/pdf2doi/finders.py", line 286, in find_identifier_in_text
    for identifier in identifiers:
TypeError: 'NoneType' object is not iterable

I thought I fixed this error by adding:

if identifiers is None:
     identifiers = []

at line 286 of your "finder.py", so that it becomes:

        #First we look for DOI
        for v in range(len(doi_regexp)):
            identifiers = extract_doi_from_text(text,version=v)
            if identifiers is None: # <- here
                identifiers = [] # <- here
            for identifier in identifiers:
                validation = func_validate(identifier,'doi')
                if validation: 
                    return identifier, 'DOI', validation

But this was a bit hacky and not the proper solution. You'd undoubtedly know more about what's going on, so I thought I'd let you know about this.

And by the way, there are some deprecated syntax that you might want to address:

UserWarning: Page.extractText is deprecated and will be removed in PyPDF2 2.0.0. Use Page.extract_text instead. [_page.py:1003]
UserWarning: addMetadata is deprecated and will be removed in PyPDF2 2.0.0. Use add_metadata instead. [_writer.py:793]
UserWarning: Page.extractText is deprecated and will be removed in PyPDF2 2.0.0. Use Page.extract_text instead. [_page.py:1003]

cheers,
Don

@MicheleCotrufo
Copy link
Owner

Thanks a lot for reporting this! Indeed, it is due to the bug that you pointed out. I am quite puzzled why this bug has "manifested" only now, I would have excepted this situation to occur more often. Maybe I accidentally created it in the last version....

I fixed the bug by making sure that the functions extract_doi_from_text and extract_arxivID_from_text return an empty list instead of None when nothing is found. I also fixed the deprecates sintaxes that you pointed out. Thanks again!

I released the version 1.2rc1 on pypi, would you mind testing it on this pdf file again?

pip install pdf2doi==1.2rc1

@Don-Yin
Copy link
Author

Don-Yin commented May 28, 2022

Thank you for responding! It is now operational! Well done!
Thank you for this fantastic project.

@alexmaehon
Copy link

same problem

@MicheleCotrufo
Copy link
Owner

same problem

alexmaehon, do you still get this error after upgrading to version 1.2rc1? If yes, can you send me the PDF file and the full traceback?

@MicheleCotrufo
Copy link
Owner

Thank you for responding! It is now operational! Well done! Thank you for this fantastic project.

Excellent, thanks for your help! I will release the 1.2 version later today, then.

MicheleCotrufo added a commit that referenced this issue May 28, 2022
@MicheleCotrufo
Copy link
Owner

This bug was fixed with version 1.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants