Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

medcat.utils.preprocess_snomed.Snomed - FileNotFoundError #198

Closed
KeironO opened this issue Jan 25, 2022 · 5 comments
Closed

medcat.utils.preprocess_snomed.Snomed - FileNotFoundError #198

KeironO opened this issue Jan 25, 2022 · 5 comments
Assignees

Comments

@KeironO
Copy link

KeironO commented Jan 25, 2022

Hi there,

Whenever I attempt to use the Snomed preprocess utility set, I have file not found errors:

from medcat.utils.preprocess_snomed import Snomed
snomed = Snomed("C:/path/to/dir/uk_sct2cl_32.7.0_20211124000001Z/")
cdf = snomed.to_concept_df()

Returns

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-23-5eb639e435ed> in <module>
----> 1 cdf = snomed.to_concept_df()

~\Projects\nlp\env\lib\site-packages\medcat\utils\preprocess_snomed.py in to_concept_df(self)
     50                     snomed_v = m.group(1)
     51 
---> 52             int_terms = parse_file(f'{contents_path}/sct2_Concept_Snapshot_{snomed_v}_{snomed_release}.txt')
     53             active_terms = int_terms[int_terms.active == '1']
     54             del int_terms

~\Projects\nlp\env\lib\site-packages\medcat\utils\preprocess_snomed.py in parse_file(filename, first_row_header, columns)
      7 
      8 def parse_file(filename, first_row_header=True, columns=None):
----> 9     with open(filename, encoding='utf-8') as f:
     10         entities = [[n.strip() for n in line.split('\t')] for line in f]
     11         return pd.DataFrame(entities[1:], columns=entities[0] if first_row_header else columns)

FileNotFoundError: [Errno 2] No such file or directory: 'C:/path/to/dir/uk_sct2cl_32.7.0_20211124000001Z/SnomedCT_UKClinicalRefsetsRF2_PRODUCTION_20211124T000001Z\\Snapshot\\Terminology/sct2_Concept_Snapshot_INT_20211124.txt'

Where the file is named sct2_Concept_UKCRSnapshot_GB1000000_20211124.txt

Best wishes,

Keiron

@w-is-h w-is-h assigned w-is-h and antsh3k and unassigned w-is-h Jan 25, 2022
@antsh3k
Copy link
Collaborator

antsh3k commented Jan 25, 2022

Dear Keiron,

Thank you for flagging to the team that the format of new UK extension releases has now changed.
I will make changes to enable the processing of the new release format and will let you know when it is done.

@KeironO
Copy link
Author

KeironO commented Jan 25, 2022

@antsh3k no worries, I can have a go at fixing it if you're snowed under?

@antsh3k
Copy link
Collaborator

antsh3k commented Jan 25, 2022

dw, I can change it. Although, If you can test and feedback that would be amazing!

The changes will be reviewed and integrated by tomorrow.

@antsh3k
Copy link
Collaborator

antsh3k commented Jan 26, 2022

The changes have been made PR #199. The following should work now:
from medcat.utils.preprocess_snomed import Snomed
snomed = Snomed("C:/path/to/dir/uk_sct2cl_32.7.0_20211124000001Z/")
snomed.uk_ext = True # Note: this will only work with UK extensions >2021 with the new release format. Prior UK extension releases should skip this step.
cdf = snomed.to_concept_df()

Let me know if there are any further issues.

@KeironO
Copy link
Author

KeironO commented Jan 26, 2022

Seems to work fine now. Thank you for your work :D

@w-is-h w-is-h closed this as completed Jan 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants