Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PubTator file not found error #4

Closed
dimitrijenyc opened this issue Nov 5, 2019 · 7 comments
Closed

PubTator file not found error #4

dimitrijenyc opened this issue Nov 5, 2019 · 7 comments

Comments

@dimitrijenyc
Copy link

I have managed to install BERN on my Linux 18 machine, under Python 3.6 and everything seems fine upon starting the server. The output tin the log file looks like the following:

nohup: ignoring input
/home/ubuntu/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:523: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/ubuntu/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:524: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/ubuntu/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/ubuntu/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:526: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/ubuntu/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:527: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/ubuntu/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:532: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
[05/Nov/2019 16:35:28.802904] Starting..
2019-11-05 16:35:28.835150: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
A GPU is NOT available
WARNING:tensorflow:Estimator's model_fn (<function model_fn_builder.<locals>.model_fn at 0x7fbd33f8c8c8>) includes params argument, but params are not passed to Estimator.
INFO:tensorflow:Using config: {'_model_dir': './biobert_ner/pretrainedBERT/gene', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 1000, '_save_checkpoints_secs': None, '_session_config': gpu_options {
  allow_growth: true
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fbd2a69f358>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1, '_tpu_config': TPUConfig(iterations_per_loop=1000, num_shards=8, num_cores_per_replica=None, per_host_input_for_training=3, tpu_job_name=None, initial_infeed_sleep_secs=None, input_partition_dims=None), '_cluster': None}
INFO:tensorflow:_TPUContext: eval_on_tpu True
WARNING:tensorflow:eval_on_tpu ignored because use_tpu is False.
WARNING:tensorflow:Estimator's model_fn (<function model_fn_builder.<locals>.model_fn at 0x7fbd2a92d6a8>) includes params argument, but params are not passed to Estimator.
INFO:tensorflow:Using config: {'_model_dir': './biobert_ner/pretrainedBERT/disease', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 1000, '_save_checkpoints_secs': None, '_session_config': gpu_options {
  allow_growth: true
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fbd2a69f4e0>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1, '_tpu_config': TPUConfig(iterations_per_loop=1000, num_shards=8, num_cores_per_replica=None, per_host_input_for_training=3, tpu_job_name=None, initial_infeed_sleep_secs=None, input_partition_dims=None), '_cluster': None}
INFO:tensorflow:_TPUContext: eval_on_tpu True
WARNING:tensorflow:eval_on_tpu ignored because use_tpu is False.
WARNING:tensorflow:Estimator's model_fn (<function model_fn_builder.<locals>.model_fn at 0x7fbd2a69c7b8>) includes params argument, but params are not passed to Estimator.
INFO:tensorflow:Using config: {'_model_dir': './biobert_ner/pretrainedBERT/drug', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 1000, '_save_checkpoints_secs': None, '_session_config': gpu_options {
  allow_growth: true
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fbd2a69f668>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1, '_tpu_config': TPUConfig(iterations_per_loop=1000, num_shards=8, num_cores_per_replica=None, per_host_input_for_training=3, tpu_job_name=None, initial_infeed_sleep_secs=None, input_partition_dims=None), '_cluster': None}
INFO:tensorflow:_TPUContext: eval_on_tpu True
WARNING:tensorflow:eval_on_tpu ignored because use_tpu is False.
WARNING:tensorflow:Estimator's model_fn (<function model_fn_builder.<locals>.model_fn at 0x7fbd2a69c8c8>) includes params argument, but params are not passed to Estimator.
INFO:tensorflow:Using config: {'_model_dir': './biobert_ner/pretrainedBERT/species', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 1000, '_save_checkpoints_secs': None, '_session_config': gpu_options {
  allow_growth: true
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fbd2a69f7f0>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1, '_tpu_config': TPUConfig(iterations_per_loop=1000, num_shards=8, num_cores_per_replica=None, per_host_input_for_training=3, tpu_job_name=None, initial_infeed_sleep_secs=None, input_partition_dims=None), '_cluster': None}
INFO:tensorflow:_TPUContext: eval_on_tpu True
WARNING:tensorflow:eval_on_tpu ignored because use_tpu is False.
BioBERT init_t 3.838 sec.
[05/Nov/2019 16:35:32.679049] Starting server at http://0.0.0.0:8888
gid2oid loaded 59849
gene meta #ids 42916, #ext_ids 42916
disease meta #ids 12122, #ext_ids 15040
chem meta #ids 178395, #ext_ids 178795

Then, when I proceed to test the example script which is mentioned in the README file:

import requests
import json
body_data = {"param": json.dumps({"text": "CLAPO syndrome: identification of somatic activating PIK3CA mutations and delineation of the natural history and phenotype. PURPOSE: CLAPO syndrome is a rare vascular disorder characterized by capillary malformation of the lower lip, lymphatic malformation predominant on the face and neck, asymmetry, and partial/generalized overgrowth. Here we tested the hypothesis that, although the genetic cause is not known, the tissue distribution of the clinical manifestations in CLAPO seems to follow a pattern of somatic mosaicism. METHODS: We clinically evaluated a cohort of 13 patients with CLAPO and screened 20 DNA blood/tissue samples from 9 patients using high-throughput, deep sequencing. RESULTS: We identified five activating mutations in the PIK3CA gene in affected tissues from 6 of the 9 patients studied; one of the variants (NM_006218.2:c.248T>C; p.Phe83Ser) has not been previously described in developmental disorders. CONCLUSION: We describe for the first time the presence of somatic activating PIK3CA mutations in patients with CLAPO. We also report an update of the phenotype and natural history of the syndrome."})}
response = requests.post('http://127.0.01:8888', data=body_data)
result_dict = response.json()
print(result_dict)

It complains about the missing PubTator file in the output folder:

127.0.0.1 - - [05/Nov/2019 16:41:05] "POST / HTTP/1.1" 200 -
[05/Nov/2019 16:41:05.504282] [Thread-1] text_hash: 3da0b63ecd8efcc2a76bbe02df6fc42b9003c94c0662a9223396c2f7
[05/Nov/2019 16:41:06.330533] [Thread-1] GNormPlus 0.826 sec
----------------------------------------
Exception happened during processing of request from ('127.0.0.1', 51812)
Traceback (most recent call last):
  File "/usr/lib/python3.6/shutil.py", line 550, in move
    os.rename(src, real_dst)
FileNotFoundError: [Errno 2] No such file or directory: '/home/ubuntu/bern/GNormPlusJava/output/3da0b63ecd8efcc2a76bbe02df6fc42b9003c94c0662a9223396c2f7.PubTator' -> '/home/ubuntu/bern/tmVarJava/input/3da0b63ecd8efcc2a76bbe02df6fc42b9003c94c0662a9223396c2f7.PubTator'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.6/socketserver.py", line 654, in process_request_thread
    self.finish_request(request, client_address)
  File "/usr/lib/python3.6/socketserver.py", line 364, in finish_request
    self.RequestHandlerClass(request, client_address, self)
  File "/usr/lib/python3.6/socketserver.py", line 724, in __init__
    self.handle()
  File "/usr/lib/python3.6/http/server.py", line 418, in handle
    self.handle_one_request()
  File "/usr/lib/python3.6/http/server.py", line 406, in handle_one_request
    method()
  File "server.py", line 317, in do_POST
    text, cur_thread_name, is_raw_text=True, reuse=False)
  File "server.py", line 420, in tag_entities
    shutil.move(output_gnormplus, input_tmvar2)
  File "/usr/lib/python3.6/shutil.py", line 564, in move
    copy_function(src, real_dst)
  File "/usr/lib/python3.6/shutil.py", line 263, in copy2
    copyfile(src, dst, follow_symlinks=follow_symlinks)
  File "/usr/lib/python3.6/shutil.py", line 120, in copyfile
    with open(src, 'rb') as fsrc:
FileNotFoundError: [Errno 2] No such file or directory: '/home/ubuntu/bern/GNormPlusJava/output/3da0b63ecd8efcc2a76bbe02df6fc42b9003c94c0662a9223396c2f7.PubTator'

Would you be able to let me know what the issue might be?
Thank you!

@donghyeonk
Copy link
Collaborator

Hi @dimitrijenyc

  1. Did you install GnormPlusJava and tmVarJava?
  2. And did you run GNormPlusServer.jar and tmVar2Server.jar?

@dimitrijenyc
Copy link
Author

dimitrijenyc commented Nov 6, 2019

Hi @donghyeonk ,

Yes, I did install both GNormPlusJava and tmVarJava and both of them are up and running, which can be seen from the following command output:

Every 5.0s: ps auxww | egrep 'python|java|node' | grep -v grep                                                                                                                    ip-172-31-30-130: Wed Nov  6 17:20:26 2019

root      1358  0.0  0.0 169096 17188 ?        Ssl  17:12   0:00 /usr/bin/python3 /usr/bin/networkd-dispatcher --run-startup-triggers
root      1481  0.0  0.0 185944 20088 ?        Ssl  17:12   0:00 /usr/bin/python3 /usr/share/unattended-upgrades/unattended-upgrade-shutdown --wait-for-signal
ubuntu    2850  7.1 16.9 20105432 5590012 pts/0 Sl  17:16   0:15 java -Xmx16G -Xms16G -jar GNormPlusServer.jar 18895
ubuntu    2870  1.0  0.9 11410160 317328 pts/0 Sl   17:16   0:02 java -Xmx8G -Xms8G -jar tmVar2Server.jar 18896
ubuntu    2899  0.0  0.0  29072  9728 pts/0    S    17:17   0:00 python3 normalizers/chemical_normalizer.py
ubuntu    2900  0.0  0.0  28780  9552 pts/0    S    17:17   0:00 python3 normalizers/species_normalizer.py
ubuntu    2901  0.0  0.0  28780  9592 pts/0    S    17:17   0:00 python3 normalizers/mutation_normalizer.py
ubuntu    2902  2.0  1.2 20100700 415292 pts/0 Sl   17:17   0:04 java -Xmx16G -jar resources/normalizers/disease/disease_normalizer_19.jar
ubuntu    2903  0.0  0.0 24450020 28812 pts/0  Sl   17:17   0:00 java -Xmx20G -jar GNormPlus_180921.jar
ubuntu    2945  2.5  0.8 1540584 273628 pts/0  Sl   17:17   0:04 python3 -u server.py --port 8888 --gnormplus_home /home/ubuntu/bern/GNormPlusJava --gnormplus_port 18895 --tmvar2_home /home/ubuntu/bern/tmVarJava --tmvar2
_port 18896

When I run the above python test script, I notice that it hangs on the execution of GNormPlusServer.jar which then shuts down due to exception. The input file seems to be created and left in ~/bern/GNormPlusJava/input directory, but nothing is written in the ~/bern/GNormPlusJava/output directory. Following is the content of the produced input PubTator file in ~/bern/GNormPlusJava/input directory:

3da0b63ecd8efcc2a76bbe02df6fc42b9003c94c0662a9223396c2f7|t|
3da0b63ecd8efcc2a76bbe02df6fc42b9003c94c0662a9223396c2f7|a|CLAPO syndrome: identification of somatic activating PIK3CA mutations and delineation of the natural history and phenotype. PURPOSE: CLAPO syndrome is a rare vascular disorder characterized by capillary malformation of the lower lip, lymphatic malformation predominant on the face and neck, asymmetry, and partial/generalized overgrowth. Here we tested the hypothesis that, although the genetic cause is not known, the tissue distribution of the clinical manifestations in CLAPO seems to follow a pattern of somatic mosaicism. METHODS: We clinically evaluated a cohort of 13 patients with CLAPO and screened 20 DNA blood/tissue samples from 9 patients using high-throughput, deep sequencing. RESULTS: We identified five activating mutations in the PIK3CA gene in affected tissues from 6 of the 9 patients studied; one of the variants (NM_006218.2:c.248T>C; p.Phe83Ser) has not been previously described in developmental disorders. CONCLUSION: We describe for the first time the presence of somatic activating PIK3CA mutations in patients with CLAPO. We also report an update of the phenotype and natural history of the syndrome.

@donghyeonk
Copy link
Collaborator

Thanks for the error report. We will fix the error as soon as it can be reproduced.

@donghyeonk
Copy link
Collaborator

Hi @dimitrijenyc

I modified some code that seems to be related to the problem. Please try again as follows.

  1. git pull
  2. Restart BERN

Have a nice day.

@dimitrijenyc
Copy link
Author

dimitrijenyc commented Nov 12, 2019

@donghyeonk
Thank you for responding to my question and updating the code. I am now seeing another problem during the startup of the server and which seems to be due to the missing resource file:

Traceback (most recent call last):
  File "server.py", line 676, in <module>
    Main(args)
  File "server.py", line 631, in __init__
    GetHandler.normalizer = Normalizer()
  File "/home/ubuntu/bern/normalize.py", line 116, in __init__
    with open(self.METADATA_PATH['drug'], 'r', encoding='utf-8') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'normalization/resources/meta/chem_meta.tsv'

When I run:
sh download_norm.sh

I get the following message regardin the download of the data.zip file:

~/bern/scripts$ sh download_norm.sh 
Found data.zip.
Archive:  data.zip
  End-of-central-directory signature not found.  Either this file is not
  a zipfile, or it constitutes one disk of a multi-part archive.  In the
  latter case the central directory and zipfile comment will be found on
  the last disk(s) of this archive.
unzip:  cannot find zipfile directory in one of data.zip or
        data.zip.zip, and cannot find data.zip.ZIP, period.

Lets say that however I fix the above script and succeed to download the resources.zip and data.zip, and in the file bern/normalize.py I substitute the appearance of meta/chem_meta.tsv with meta/chem_meta_190821.tsv, I still get the same PubTator file not found error as reported initially.

@dimitrijenyc
Copy link
Author

@donghyeonk

In summary, I was able to make the python script run work, by making one change in file normalize.py where I replaced meta/chem_meta.tsv with meta/chem_meta_190821.tsv.

Here's my output:

>>> import json
>>> import requests
>>> body_data = {"param": json.dumps({"text": "CLAPO syndrome: identification of somatic activating PIK3CA mutations and delineation of the natural history and phenotype. PURPOSE: CLAPO syndrome is a rare vascular disorder characterized by capillary malformation of the lower lip, lymphatic malformation predominant on the face and neck, asymmetry, and partial/generalized overgrowth. Here we tested the hypothesis that, although the genetic cause is not known, the tissue distribution of the clinical manifestations in CLAPO seems to follow a pattern of somatic mosaicism. METHODS: We clinically evaluated a cohort of 13 patients with CLAPO and screened 20 DNA blood/tissue samples from 9 patients using high-throughput, deep sequencing. RESULTS: We identified five activating mutations in the PIK3CA gene in affected tissues from 6 of the 9 patients studied; one of the variants (NM_006218.2:c.248T>C; p.Phe83Ser) has not been previously described in developmental disorders. CONCLUSION: We describe for the first time the presence of somatic activating PIK3CA mutations in patients with CLAPO. We also report an update of the phenotype and natural history of the syndrome."})}
>>> 
>>> response = requests.post('http://127.0.0.1:8888', data=body_data)
>>> response
<Response [200]>
>>> response.content
b'{"project": "BERN", "sourcedb": "", "sourceid": "3da0b63ecd8efcc2a76bbe02df6fc42b9003c94c0662a9223396c2f7", "text": "CLAPO syndrome: identification of somatic activating PIK3CA mutations and delineation of the natural history and phenotype. PURPOSE: CLAPO syndrome is a rare vascular disorder characterized by capillary malformation of the lower lip, lymphatic malformation predominant on the face and neck, asymmetry, and partial/generalized overgrowth. Here we tested the hypothesis that, although the genetic cause is not known, the tissue distribution of the clinical manifestations in CLAPO seems to follow a pattern of somatic mosaicism. METHODS: We clinically evaluated a cohort of 13 patients with CLAPO and screened 20 DNA blood/tissue samples from 9 patients using high-throughput, deep sequencing. RESULTS: We identified five activating mutations in the PIK3CA gene in affected tissues from 6 of the 9 patients studied; one of the variants (NM_006218.2:c.248T>C; p.Phe83Ser) has not been previously described in developmental disorders. CONCLUSION: We describe for the first time the presence of somatic activating PIK3CA mutations in patients with CLAPO. We also report an update of the phenotype and natural history of the syndrome.", "denotations": [{"id": ["MESH:C567763", "BERN:262813101"], "span": {"begin": 0, "end": 14}, "obj": "disease"}, {"id": ["MIM:171834", "HGNC:8975", "Ensembl:ENSG00000121879", "BERN:324295302"], "span": {"begin": 53, "end": 59}, "obj": "gene"}, {"id": ["MESH:C567763", "BERN:262813101"], "span": {"begin": 133, "end": 147}, "obj": "disease"}, {"id": ["MESH:D014652", "BERN:256572101"], "span": {"begin": 158, "end": 175}, "obj": "disease"}, {"id": ["MESH:C567763", "BERN:262813101"], "span": {"begin": 193, "end": 232}, "obj": "disease"}, {"id": ["MESH:C567763", "BERN:262813101"], "span": {"begin": 234, "end": 289}, "obj": "disease"}, {"id": ["MESH:C567763", "BERN:262813101"], "span": {"begin": 589, "end": 594}, "obj": "disease"}, {"id": ["MIM:171834", "HGNC:8975", "Ensembl:ENSG00000121879", "BERN:324295302"], "span": {"begin": 748, "end": 759}, "obj": "gene"}, {"id": ["BERN:9309004"], "span": {"begin": 847, "end": 855}, "obj": "mutation", "mutationType": "DNAMutation", "normalizedName": "c|SUB|T|248|C"}, {"id": ["BERN:5820904"], "span": {"begin": 857, "end": 867}, "obj": "mutation", "mutationType": "ProteinMutation", "normalizedName": "p|SUB|F|83|S"}, {"id": ["BERN:257523801"], "span": {"begin": 906, "end": 929}, "obj": "disease"}, {"id": ["CUI-less"], "span": {"begin": 1009, "end": 1025}, "obj": "gene"}, {"id": ["MESH:C567763", "BERN:262813101"], "span": {"begin": 1043, "end": 1048}, "obj": "disease"}], "timestamp": "Tue Nov 12 20:35:51 +0000 2019"}'

@donghyeonk
Copy link
Collaborator

Hi @dimitrijenyc

Sorry for the inconvenience.

You can download "chem_meta.tsv" file at
https://drive.google.com/file/d/12fMnhml9YVg-Jfl_VyggJx37V17iOq9g/view?usp=sharing

And I added "chem_meta.tsv" to the "resources.zip".

Have a nice day.

This was referenced May 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants