Merge 0e1baba into 0327f25

4dn-dcic · Jan 6, 2020 · 636bb5c · 636bb5c
2 parents 0327f25 + 0e1baba
commit 636bb5c
Show file tree

Hide file tree

Showing 5 changed files with 148 additions and 131 deletions.
diff --git a/README.md b/README.md
@@ -6,136 +6,131 @@
 [![Code Quality](https://api.codacy.com/project/badge/Grade/a4d521b4dd9c49058304606714528538)](https://www.codacy.com/app/jeremy_7/Submit4DN)
 [![PyPI version](https://badge.fury.io/py/Submit4DN.svg)](https://badge.fury.io/py/Submit4DN)
 
-The Submit4DN package is written by the [4DN Data Coordination and Integration Center](http://dcic.4dnucleome.org/) for data submitters from the 4DN Network. Please [contact us](mailto:support@4dnucleome.org) to get access to the system, or if you have any questions or suggestions.  Detailed documentation on data submission can be found [at this link](https://docs.google.com/document/d/1Xh4GxapJxWXCbCaSqKwUd9a2wTiXmfQByzP0P8q5rnE/edit?usp=sharing)
+The Submit4DN package is written by the [4DN Data Coordination and Integration Center](http://dcic.4dnucleome.org/) for data submitters from the 4DN Network. Please [contact us](mailto:support@4dnucleome.org) to get access to the system, or if you have any questions or suggestions.  Detailed documentation on data submission can be found [at this link](https://data.4dnucleome.org/help/submitter-guide/getting-started-with-submissions)
 
 ## Installing the package
 
-The Submit4DN package is registered with Pypi so installation is as simple as:
-
 ```
-pip3 install submit4dn
+pip install submit4dn
 ```
 
 To upgrade to the latest version
 
 ```
-pip3 install submit4dn --upgrade
+pip install submit4dn --upgrade
 ```
 
 ### Troubleshooting
 
-If you encounter an error containing something like:
-
-```
- Symbol not found: _PyInt_AsLong
-```
-
-Then it means that the imaging library Pillow / PIL is missing some required libraries.  You can fix it by doing the following.
+This package may install and run on Python v2.7 but using this package with that version is no longer officially supported and your mileage may vary.
 
-```shell
-$ pip uninstall pillow
-$ brew install libjpeg zlib libtiff littlecms webp openjpeg tcl-tk
-$ pip install pillow
-```
+It is recommended to install this package in a virtual environment to avoid dependency clashes.
 
-That should fix it!
+Problems have been reported on recent MacOS X versions having to do with the inablity to find `libmagic`,
+a C library to check file types that is used by the `python-magic` library.
 
+eg. `ImportError: failed to find libmagic.  Check your installation`
 
-Once installed then follow the directions below:
+First thing to try is:
 
+```
+pip uninstall python-magic
+pip install python-magic
+```
 
+If that doesn't work one solution that has worked for some from [here](https://github.com/Yelp/elastalert/issues/1927):
 
-## Connection
-To be able to use the provided tools, you need to have a secure key to access the REST application.
-If you do not have a secure key, please contact [4DN Data Wranglers](mailto:support@4dnucleome.org)
-to get an account and to learn how to generate a key. Place your key in a json file in the following format.
+```
+pip uninstall python-magic
+pip install python-magic-bin==0.4.14
+```
 
-    {
-      "default": {
-        "key": "TheConnectionKey",
-        "secret": "very_secret_key",
-        "server":"www.The4dnWebsite.com"
-      }
-    }
+Others have had success using homebrew to install `libmagic`:
 
-The default location for the keyfile is your home directory `~/keypairs.json`.
-If you prefer to use a different file location or a different key name (not "default"), you can specify your key with the `keyfile` and `key` parameters:
+```
+brew install libmagic
+brew link libmagic  (if the link is already created is going to fail, don't worry about that)
+```
 
-    import_data --keyfile path/to/filename.json --key NotDefault
+## Connecting to the Data Portal
+To be able to use the provided tools, you need to generate an AccessKey on the [data portal](https://data.4dnucleome.org/).
+If you do not yet have access, please contact [4DN Data Wranglers](mailto:support@4dnucleome.org)
+to get an account and [learn how to generate and save a key](https://data.4dnucleome.org/help/submitter-guide/getting-started-with-submissions#getting-connection-keys-for-the-4dn-dcic-servers).
 
 ## Generating data submission forms
 To create the data submission xls forms, you can use `get_field_info`.
+
 It will accept the following parameters:
+~~~~
     --type           use for each sheet that you want to add to the excel workbook
     --descriptions   adds the descriptions in the second line (by default True)
     --enums          adds the enum options in the third line (by default True)
     --comments       adds the comments together with enums (by default False)
     --writexls       creates the xls file (by default True)
     --outfile        change the default file name "fields.xls" to a specified one
     --order          create an ordered and filtered version of the excel (by default True)
-
+~~~~
 
 Examples generating a single sheet:
-```
+~~~~
 get_field_info --type Biosample
 get_field_info --type Biosample --comments
 get_field_info --type Biosample --comments --outfile biosample.xls
-
-```
-
-Example list of sheets:
-~~~~
-get_field_info --type Publication --type Document --type Vendor --type Protocol --type BiosampleCellCulture --type Biosource --type Enzyme --type Construct --type TreatmentAgent --type TreatmentRnai --type Modification --type Biosample --type FileFastq --type IndividualMouse --type ExperimentHiC --type ExperimentSetReplicate --type ExperimentCaptureC --type BioFeature --type GenomicRegion --type ExperimentSet --type Image --comments --outfile MetadataSheets.xls
-~~~~
-
-Example list of sheets: (using python scripts)
-~~~~
-python3 -m wranglertools.get_field_info --type Publication --type Document --type Vendor --type Protocol --type BiosampleCellCulture --type Biosource --type Enzyme --type Construct --type TreatmentAgent --type TreatmentRnai --type Modification --type Biosample --type FileFastq --type IndividualHuman --type ExperimentHiC --type ExperimentCaptureC --type BioFeature --type GenomicRegion --type ExperimentSet --type ExperimentSetReplicate --type Image --comments --outfile MetadataSheets.xls
 ~~~~
 
-Example list of sheets: (Experiment seq)
+Example Workbook with all sheets:
 ~~~~
-python3 -m wranglertools.get_field_info --type Publication --type Document --type Vendor --type Protocol --type BiosampleCellCulture --type Biosource --type Enzyme --type Construct --type TreatmentAgent --type TreatmentRnai --type Modification --type Biosample --type FileFastq --type ExperimentSeq --type BioFeature --type GenomicRegion --type ExperimentSet --type ExperimentSetReplicate --type Image --comments --outfile exp_seq_all.xls
+get_field_info --type all --outfile MetadataSheets.xls
 ~~~~
 
-Example list of sheets: (Experiment seq simple)
+Examples for Workbooks using a preset option:
 ~~~~
-python3 -m wranglertools.get_field_info --type Publication --type Protocol --type BiosampleCellCulture --type Biosource --type Biosample --type FileFastq --type ExperimentSeq --type ExperimentSetReplicate --type Image --comments --outfile exp_seq_simple.xls
-~~~~
-
-Examples for list of sheets using a preset option:
-~~~~
-python3 -m wranglertools.get_field_info --type HiC --comments --outfile exp_hic_generic.xls
-python3 -m wranglertools.get_field_info --type ChIP-seq --comments --outfile exp_chipseq_generic.xls
-python3 -m wranglertools.get_field_info --type FISH --comments --outfile exp_fish_generic.xls
+get_field_info --type HiC --comments --outfile exp_hic_generic.xls
+get_field_info --type ChIP-seq --comments --outfile exp_chipseq_generic.xls
+get_field_info --type FISH --comments --outfile exp_fish_generic.xls
 ~~~~
 
+Current presets include: `Hi-C, ChIP-seq, Repli-seq, ATAC-seq, DamID, ChIA-PET, Capture-C, FISH, SPT`
 
+## Data submission
 
+Please refer to the [submission guidelines](https://data.4dnucleome.org/help/submitter-guide) and become familiar with the metadata structure prior to submission.
 
-## Data submission
 After you fill out the data submission forms, you can use `import_data` to submit the metadata. The method can be used both to create new metadata items and to patch fields of existing items.
-
+~~~~
 	import_data filename.xls
+~~~~
 
-**Uploading vs Patching**
+#### Uploading vs Patching
+
+Runnning `import_data` without one of the flags described below will perform a dry run submission that will include several validation checks.
+It is strongly recommended to do a dry run prior to actual submission and if necessary work with a Data Wrangler to correct any errors.
 
 If there are uuid, alias, @id, or accession fields in the xls form that match existing entries in the database, you will be asked if you want to PATCH each object.
 You can use the `--patchall` flag, if you want to patch ALL objects in your document and ignore that message.
 
 If no object identifiers are found in the document, you need to use `--update` for POSTing to occur.
 
-Please refer to the submission guidelines provided by data wranglers, and get familiar with the metadata structure on example excel workbooks, like one for Rao et al data. You can find examples under the folder "Data Files". Once you understand how to fill in the fields in the excel workbook
+**Other Helpful Advanced parameters**
+
+Normally you are asked to verify the **Lab** and **Award** that you are submitting for.  In some cases it may be desirable to skip this prompt so a submission
+can be run by a scheduler or in the background:
+
+`--remote` is an option that will skip any prompt before submission
+
+**However** if you submit for more than one Lab or there is more than one Award associated with your lab you will need to specify these values
+as parameters using `--lab` and/or `--award` followed by the uuids for the appropriate items.
 
 <img src="https://media.giphy.com/media/l0HlN5Y28D9MzzcRy/giphy.gif" width="200" height="200" />
 
-`--remote` is an option that will skip any prompt before submission, and useful if you are submitting LSF jobs where you expect to run automatically. You should take care of your submitting lab and award if you have multiple, since the first ones on your list will be assigned as default.
 
 # Development
 Note if you are attempting to run the scripts in the wranglertools directory without installing the package then in order to get the correct sys.path you need to run the scripts from the parent directory using the following command format:
 
-    python3 -m wranglertools.get_field_info —-type Biosource
-	python3 -m wranglertools.import_data filename.xls
+```
+  python -m wranglertools.get_field_info —-type Biosource
+	python -m wranglertools.import_data filename.xls
+```
 
 pypi page is - https://pypi.python.org/pypi/Submit4DN
 
@@ -163,7 +158,7 @@ To run the mark tests, or exclude them from the tests you can use the following
     # Run only webtest
     py.test -m webtest
 
-    # Run only tests with file_opration
+    # Run only tests with file_operation
     py.test -m file_operation
 
 For a better testing experienece that also check to ensure sufficient coverage and runs linters use invoke:

diff --git a/requirements.txt b/requirements.txt
@@ -2,14 +2,13 @@
 six
 
 attrs==16.0.0
-Pillow==3.3.0
 py==1.4.31
 pytest==3.0.1
 pytest-cov==2.3.1
-pytest-mock
+pytest-mock==1.11.2
 python-magic==0.4.12
 xlrd==1.0.0
-xlwt==1.1.2
+xlwt
 dcicutils>=0.5.3
 
 # Build tasks

diff --git a/setup.py b/setup.py
@@ -10,7 +10,6 @@
 
 requires = [
     'attrs',
-    'Pillow',
     'py>=1.4.31',
     'python-magic>=0.4.12',
     'wheel>=0.24.0',
@@ -31,6 +30,8 @@
     name='Submit4DN',
     version=open("wranglertools/_version.py").readlines()[-1].split()[-1].strip("\"'"),
     description='Tools for data wrangling and submission to data.4dnucleome.org',
+    long_description=README,
+    long_description_content_type="text/markdown",
     packages=['wranglertools'],
     zip_safe=False,
     author='4DN Team at Harvard Medical School',
@@ -44,7 +45,7 @@
             'Programming Language :: Python :: 2.7',
             'Programming Language :: Python :: 3',
             'Programming Language :: Python :: 3.5',
-            ],
+    ],
     install_requires=requires,
     include_package_data=True,
     tests_require=tests_require,

diff --git a/tests/test_import_data.py b/tests/test_import_data.py
@@ -10,6 +10,13 @@ def test_attachment_from_ftp():
     assert attach
 
 
+@pytest.mark.ftp
+def test_attachment_ftp_to_nowhere():
+    with pytest.raises(Exception) as e:
+        imp.attachment("ftp://on/a/road/to/nowhere/blah.txt")
+    assert "urlopen error" in str(e.value)
+
+
 @pytest.mark.file_operation
 def test_md5():
     md5_keypairs = imp.md5('./tests/data_files/keypairs.json')
@@ -19,8 +26,6 @@ def test_md5():
 @pytest.mark.file_operation
 def test_attachment_image():
     attach = imp.attachment("./tests/data_files/test.jpg")
-    assert attach['height'] == 1080
-    assert attach['width'] == 1920
     assert attach['download'] == 'test.jpg'
     assert attach['type'] == 'image/jpeg'
     assert attach['href'].startswith('data:image/jpeg;base64')
@@ -43,30 +48,31 @@ def test_attachment_image_wrong_extension():
 
 @pytest.mark.file_operation
 def test_attachment_wrong_path():
-    # system exit with wrong file path
-    with pytest.raises(SystemExit) as excinfo:
+    with pytest.raises(Exception) as e:
         imp.attachment("./tests/data_files/dontexisit.txt")
-    assert str(excinfo.value) == "1"
+    assert "ERROR : The 'attachment' field has INVALID FILE PATH or URL" in str(e.value)
 
 
 @pytest.mark.webtest
 def test_attachment_url():
-    import os
-    attach = imp.attachment("https://wordpress.org/plugins/about/readme.txt")
-    assert attach['download'] == 'readme.txt'
-    assert attach['type'] == 'text/plain'
-    assert attach['href'].startswith('data:text/plain;base64')
-    try:
-        os.remove('./readme.txt')
-    except OSError:
-        pass
+    attach = imp.attachment("https://www.le.ac.uk/oerresources/bdra/html/page_09.htm")
+    assert attach['download'] == 'page_09.htm'
+    assert attach['type'] == 'text/html'
+    assert attach['href'].startswith('data:text/html;base64')
+
+
+@pytest.mark.webtest
+def test_attachment_bad_url():
+    with pytest.raises(Exception) as e:
+        imp.attachment("https://some/unknown/url.html")
+    assert "ERROR : The 'attachment' field has INVALID FILE PATH or URL" in str(e.value)
 
 
 @pytest.mark.file_operation
 def test_attachment_not_accepted():
     with pytest.raises(ValueError) as excinfo:
         imp.attachment("./tests/data_files/test.mp3")
-    assert str(excinfo.value) == 'Unknown file type for test.mp3'
+    assert str(excinfo.value) == 'Unallowed file type for test.mp3'
 
 
 @pytest.mark.file_operation
@@ -707,6 +713,7 @@ def test_cabin_cross_check_remote_w_single_lab_award(mocker, connection_mock, ca
             assert out.strip() == message.strip()
 
 
+@pytest.mark.skip  # invalid mock use, needs refactor
 def test_cabin_cross_check_not_remote_w_lab_award_options(mocker, connection_mock, capsys):
     with mocker.patch('wranglertools.import_data.os.path.isfile', return_value=True):
         with mocker.patch.object(connection_mock, 'prompt_for_lab_award', return_value='blah'):