# SPSS data file (that includes annotated variables (ie variable labels and value labels)) to HEAL dd
This takes an SPSS .sav file and uses the healdatautils to export HEAL-formatted data dictionary.
The data dictionary titles are inferred from the file names. 

> Note, currently there are a few fields that do not have descriptions so return 
validation failure warnings. 

Will demonstrate two ways to create a data dictionary via the healdata-util `vlmd` tool.

1. Via python
2. Via command line

## Set up (as of 2/13/2023)
After activating your virtual environment (recommended):
```bash
pip install healdata_utils @ git+https://github.com/norc-heal/healdata-utils
```

## Via python

In [1]:
from pathlib import Path 
from healdata_utils.cli import to_json,to_csv_from_json
import os 

In [2]:
help(to_json)

Help on function to_json in module healdata_utils.cli:

to_json(filepath, outputdir, data_dictionary_props={}, inputtype=None, raise_valid_error=True)



In [3]:
help(to_csv_from_json)

Help on function to_csv_from_json in module healdata_utils.cli:

to_csv_from_json(filepath, outputdir)
    converts a json file to a csv
    
    Parameters
    ---------------
    filepath: path to input file 
    outputdir: path to a directory where output will go. 
        If a directory is specified, will use the input name for output name replaced with json suffix.
    
    Returns
    --------------
    None



In [4]:
inputpaths = list(Path().glob("input/*.sav"))
inputtype = "sav"
# NOTE: titles inferred from file names
# NOTE: could add descriptions for each data dictionary as well but 
# not required

In [5]:
print('\n'.join([str(path) for path in inputpaths]))

input\3645_JCOIN_HEAL Initiative 2021_Data for External Use.sav
input\JCOIN_NORC Omnibus_SURVEY1_Feb2020_072821.sav
input\JCOIN_NORC Omnibus_SURVEY2_April2020_072821.sav
input\JCOIN_NORC Omnibus_SURVEY3_June2020_072821.sav
input\JCOIN_NORC Omnibus_SURVEY4_Oct2020_072821.sav
input\JCOIN_NORC Omnibus_SURVEY5_Feb2021_072821.sav


In [6]:
for i,path in enumerate(inputpaths):
    
    inputpath = path
    healdir = path.parent.with_name('output')
    healjson = healdir/inputpath.name.replace(f".{inputtype}",".json") 
    healcsv = healdir/inputpath.name.replace(f".{inputtype}",".csv") 

    to_json(
        filepath=inputpath,
        outputdir=healdir,
        inputtype=inputtype, #if not specified, looks for suffix
        raise_valid_error=False 
    )
    to_csv_from_json(healjson,healcsv)

Validating output json file created from input\3645_JCOIN_HEAL Initiative 2021_Data for External Use.sav.....
'description' is a required property

Failed validating 'required' in schema['properties']['data_dictionary']['items']:
    {'$id': 'vlmd-fields',
     '$schema': 'http://json-schema.org/draft-04/schema#',
     'description': 'Variable level metadata individual fields integrated '
                    'into the variable level metadata object within the '
                    'HEAL platform metadata service.\n',
     'properties': {'cde_id': {'description': 'The source and id for the '
                                              'NIH Common Data Elements '
                                              'program.',
                               'items': {'properties': {'id': {'type': 'string'},
                                                        'source': {'type': 'string'}},
                                         'type': 'object'},
                               'title': '

## Via command line

We will demonstrate the `vlmd` command line utility using one of the data dictionaries.

In [6]:
!vlmd --help

Usage: vlmd [OPTIONS]

  write a data dictioanry (ie variable level metadata) to a HEAL metadata json
  file

Options:
  --filepath TEXT                 Path to the file you want to convert to a
                                  HEAL data dictionary  [required]
  --title TEXT                    The title of your data dictionary. If not
                                  specified, then the file name will be used
  --description TEXT              Description of data dictionary
  --inputtype [csv|sav|dta|por|sas7bdat|json|redcap.xml|redcap.csv]
                                  The type of your input file.
  --outputdir TEXT                The folder where you want to output your
                                  HEAL data dictionary
  --help                          Show this message and exit.


Can use the following command in the command line to create the above data dictionary:


```bash
vlmd --filepath "input\JCOIN_NORC Omnibus_SURVEY1_Feb2020_072821.sav" \
--inputtype sav \ #if this isn't specified, will have same result as extension is .sav
--outputdir "output/stigma-survey-demo.json" \
--title "JCOIN NORC Omnibus SURVEY1" \
--description "This data dictionary is for demostration purposes only.\
The data dictionary is generated from the JCOIN NORC stigma surveys and mapped to the heal variable \
variable level metadata."
```

To run directly in this notebook through command line, run cell below:

In [12]:
#if inputtype isn't specified, will have same result as extension is .sav
!vlmd --filepath "input\JCOIN_NORC Omnibus_SURVEY1_Feb2020_072821.sav" \
--inputtype sav \
--outputdir "output/stigma-survey-demo.json" \
--title "JCOIN NORC Omnibus SURVEY1" \
--description "This data dictionary is for demostration purposes only.\
The data dictionary is generated from the JCOIN NORC stigma surveys and mapped to the heal variable \
variable level metadata."

Validating output json file.....


Traceback (most recent call last):
  File "c:\Users\kranz-michael\projects\healdata-utils\venv\Scripts\vlmd-script.py", line 33, in <module>
    sys.exit(load_entry_point('healdata-utils', 'console_scripts', 'vlmd')())
  File "C:\Users\kranz-michael\projects\healdata-utils\venv\lib\site-packages\click\core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "C:\Users\kranz-michael\projects\healdata-utils\venv\lib\site-packages\click\core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "C:\Users\kranz-michael\projects\healdata-utils\venv\lib\site-packages\click\core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "C:\Users\kranz-michael\projects\healdata-utils\venv\lib\site-packages\click\core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "c:\users\kranz-michael\projects\healdata-utils\src\healdata_utils\cli.py", line 140, in main
    to_json(filepath,outputdir,{'title':title,'description':descrip