diff --git a/doc/sphinx-guides/source/api/dataaccess.rst b/doc/sphinx-guides/source/api/dataaccess.rst index 42e27ac981c..089fcc1f061 100755 --- a/doc/sphinx-guides/source/api/dataaccess.rst +++ b/doc/sphinx-guides/source/api/dataaccess.rst @@ -27,9 +27,11 @@ Value Description ============== =========== original "Saved Original", the proprietary (SPSS, Stata, R, etc.) file from which the tabular data was ingested; RData Tabular data as an R Data frame (generated; unless the "original" file was in R); -prep "Pre-processed data", in JSON. (TODO: *get a proper description of the feature from James/Vito*) +prep "Pre-processed data", in JSON. ============== =========== +--------------------------- + ``imageThumb`` the following parameter values are supported (for image and pdf files only): @@ -41,6 +43,8 @@ true Generates a thumbnail image, by rescaling to the default thumbna ``N`` Rescales the image to ``N`` pixels. ============== =========== +--------------------------- + ``vars`` For column-wise subsetting (available for tabular data files only). diff --git a/doc/sphinx-guides/source/api/native-api.rst b/doc/sphinx-guides/source/api/native-api.rst index 42f70c1bd8d..c92737f45fb 100644 --- a/doc/sphinx-guides/source/api/native-api.rst +++ b/doc/sphinx-guides/source/api/native-api.rst @@ -244,7 +244,7 @@ For example, after making your edits, your JSON file might look like :download:` curl -H "X-Dataverse-key: $API_TOKEN" -X PUT $SERVER_URL/api/datasets/:persistentId/versions/:draft?persistentId=$PID --upload-file dataset-update-metadata.json -Note that in example JSON file above, there is a single JSON object with ``metadataBlocks`` as a key. When you download a representation of your dataset in JSON format, the ``metadataBlocks`` object you need is nested inside another object called ``json``. To extract just the ``metadataBlocks`` key when downloading a JSON representation, you can use a tool such as ``jq`` like this:: +Note that in the example JSON file above, there is a single JSON object with ``metadataBlocks`` as a key. When you download a representation of your dataset in JSON format, the ``metadataBlocks`` object you need is nested inside another object called ``json``. To extract just the ``metadataBlocks`` key when downloading a JSON representation, you can use a tool such as ``jq`` like this:: curl -H "X-Dataverse-key: $API_TOKEN" $SERVER_URL/api/datasets/:persistentId/versions/:latest?persistentId=$PID | jq '.data | {metadataBlocks: .metadataBlocks}' > dataset-update-metadata.json @@ -332,7 +332,7 @@ A more detailed "add" example using curl:: Example python code to add a file. This may be run by changing these parameters in the sample code: -* ``dataverse_server`` - e.g. https://dataverse.harvard.edu +* ``dataverse_server`` - e.g. https://demo.dataverse.org * ``api_key`` - See the top of this document for a description * ``persistentId`` - Example: ``doi:10.5072/FK2/6XACVA`` * ``dataset_id`` - Database id of the dataset @@ -466,11 +466,11 @@ Replace an existing file where ``id`` is the database id of the file to replace. A more detailed "replace" example using curl (note that ``forceReplace`` is for replacing one file type with another):: - curl -H "X-Dataverse-key:$API_TOKEN" -X POST -F 'file=@data.tsv' -F 'jsonData={"description":"My description.","categories":["Data"],"forceReplace":false}' "https://example.dataverse.edu/api/files/$FILE_ID/replace" + curl -H "X-Dataverse-key:$API_TOKEN" -X POST -F 'file=@data.tsv' -F 'jsonData={"description":"My description.","categories":["Data"],"forceReplace":false}' "https://demo.dataverse.org/api/files/$FILE_ID/replace" Example python code to replace a file. This may be run by changing these parameters in the sample code: -* ``dataverse_server`` - e.g. https://dataverse.harvard.edu +* ``dataverse_server`` - e.g. https://demo.dataverse.org * ``api_key`` - See the top of this document for a description * ``file_id`` - Database id of the file to replace (returned in the GET API for a Dataset) diff --git a/doc/sphinx-guides/source/api/sword.rst b/doc/sphinx-guides/source/api/sword.rst index d894b27c35c..fc33bfca068 100755 --- a/doc/sphinx-guides/source/api/sword.rst +++ b/doc/sphinx-guides/source/api/sword.rst @@ -45,13 +45,13 @@ Differences in Dataverse 4 from DVN 3.x lead to a few minor backward incompatibl New features as of v1.1 ----------------------- -- Dataverse 4 supports API tokens and they must be used rather that a username and password. In the ``curl`` examples below, you will see ``curl -u $API_TOKEN:`` showing that you should send your API token as the username and nothing as the password. For example, ``curl -u 54b143b5-d001-4254-afc0-a1c0f6a5b5a7:``. +- Dataverse 4 supports API tokens and requires them to be used for APIs instead of a username and password. In the ``curl`` examples below, you will see ``curl -u $API_TOKEN:`` showing that you should send your API token as the username and nothing as the password. For example, ``curl -u 54b143b5-d001-4254-afc0-a1c0f6a5b5a7:``. -- SWORD operations no longer require "admin" permission. In order to use any SWORD operation in DVN 3.x, you had to be "admin" on a dataverse (the container for your dataset) and similar rules were applied in Dataverse 4.4 and earlier (the ``EditDataverse`` permission was required). The SWORD API has now been fully integrated with the Dataverse 4 permission model such that any action you have permission to perform in the GUI or "native" API you are able to perform via SWORD. This means that even a user with a "Contributor" role can operate on datasets via SWORD. Note that users with the "Contributor" role do not have the ``PublishDataset`` permission and will not be able publish their datasets via any mechanism, GUI or API. +- SWORD operations no longer require "admin" permission. In order to use any SWORD operation in DVN 3.x, you had to be an "admin" on a dataverse (the container for your dataset) and similar rules were applied in Dataverse 4.4 and earlier (the ``EditDataverse`` permission was required). The SWORD API has now been fully integrated with the Dataverse 4 permission model such that any action you have permission to perform in the GUI or "native" API you are able to perform via SWORD. This means that even a user with a "Contributor" role can operate on datasets via SWORD. Note that users with the "Contributor" role do not have the ``PublishDataset`` permission and will not be able publish their datasets via any mechanism, GUI or API. - Dataverses can be published via SWORD. -- Datasets versions will only be increased to the next minor version (i.e. 1.1) rather than a major version (2.0) if possible. This depends on the nature of the change. Adding or removing, a file, for example, requires a major version bump. +- Datasets versions will only be increased to the next minor version (i.e. 1.1) rather than a major version (2.0) if possible. This depends on the nature of the change. Adding or removing a file, for example, requires a major version bump. - "Author Affiliation" can now be populated with an XML attribute. For example: Stumptown, Jane @@ -59,7 +59,7 @@ New features as of v1.1 - "License" can now be set with dcterms:license and the possible values are "CC0" and "NONE". "License" interacts with "Terms of Use" (dcterms:rights) in that if you include dcterms:rights in the XML, the license will be set to "NONE". If you don't include dcterms:rights, the license will default to "CC0". It is invalid to specify "CC0" as a license and also include dcterms:rights; an error will be returned. For backwards compatibility, dcterms:rights is allowed to be blank (i.e. ) but blank values will not be persisted to the database and the license will be set to "NONE". -- "Contact E-mail" is automatically populated from dataset owners email. +- "Contact E-mail" is automatically populated from dataset owner's email. - "Subject" uses our controlled vocabulary list of subjects. This list is in the Citation Metadata of our User Guide > `Metadata References `_. Otherwise, if a term does not match our controlled vocabulary list, it will put any subject terms in "Keyword". If Subject is empty it is automatically populated with "N/A". diff --git a/doc/sphinx-guides/source/developers/index.rst b/doc/sphinx-guides/source/developers/index.rst index 11669efc369..2cc7e72c6d2 100755 --- a/doc/sphinx-guides/source/developers/index.rst +++ b/doc/sphinx-guides/source/developers/index.rst @@ -12,6 +12,7 @@ Developer Guide intro dev-environment + windows tips troubleshooting version-control diff --git a/doc/sphinx-guides/source/developers/tips.rst b/doc/sphinx-guides/source/developers/tips.rst index f7cc3b096e4..1ac40870374 100755 --- a/doc/sphinx-guides/source/developers/tips.rst +++ b/doc/sphinx-guides/source/developers/tips.rst @@ -76,6 +76,27 @@ Netbeans Connector Chrome Extension For faster iteration while working on JSF pages, it is highly recommended that you install the Netbeans Connector Chrome Extension listed in the :doc:`tools` section. When you save XHTML or CSS files, you will see the changes immediately. Hipsters call this "hot reloading". :) +Database Schema Exploration +--------------------------- + +With over 100 tables, the Dataverse PostgreSQL database ("dvndb") can be somewhat daunting for newcomers. Here are some tips for coming up to speed. + +pgAdmin3 +~~~~~~~~ + +Back in the :doc:`dev-environment` section, we had you install pgAdmin3, which can help you explore the tables and execute SQL commands. It's also listed in the :doc:`tools` section. + +SchemaSpy +~~~~~~~~~ + +SchemaSpy is a tool that creates a website of entity-relationship diagrams based on your database. + +As part of our build process for running integration tests against the latest code in the "develop" branch, we drop the database on the "phoenix" server, recreate the database by deploying the latest war file, and run SchemaSpy to create the following site: http://phoenix.dataverse.org/schemaspy/latest/relationships.html + +To run this command on your laptop, download SchemaSpy and take a look at the syntax in ``scripts/deploy/phoenix.dataverse.org/post`` + +To read more about the phoenix server, see the :doc:`testing` section. + Deploying With ``asadmin`` -------------------------- diff --git a/doc/sphinx-guides/source/developers/tools.rst b/doc/sphinx-guides/source/developers/tools.rst index 030d8fcc904..bd51bb7eccc 100755 --- a/doc/sphinx-guides/source/developers/tools.rst +++ b/doc/sphinx-guides/source/developers/tools.rst @@ -13,6 +13,11 @@ Netbeans Connector Chrome Extension The `Netbeans Connector `_ extension for Chrome allows you to see changes you've made to HTML pages the moment you save the file without having to refresh your browser. See also http://wiki.netbeans.org/ChromeExtensionInstallation +pgAdmin3 +++++++++ + +You probably installed pgAdmin3 when following the steps in the :doc:`dev-environment` section but if not, you can download it from https://www.pgadmin.org + Maven +++++ diff --git a/doc/sphinx-guides/source/developers/version-control.rst b/doc/sphinx-guides/source/developers/version-control.rst index d8158fb753c..7e0b11ecb2b 100644 --- a/doc/sphinx-guides/source/developers/version-control.rst +++ b/doc/sphinx-guides/source/developers/version-control.rst @@ -67,7 +67,7 @@ If you tell us your GitHub username we are happy to add you to the "read only" t Create a New Branch off the develop Branch ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Always create your feature branch from the latest code in develop, pulling the latest code if necessary. As mentioned above, your branch should have a name like "3728-doc-apipolicy-fix" that starts with the issue number you are addressing, and ends with a short, descriptive name. +Always create your feature branch from the latest code in develop, pulling the latest code if necessary. As mentioned above, your branch should have a name like "3728-doc-apipolicy-fix" that starts with the issue number you are addressing, and ends with a short, descriptive name. Dashes ("-") and underscores ("_") in your branch name are ok, but please try to avoid other special characters such as ampersands ("&") than have special meaning in Unix shells. Commit Your Change to Your New Branch ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/doc/sphinx-guides/source/developers/windows.rst b/doc/sphinx-guides/source/developers/windows.rst index cd8cb2ca636..9d308c91a2d 100755 --- a/doc/sphinx-guides/source/developers/windows.rst +++ b/doc/sphinx-guides/source/developers/windows.rst @@ -1,6 +1,6 @@ -======= -Windows -======= +=================== +Windows Development +=================== Development on Windows is not well supported, unfortunately. You will have a much easier time if you develop on Mac or Linux as described under :doc:`dev-environment` section. diff --git a/doc/sphinx-guides/source/installation/config.rst b/doc/sphinx-guides/source/installation/config.rst index ee391a2e85e..938039df9d1 100644 --- a/doc/sphinx-guides/source/installation/config.rst +++ b/doc/sphinx-guides/source/installation/config.rst @@ -68,7 +68,7 @@ Network Ports Remember how under "Decisions to Make" in the :doc:`prep` section we mentioned you'll need to make a decision about whether or not to introduce a proxy in front of Dataverse such as Apache or nginx? The time has come to make that decision. -The need to redirect port HTTP (port 80) to HTTPS (port 443) for security has already been mentioned above and the fact that Glassfish puts these services on 8080 and 8181, respectively, was touched on in the :doc:`installation-main` section. In production, you don't want to tell your users to use Dataverse on ports 8080 and 8181. You should have them use the stardard HTTPS port, which is 443. +The need to redirect port HTTP (port 80) to HTTPS (port 443) for security has already been mentioned above and the fact that Glassfish puts these services on 8080 and 8181, respectively, was touched on in the :doc:`installation-main` section. In production, you don't want to tell your users to use Dataverse on ports 8080 and 8181. You should have them use the standard HTTPS port, which is 443. Your decision to proxy or not should primarily be driven by which features of Dataverse you'd like to use. If you'd like to use Shibboleth, the decision is easy because proxying or "fronting" Glassfish with Apache is required. The details are covered in the :doc:`shibboleth` section. @@ -215,7 +215,7 @@ In order to **enable file access restrictions**, you must enable Swift to use te to your swift.properties file. -You also have the option to set a custom expiration length for a generated temporary URL. It is initalized to 60 seconds, but you can change it by running the create command: +You also have the option to set a custom expiration length for a generated temporary URL. It is initialized to 60 seconds, but you can change it by running the create command: ``./asadmin $ASADMIN_OPTS create-jvm-options "\-Ddataverse.files.temp_url_expire=3600"`` @@ -254,7 +254,7 @@ You can configure this redirect properly in your cloud environment to generate a Amazon S3 Storage +++++++++++++++++ -For institutions and organizations looking to use Amazon's S3 cloud storage for their installation, this can be set up manually through creation of the credentials and config files or automatically via the aws console commands. +For institutions and organizations looking to use Amazon's S3 cloud storage for their installation, this can be set up manually through creation of the credentials and config files or automatically via the AWS console commands. You'll need an AWS account with an associated S3 bucket for your installation to use. From the S3 management console (e.g. ``_), you can poke around and get familiar with your bucket. We recommend using IAM (Identity and Access Management) to create a user with full S3 access and nothing more, for security reasons. See ``_ for more info on this process. @@ -263,7 +263,7 @@ Make note of the bucket's name and the region its data is hosted in. Dataverse a Set Up credentials File Manually ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -To create the ``credentials`` file manually, you will need to generate a key/secret key. The first step is to log onto your aws web console (e.g. ``_). If you have created a user in AWS IAM, you can click on that user and generate the keys needed for Dataverse. +To create the ``credentials`` file manually, you will need to generate a key/secret key. The first step is to log onto your AWS web console (e.g. ``_). If you have created a user in AWS IAM, you can click on that user and generate the keys needed for Dataverse. Once you have acquired the keys, they need to be added to the ``credentials`` file. The format for credentials is as follows: @@ -271,7 +271,7 @@ Once you have acquired the keys, they need to be added to the ``credentials`` fi | ``aws_access_key_id = `` | ``aws_secret_access_key = `` -You must also specify the AWS region, in the ``config`` file, for example: +You must also specify the AWS region in the ``config`` file, for example: | ``[default]`` | ``region = us-east-1`` @@ -412,10 +412,10 @@ Out of the box, Dataverse attempts to block search engines from crawling your in Letting Search Engines Crawl Your Installation ++++++++++++++++++++++++++++++++++++++++++++++ -For a public production Dataverse installation, it is probably desired that search agents be able to index published pages (aka - pages that are visible to an unauthenticated user). +For a public production Dataverse installation, it is probably desired that search agents be able to index published pages (AKA - pages that are visible to an unauthenticated user). Polite crawlers usually respect the `Robots Exclusion Standard `_; we have provided an example of a production robots.txt :download:`here `). -You have a couple of options for putting an updated robots.txt file into production. If you are fronting Glassfish with Apache as recommended above, you can place robots.txt in the root of the directory specified in your ``VirtualHost`` and to your Apache config a ``ProxyPassMatch`` line like the one below to prevent Glassfish from serving the version of robots.txt that embedded in the Dataverse war file: +You have a couple of options for putting an updated robots.txt file into production. If you are fronting Glassfish with Apache as recommended above, you can place robots.txt in the root of the directory specified in your ``VirtualHost`` and to your Apache config a ``ProxyPassMatch`` line like the one below to prevent Glassfish from serving the version of robots.txt that is embedded in the Dataverse war file: .. code-block:: text @@ -429,7 +429,7 @@ If you are not fronting Glassfish with Apache you'll need to prevent Glassfish f Putting Your Dataverse Installation on the Map at dataverse.org +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ -Congratulations! You've gone live! It's time to announce your new data respository to the world! You are also welcome to contact support@dataverse.org to have the Dataverse team add your installation to the map at http://dataverse.org . Thank you for installing Dataverse! +Congratulations! You've gone live! It's time to announce your new data repository to the world! You are also welcome to contact support@dataverse.org to have the Dataverse team add your installation to the map at http://dataverse.org . Thank you for installing Dataverse! Administration of Your Dataverse Installation +++++++++++++++++++++++++++++++++++++++++++++ @@ -454,7 +454,7 @@ It's also possible to change these values by stopping Glassfish, editing ``glass dataverse.fqdn ++++++++++++++ -If the Dataverse server has multiple DNS names, this option specifies the one to be used as the "official" host name. For example, you may want to have dataverse.foobar.edu, and not the less appealling server-123.socsci.foobar.edu to appear exclusively in all the registered global identifiers, Data Deposit API records, etc. +If the Dataverse server has multiple DNS names, this option specifies the one to be used as the "official" host name. For example, you may want to have dataverse.example.edu, and not the less appealing server-123.socsci.example.edu to appear exclusively in all the registered global identifiers, Data Deposit API records, etc. The password reset feature requires ``dataverse.fqdn`` to be configured. @@ -467,7 +467,7 @@ dataverse.siteUrl | and specify the protocol and port number you would prefer to be used to advertise the URL for your Dataverse. | For example, configured in domain.xml: -| ``-Ddataverse.fqdn=dataverse.foobar.edu`` +| ``-Ddataverse.fqdn=dataverse.example.edu`` | ``-Ddataverse.siteUrl=http://${dataverse.fqdn}:8080`` dataverse.files.directory @@ -593,6 +593,8 @@ dataverse.handlenet.admprivphrase +++++++++++++++++++++++++++++++++ This JVM setting is also part of **handles** configuration. The Handle.Net installer lets you choose whether to encrypt the admcredfile private key or not. If you do encrypt it, this is the pass phrase that it's encrypted with. +.. _dataverse.handlenet.index: + dataverse.handlenet.index +++++++++++++++++++++++++++++++++ If you want to use different index than the default 300 @@ -700,7 +702,7 @@ See :ref:`Branding Your Installation` above. By default the footer says "Copyright © [YYYY]" but you can add text after the year, as in the example below. -``curl -X PUT -d ", The President & Fellows of Harvard College" http://localhost:8080/api/admin/settings/:FooterCopyright`` +``curl -X PUT -d ", Your Institution" http://localhost:8080/api/admin/settings/:FooterCopyright`` .. _:DoiProvider: @@ -752,8 +754,8 @@ It is recommended that you keep this as a slash ("/"). ++++++++++++++++++++++++++ By default, Dataverse generates a random 6 character string to use as the identifier -for a Dataset. Set this to "``sequentialNumber``" to use sequential numeric values -instead. (the assumed default setting is "``randomString``"). +for a Dataset. Set this to ``sequentialNumber`` to use sequential numeric values +instead. (the assumed default setting is ``randomString``). In addition to this setting, a database sequence must be created in the database. We provide the script below (downloadable :download:`here `). You may need to make some changes to suit your system setup, see the comments for more information: @@ -807,7 +809,7 @@ Set ``:ExcludeEmailFromExport`` to prevent email addresses for dataset contacts :NavbarAboutUrl +++++++++++++++ -Set ``NavbarAboutUrl`` to a fully-qualified url which will be used for the "About" link in the navbar. +Set ``NavbarAboutUrl`` to a fully-qualified URL which will be used for the "About" link in the navbar. Note: The "About" link will not appear in the navbar until this option is set. @@ -830,7 +832,7 @@ Set ``:GuidesVersion`` to override the version number in the URL of guides. For :NavbarSupportUrl +++++++++++++++++ -Set ``:NavbarSupportUrl`` to a fully-qualified url which will be used for the "Support" link in the navbar. +Set ``:NavbarSupportUrl`` to a fully-qualified URL which will be used for the "Support" link in the navbar. Note that this will override the default behaviour for the "Support" menu option, which is to display the dataverse 'feedback' dialog. @@ -890,14 +892,14 @@ Threshold in bytes for limiting whether or not "ingest" it attempted for tabular (You can set this value to 0 to prevent files from being ingested at all.) -You can overide this global setting on a per-format basis for the following formats: +You can override this global setting on a per-format basis for the following formats: -- dta -- por -- sav +- DTA +- POR +- SAV - Rdata - CSV -- xlsx +- XLSX For example, if you want your installation of Dataverse to not attempt to ingest Rdata files larger that 1 MB, use this setting: @@ -994,7 +996,7 @@ The duration in minutes before "Confirm Email" URLs expire. The default is 1440 :DefaultAuthProvider ++++++++++++++++++++ -If you have enabled Shibboleth and/or one or more OAuth providers, you may wish to make one of these authentication providers the default when users visit the Log In page. If unset, this will default to ``builtin`` but thes valid options (depending if you've done the setup described in the :doc:`shibboleth` or :doc:`oauth2` sections) are: +If you have enabled Shibboleth and/or one or more OAuth providers, you may wish to make one of these authentication providers the default when users visit the Log In page. If unset, this will default to ``builtin`` but these valid options (depending if you've done the setup described in the :doc:`shibboleth` or :doc:`oauth2` sections) are: - ``builtin`` - ``shib`` @@ -1032,8 +1034,8 @@ or :PiwikAnalyticsTrackerFileName ++++++++++++++++++++++++++++++ -Filename for the 'php' and 'js' tracker files used in the piwik code (piwik.php and piwik.js). -Sometimes these files are renamed in order to prevent ad-blockers (in the browser) to block the piwik tracking code. +Filename for the 'php' and 'js' tracker files used in the Piwik code (piwik.php and piwik.js). +Sometimes these files are renamed in order to prevent ad-blockers (in the browser) to block the Piwik tracking code. This sets the base name (without dot and extension), if not set it defaults to 'piwik'. ``curl -X PUT -d domainstats http://localhost:8080/api/admin/settings/:PiwikAnalyticsTrackerFileName`` @@ -1161,7 +1163,7 @@ It is recommended that you configure additional error handling for your Service - *In your Service Provider 2.x shibboleth2.xml file, add redirectErrors="#THIS PAGE#" to the Errors element.* -You can set the value of "#THIS PAGE#" to the url of your Dataverse homepage, or any other page on your site that is accessible to anonymous users and will have the isPassive.js file loaded. +You can set the value of "#THIS PAGE#" to the URL of your Dataverse homepage, or any other page on your site that is accessible to anonymous users and will have the isPassive.js file loaded. ``curl -X PUT -d true http://localhost:8080/api/admin/settings/:ShibPassiveLoginEnabled`` diff --git a/doc/sphinx-guides/source/installation/prep.rst b/doc/sphinx-guides/source/installation/prep.rst index 6fb5157894d..b3958a82908 100644 --- a/doc/sphinx-guides/source/installation/prep.rst +++ b/doc/sphinx-guides/source/installation/prep.rst @@ -90,7 +90,9 @@ Hardware Requirements A basic installation of Dataverse runs fine on modest hardware. For example, as of this writing the test installation at http://phoenix.dataverse.org is backed by a single virtual machine with two 2.8 GHz processors, 8 GB of RAM and 50 GB of disk. -In contrast, the production installation at https://dataverse.harvard.edu is currently backed by six servers with two Intel Xeon 2.53 Ghz CPUs and either 48 or 64 GB of RAM. The three servers with 48 GB of RAM run are web frontends running Glassfish and Apache and are load balanced by a hardware device. The remaining three servers with 64 GB of RAM are the primary and backup database servers and a server dedicated to running Rserve. Multiple TB of storage are mounted from a SAN via NFS. The :ref:`advanced` section shows a diagram (a seventh server to host Geoconnect will probably be added). +In contrast, before we moved it to the Amazon Cloud, the production installation at https://dataverse.harvard.edu was backed by six servers with two Intel Xeon 2.53 Ghz CPUs and either 48 or 64 GB of RAM. The three servers with 48 GB of RAM run were web frontends running Glassfish and Apache and were load balanced by a hardware device. The remaining three servers with 64 GB of RAM were the primary and backup database servers and a server dedicated to running Rserve. Multiple TB of storage were mounted from a SAN via NFS. + +Currently, Harvard Dataverse is served by four AWS server nodes: two "m4.4xlarge" instances (64GB/16 vCPU) as web frontends, one 32GB/8 vCPU ("m4.2xlarge") instance for the Solr search engine, and one 16GB/4 vCPU ("m4.xlarge") instance for R and TwoRavens. The PostgreSQL database is served by Amazon RDS, and physical files are stored on Amazon S3. The Dataverse installation script will attempt to give Glassfish the right amount of RAM based on your system. diff --git a/doc/sphinx-guides/source/installation/r-rapache-tworavens.rst b/doc/sphinx-guides/source/installation/r-rapache-tworavens.rst index 2bdf8f933aa..b46082a4cb7 100644 --- a/doc/sphinx-guides/source/installation/r-rapache-tworavens.rst +++ b/doc/sphinx-guides/source/installation/r-rapache-tworavens.rst @@ -400,7 +400,7 @@ Correct this, if necessary, then try again. If you ARE getting JSON output, but the TwoRavens view is still broken: -- Look closely at the very beginning of the JSON fragment. Does it have the ``{"prvate":false}`` entry, as shown in the example above? If not, this likely an R code version mismatch, described in section ``3.d.``, above. Correct the problem as described there, then try again. +- Look closely at the very beginning of the JSON fragment. Does it have the ``{"private":false}`` entry, as shown in the example above? If not, this likely an R code version mismatch, described in section ``3.d.``, above. Correct the problem as described there, then try again. - If the JSON looks *exactly* as the fragment above, yet still no data pebbles - enable the JavaScript error console in the TwoRavens window, and try again. Look for any error messages; and, specifically, for any URLs that TwoRavens is failing to access. Look for the debugging entry that shows TwoRavens attempting to download the ``format=prep`` fragment. Does the URL have the correct host name, port and/or the protocol (http vs. https)? If not, re-run the installer, specifying the correct Dataverse URL, and try again. diff --git a/doc/sphinx-guides/source/user/account.rst b/doc/sphinx-guides/source/user/account.rst index c3459a1f527..6ab868c3752 100755 --- a/doc/sphinx-guides/source/user/account.rst +++ b/doc/sphinx-guides/source/user/account.rst @@ -69,7 +69,7 @@ Too many passwords? You can set up your Dataverse account to use log in credenti Institutional Log In ~~~~~~~~~~~~~~~~~~~~ -Institutional log in allows you to use your log in information for your university (e.g. HarvardKey at Harvard) to log into your Dataverse account. +Institutional log in allows you to use your log in information for your university (e.g. HarvardKey at Harvard) to log in to your Dataverse account. Create a Dataverse account using Institutional Log In ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ diff --git a/doc/sphinx-guides/source/user/dataset-management.rst b/doc/sphinx-guides/source/user/dataset-management.rst index a24106e3c59..cf543da51ee 100755 --- a/doc/sphinx-guides/source/user/dataset-management.rst +++ b/doc/sphinx-guides/source/user/dataset-management.rst @@ -60,9 +60,7 @@ The file types listed in the following sections are supported by additional func Tabular Data Files ------------------ -Files in certain formats - Stata, SPSS, R, Excel(xlsx) and CSV - may be ingested as tabular data (see "Tabular Data Ingest" section for details). Tabular data files can be further explored and manipulated with `TwoRavens <../user/data-exploration/tworavens.html>`_ - a statistical data exploration application integrated with Dataverse, as well as other :doc:`/installation/external-tools` if they have been enabled in the installation of Dataverse you are using. TwoRavens allows the user to run statistical models, view summary statistics, download subsets of variable vectors and more. To start, click on the "Explore" button, found next to each relevant tabular file (the application will be opened in a new window). To download subsets of variables click on the "Download" button found next to a relevant tabular file and select "Data Subset" in the dropdown menu. You will then be able to create your subset using the interface opened in a new window (this functionality is also provided by the `TwoRavens <../user/data-exploration/tworavens.html>`_ project). See the `TwoRavens documentation section <../user/data-exploration/tworavens.html>`_ for more information. - -For example, for the ingest functionality for tabular files in Harvard Dataverse, a file can only be up to 2GB in size. To use the ingest functionality for RData files, a file can only be up to 1MB in size. However, to upload a RData file without using ingest, a file can be up to 2GB in size. +Files in certain formats - Stata, SPSS, R, Excel(xlsx) and CSV - may be ingested as tabular data (see "Tabular Data Ingest" section of the User Guide for details). Tabular data files can be further explored and manipulated with `TwoRavens <../user/data-exploration/tworavens.html>`_ - a statistical data exploration application integrated with Dataverse, as well as other :doc:`/installation/external-tools` if they have been enabled in the installation of Dataverse you are using. TwoRavens allows the user to run statistical models, view summary statistics, download subsets of variable vectors and more. To start, click on the "Explore" button, found next to each relevant tabular file (the application will be opened in a new window). To download subsets of variables click on the "Download" button found next to a relevant tabular file and select "Data Subset" in the dropdown menu. You will then be able to create your subset using the interface opened in a new window (this functionality is also provided by the `TwoRavens <../user/data-exploration/tworavens.html>`_ project). See the `TwoRavens documentation section <../user/data-exploration/tworavens.html>`_ for more information. Additional download options available for tabular data (found in the same drop-down menu under the "Download" button): @@ -198,11 +196,11 @@ File tags are comprised of custom, category (i.e. Documentation, Data, Code) and Replace Files ============= -In cases where you would like to revise an existing file rather than add a new one, you can do so using our Replace File feature. This will allow you to track the history of this file across versions of your dataset, both before and after replacing it. This could be useful for updating your data or fixing mistakes in your data. +In cases where you would like to revise an existing file rather than add a new one, you can do so using our Replace File feature. This will allow you to track the history of this file across versions of your dataset, both before and after replacing it. This could be useful for updating your data or fixing mistakes in your data. Because replacing a file creates an explicit link between the previous dataset version and the current version, the file replace feature is not available for unpublished dataset drafts. Also note that replacing a file will not automatically carry over that file's metadata, but once the file is replaced then its original metadata can still be found by referencing the previous version of the file under the "Versions" tab of the file page. -Go to the file page for the file you would like to replace, click on the "Edit" button, and from the dropdown list select "Replace". This will bring you to the Replace File page, where you can see the metadata for the most recently published version of the file and you can upload your replacement file. Once you have uploaded the replacement file, you can edit its name, description, and tags. When you're finished, click the "Save Changes" button. +To replace a file, go to the file page for that file, click on the "Edit" button, and from the dropdown list select "Replace". This will bring you to the Replace File page, where you can see the metadata for the most recently published version of the file and you can upload your replacement file. Once you have uploaded the replacement file, you can edit its name, description, and tags. When you're finished, click the "Save Changes" button. -After successfully replacing a file, a new dataset draft version will be created. A summary of your actions will be recorded in the dataset Version Details table in the "Versions" tab on that dataset's page, and in the file Version Details table on the file's page. Both tables allow you to access all previous versions of the file across all previous versions of your dataset, including the old version of the file before you replaced it. +After successfully replacing a file, a new dataset draft version will be created. A summary of your actions will be recorded in the "Versions" tab on on both the dataset page and file page. The Versions tab allows you to access all previous versions of the file across all previous versions of your dataset, including the old version of the file before you replaced it. .. _license-terms: @@ -388,7 +386,7 @@ You must also include a reason as to why this dataset was deaccessioned. Select Add more information as to why this was deaccessioned in the free-text box. If the dataset has moved to a different repository or site you are encouraged to include a URL (preferably persistent) for users to continue to be able to access this dataset in the future. -If you deaccession the most recently published version of the dataset but not all versions of the dataset, you are able to go in and create a new draft for the dataset. For example, you have a version 1 and version 2 of a dataset, both published, and deaccession version 2. You are then able to edit version 1 of the dataset and a new draft version will be created. +If you deaccession the most recently published version of the dataset but not all versions of the dataset, you may then revisit an earlier version and create a new non-deaccessioned draft for the dataset. For example, imagine you have a version 1 and version 2 of a dataset, both published, and you deaccession version 2. You may then edit version 1 of the dataset and a new draft version will be created. **Important Note**: A tombstone landing page with the basic citation metadata will always be accessible to the public if they use the persistent URL (Handle or DOI) provided in the citation for that dataset. Users will not be able to see any of the files or additional metadata that were previously available prior to deaccession. diff --git a/doc/sphinx-guides/source/user/dataverse-management.rst b/doc/sphinx-guides/source/user/dataverse-management.rst index 779590faf01..b78e2a3f198 100755 --- a/doc/sphinx-guides/source/user/dataverse-management.rst +++ b/doc/sphinx-guides/source/user/dataverse-management.rst @@ -126,7 +126,7 @@ Both of these settings can be changed at any time. Assigning Roles to Users and Groups ------------------------------------------ -Under the Users/Groups tab, you can add, edit, or remove the roles granted to users and groups on your dataverse. A role is a set of permissions granted to a user or group when they're using your dataverse. For example, giving your research assistant the "Contributor" role would give him the following self-explanatory permissions on your dataverse and all datasets within your dataverse: "ViewUnpublishedDataset", "DownloadFile", "EditDataset", and "DeleteDatasetDraft". He would, however, lack the "PublishDataset" permission, and thus would be unable to publish datasets on your dataverse. If you wanted to give him that permission, you would give him a role with that permission, like the Curator role. Users and groups can hold multiple roles at the same time if needed. Roles can be removed at any time. All roles and their associated permissions are listed under the "Roles" tab of the same page. +Under the Users/Groups tab, you can add, edit, or remove the roles granted to users and groups on your dataverse. A role is a set of permissions granted to a user or group when they're using your dataverse. For example, giving your research assistant the "Contributor" role would give her the following self-explanatory permissions on your dataverse and all datasets within your dataverse: "ViewUnpublishedDataset", "DownloadFile", "EditDataset", and "DeleteDatasetDraft". She would, however, lack the "PublishDataset" permission, and thus would be unable to publish datasets on your dataverse. If you wanted to give her that permission, you would give her a role with that permission, like the Curator role. Users and groups can hold multiple roles at the same time if needed. Roles can be removed at any time. All roles and their associated permissions are listed under the "Roles" tab of the same page. |image5| diff --git a/doc/sphinx-guides/source/user/find-use-data.rst b/doc/sphinx-guides/source/user/find-use-data.rst index 05f1ce49c6f..1c7470404cf 100755 --- a/doc/sphinx-guides/source/user/find-use-data.rst +++ b/doc/sphinx-guides/source/user/find-use-data.rst @@ -10,7 +10,7 @@ Finding Data Without logging in to Dataverse, users can browse Dataverse, search for dataverses, datasets, and files, view dataset descriptions and files for published datasets, and subset, analyze, and visualize data for published (restricted & not restricted) data files. To view an unpublished dataverse, dataset, or file, a user will need to be given permission from that dataverse's administrator to access it. -A user can search the dataverses, datasets, and files within a particular dataverse by using the search bar found on a dataverse page. For example, if you are on the Murray Research Archive Dataverse page, you can search that specific dataverse's contents by using the search bar and/or facets displayed on the page. +A user can search within a specific dataverse for the dataverses, datasets, and files it contains by using the search bar and facets displayed on that dataverse's page. Basic Search ------------ diff --git a/doc/sphinx-guides/source/user/tabulardataingest/ingestprocess.rst b/doc/sphinx-guides/source/user/tabulardataingest/ingestprocess.rst index 4f71dd9944f..cc7df319f67 100644 --- a/doc/sphinx-guides/source/user/tabulardataingest/ingestprocess.rst +++ b/doc/sphinx-guides/source/user/tabulardataingest/ingestprocess.rst @@ -20,7 +20,7 @@ files. Some companies publish the specifications of their formats (Thank you Stata - much appreciated!), some don't (SPSS - yes, we are still frowning at you here at the Dataverse Project). Either way, reading these specially-formatted files requires some extra knowledge -or special software. For these reasons they are not considered idea +or special software. For these reasons they are not considered ideal for the purposes of archival preservation. Dataverse stores the raw data content extracted from such files in plain text, TAB-delimited files. The metadata information that describes this content is stored diff --git a/doc/sphinx-guides/source/user/tabulardataingest/spss.rst b/doc/sphinx-guides/source/user/tabulardataingest/spss.rst index 71bc61265c3..eb0ed643fa2 100644 --- a/doc/sphinx-guides/source/user/tabulardataingest/spss.rst +++ b/doc/sphinx-guides/source/user/tabulardataingest/spss.rst @@ -31,7 +31,8 @@ Please contact us if you have any questions and/or strong feelings on this issue Support for Language Encodings in SPSS --------------------------------------- -Historically, there was no support for specifying a particular language/code page encoding for the data stored in an SPSS file. Meaning, text values in none-ASCII encodings, or non-Latin characters could be entered and stored, but there was no setting to unambiguously specify what language, or what character set it was. By default, Dataverse will try to interpret binary characters as UTF8. If that's not working - for example, if the descriptive labels and/or categorical values ingest as garbage - and if you know happen to know what encoding was used in the original file, you can now specify it in the Ingest Options. +Historically, there was no support for specifying a particular language/code page encoding for the data stored in an SPSS file. Meaning, text values in none-ASCII encodings, or non-Latin characters could be entered and stored, but there was no setting to unambiguously specify what language, or what character set it was. By default, Dataverse will try to interpret binary characters as UTF8. If that's not working - for example, if the descriptive labels and/or categorical values ingest as garbage - and if you happen to know what encoding was used in the original file, you can now specify it in the Ingest Options. + For example, if you know that the text in your SAV file is in Mandarin, and is encoded using the GB2312, specify it as follows: Upload your file, in the "Edit Files" tab of the Dataset page. Once the file is recognized as SPSS/save, and *before* you click Save, go into the "Advanced Ingest Options", and select "Simplified Chinese, GB2312" in the nested menu under "Language Encoding" -> "East Asian".