Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 27 additions & 1 deletion doc/sphinx-guides/source/admin/geoconnect-worldmap.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,33 @@ Geoconnect and WorldMap

.. contents:: :local:

One of the optional components listed under "Architecture and Components" in the :doc:`/installation/prep` section of the Installation Guide is `Geoconnect <https://github.com/IQSS/geoconnect>`_, piece of middleware that allows Dataverse users to create maps in `WorldMap <http://worldmap.harvard.edu>`_ based on geospatial data stored in Dataverse. For more details on the feature from the user perspective, see the :doc:`/user/data-exploration/worldmap` section of the User Guide.
One of the optional components listed under "Architecture and Components" in the :doc:`/installation/prep` section of the Installation Guide is `Geoconnect <https://github.com/IQSS/geoconnect>`_, a piece of middleware that allows Dataverse users to create maps in `WorldMap <http://worldmap.harvard.edu>`_ based on geospatial data stored in Dataverse. For more details on the feature from the user perspective, see the :doc:`/user/data-exploration/worldmap` section of the User Guide.

Update "mapitlink"
------------------

SQL commands to point a Dataverse installation at different Geoconnect servers:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the Geoconnect README we say, "You should wait until the WorldMap rewrite is complete before considering installing Geoconnect." Should we explain on this page that people installing Dataverse are welcome to use the two servers below that are hosted by Harvard?



**Geoconnect Production** *geoconnect.datascience.iq.harvard.edu*

.. code-block:: sql

update worldmapauth_tokentype set mapitlink = 'https://geoconnect.datascience.iq.harvard.edu/shapefile/map-it', hostname='geoconnect.datascience.iq.harvard.edu' where name = 'GEOCONNECT';

**Heroku Test** *geoconnect-dev.herokuapp.com*

.. code-block:: sql

update worldmapauth_tokentype set mapitlink = 'https://geoconnect-dev.herokuapp.com/shapefile/map-it', hostname='geoconnect-dev.herokuapp.com' where name = 'GEOCONNECT';


**View Current Settings**

.. code-block:: sql

SELECT * from worldmapauth_tokentype;


Removing Dead Explore Links
---------------------------
Expand Down
39 changes: 25 additions & 14 deletions doc/sphinx-guides/source/developers/dev-environment.rst
Original file line number Diff line number Diff line change
Expand Up @@ -74,11 +74,11 @@ Additional Tools

Please see also the :doc:`/developers/tools` page, which lists additional tools that very useful but not essential.

Setting up your dev environment
Setting Up Your Dev Environment
-------------------------------

SSH keys
~~~~~~~~
Set Up SSH Keys
~~~~~~~~~~~~~~~

You can use git with passwords over HTTPS, but it's much nicer to set up SSH keys. https://github.com/settings/ssh is the place to manage the ssh keys GitHub knows about for you. That page also links to a nice howto: https://help.github.com/articles/generating-ssh-keys

Expand Down Expand Up @@ -135,7 +135,7 @@ Once Solr is up and running you should be able to see a "Solr Admin" dashboard a

Once some dataverses, datasets, and files have been created and indexed, you can experiment with searches directly from Solr at http://localhost:8983/solr/#/collection1/query and look at the JSON output of searches, such as this wildcard search: http://localhost:8983/solr/collection1/select?q=*%3A*&wt=json&indent=true . You can also get JSON output of static fields Solr knows about: http://localhost:8983/solr/schema/fields

Run installer
Run Installer
~~~~~~~~~~~~~

Once you install Glassfish and PostgreSQL, you need to configure the environment for the Dataverse app - configure the database connection, set some options, etc. We have a new installer script that should do it all for you. Again, assuming that the clone on the Dataverse repository was retrieved using NetBeans and that it is saved in the path ~/NetBeansProjects:
Expand All @@ -150,10 +150,26 @@ The script is a variation of the old installer from DVN 3.x that calls another s

All the future changes to the configuration that are Glassfish-specific and can be done through ``asadmin`` should now go into ``scripts/install/glassfish-setup.sh``.

Rebuilding Your Dev Environment
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

If you have an old copy of the database and old Solr data and want to start fresh, here are the recommended steps:

- drop your old database
- clear out your existing Solr index: ``scripts/search/clear``
- run the installer script above - it will create the db, deploy the app, populate the db with reference data and run all the scripts that create the domain metadata fields. You no longer need to perform these steps separately.
- confirm you are using the latest Dataverse-specific Solr schema.xml per the "Installing and Running Solr" section of this guide
- confirm http://localhost:8080 is up
- If you want to set some dataset-specific facets, go to the root dataverse (or any dataverse; the selections can be inherited) and click "General Information" and make choices under "Select Facets". There is a ticket to automate this: https://github.com/IQSS/dataverse/issues/619

You may also find https://github.com/IQSS/dataverse/blob/develop/scripts/deploy/phoenix.dataverse.org/deploy and related scripts interesting because they demonstrate how we have at least partially automated the process of tearing down a Dataverse installation and having it rise again, hence the name "phoenix." See also "Fresh Reinstall" in the :doc:`/installation/installation-main` section of the Installation Guide.

Shibboleth and OAuth
--------------------

If you are working on anything related to users, please keep in mind that your changes will likely affect Shibboleth and OAuth users. Rather than setting up Shibboleth on your laptop, developers are advised to simply add a value to their database to enable Shibboleth "dev mode" like this:
If you are working on anything related to users, please keep in mind that your changes will likely affect Shibboleth and OAuth users. For some background on user accounts in Dataverse, see "Auth Modes: Local vs. Remote vs. Both" in the :doc:`/installation/config` section of the Installation Guide.

Rather than setting up Shibboleth on your laptop, developers are advised to simply add a value to their database to enable Shibboleth "dev mode" like this:

``curl http://localhost:8080/api/admin/settings/:DebugShibAccountType -X PUT -d RANDOM``

Expand All @@ -171,14 +187,9 @@ For a list of possible values, please "find usages" on the settings key above an

Now when you go to http://localhost:8080/oauth2/firstLogin.xhtml you should be prompted to create a Shibboleth account.

Rebuilding your dev environment
-------------------------------
Geoconnect
----------

If you have an old copy of the database and old Solr data and want to start fresh, here are the recommended steps:
Geoconnect works as a middle layer, allowing geospatial data files in Dataverse to be visualized with Harvard WorldMap. To set up a Geoconnect development environment, you can follow the steps outlined in the `local_setup.md <https://github.com/IQSS/geoconnect/blob/master/local_setup.md>`_ guide. You will need Python and a few other prerequisites.

- drop your old database
- clear out your existing Solr index: ``scripts/search/clear``
- run the installer script above - it will create the db, deploy the app, populate the db with reference data and run all the scripts that create the domain metadata fields. You no longer need to perform these steps separately.
- confirm you are using the latest Dataverse-specific Solr schema.xml per the "Installing and Running Solr" section of this guide
- confirm http://localhost:8080 is up
- If you want to set some dataset-specific facets, go to the root dataverse (or any dataverse; the selections can be inherited) and click "General Information" and make choices under "Select Facets". There is a ticket to automate this: https://github.com/IQSS/dataverse/issues/619
As mentioned under "Architecture and Components" in the :doc:`/installation/prep` section of the Installation Guide, Geoconnect is an optional component of Dataverse, so this section is only necessary to follow it you are working on an issue related to this feature.
174 changes: 174 additions & 0 deletions doc/sphinx-guides/source/developers/geospatial.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,174 @@
===============
Geospatial Data
===============

How Dataverse Ingests Shapefiles
--------------------------------

A shapefile is a set of files, often uploaded/transferred in ``.zip`` format. This set may contain up to fifteen files. A minimum of three specific files (``.shp``, ``.shx``, ``.dbf``) are needed to be a valid shapefile and a fourth file (``.prj``) is required for WorldMap -- or any type of meaningful visualization.

For ingest and connecting to WorldMap, four files are the minimum required:

- ``.shp`` - shape format; the feature geometry itself
- ``.shx`` - shape index format; a positional index of the feature geometry to allow seeking forwards and backwards quickly
- ``.dbf`` - attribute format; columnar attributes for each shape, in dBase IV format
- ``.prj`` - projection format; the coordinate system and projection information, a plain text file describing the projection using well-known text format

Ingest
~~~~~~

When uploaded to Dataverse, the ``.zip`` is unpacked (same as all ``.zip`` files). Shapefile sets are recognized by the same base name and specific extensions. These individual files constitute a shapefile set. The first four are the minimum required (``.shp``, ``.shx``, ``.dbf``, ``.prj``)

For example:

- bicycles.shp (required extension)
- bicycles.shx (required extension)
- bicycles.prj (required extension)
- bicycles.dbf (required extension)
- bicycles.sbx (NOT required extension)
- bicycles.sbn (NOT required extension)

Upon recognition of the four required files, Dataverse will group them as well as any other relevant files into a shapefile set. Files with these extensions will be included in the shapefile set:

- Required: ``.shp``, ``.shx``, ``.dbf``, ``.prj``
- Optional: ``.sbn``, ``.sbx``, ``.fbn``, ``.fbx``, ``.ain``, ``.aih``, ``.ixs``, ``.mxs``, ``.atx``, ``.cpg``, ``shp.xml``

Then Dataverse creates a new ``.zip`` with mimetype as a shapefile. The shapefile set will persist as this new ``.zip``.

Example
~~~~~~~

**1a.** Original ``.zip`` contents:

A file named ``bikes_and_subways.zip`` is uploaded to the Dataverse. This ``.zip`` contains the following files.

- ``bicycles.shp`` (shapefile set #1)
- ``bicycles.shx`` (shapefile set #1)
- ``bicycles.prj`` (shapefile set #1)
- ``bicycles.dbf`` (shapefile set #1)
- ``bicycles.sbx`` (shapefile set #1)
- ``bicycles.sbn`` (shapefile set #1)
- ``bicycles.txt``
- ``the_bikes.md``
- ``readme.txt``
- ``subway_line.shp`` (shapefile set #2)
- ``subway_line.shx`` (shapefile set #2)
- ``subway_line.prj`` (shapefile set #2)
- ``subway_line.dbf`` (shapefile set #2)

**1b.** Dataverse unzips and re-zips files:

Upon ingest, Dataverse unpacks the file ``bikes_and_subways.zip``. Upon recognizing the shapefile sets, it groups those files together into new ``.zip`` files:

- files making up the "bicycles" shapefile become a new ``.zip``
- files making up the "subway_line" shapefile become a new ``.zip``
- remaining files will stay as they are

To ensure that a shapefile set remains intact, individual files such as ``bicycles.sbn`` are kept in the set -- even though they are not used for mapping.

**1c.** Dataverse final file listing:

- ``bicycles.zip`` (contains shapefile set #1: ``bicycles.shp``, ``bicycles.shx``, ``bicycles.prj``, ``bicycles.dbf``, ``bicycles.sbx``, ``bicycles.sbn``)
- ``bicycles.txt`` (separate, not part of a shapefile set)
- ``the_bikes.md`` (separate, not part of a shapefile set)
- ``readme.txt`` (separate, not part of a shapefile set)
- ``subway_line.zip`` (contains shapefile set #2: ``subway_line.shp``, ``subway_line.shx``, ``subway_line.prj``, ``subway_line.dbf``)

For two "final" shapefile sets, ``bicycles.zip`` and ``subway_line.zip``, a new mimetype is used:

- Mimetype: ``application/zipped-shapefile``
- Mimetype Label: "Shapefile as ZIP Archive"

WorldMap JoinTargets + API Endpoint
-----------------------------------

WorldMap supplies target layers -- or JoinTargets -- that a tabular file may be mapped against. A JSON description of these `CGA <http://gis.harvard.edu>`_-curated JoinTargets may be retrieved via API at ``http://worldmap.harvard.edu/datatables/api/jointargets/``. Please note: login is required. You may use any WorldMap account credentials via HTTP Basic Auth.

Example of JoinTarget information returned via the API:

.. code-block:: json

{
"data":[
{
"layer":"geonode:census_tracts_2010_boston_6f6",
"name":"Census Tracts, Boston (GEOID10: State+County+Tract)",
"geocode_type_slug":"us-census-tract",
"geocode_type":"US Census Tract",
"attribute":{
"attribute":"CT_ID_10",
"type":"xsd:string"
},
"abstract":"As of the 2010 census, Boston, MA contains 7,288 city blocks [truncated for example]",
"title":"Census Tracts 2010, Boston (BARI)",
"expected_format":{
"expected_zero_padded_length":-1,
"is_zero_padded":false,
"description":"Concatenation of state, county and tract for 2010 Census Tracts. Reference: https://www.census.gov/geo/maps-data/data/tract_rel_layout.html\r\n\r\nNote: Across the US, this can be a zero-padded \"string\" but the original Boston layer has this column as \"numeric\" ",
"name":"2010 Census Boston GEOID10 (State+County+Tract)"
},
"year":2010,
"id":28
},
{
"layer":"geonode:addresses_2014_boston_1wr",
"name":"Addresses, Boston",
"geocode_type_slug":"boston-administrative-geography",
"geocode_type":"Boston, Administrative Geography",
"attribute":{
"attribute":"LocationID",
"type":"xsd:int"
},
"abstract":"Unique addresses present in the parcels data set, which itself is derived from [truncated for example]",
"title":"Addresses 2015, Boston (BARI)",
"expected_format":{
"expected_zero_padded_length":-1,
"is_zero_padded":false,
"description":"Boston, Administrative Geography, Boston Address Location ID. Example: 1, 2, 3...nearly 120000",
"name":"Boston Address Location ID (integer)"
},
"year":2015,
"id":18
},
{
"layer":"geonode:bra_neighborhood_statistical_areas_2012__ug9",
"name":"BRA Neighborhood Statistical Areas, Boston",
"geocode_type_slug":"boston-administrative-geography",
"geocode_type":"Boston, Administrative Geography",
"attribute":{
"attribute":"BOSNA_R_ID",
"type":"xsd:double"
},
"abstract":"BRA Neighborhood Statistical Areas 2015, Boston. Provided by [truncated for example]",
"title":"BRA Neighborhood Statistical Areas 2015, Boston (BARI)",
"expected_format":{
"expected_zero_padded_length":-1,
"is_zero_padded":false,
"description":"Boston, Administrative Geography, Boston BRA Neighborhood Statistical Area ID (integer). Examples: 1, 2, 3, ... 68, 69",
"name":"Boston BRA Neighborhood Statistical Area ID (integer)"
},
"year":2015,
"id":17
}
],
"success":true
}

How Geoconnect Uses Join Target Information
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

When a user attempts to map a tabular file, the application looks in the Geoconnect database for ``JoinTargetInformation``. If this information is more than 10 minutes* old, the application will retrieve fresh information and save it to the db.

(* Change the timing via the Django settings variable ``JOIN_TARGET_UPDATE_TIME``.)

This JoinTarget info is used to populate HTML forms used to match a tabular file column to a JoinTarget column. Once a JoinTarget is chosen, the JoinTarget ID is an essential piece of information used to make an API call to the WorldMap and attempt to map the file.

Retrieving Join Target Information from WorldMap API
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The ``get_join_targets()`` function in ``dataverse_layer_services.py`` uses the WorldMap API, retrieves a list of available tabular file JointTargets. (See the `dataverse_layer_services code in GitHub <https://github.com/IQSS/geoconnect/blob/master/gc_apps/worldmap_connect/dataverse_layer_services.py#L275>`_.)

Saving Join Target Information to Geoconnect Database
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The ``get_latest_jointarget_information()`` in ``utils.py`` retrieves recent JoinTarget Information from the database. (See the `utils code in GitHub <https://github.com/IQSS/geoconnect/blob/master/gc_apps/worldmap_connect/utils.py#L16>`_.)
1 change: 1 addition & 0 deletions doc/sphinx-guides/source/developers/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,4 +20,5 @@ Contents:
making-releases
tools
unf/index
geospatial
selinux
14 changes: 14 additions & 0 deletions doc/sphinx-guides/source/installation/geoconnect.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
Geoconnect
==========

Geoconnect works as a middle layer, allowing geospatial data files in Dataverse to be visualized with Harvard WorldMap.

To understand the feature from the user perspective, see the :doc:`/user/data-exploration/worldmap` section of the User Guide.

As of this writing, the README at https://github.com/IQSS/geoconnect recommends not installing Geoconnect at this time due to an ongoing rewrite of the WorldMap code. If you are not deterred by this, read on!

To set up a Geoconnect development environment, you can follow the steps outlined in the `local_setup.md <https://github.com/IQSS/geoconnect/blob/master/local_setup.md>`_ guide. Although those instructions are for a local development setup, they may assist in installing Geoconnect in your production environment. See also "Geoconnect" under the :doc:`/developers/dev-environment` section of the Developer Guide.

Harvard Dataverse runs Geoconnect on Heroku. To make use of Heroku, you will need a Heroku account, as well as a few other prerequisites. Follow the instructions outlined in the `heroku_setup.md <https://github.com/IQSS/geoconnect/blob/master/heroku_setup.md>`_ guide. The `heroku.py <https://github.com/IQSS/geoconnect/blob/master/geoconnect/settings/heroku.py>`_ settings file may also be adapted for other environments. Please note, for the production environment, remember to set ``DEBUG=False``.

See also the :doc:`/admin/geoconnect-worldmap` section of the Admin Guide.
1 change: 1 addition & 0 deletions doc/sphinx-guides/source/installation/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,5 +18,6 @@ Contents:
administration
upgrading
r-rapache-tworavens
geoconnect
shibboleth
oauth2
Original file line number Diff line number Diff line change
Expand Up @@ -151,4 +151,4 @@ Rerun Installer

With all the data cleared out, you should be ready to rerun the installer per above.

Related to all this is a series of scripts at https://github.com/IQSS/dataverse/blob/develop/scripts/deploy/phoenix.dataverse.org/deploy that Dataverse developers use have the test server http://phoenix.dataverse.org rise from the ashes before integration tests are run against it. Your mileage may vary. :)
Related to all this is a series of scripts at https://github.com/IQSS/dataverse/blob/develop/scripts/deploy/phoenix.dataverse.org/deploy that Dataverse developers use have the test server http://phoenix.dataverse.org rise from the ashes before integration tests are run against it. Your mileage may vary. :) For more on this topic, see "Rebuilding Your Dev Environment" in the :doc:`/developers/dev-environment` section of the Developer Guide.
2 changes: 1 addition & 1 deletion doc/sphinx-guides/source/installation/prep.rst
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ There are a number of optional components you may choose to install or configure
- Apache: a web server that can "reverse proxy" Glassfish applications and rewrite HTTP traffic.
- Shibboleth: an authentication system described in :doc:`shibboleth`. Its use with Dataverse requires Apache.
- OAuth2: an authentication system described in :doc:`oauth2`.
- Geoconnect: :doc:`/user/data-exploration/worldmap` describes the feature and the code can be downloaded from https://github.com/IQSS/geoconnect
- Geoconnect: a system that allows users to create maps from geospatial files, described in :doc:`geoconnect`.

System Requirements
-------------------
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading