Skip to content

Commit

Permalink
Merge pull request #65 from atorin/master
Browse files Browse the repository at this point in the history
Docs review (WIP)
  • Loading branch information
ruphy committed Mar 28, 2018
2 parents a5b6209 + 22f63e7 commit 1a6aa60
Show file tree
Hide file tree
Showing 52 changed files with 211 additions and 605 deletions.
1 change: 1 addition & 0 deletions .gitignore
@@ -1,4 +1,5 @@
_build/
docs-italia-theme/

.DS_Store
.ropeproject/
Expand Down
8 changes: 4 additions & 4 deletions architecture/index.rst
Expand Up @@ -14,11 +14,11 @@ The DAF Big Data platform is an environment offering capabilities for:
- *managing of access rights for each dataset*: the adopted security approach allows
the platform administrators to set the proper access rights for each dataset.

The DAF Big Data platform also enable *redistributing datasets, developing data application, publishing insights*
by mean of the above mentioned tools provided by the `Dataportal <../dataportal>`_: by these tools, data scientists and analysts can perform analysis on data, run statistical and machine learning models, and produce data
visualizazions and reports.
The DAF Big Data platform also enables *redistributing datasets, developing data applications, publishing insights*
by means of the above mentioned tools provided by the `Dataportal <../dataportal>`_: by these tools, data scientists and analysts can perform analysis on data, run statistical and machine learning models, and produce data
visualizations and reports.

For more information, continue your tour looking at:
For more information, continue your tour with the following sections.

.. toctree::
:maxdepth: 1
Expand Down
20 changes: 10 additions & 10 deletions index.rst
Expand Up @@ -9,22 +9,22 @@ Data & Analytics Framework (DAF) - Developer Documentation
.. NOTE::

This documentation refers to the Alpha version of the DAF (released in October 2017) and it is daily updated and improved.
For comments and enhancement requests about the documentation please open an issue on `Github <https://github.com/italia/daf-docs>`_.
For comments and enhancement requests about the documentation please open an issue on `GitHub <https://github.com/italia/daf-docs>`_.

The `Data & Analytics Framework <https://pianotriennale-ict.readthedocs.io/en/latest/doc/09_data-analytics-framework.html>`_ (DAF, in short) is an open source project
developed in the context of the activities planned by the
Italian `Three-Year Plan for ICT in Public Administration 2017 - 2019 <https://pianotriennale-ict.readthedocs.io/en/latest/>`_,
approved by the Italian Government in the 2017.
approved by the Italian Government in 2017.

The DAF project is an attempt to establish a central Chief Data Officer (CDO) for the Government and Public Administration. Its main goal is to promote data exchange among Italian Public Administrations (PAs), to support the diffusion of open data, and to enable data-driven policies. The framework is composed by three building blocks:

* **A Big Data Platform** to store in a unique repository the data of the PAs, implementing ingestion procedures to promote standardization and therefore interoperability among them. It exposes functionalities common to the Hadoop ecosystem, a set of (micro) services designed to improve data governance and a number of end-user tools that have been integrated with them.
* **A Team of Data Experts:** (Data Scientists and Data Engineers) able to manage and evolve the platform and to provide support to PA on their analytics and data management activities in a consultancy fashion.
* **A Regulatory Framework** that institutionalizes this activity at government level, and gives the proper mandate to the PA that will manage the DAF, in compliance with privacy policy.
* **A Big Data Platform**, to store in a unique repository the data of the PAs, implementing ingestion procedures to promote standardization and therefore interoperability among them. It exposes functionalities common to the Hadoop ecosystem, a set of (micro) services designed to improve data governance and a number of end-user tools that have been integrated with them.
* **A Team of Data Experts** (Data Scientists and Data Engineers), able to manage and evolve the platform and to provide support to PA on their analytics and data management activities in a consultancy fashion.
* **A Regulatory Framework**, that institutionalizes this activity at government level, and gives the proper mandate to the PA that will manage the DAF, in compliance with privacy policy.

This documentation is focused on the **Big Data Platform**, and we'll refer to it as DAF for the sake of simplicity.

The Italian instance of the DAF is developed and maintanied by the `DAF Team <https://teamdigitale.governo.it/it/projects/daf.htm>`_ (part of the Digital Transformation Team of the Italian Government) composed by data scientists and data engineers, which uses and evolves the framework: to analyze data, to create machine learning models and to build data applications/visualization products.
The Italian instance of the DAF is developed and maintained by the `DAF Team <https://teamdigitale.governo.it/it/projects/daf.htm>`_ (part of the Digital Transformation Team of the Italian Government), composed by data scientists and data engineers, which uses and evolves the framework: to analyze data, to create machine learning models and to build data applications/visualization products.

The DAF is designed to be easily re-usable in other countries and other application domains. It exposes the following data management and analytics functionalities:

Expand All @@ -39,10 +39,10 @@ The DAF is designed to be easily re-usable in other countries and other applicat
- a catalog of all datasets the user can access;
- an ingestion form to govern (insert, edit, delete) datasets information and setup ingestion procedures;
- data visualization and dashboard tools;
- data science notebook;
- a data science notebook;

- a **Hadoop Cluster** with typical applications to centralize and store, manipulate and standardize and re-distribute data and insights;
- a **Multy tenant** architecture, based on Kerberos and LDAP.
- a **Multi-tenant** architecture, based on Kerberos and LDAP.

The DAF is under development. This is a snapshot of the roadmap:

Expand All @@ -52,7 +52,8 @@ The DAF is under development. This is a snapshot of the roadmap:

All contributions are welcome!

Contents:
.. rubric::
Contents

.. toctree::
:maxdepth: 2
Expand All @@ -62,4 +63,3 @@ Contents:
Content Management <content/index>
Architecture <architecture/index>
Installation <installation/index>
[ITA] Manuale utente <manutente/index>
27 changes: 14 additions & 13 deletions installation/dataportal/dataportal-private.rst
Expand Up @@ -3,15 +3,15 @@ Dataportal-private
******************

===========
What it is?
What is it?
===========

The Dataportal-private [put link] is the webapp to access functionalities of DAF, like:
The `Dataportal-private <https://dataportal-private.daf.teamdigitale.it>`__ is the web app that allows access to the functionalities of DAF, like:

* **Ingestion** Form to add dataset with metadata
* **Ingestion** form to add dataset with metadata
* **Business Intelligence** with Superset (AirBnB)
* **Graphs** with Metabase
* **Data Science** with Jupyter + Sparkmagic
* **Data science** with Jupyter + Sparkmagic
* Ontologies and Controlled Vocabularies repository


Expand All @@ -23,23 +23,24 @@ Before proceeding with the installation steps, you need to install and run the f

Basic Dependencies
------------------
* daf-dataportal-backend [put link to documentation]
* FreeIPA [put link to documentation]
* CatalogManager [put link to documentation]
* SecurityManager [put link to documentation]
* `daf-dataportal-backend <../local/devVM.html#dataportal>`__
* `FreeIPA <../docker/freeipa.html>`__
* `CatalogManager <../microsrv/core/catalog-manager.html>`__
* `SecurityManager <../microsrv/core/security-manager.html>`__


Features Enabling Dependencies
------------------------------
* Superset [put link to docker documentation]
* Metabase [put link to docker documentation]
* JupyterHub [put link to docker documentation]
* CKAN [put link to docker documentation]
* `Superset <../docker/superset.html>`__
* `Metabase <../docker/metabase.html>`__
* `JupyterHub <../docker/jupyter.html>`__
* `CKAN <../docker/ckan.html>`__



Installation Steps
------------------
First of all, you need to clone the following github repository:
First of all, you need to clone the following GitHub repository:

.. code-block:: bash
Expand Down
8 changes: 4 additions & 4 deletions installation/dataportal/dataportal-public.rst
Expand Up @@ -3,10 +3,10 @@ Dataportal-public
*****************

===========
What it is?
What is it?
===========

The Dataportal-public [put link] is the webapp to access the open data catalog and other content that can be exposed publicly
The `Dataportal-public <https://dataportal.daf.teamdigitale.it/>`__ is the web app that allows access to the open data catalog and other content that can be exposed publicly.


=======
Expand All @@ -24,12 +24,12 @@ Features Enabling Dependencies
------------------------------
Connection with DataStories:

* daf-dataportal-backend [put link to documentation]
* `daf-dataportal-backend <../local/devVM.html#dataportal>`__


Installation Steps
------------------
First of all, clone the following github repository:
First of all, clone the following GitHub repository:

.. code-block:: bash
Expand Down
28 changes: 14 additions & 14 deletions installation/docker/ckan.rst
@@ -1,27 +1,27 @@

Ckan
CKAN
============================================================

CKAN is an open-source DMS (data management system) for powering data hubs and data portals.
CKAN makes it easy to publish, share and use data. It powers datahub.io, catalog.data.gov and data.gov.uk among many other sites.
CKAN makes it easy to publish, share and use data. It powers datahub.io, catalog.data.gov and data.gov.uk, among many other sites.

This guide will show you how to use Docker Compose to set up and run a `Ckan <https://ckan.org/>`_ instance which uses ldap credentials to authenticate users. In particular, you can use an openLDAP docker container or a freeIpa instance.
This guide will show you how to use Docker Compose to set up and run a `CKAN <https://ckan.org/>`_ instance which uses LDAP credentials to authenticate users. In particular, you can use an openLDAP Docker container or a FreeIpa instance.

Account Management Dependency
--------------------------

This configuration of Ckan needs an account management system to work with. We provide three different options, you will find more info on their respective sections:
This configuration of CKAN needs an account management system to work with. We provide three different options, you will find more info on their respective sections:

* Local LDAP Docker
* Local FreeIPA Docker (works only with linux)
* Local FreeIPA Docker (works only with Linux)
* Remote FreeIpa Server

Ckan docker compose
-----------------------

Now that we have a ldap server up we can run the Ckan docker compose. It will run an instance of Solr, Postgresql, Redis and Mongo.
Now that we have a LDAP server up we can run the CKAN Docker Compose. It will run an instance of Solr, Postgresql, Redis and Mongo.

First we have to build some custom image:
First we have to build a custom image:


.. code-block:: bash
Expand All @@ -45,14 +45,14 @@ Then edit the file *ckan.ini*:
ckanext.ldap.email = mail
ckanext.ldap.ckan_fallback = True
where LDAP_IP is the ip of the LDAP docker. To know the LDAP ip run:
where LDAP_IP is the IP of the LDAP docker. To know the LDAP IP, run:

.. code-block:: bash
> docker inspect -f '{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}' ldap
172.22.0.2
We know that this is not the best approach to connect container among them (maybe it is the worst), we are using a deprecated compose file version (i.e. version 1 rather than using version 3), and we are using very heavy images. We will improve Ckan docker compose as soon as possible.
We know that this is not the best approach to connect containers among them (maybe it is the worst), we are using a deprecated compose file version (i.e. version 1 rather than using version 3), and we are using very heavy images. We will improve CKAN docker compose as soon as possible.

- If you are using our FreeIpa server
.. code-block:: bash
Expand All @@ -68,14 +68,14 @@ We know that this is not the best approach to connect container among them (mayb
ckanext.ldap.ckan_fallback = True
Now that ckan container is up, type *http://localhost:5000* on your browser and login as user *bob* (password *password*).
Now that CKAN container is up, type *http://localhost:5000* on your browser and login as user *bob* (password *password*).

.. image:: imgs/ckan_login.png
:scale: 50 %
:alt: alternate text
:align: right
:alt: CKAN login page
:align: center

.. image:: imgs/bob_page.png
:scale: 50 %
:alt: alternate text
:align: right
:alt: CKAN dashboard
:align: center
18 changes: 10 additions & 8 deletions installation/docker/freeipa.rst
Expand Up @@ -6,7 +6,7 @@ Docker
-----------------
FreeIPA server can be run in a Docker container for testing or demo purposes. It makes it possible to run all the processes comprising the server in an isolated way, leaving the host free to run other software, not clashing with the FreeIPA server.

This install is done on Ubuntu 16.04. FreeIPA is focused on Linux (and other standards compliant) systems. FreeIPA is focused on Linux (and other standards compliant) systems. Therefore, in our knowledge, you cannot run a container of a FreeIPA server on **Mac** or **Windows**. However, any help in this direction is very welcomed!!
This installation is done on Ubuntu 16.04. FreeIPA is focused on Linux (and other standards compliant) systems. Therefore, in our knowledge, you cannot run a container of a FreeIPA server on **Mac OS** or **Windows**. However, any help in this direction is very welcomed!!


Follow these steps to run our FreeIPA server docker:
Expand All @@ -17,7 +17,7 @@ Follow these steps to run our FreeIPA server docker:
> mkdir /var/lib/ipa-data
2. Edit */etc/hosts* and ensure that the IPA server address is listed. This is required for Apache to work properly. You have to change IPA_SERVER_IP with the ipa server ip:
2. Edit */etc/hosts* and ensure that the IPA server address is listed. This is required for Apache to work properly. You have to change IPA_SERVER_IP with the IPA server IP:

.. code-block:: bash
Expand Down Expand Up @@ -49,26 +49,28 @@ Answer to the question:

Continue to configure the system with these values? [no]: --> type "y" and press "Enter"

Wait some time until freeipa server is completely configured and started.
The server is ready when on the shell appear the following message:
Wait some time until FreeIPA server is completely configured and started.
The server is ready when on the shell the following message appears:

.. code-block:: bash
> FreeIPA server configured.
NOTE: Only first time that build image and run docker you need to ask to previous questions.
- You can connect to Freeip Server with web interface:
.. note::
You need to answer the previous questions only the first time you build the image and run docker.

- You can connect to FreeIPA Server from a web interface:

https://IPA_SERVER_IP:443

USER: admin

PW: adminpassword

- You can also connect with LDAP client with Server IP address IPA_SERVER_IP
- You can also connect with an LDAP client with Server IP address IPA_SERVER_IP

- The container can the be started and stopped with the following commands:
- The container can then be started and stopped with the following commands:

.. code-block:: bash
Expand Down
16 changes: 10 additions & 6 deletions installation/docker/jupyter.rst
Expand Up @@ -3,25 +3,25 @@ Jupyter
============================================================

This guide will show you how to use Docker Compose to set up and run a `JupyterHub <https://jupyterhub.readthedocs.io/en/latest/>`_ instance
which uses ldap credentials to authenticate users.
which uses LDAP credentials to authenticate users.


Account Management Dependency
--------------------------
-------------------------------

This configuration of Ckan needs an account management system to work with. We provide three different options, you will find more info on their respective sections:
This configuration of CKAN needs an account management system to work with. We provide three different options, you will find more info on their respective sections:

* Local LDAP Docker
* Local FreeIPA Docker (works only with linux)
* Local FreeIPA Docker (works only with Linux)
* Remote FreeIpa Server


JupyterHub
-----------------

This docker container runs a JupyterHub instance which is connected with a postgresql database.
This Docker container runs a JupyterHub instance which is connected with a PostgreSQL database.

Run the docker container:
Run the Docker container:

.. code-block:: bash
Expand All @@ -41,4 +41,8 @@ Check whether dockers are running:
To open the interactive shell type *http://localhost:8000* and login as user *alice* (password *password*).

.. image:: imgs/jupyter.png
:scale: 50 %
:alt: JupyterHub login page
:align: center


14 changes: 7 additions & 7 deletions installation/docker/ldap.rst
Expand Up @@ -2,9 +2,9 @@
LDAP Installation
*****************

This docker container allows us to start a simple LDAP server (`OpenLdap <http://www.openldap.org/>`_
This docker container allows you to start a simple LDAP server (`OpenLdap <http://www.openldap.org/>`_
) and a client (`phpLDAPadmin <http://phpldapadmin.sourceforge.net/>`_
). In particular, the docker compose downloads an initial database having domain *daf.test.it* and containing the user *bob* with password *password*.
). In particular, the Docker Compose downloads an initial database having domain *daf.test.it* and containing the user *bob* with password *password*.

Clone the git project:

Expand All @@ -29,24 +29,24 @@ Check whether dockers are running:
**Note**

The docker compose requires that ports 80, 636 and 389 are available. If not, change them.
The Docker Compose requires that ports 80, 636 and 389 are available. If not, change them.

Now, open your favorite browser and type *http://localohost*.
Now, open your favorite browser and type *http://localhost*.

.. image:: imgs/ldap_login.png
:scale: 50 %
:alt: alternate text
:alt: LDAP login page
:align: center

Login as *cn=admin,dc=example,dc=org* and password *admin* to navigate inside.

.. image:: imgs/ldap_tree.png
:scale: 50 %
:alt: alternate text
:alt: LDAP web app
:align: center



FreeIpa Instance
-----------------------
We installed a FreeIpa server which can be used for test purposes. It can be reached to the address *91.206.129.245*
We installed a FreeIpa server which can be used for test purposes. It can be reached at the address *91.206.129.245*.
8 changes: 4 additions & 4 deletions installation/docker/metabase.rst
Expand Up @@ -3,7 +3,7 @@ Metabase
============================================================

.. Metabase + postgres + ldap configuration
This guide explains how to run a execute a Metabase server.
This guide explains how to run and execute a Metabase server.

Follow these steps to run the Docker images.

Expand All @@ -13,14 +13,14 @@ Clone the git project:
> git clone git@github.com:italia/daf-recipes.git
Go to the metabase directory, the images needed by docker-compose and run it:
Go to the :code:`metabase` directory, the images needed by docker-compose and run it:

.. code-block:: bash
> cd metabase
> ./build_local.sh
> docker-compose up -d # it will run all the needed containers
Open the metabase home http://localhost:3000
Open the Metabase home at http://localhost:3000.

Go to `github <https://github.com/italia/daf-recipes/tree/master/metabase>`_ for seeing how to setup metabase.
Go to `GitHub <https://github.com/italia/daf-recipes/tree/master/metabase>`_ to check how to set up Metabase.

0 comments on commit 1a6aa60

Please sign in to comment.