# Setting Up Your Geospatial Python Environment

## Installing virtualenv and virtualenvwrapper

This recipe will enable you to manage different versions of different libraries for multiple projects. We use <em style="color:blue;">virtualenv</em> to create virtual Python environments to host collections of project-specific libraries in an isolated directory. For example, you may have an old legacy project using Django 1.4, whereas a new project requires you use Django version 1.8. With virtualenv, you can have both versions of Django installed on the same machine, and each project can access the appropriate version of Django without any conflicts or problems.

Without <em style="color:blue;">virtualenv</em>, you are forced to either upgrade the old project or find a workaround to implement the new features of the other version, therefore limiting or complicating the new project.

The <em style="color:blue;">virtualenv</em> allows you to simply switch between different Python virtual environments for your individual projects. This has the added benefit that you can easily and quickly set up a new machine for testing or help a new developer get their machine up and running as fast as possible.

### Getting ready

Before anything, we are going to assume that you already have a Linux/Ubuntu machine or a virtualbox instance running Linux/Ubuntu with python3 so you can follow these instructions.

To install <em style="color:blue;">virtualenv</em>, you need to have a running installation of Python and pip. The pip package manager manages and installs Python packages, making our lives easier. Throughout this book, if we need to install a package, <em style="color:blue;">pip</em> will be our tool of choice for this job. The official installation instructions for pip can be found at https://pip.pypa.io/en/latest/installing.html. To install pip from the command line, we first need to install <em style="color:blue;">easy_install</em>. Let's try it out from the Terminal:

<code>$ sudo apt-get install python-setuptools python-pip</code>

With this one line, you have both <em style="color:blue;">pip</em> and <em style="color:blue;">easy_install</em> installed.

<pre>
<strong>NOTE:
What is sudo?</strong>
sudo is a program for Unix-like computer operating systems that allows users to run programs with the security privileges of another user (normally, the super user or root). Its name is a concatenation of su (substitute user) and do (take action). Take a look at http://en.wikipedia.org/wiki/Sudo for more information on this.
<pre>

The command sudo means to run an execution as a super user. If this fails, you will need to get the <em style="color:blue;">ez_setup.py</em> file, which is available at https://bootstrap.pypa.io/ez_setup.py. After downloading the file, you can run it from the command line:

<code>$ python ez_setup.py</code>

Now <em style="color:blue;">pip</em> should be up and running and you can execute commands to complete the installations of virtualenv and virtualenvwrapper. The virtualenvwrapper creates shortcuts that are faster ways to create or delete your virtual environments. You can test it as follows:

<code>$ pip install virtualenv</code>

### How to do it...

The steps to install your Python virtualenv and virtualenvwrapper packages are as follows:

1. Install virtualenv using the pip installer:

    <code>$ sudo pip install virtualenv</code>

2. Install virtualenvwrapper using easy_install:

    <code>$ sudo easy_install virtualenvwrapper</code>

<pre>
<strong>NOTE:

We use easy_install instead of pip because with Ubuntu 14.04, the virtualenvwrapper.sh file is unfortunately not located at /usr/local/bin/virtualenvwrapper.sh where it should be according to the online documentation.</pre>

3. Assign the <em style="color:blue;">WORKON_HOME</em> variable to your home directory with the folder name <em style="color:blue;">venvs</em>. Create a single folder where you want to store all your different Python virtual environments; in my case, the folder is located at /home/calvin/venvs:

<code>
$ export WORKON_HOME=~/venvs
$ mkdir $WORKON_HOME
</code>

4. Run the source command to execute the <em style="color:blue;">virtualenvrapper.sh</em> bash file:

<code> $ source /usr/local/bin/virtualenvwrapper.sh</code>

5. Next, we create a new virtual environment called <em style="color:blue;">spatial</em>, and this is also the name of the new folder where the virtual environment is installed:

<code> $ mkvirtualenv pygeoan_cb</code>

To use <em style="color:blue;">virtualenvwrapper</em> the next time you start up your machine, we need to set it up so that your bash terminal runs the <em style="color:blue;">virtualenvwrapper.sh</em> script when your computer starts.

6. First, put it in your <em style="color:blue;">~/.bashrc</em> file:

<code> $ echo "export WORKON_HOME=$WORKON_HOME" >> ~/.bashrc</code>

7. Next, we'll import the virtualenvwrapper function in our bash:

<code>$ echo "source /usr/local/bin/virtualenvwrapper.sh" >> ~/.bashrc</code>

8. Now we can execute our bash:

<code> $ source ~/.bashrc </code>

### How it works
Step one shows how pip installs the <em style="color:blue;">virtualenv</em> package into your system-wide Python installation. Step two shows how the <em style="color:blue;">virtualenvwrapper</em> helper package is installed with <em style="color:blue;">easy_install</em> because the <em style="color:blue;">virtualenvwrapper.sh</em> file is not created using the pip installer. This will help us create, enter, and generally, work or switch between Python virtual environments with ease. Step three assigns the <em style="color:blue;">WORKON_HOME</em> variable to a directory where we want to have all of our virtual environments. Then, we'll create a new directory to hold all the virtual environments. In step four, the command source is used to execute the shell script to set up the <em style="color:blue;">virtualenvwrapper</em> package. In step five, we see how to actually create a new <em style="color:blue;">virtualenv</em> called <em style="color:blue;">spatial</em> in our /home/calvin/venvs directory. This final step automatically starts our <em style="color:blue;">virtualenv</em> session.

Once the <em style="color:blue;">virtualenv</em> session starts, we can now see the name of virtualenv in brackets like this:
<code> (spatial)calvin@calvin-computer:~$ </code>

To exit <em style="color:blue;">virtualenv</em>, simply type the following code:

<code> $ deativate </code>

To reactivate <em style="color:blue;">virtualenv</em>, simply type:

<code> $ workon spatial </code>

## Installation of PyProj and Numpy

The pyproj is a wrapper around the PROJ.4 library that works with projections and performs transformations (https://pypi.python.org/pypi/pyproj/) in Python. All your geographic information should be projected into one of the many coordinate systems supported by the European Petroleum Survey Group (EPSG). This information is necessary for the systems to correctly place data at the appropriate location on Earth. The geographic data can then be placed on top of each other as layers upon layers of data in order to create maps or perform analysis. The data must be correctly positioned or we won't be able to add, combine, or compare it to other data sources spatially.

Data comes from many sources and, often, a projection is not the same as a dataset. Even worse, the data could be delivered with a description from a data provider stating it's in projection UTM31 when, in reality, the data is in projection UTM34! This can lead to big problems later on when trying to get your data to work together as programs will throw you some ugly error messages.

NumPy is the scientific backbone of number crunching arrays and complex numbers that are used to power several popular geospatial libraries including GDAL (geospatial abstraction library). The power of NumPy lies is in its support for large matrices, arrays, and math functions. The installation of NumPy is, therefore, necessary for the other libraries to function smoothly, but is seldom used directly in our quest for spatial analysis.

### Getting Ready

Fire up your virtual environment, 
<pre><code>$ workon spatial </code></pre>

Now, we need to install some Python tools for development that allow us to install NumPy, so run this command:

<pre><code>$ sudo apt-get install -y python-dev </code></pre>

You are now ready to move on and install pyproj and NumPy inside your running virtual environment.

<pre><code>
$ pip install numpy

$ pip install pyproj
</code></pre>

## Installing shapely, matplotlib, and descartes

A large part of geospatial analysis and visualization is made possible using Shapely, matplotlib, GDAL, OGR, and descartes, which are installed later. Most of the recipes here will use these libraries extensively so setting them up is necessary to complete our exercises.

<strong>Shapely</strong> (http://toblerity.org/shapely) provides pure spatial analysis of geometries using the Cartesian coordinate system as is used by AutoCAD, for those of you familiar with CAD-like programs. The benefit of using a flat coordinate system is that all the rules of Euclidean geometry and analytic geometry are applied. For a quick refresher in the coordinate systems that we all learned in school, here is a little image to quickly jolt your memory.

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/0/0e/Cartesian-coordinate-system.svg/250px-Cartesian-coordinate-system.svg.png" width=300 height=300>

The classic overlay analysis and other geometric computations is where Shapely shines using the GEOS library as its workhorse in the background.

As for <strong>matplotlib</strong> (http://matplotlib.org/), it is the plotting engine that renders nice graphs and data to your screen as an image or <strong>scalable vector graphic (svg)</strong>. The uses of matplotlib are only limited to your imagination. So, like the name partially implies, matplotlib enables you to plot your data on a graph or even on a map. For those of you familiar with MATLAB, you will find matplotlib quite similar in functionality.

The <strong>descartes</strong> library provides a nicer integration of Shapely geometry objects with Matplotlib. Here, you will see that descartes opens the <em>fill</em> and <em>patch</em> of matplotlib plots to work with the geometries from Shapely and saves you from typing them individually.

### Getting ready

To prepare for installation, it is necessary to install some global packages, such as <strong>libgeos_c</strong>, as these are required by Shapely. NumPy is also a requirement that we have already met and is also used by Shapely.

Install the requirements of matplotlib from the command line like this:

<code>
$ sudo apt-get install freetype* libpng-dev libjpeg8-dev
</code>

These are the dependencies of matplotlib, which can be seen on a Ubuntu 14.04 machine.

<code>
$ pip install shapely
    
$ pip install matplotlib

$ pip install descartes

</code>

The installation itself is simple with pip and should be quick and painless. The tricky parts occur if <strong>libgeos_c</strong> is not installed properly, and you might need to install the <strong>libgeos-dev</strong> library.

## Installing pyshp, geojson, and pandas

These specific libraries are for specific formats that make our life easier and simpler than using GDAL for some projects. pyshp will work with shapefiles, geojson with GeoJSON, and pandas with all other textual data types in a structured manner.

<strong>pyshp</strong> is pure Python and is used to import and export shapefiles; you can find the source code for pyshp here at https://github.com/GeospatialPython/pyshp. The pyshp library's sole purpose is to work with shapefiles. GDAL will be used to do most of our data's in/out needs, but sometimes, a pure Python library is simpler when working with shapefiles.

<strong>geojson</strong> is the name of a Python library and also a format, making it a little confusing to understand. The GeoJSON format (http://geojson.org) is becoming ever more popular and to this extent, we use the Python geojson library to handle its creation. You will find it on <strong>Python Package Index (PyPI)</strong> if you search for geojson. As you would expect, this will help us create all the different geometry types supported in the GeoJSON specifications.

<strong>pandas</strong> (http://pandas.pydata.org) is a data analysis library that structures your data in a spreadsheet-like manner for further computations. Since our geospatial data comes from a broad set of sources and formats, such as CSV, pandas helps work with the data with minimal effort.

<code>
$ pip install pyshp
$ pip install geojson
$ pip install pandas
</code>

## Installing SciPy, PySAL, and IPython

<strong>SciPy</strong> is a collection of Python libraries, including SciPy library, matplotlib, pandas, SymPy, and IPython. The SciPy library itself is used for many operations, but we are particularly interested in the <strong>spatial</strong> module. This module can do many things including running a nearest neighbor query.

<strong>PySAL</strong> is a geospatial computing library that's used for spatial analysis. Creating models and running simulations directly from Python code are some of the many library functions that PySAL offers. PySAL is a library that, when put together with our visualization tools such as matplotlib, gives us a great tool.

<strong>IPython</strong> is a Python interpreter for a console that replaces the normal Python console you may be used to when running and testing Python code from your terminal. This is really just an advanced interpreter with some cool features, such as Tab completion, which means that beginners can get commands quickly by typing a letter and hitting Tab. The IPython notebooks help share code in the form of a web page, including code, images, and more without any installation.

Dependency installations:
<code> $ sudo apt-get install libblas_dev liblapack-dev gfortran </code>

<code>
$ pip install scipy
$ pip install pysal
$ pip install ipython
</code>

## Installing GDAL and OGR

Converting formats is boring, repetitive, and is one of the many, many responsibilities that the GDAL library provides, not to mention format transformations. However, GDAL also shines with regard to other geospatial functions, such as getting the current projections of a Shapefile or generating contours from elevation data. So, to only say that GDAL is a transformation library would be wrong; it really is so much more. The father of GDAL, Frank Warmerdam, deserves credit for starting it all off, and the GDAL project is now part of the <strong>OSGEO (Open Source Geospatial Foundation</strong>, refer to www.osgeo.org).

Currently, GDAL covers working with raster data, and OGR covers working with vector data. With GDAL 2.x now here, the two sides, raster and vector, are merged under one hat. GDAL and OGR are the so-called Swiss Army knives of geospatial data transformations, covering over 200 different spatial data formats.

### Getting ready

GDAL isn't known to be the friendliest beast to install on Windows, Linux, or OSX. There are many dependencies and even more ways to install them. The descriptions are not all very straightforward. Keep in mind that this description is just one way of doing things and will not always work on all machines, so please refer to the online instructions for the latest and best ways to get your system up and running.

To start with, we will install some dependencies globally on our machine. After the dependencies have been installed, we will go into the global installation of GDAL for Python in our global site packages.

### How to do it...

To globally install GDAL into our Python site packages, we will proceed with the following steps:

1. The following command is used when installing build and XML tools:

<code> $ sudo apt-get install -y build-essentiallibxml2-dev libxslt1-dev</code>


2. Install the GDAL development files using the following command:

<code> $ sudo apt-get install libgdal-dev # install is 125MB </code>

3. This following command will install GDAL package in the main Python package. This means that GDAL will be installed globally. The global installation of GDAL is usually not a bad thing since, as far as I am aware, there are no backward incompatible versions, which is very rare these days. The installation of GDAL directly and only in virtualenv is painful, to say the least, and if you are interested in attempting it, I've mentioned some links for you to try out.

<code> $ sudo apt-get install python-gdal </code>

<strong>NOTE</strong>: If you would like to attempt the installation inside your virtual environment, please take a look at this Stack Overflow question at http://gis.stackexchange.com/questions/28966/python-gdal-package-missing-header-file-when-installing-via-pip.

4. To get GDAL in the Python virtual environment, we only need to run a simple virtualevnwrapper command:

<code> toggleglobalsitepackages </code>

Make sure you have your virtual environment activated

5. Now, activate the global Python site packages in your current virtual environment:

<code>(spatial)mdiener@mdiener:~$ toggleglobalsitepackages
enable global site-packages </code>

6. The final check is to see if GDAL is available as follows:
<code>
$ python
>>> import gdal
>>>
</code>

Windows 7 plus users should use the OSGeo4W windows installer (https://trac.osgeo.org/osgeo4w/).Find the following section on the web page and download your Windows version in 32-bit or 64-bit. Follow the graphical installer instructions and the GDAL installation will then be complete.

### How it works...

The GDAL installation encompasses both the raster (GDAL) and vector (OGR) tools in one. Within the GDAL install are five modules that can be separately imported into your project depending on your needs:

<code>
>>> from osgeo import gdal
>>> from osgeo import ogr
>>> from osgeo import osr
>>> from osgeo import gdal_array
>>> from osgeo import gdalconst
>>> python
>>> import osgeo
>>> help(osgeo)
</code>

## Installing GeoDjango and PostgreSQL with PostGIS

This is our final installation recipe and if you have followed along so far, you are ready for a simple, straightforward start to Django. Django is a web framework for professionals with deadlines, according to the Django homepage. The spatial part of it can be found in <strong>GeoDjango</strong>. GeoDjango is a contrib module installed with every Django installation therefore, you only need to install Django to get GeoDjango running. Of course, "geo" has its dependencies that were met in the previous sections. For reference purposes, take a look at this great documentation on the Django homepage at

https://docs.djangoproject.com/en/dev/ref/contrib/gis/install/#ref-gis-install.

We will use PostgreSQL and PostGIS since they are the open source industry go-to spatial databases. The installations are not 100% necessary, but without them there is no real point because you then limit your operations, and they're definitely needed if you plan to store your spatial data in a spatial database. The combination of PostgreSQL and PostGIS is the most common spatial database setup for GeoDjango. This installation is definitely more involved and can lead to some hook-ups depending on your system.

### Getting ready

To use GeoDjango, we will need to have a spatial database installed, and in our case, we will be using PostgreSQL with the PostGIS extension. GeoDjango also supports Oracle, Spatialite, and MySQL. The dependencies of PostGIS include GDAL, GEOS, PROJ.4, LibXML2, and JSON-C.

Start up your Python virtual environment:

Follow these steps. These are taken from the PostgreSQL homepage for Ubuntu Linux:

1. Create a new file called pgdg.list using the standard gedit text editor. This stores the command to fire up your Ubuntu installer package:

<code> $ sudo nano /etc/apt/sources.list.d/pgdg.list </code>

2. Add this line to the file, save, and then close it:

<code> $ deb http://apt.postgresql.org/pub/repos/apt/ precise-pgdg main </code>

3. Now, run the wget command for add the key:

<code> $ wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | \ sudo apt-key add - </code> 

4. Run the update command to actualize your installer packages:

<code> $ sudo apt-get update </code>

5. Run the install command to actually install PostgreSQL 10, posgis 2.4, pgadmin4, pgRouting 2.6 and additional supplied modules including the adminpack extension:

    <code>
    $ sudo apt-get install postgresql-10
    $ sudo apt-get install postgresql-10-postgis-2.4
    $ sudo apt-get install postgresql-10-postgis-scripts
    # to get the commandline tools shp2pgsql, raster2pgsql
    $ sudo apt-get install postgis
    # to install pgRouting
    $ sudo apt-get install postgresql-10-pgrouting
    </code>

6. Install the PostgreSQL header files:

    <code> $ sudo apt-get install libpq-dev</code>

7. Finally, install the contrib module with contributions:

    <code> $ sudo apt-get install postgresql-contrib </code>

8. Install the Python database adapter, psycopg2, to connect to your PostgreSQL database from Python:

    <code> $ sudo apt-get install python-psycopg2 </code> 

   
9. Moving on, we can finally install Django in one line directly in our activated virtual environment:

    <code> $ pip install django </code>

10. Test out your install of Django and GDAL and, as always, try to import them as follows:

<code>
    >>> from django.contrib.gis import gdal
    >>> gdal.HAS_GDAL
    True
</code>

Windows users should be directed to the PostgreSQL Windows (http://www.postgresql.org/download/windows/) binaries provided by EnterpriseDB (http://www.enterprisedb.com/products-services-training/pgdownload#windows). Download the correct version and follow the installer instructions. PostGIS is also included in the list of extensions that you can directly install using the installer.

### There's more...

To summarize all the installed libraries, take a look at this table:
<table class='table table-striped'>
<tr>
<th>Library name</th>
<th>Description</th>
<th>Reason to install</th>
</tr>
<tr>
<td>Numpy</td>
<td>This adds support for large multidimensional arrays and matrices</td>
<td>It is a requirement for many other libraries</td>
</tr>
<tr>
<td>pyproj</td>
<td>This handles projections</td>
<td>It transforms projections</td>
</tr>
<tr>
<td>shapely</td>
<td>This handles geospatial operations</td>
<td>It performs fast geometry manipulations and operations</td>
</tr>
<tr>
<td>matplotlib</td>
<td>This plots libraries</td>
<td>It provides a quick visualization of results</td>
</tr>
<tr>
<td>descartes</td>
<td>This uses Shapely or GeoJSON objects as matplotlib paths and patches</td>
<td>It speedily plots geo-data</td>
</tr>
<tr>
<td>pandas</td>
<td>This provides high-performance data structures and data analysis</td>
<td>It performs data manipulation, CSV creation, and data manipulation</td>
</tr>
<tr>
<td>SciPy</td>
<td>This provides a collection of Python libraries for scientific computing</td>
<td>It has the best collection of necessary tools</td>
</tr>
<tr>
<td>PySAL</td>
<td>This contains a geospatial analysis library</td>
<td>It performs a plethora of spatial operations (optional)</td>
</tr>
<tr>
<td>IPython</td>
<td>This provides interactive Python computing</td>
<td>It is a helpful notebook to store and save your scripts (optional)</td>
</tr>
<tr>
<td>Django</td>
<td>This contains a web application framework</td>
<td>It is used for our demo web application in Chapter 11, Web Analysis with GeoDjango</td>
</tr>
<tr>
<td>pyshp</td>
<td>This provides pure Python shapefile manipulation and generation</td>
<td>It helps input and output shapefiles</td>
</tr>
<tr>
<td>GeoJSON</td>
<td>This contains the JSON format for spatial data</td>
<td>It facilitates the exchange and publication of this format</td>
</tr>
<tr>
<td>PostgreSQL</td>
<td>This is a relational database</td>
<td>It helps store spatial data</td>
</tr>
<tr>
<td>PostGIS</td>
<td>This is the spatial extension to PostgreSQL</td>
<td>It stores and performs spatial operations on geographic data in PostgreSQL</td>
</tr>
</table>