An introduction to web-mapping using CartoDB
HTML
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
img
slide-show
sql
README.md

README.md

CartoDB Tutorial

Originally written on August 9, 2014
Last updated on February 15, 2016

About Me

I (Chris Henrick) have a professional background in Cartography and Geographic Information Systems. More recently I have been consulting in front-end web development, data visualization, and interactive web-mapping. I help co-organize the NYC chapter of Maptime, a group of volunteers that love to teach mapping technology to the general public for free.

What is This Tutorial About?

Mapping data interactively on the web using the free open-source software CartoDB.

Who is This Tutorial For?

Primarily people who are new to making interactive maps that live on the web. If you have GIS or Cartography experience some of this tutorial may be a review, so please bare with me.

Slide Deck

You may view the slides for the presentation I typically give with this tutorial. Use the ◀ ▶ keys to navigate between slides.

Examples / Use Cases:

Here are some examples of how people have used CartoDB to create interactive maps:

What is CartoDB?

CartoDB is a Software as a Service (SaaS) for visualizing and analzying geospatial data on the web. It is perhaps the most user friendly method of creating interactive maps on the web with your own data. CartoDB allows for high cartographic customization through an intuitive user interface as well as advanced geospatial data analysis using SQL (Structured Queried Language) and Post-GIS.

Prior to CartoDB and other opensource web-cartography software such as Tile Mill, creating webmaps involved having to run your very own web server and install both server-side and database software. This could be extremely difficult unless you were a experienced computer programmer / back-end web developer. The great thing about CartoDB is that it handles all of the server side stuff for you! For example, each time you import data into CartoDB that data is automatically stored inside a database that has geospatial capabilities.

Geospatial? Wait, What's That???

Geospatial data refers to data that has a location based, geometric component. Most geospatial data is in vector format and is stored as points, lines and polygons whose geometric attributes have references to physical locations in the real world, such as latitude and longitude coordinates. With a Geographic Information System, geospatial data can be used to represent both physical and cultural features. These data can then be cartographically rendered and spatially analyzed to solve problems and model the enviroment.

Examples of Geometry Types:

  • A list of street addresses, which can then be georeferenced (matched) to individual pairs of latitude longitude coordinates (points), for example the locations of all public schools in NYC:

  • Features such as rivers and streams or road networks can be stored and represented as lines:

  • New York City's borough boundaries (or other administrative governmental boundaries such as states, provinces, and countries) can be stored and represented as polygons:

Map Tiles

The above types of data are used to render map tiles like those you see on OpenStreetMap, Google Maps, Bing, Map Quest, etc. For example these are map tiles from different providers for the same area of San Francisco:

Map tiles are 256 x 256 pixel images that are aligned in a grid like fashion. They are broken up this way to make zooming and panning appear seamless and fluid by the web-map user. Only the images inside and just outside the map area the user is looking at are being rendered. The server is told to render neighboring tiles and to cache them so that when you pan to a new area the interaction appears seamless.

For the purposes of this tutorial map tiles form our basemap that we can overlay custom data on top of. Even more though, we can analyze our geospatial data with CartoDB. Both of these abilities are where the fun comes in :)

Analyzing Geospatial Data With PostGIS

PostGIS (along with PostgreSQL) is the open-source database technology that allows for performing geospatial analysis on data in CartoDB. Why would we want to use this over other types of GIS software?

  • Replicable: You can script your workflow, which is great for leaving a trail of your work.
  • It builds on SQL: If you already know SQL, this is an easy way to get into doing GIS analysis.
  • You can query data dynamically: If you have a server that can crunch a PostGIS query and return JSON, you can do dynamic spatial queries in your apps. e.g. "Find all points near me."

We will cover some basic PostGIS in this tutorial though if you are interested you may find another introductory tutorial about using PostGIS in CartoDB here. There are also plenty of great tutorials on CartoDB's Map Academy

The Tutorial

Intro to the CartoDB Dashboard

  1. Create a free account and log into CartoDB. Once logged in and viewing the dashboard, click on the Data Library option (upper right corner of the dashboard). Navigate to page 5, select the Populated Places dataset, and then click on connect dataset at the top of the dashboard. This will import the Populated Places dataset to your account.

Once the data has been imported take a look at the adm0cap field in the table view. This field stores 0's and 1's, the latter meaning the place is a country capital.

  1. A walk through the GUI:

    • There are two ways to inspect your data in CartoDB:

      • Table View: Shows column names & rows much like a spreadsheet. For this dataset each row represents a point. But rows may also represent other geometry types depending on your data.

        • Take a look at what's inside one of the cells in the_geom column, You should see lattitude and longitude coordinates.
      • Map View: Allows for inspecting our data visually, eg: zooming and panning on an interactive web map. From here we can change the style of the base map, use the Visualization Wizard in the side bar to style our data and add interaction such as pop-ups that display values from the columns in our table view.

    • In the Visualization Wizard try switching the data's style to category view, choose the adm0cap column, and try assigning different types of image markers based on the value for adm0cap. Remember, a 1 means a place is a country capital.

    • Note: You may also upload custom images to be used as markers.

  2. Publishing / Sharing a Map:

    • By clicking on the Visualize button in the upper right corner we can create a Visualization. Do this and give your visualization a name such as "My First Viz".

    • When we create a Visualization it will inherit the styles we set from our map view.

    • Visualizations work by linking to your data tables. Note that if we go back to inspecting our imported data table and change the styles from here the visualization we made with that data will not be updated with those styles. However if you make any changes to the values in those tables the corresponding visualization will be affected.

    • Note: any changes we make to our visualization (data or styling) will be updated in real time to anyone viewing our viz!

    • Notice the differnces between the tables and visualizations views in your Dashboard. The former lists the datasets you have imported to your account, the latter lists the maps you have created with your data and may choose to share / publish on the web.

    • Note: A single Visualization may link to multiple tables in the form of layers. This is a key concept in Cartographic Design, and layer order matters.

Making a Choropleth Map

  1. Delete the populated places dataset and visualization we previously made as we'll need the storage space to move forward with the next part of the tutorial using a free account.

  2. Import the U.S. Counties, 1979 - Present dataset (currently on page 5 of the data library)

  3. Let's take a look at this data. Click on one of the cells under the the_geom column. You should see something like:

{"type":"MultiPolygon","coordinates":[[[[-69.99693763,12.5775821],[-69.93639075,12.53172435],
[-69.924672,12.51923249],[-69.91576087,12.49701569],[-69.88019772,12.45355866],[-69.87682044,12.42739492],
[-69.8880916,12.41766999],[-69.90880286,12.41779206],[-69.93053138,12.42597077],[-69.94513913,12.44037507],
[-69.924672,12.44037507],[-69.924672,12.447211],[-69.95856686,12.46320222],[-70.02765866,12.52293529],
[-70.04808509,12.53115469],[-70.05809486,12.53717683],[-70.06240801,12.54682038],[-70.0603735,12.55695222],
[-70.05109616,12.57404206],[-70.04873613,12.5837263],[-70.05264238,12.60000235],[-70.05964108,12.61424388],
[-70.06110592,12.62539297],[-70.04873613,12.63214753],[-70.00715085,12.58551667],[-69.99693763,12.5775821]]]]}

This is how CartoDB stores geometry for a multi-polygon, or a group of polygons. Each of those coordinates refers to a node within a single polygon. Multi-polygons are useful for grouping lots of geographic features, such as islands, that belong to a single political entity like a state or province.

  1. Now switch to the Map View to see how the polygons are overlayed on our map.
    In the Visualization Wizard:

    • try changing the base map.
    • try changing the polygon fills and borders.
  2. Try clicking somewhere on the map. Notice a pop-up displays with the following message: "You haven’t selected any fields to be shown in the infowindow." Click on the select fields link and notice the sidebar on the right will navigate to the Info Window panel. This is where you may configure data to be shown in the pop-ups or what CartoDB calls Info Windows.
    Here you may:

    • turn any of your columns on or off for values to be displayed in the Info Window.
    • edit the name of the column to be displayed in the Info Window (note: this will not alter the column name in your actual data).
    • change the style of Info Windows.
    • customize them with HTML and CSS.
  3. Let's try switching our data's graphic style using the Visualization Wizard. Switch the style from "simple" to "choropleth". Notice how our polygon data is automatically color coded based on values in the data, in this case total population. However there's a problem here: mapping population by county gives a false impression to the viewer of our map. We need to normalize the data by dividing the number of people in a county by its geographic area.

    • Fortunately our data already has this value included in the pop_sqkm column. To show how you could compute it on your own we would do the following in the SQL Panel:

      SELECT pop_sqkm, 
      	round(
      		pop / 
      		(ST_Area(the_geom::geography)/1000000)) 
      	as psqkm 
      	FROM us_counties
      
    • This is an example of using the open-source technology PostGIS to spatially analyze our data. With PostGIS we can calculate values such as distance and area, where different spatial datasets interesect each other, as well as export our data to different data formats such as GeoJSON or Shapefile.

Making a Thematic Point Map

  1. Import the Tornado historic data 1950 - 2013 dataset (currently on page 3 of the data library).

  2. Inspect the data. Because this data has been stored in CSV format all of our data types are stored as strings (a string is a data type for storing text, like a sentence or word). In order to use the numeric and date values in this data we need to convert the following columns to their respective data types by clicking on the small carrot next to the column name, then clicking on "Change data type..."
    So now we:

    • convert the damage column's data type to number.
    • convert the date column's data type to date. Having correct datatypes is important for doing spatial analysis. If PostGIS thinks our datatype are strings when they're in fact numbers or dates our analysis won't work!
  3. Now in the map view use the Visualization Wizard to show our data's damage value in different methods such as Bubble Map, Intensity, Density Map, etc.

  4. Take a look at the Filters panel, show how filters are translated into SQL by viewing the SQL Panel after applying a filter.

  5. Back in the "Map View" try adding labels to our map. Notice how the CartoCSS panel is updated. If we'd like, we can customize our map styles using CartoCSS. This gives us more fine tuning of our map's style than we can get with the Visualization Wizard.

Animating Geospatial Data with Torque

  1. Use the same tornado data from above.

  2. Try out the Torque option in the Visualization Wizard by selecting the date column as the temporal value to animate.

Counting Points in Polygons

Let's combine both the Counties and Tornadoes datasets into a new visualization. Order the layers so that the tornado data is on top of the county data. This is interesting but what if we wanted to style our counties by the number of tornados each has within its borders?

  • We can use PostGIS to count the number of tornadoes per county. Create a new column called tornadoes_by_county in the us_counties table and give it a numeric data type.

  • Then in the SQL Panel run the following query (this assumes your tornado data table is named tornadoes)

      UPDATE us_counties 
        SET tornadoes_by_county = 
        (
            SELECT count(1) 
            FROM tornadoes 
            WHERE st_contains(us_counties.the_geom,tornadoes.the_geom) 
        )
    
  • In the visualization wizard for the us_counties layer try changing the category to choropleth and using the column tornadoes_by_county to style the map.

  • That's it folks, hope you had fun! See the Resources section below for further learning.

Resources

Learning

Reference

Support

Code Examples

CartoDB SQL

See the sql folder in this repositiory. Inside there are two files; the demo-queries.sql file contains the queries we used in this tutorial. The other-useful-queries.sql file contains examples of other basic SQL commands you can do with CartoDB's SQL Panel. Note that in most cases in CartoDB we don't need to include a semicolon at the end of our SQL query.

I also created a PostgreSQL & PostGIS cheatsheet which you may fork or download.

CartoDB's Javascript Library

CartoDB.JS allows for the programatic use of CartoDB's APIs so that you may integrate them with web applications. This is a more advanced subject that requires a basic understanding of Javascript and front-end web development.

CartoDB.JS Examples:

More CartoDB Map Examples

Geospatial data sources

Following the time I first wrote this tutorial CartoDB has done a terrific job of including more and more public datasets in their Data Library, which I encourage you to take a look at. That being said here are a few external resources you may want to grab geospatial data from and use with CartoDB:

  1. Natural Earth Data: Global cultural and physical geographic data available in 3 varying scale levels (1:10m, 1:50m, 1:100m).

  2. Metro Extracts: OpenStreetMap extracts of urban areas converted to shapefile and other formats, updated weekly.

  3. OpenStreetMapData.com: OSM Land, Water, Coastline data, typically not included with the above metro extracts.

  4. Open Data NYC All sorts of goodies like 311 data, public school locations, and city administrative boundaries like community boards. Much of the data here is already geocoded but some is not (see the note at the bottom).

  5. US National Weather Service (NOAA)

  6. U.S. Census: Demographic data for the United States that can be joined to various types of census geographies.

General Note / Tip:
Any dataset that has a spatial attribute, such as street addresses, county names, state / province names, country names, zipcodes, IP addresses, etc., but that doesn't have a geometry data type, generally can be geocoded, georeferenced, or joined to existing geospatial data. Typically the preferred format to work with this type of data is tabular such as CSV (comma separated value) but CartoDB also allows for importing Microsoft Excel tables. Geocoding, Georeferencing, and joining data are subjects for other tutorials but I encourage you to explore them as they are key concepts when mapping and analyzing geospatial data.

If you do want to work with your own table data, make sure your data's first row contains column names and that these names don't contain numbers or special characters. CartoDB will do a good job sanitizing column names on its own (eg removing spaces) but it's better to be on the safe side and do this ahead of time.

Happy Mapping!