<a href="https://colab.research.google.com/github/MiaVT/first_github/blob/master/1_MakeABasicMap_MrinaliniV.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Getting Started

First things first, if you haven't already done so: you need your own copy of this notebook. Once in Colab, go to "File" and  'save a copy in github' (give access if needed.... put it into the repository you made for this course).

Now you have your own copy of the notebook. Click 'open in colab' to get started working on the practical exercise. 

* **Reminders**:    
 * [GitHub](https://github.com/) is widely known as one of the most popular code hosting platforms for version control and collaboration. This is where these notebooks will be saved and accessed publicly.
 * [Jupyter notebooks](https://jupyter.org) are web applications that you will be using in the practical classes. They ease the creation and sharing of documents containing live code. You can explore a [gallery](https://github.com/jupyter/jupyter/wiki/A-gallery-of-interesting-Jupyter-Notebooks) of interesting Jupyter notebooks and read why Jupyter is data scientists’ computational notebook of choice ([i](https://www.nature.com/news/interactive-notebooks-sharing-the-code-1.16261) and [ii](https://www.nature.com/articles/d41586-018-07196-1)).
 * [Colab](https://colab.research.google.com) is a platform enabling work with large datasets and intensive computations, provided by Google. The platform is designed to test machine learning models, otherwise known as artificial intelligence (AI) systems with the ability to "learn" and improve from experience without being explicitly programmed, and can host Jupyter notebooks. Colab was chosen because it is stable platform for our practical classes. However, this is not the only way to interact with your notebooks. For example [binder](https://mybinder.org/)  is also a live environment you can use to run the code in your notebooks.  
 


---


# What's a notebook and how does it work?
The layout of these notebooks is simple. It is composed of two types of cells: 

* Text cells which are explanatory and that I will be using - a lot- to describe things to you.  These cells can be easily edited in a language called  [Markdown](https://jupyter-notebook.readthedocs.io/en/stable/examples/Notebook/Working%20With%20Markdown%20Cells.html).

* Code cells which we will use to do things. They  are executable code, mostly written in programming [language](https://jupyter-notebook.readthedocs.io/en/stable/examples/Notebook/Running%20Code.html#Code-cells-allow-you-to-enter-and-run-code). To make things happen in a code cell you can click on the cell and then type 'ctrl + enter' or push the play button that will appear on the left. These code cells are commonly using a language called [Python](https://www.python.org/) or [R](https://www.r-project.org/). In this course we'll mostly be using python. As you know, stereotypically archaeologists hate snakes. Python is the exception to this. 

You can add text or code cells to your version of the notebook (how? place the cursor at the bottom of a cell, and 2 tabs appear - code cell or text cell, choose one and you are ready to go!). 



---

 



# What we are doing
In this practical exercise, you will make some basic maps. You're making them interactively in this notebook. To get started, we will get the tools we need. These tools are open source libraries from Python that enable powerful data visualisation. This is in fact the first bit of python code you are using today! 

We will always start a practical exercise in a notebook  like this. In this lab, we are using Panda, Folium and Branca libraries.

In [1]:
#codecell_makeabasicmap_importyourlibraries
import folium
import branca
import pandas as pd
print(folium.__file__)
print(folium.__version__)

/usr/local/lib/python3.6/dist-packages/folium/__init__.py
0.8.3


# To make a map we need to get some data. 
This data must be explicitly spatial - that is have spatial coordinates that tell us about locations in a way that machines can understand. Spatial coordinates are pairs of numbers organised on a x-horizontal and y-vertical axis – this is the bit of geometry you will be using the most as an archaeologist! 

Coordinates (x,y) can also be called eastings & northings or latitude & longitudes depending on the type projection used to transform the map (refer to your First Course Meeting notes). The most common coordinate format for maps on the internet are [latitude and longitude coordinates](https://www.latlong.net/). 



---



![Xs&Ys](https://github.com/Francoz-Charlotte/Spatial_teaching_CFediting/blob/master/x&y.png?raw=1) 

**Example of a cylindrical projection of the world**



---




In this notebook, we'll experiment with the spatial data from:

 Palmisano, A., Bevan, A. and Shennan, S., 2018. Regional Demographic Trends and Settlement Patterns in Central Italy: Archaeological Sites and Radiocarbon Dates. **Journal of Open Archaeology Data**, 6(1), p.2. DOI: [http://doi.org/10.5334/joad.43](http://doi.org/10.5334/joad.43)
 
(Which you should have read about before class!)
 

---


These nice people provided the data behind their analysis so we can re-use it. I converted their old school shapefile to a CSV for you to make things easier. If you ever have to do this yourself, there are lots of online converters like [this one](https://mygeodata.cloud/converter/shp-to-csv). 

Later in the course, you will also learn to read data from shapefiles directly into your notebook. This is a little more complicated, so we are skipping it for now.

* **Reminders**: 
a shapefile is a vector data storage file format that can be used in spatial software such as [Q-GIS](https://www.qgis.org/en/site/) (refer to your First Course Meeting notes). 
A CSV file is simple tabular file format using commas or spaces to distinguish elements within it.


In [2]:
#codecell_makeabasicmap_GetUrdataReady

#Get the data by reading it into the notebook
palmisano_tuscany_sites = pd.read_csv("https://raw.githubusercontent.com/ropitz/spatialarchaeology/master/data/site_centriods_tuscany.txt")

palmisano_tuscany_sites.head()

Unnamed: 0,OBJECTID,Id,Toponyms,Type,Period,StartDate,EndDate,Longitude,Latitude,LocQual,SizeHa,SizeQual,Source,Source_id,ORIG_FID
0,1,1,Padiglione,settlement,Iron Age,-800,-700,12.639324,41.515316,A,0.2,E,"Attema et al. 2010, p. 229",15114,0
1,2,1,Padiglione,settlement,Republican Period,-100,-30,12.636817,41.516111,A,0.2,E,"Attema et al. 2010, p. 229",15114,1
2,3,1,Padiglione,settlement,Imperial Period,-30,400,12.636793,41.516111,A,0.2,E,"Attema et al. 2010, p. 229",15114,2
3,4,1,Padiglione,settlement,Late Antique/Early Medieval,400,500,12.636817,41.516111,A,0.2,E,"Attema et al. 2010, p. 229",15114,3
4,5,2,Astura,settlement,Iron Age,-700,-600,12.769313,41.41772,A,34.0,D,"Attema et al. 2010, p. 186; Picarreta 1977, p.21",11201,4


#Learning a new language – decomposing the code

In #codecell_makeabasicmap-GetUrdataReady, we have a simple piece of code which allows you to open the CSV file containing the archaeological sites information. The image below will help you to understand the code better and reuse it easily.

![makeabasicmap_GetUrdataReady](https://github.com/Francoz-Charlotte/Spatial_teaching_CFediting/blob/master/makeabasicmap_GetUrdataReady_2.jpg?raw=1)






# Maps should answer questions

We've discussed how to come up with a good, explicitly spatial question. We've also discussed how to design a good map. Here you're going to start putting all this into practice.

The data in Palmisano et al. (2018) article provides information about settlement and population patterns in central Italy and how they change over time. Where people were living and working at various times in the past is a basic archaeological question.

Let's say your question is about how many iron age sites are present in the region, and what their distribution is like in space - that is where they are located and how many relatively speaking are in each area. How would you go about answering this question with a map?

# Start by filtering your big dataset to get only the data you need

The original data file contains over 10970 entries for all archaeological settlement in Central Italy. So you need to filter or SUBSELECT your big data file to get just the iron age sites.


In [7]:
#codecell_makeabasicmap_SubSelectData

#tell the notebook you only want to see stuff where the period is the iron age.
palmisano_tuscany_sites_rep_per = palmisano_tuscany_sites[(palmisano_tuscany_sites['Period']=="Republican Period")]

palmisano_tuscany_sites_rep_per.head()

Unnamed: 0,OBJECTID,Id,Toponyms,Type,Period,StartDate,EndDate,Longitude,Latitude,LocQual,SizeHa,SizeQual,Source,Source_id,ORIG_FID
1,2,1,Padiglione,settlement,Republican Period,-100,-30,12.636817,41.516111,A,0.2,E,"Attema et al. 2010, p. 229",15114,1
7,8,2,Astura,settlement,Republican Period,-250,-30,12.769313,41.41772,A,34.0,D,"Attema et al. 2010, p. 186; Picarreta 1977, p.21",11201,7
13,14,3,Torre Astura,settlement,Republican Period,-350,-30,12.765279,41.409668,A,4.0,D,"Attema et al. 2010, p. 186",11202,13
19,20,5,Pineta di Torre Astura,settlement,Republican Period,-250,-30,12.752829,41.418009,A,0.5,F,"Attema et al. 2010, p. 187",11204,19
21,22,6,La Banca,settlement,Republican Period,-100,-30,12.749879,41.418408,A,0.5,F,"Attema et al. 2010, p. 187",11205,21


#Learning a new language – decomposing the code

In the #code_makeabasicmap_SubSelectData, we have a simple piece of code which allows you to open the CSV file containing the archaeological sites information. The image below is decomposing this codeline, so you can understand how python language works and re-use it easily.  

![](https://github.com/Francoz-Charlotte/Spatial_teaching_CFediting/blob/master/makeabasicmap_SubSelectData.jpg?raw=1)

# Put the data onto a map
Now we want to add this data to a simple map. 

You need to start by getting the coordinates of all the points so you can center the map - that is focus on the area where your data is. Probably it will be a good idea to put the middle of your map roughly where the middle of your dataset is located. 

Think about it... if your data is in Italy, putting Antarctica as the center of your map is not going to be very effective.

![Antarctica is not where the map should be](https://upload.wikimedia.org/wikipedia/commons/thumb/e/e0/Antarctica_6400px_from_Blue_Marble.jpg/310px-Antarctica_6400px_from_Blue_Marble.jpg)


Now, you want to add a marker for each Iron Age site. A marker, recall, is a graphic icon to represent the coordinates where a site is centred. 


In [4]:
#codecell_makeabasicmap_BringingUrData2theMap

#location is the mean of every lat and long point to centre the map.
location = palmisano_tuscany_sites_rep_per['Latitude'].mean(), palmisano_tuscany_sites_rep_per['Longitude'].mean()

#A basemap is then created using the location to centre on and the zoom level to start.
m = folium.Map(location=location,zoom_start=10)

#Each location in the DataFrame is then added as a marker to the basemap points are then added to the map
for i in range(0,len(palmisano_tuscany_sites_rep_per)):
    folium.Marker([palmisano_tuscany_sites_rep_per['Latitude'].iloc[i],palmisano_tuscany_sites_rep_per['Longitude'].iloc[i]]).add_to(m)
        
#To display the map simply ask for the object (called 'm') representation
m

#Learning a new language – decomposing the code

The #codecell_makeabasicmap_BringingUrData2theMap contains four codelines aimed to:

* centre your map by calculating the mean of the latitude coordinates and mean of longitude coordinates;

* zoom your map to a suitable level - e.g. world, country, region or street level;

* place a marker for each archaeological site location.

* show the results interactively.


Below are the three codelines of the script decomposed for you, so you can adapt it easily in the future.

![](https://github.com/Francoz-Charlotte/Spatial_teaching_CFediting/blob/master/makeabasicmap_BringingUrData2theMap_codeline1.jpg?raw=1)

**Reminder**: the mean or average of a set is the total sum of all numbers divided by the number of items in the set. So, for instance, the mean of latitude coordinates = sum of all latitude coordinates/ divided by the number of latitude coordinates in the dataframe.


---



![](https://github.com/Francoz-Charlotte/Spatial_teaching_CFediting/blob/master/makeabasicmap_BringingUrData2theMap_codeline2.jpg?raw=1)

This codeline shows the powerful capabilities of the Folium library you've imported at the start of this practical lab. You can find more detailed guidance about it [here](https://python-visualization.github.io/folium/).


---



![](https://github.com/Francoz-Charlotte/Spatial_teaching_CFediting/blob/master/makeabasicmap_BringingUrData2theMap_codeline3.jpg?raw=1)

This codeline introduces you to two important concepts in coding: the [range](https://pynative.com/python-range-function/) and the loop by using iloc command. 
Furthermore, by using range(), len() and iloc() you could implement a loop in Python for iterating rows and columns of your filtered pandas dataframe (palminsano_tuscany_sites_iron_age).

---




# Yay! 

You have a map. Now, a design question: what is a good starting zoom level?

Do you want your map zoomed further out or further in given the extent of your data? 

In the code cell above, play around with the '**zoom_start**' parameter and find a good zoom level that makes you happy.


---

# You just modified the code to suit your own needs

Rather than just executing code by pushing play, you've edited the code by changing a variable. 

Easy, right?

This is where many people get started with scripting in archaeology. You find a bit of [open source code](https://opensource.com/resources/what-open-source) that seems to do what you need, and then you modify it. There's no sense in writing code from scratch when there's lots of useful open source code available that is meant to be shared and modified.



# Representing information and relative values or quantities with graphics

Now let's look at how to represent larger and smaller sites. The archaeologists in this project have a qualitative ranking of site size from large (A) to small (F). 

You can see the categories they've used, and that there are more small sites than larger ones. 

Say you want to make the two categories of sites that are the largest different colours. Why do this? Maybe larger sites have more people, so this provides insights into demographics.



In [8]:
#codecell_makeabasicmap_ManipulatingyourData_ValueCounting

# Size surely matters... group the sites by size category and get the number of sites in each size category.
# SizeQual is the name of the 'attribute', or column in the table, that has information on site size. 
# An attribute is anything attached to spatial data other than the coordinates, such as period, type, size in hectare,etc. 

palmisano_tuscany_sites_rep_per['SizeQual'].value_counts()

D    947
B    738
E    559
C    289
F    224
Name: SizeQual, dtype: int64




---


**Learning a new language – decomposing the code**:

**.value_counts()** is a function that allows you to count numerical and categorical data (e.g. numbers, objects or labels such as attributes). The results are returned as a series of numbers in descending order, so, that you can further evaluate the frequency and distribution of these values.

---





In the step above, #codecell_makeabasicmap_SubSelectData, you've set up your new dataset by filtering the iron age settlements from the original dataset. It's what we would call a 'subset' of your original data.

Now you are ready to make a new map.

In [11]:
#codecell_makeabasicmap_ManipulatingyourData_UsingSymbology

#now make a map just like you did before. Note that this time we're adding a scale bar with 'control_scale'
location = palmisano_tuscany_sites_rep_per['Latitude'].mean(), palmisano_tuscany_sites_rep_per['Longitude'].mean()
m = folium.Map(location=location,zoom_start=10,control_scale = True)

#Assign different colours to the two large site categories - B and C in this case
for i in range(0,len(palmisano_tuscany_sites_rep_per)):


    site_size = palmisano_tuscany_sites_rep_per['SizeQual'].iloc[i]
    if site_size == 'C':
        color = 'blue'
    elif site_size == 'F':
        color = 'green'
    else:
        color = 'red'
    
# add the markers to the map, using the locations and colours    
    folium.Marker([palmisano_tuscany_sites_rep_per['Latitude'].iloc[i],palmisano_tuscany_sites_rep_per['Longitude'].iloc[i]],icon=folium.Icon(color=color)).add_to(m)

#type 'm' for map (the variable you set above) to tell the notebook to display your map
m

#Learning a new language – decomposing the code
for #codecell_makeabasicmap_ManipulatingyourData_UsingSymbology can be decomposed like this:

![](https://github.com/Francoz-Charlotte/Spatial_teaching_CFediting/blob/master/makeabasicmap_ManipulatingyourData_UsingSymbology_.jpg?raw=1) 





---


* Note: You can easily change icon style  and find more in this [list](https://fontawesome.com/icons?d=gallery). 


# Designing your map - Symbology

Now, go back into #codecell_makeabasicmap_ManipulatingyourData_UsingSymbology and experiment a bit. Try out some different colours. Do certain combinations work well? Try adding different colours for each of the smaller categories of sites. Does that make the map clearer or more confusing?

## What symbol scheme is best for your map?

Maybe it makes more sense to show size by changing the size of the icon than the colour. Let's make another map that varies the size of the icon for each site based on its size in hectares. 

In [12]:
#codecell_makeabasicmap_ManipulatingyourData_UsingSymbologyExperimenting

#now make a map just like you did before. 
location = palmisano_tuscany_sites_rep_per['Latitude'].mean(), palmisano_tuscany_sites_rep_per['Longitude'].mean()
m = folium.Map(location=location,zoom_start=8,control_scale = True)


# Set the size for each circle icon by defining the 'radius' which is the radius of the circle
# Here we are multiplying the size in hectares (the SizeHa attribute) by 15. Try different values here to get icons a size you like
for i in range(0,len(palmisano_tuscany_sites_rep_per)):
   folium.Circle(
      location=[palmisano_tuscany_sites_rep_per['Latitude'].iloc[i],palmisano_tuscany_sites_rep_per['Longitude'].iloc[i]],
      popup=palmisano_tuscany_sites_rep_per.iloc[i]['Toponyms'],
      radius=palmisano_tuscany_sites_rep_per.iloc[i]['SizeHa']*15, #this is where we set the value of 15 - change this variable to get differrent size icons
      color='crimson',
      fill=True,
      fill_color='crimson'
   ).add_to(m)
    
#type 'm' for map (the variable you set above) to tell the notebook to display your map
m

#Learning a new language – decomposing the code

for #codecell_makeabasicmap_ManipulatingyourData_UsingSymbologyExperimenting can be decomposed like this:

![](https://github.com/Francoz-Charlotte/Spatial_teaching_CFediting/blob/master/makeabasicmap_ManipulatingyourData_UsingSymbologyExperimenting.jpg?raw=1)

# thinking about spatial density or 'clustering' or conceptration

**Research question** Now what if you want to make a map that shows the concentration of sites in the iron age, that is areas with more and fewer sites? 


This kind of map is often called a 'heatmap'. Essentially it shows areas with more sites as 'hotter' (generally red in colour) and areas with fewer sites as cooler.In this kind of map surely larger sites should count more - we call this 'weighting'. The general idea is that a site that is twice as large should count for twice as much in our heatmap.

Weighting can influence the results of spatial analyses of our data. For example, consider the distribution of points in the image below. If we allow all the sites to count equally in and then calculate their mean coordinates (see the explanation above about calculating the mean) the center is 'unweighted' and will end up where the red circle in the top image is. On the other hand, if we set up weights so that the size of the site corresponds to its influence on our calculated mean coordinates, we end up with the center of our distribution where the red circle in the bottom image is. 


![](https://github.com/Francoz-Charlotte/Spatial_teaching_CFediting/blob/master/SpatialAnalysis_Weighted.jpg?raw=1)

Adapted from (https://pro.arcgis.com/en/pro-app/tool-reference/spatial-statistics/using-weights.htm)

*When might this be important? What kinds of archaeological questions involve finding the spatial centre of a distribution of sites or finds or features?*

In [0]:
#codecell_makeabasicmap_ManipulatingyourData_Heatmap

#first make a list of the coordinates of each site and its size in hectares, which we will use for the size-based weighting
data_heat = palmisano_tuscany_sites_rep_per[['Latitude','Longitude','SizeHa']].values.tolist()

In [15]:
#look at the first line of your list to check it seems to have worked
data_heat[0]

[41.51611051754, 12.63681665573, 0.2]

In [0]:
#to make the heatmap, we need to get an extra tool, so...
import folium.plugins as plugins



In [17]:
# now make a map a bit like you did before, set your zoom level and your centre location. Then use the plugin to make the heatmap. 

m = folium.Map(location=location, zoom_start=10)

#tiles='stamentoner'

plugins.HeatMap(data_heat).add_to(m)

#type 'm' for map (the variable you set above) to tell the notebook to display your map
m

#Learning a new language – decomposing the code

for #codecell_makeabasicmap_ManipulatingyourData_Heatmap can be decomposed like this:


![](https://github.com/Francoz-Charlotte/Spatial_teaching_CFediting/blob/master/makeabasicmap_ManipulatingyourData_Heatmap.jpg?raw=1)


---



For further documentation on folium plugins follow [link](https://python-visualization.github.io/folium/plugins.html).

# Success!

You should have a heatmap showing the concentrations of Iron Age sites in the region.

You could repeat the exercise with sites from any period in which you are interested.

Hopefully you've learned to make some basic maps and are starting to understand how to put into practice some of the theory of map design and spatial visualisation we've been discussing.

That's it for today... Remember to save your notebook (under a new name) so you can come back to it and practice making basic static maps. 



#LexiCode
In this practical you have learned new commands that you can now reuse for your own datasets:
*	==  , () []
*	.head_csv()
*	.read_csv()
*	mean()
*	folium.Map
*	range()
*	len()
*	iloc[]
*	.value_counts()
*	if =:
*	elif =:
*	else =:
*	folium.Marker()
*	folium.Icon()
*	folium.Circle
*	popup=
*	radius=
*	.values.tolist()
*	plugins.HeatMap()
