## The notebook for plotting values and locations of interest on an interective map

The queries that can be answered using an adapted version of the following code include but are not limited to:
* Plot as dots on a spatial map all *'Site'* locations which have *'Provenance'* == **Africa** and are dated between **150-200**;
* Show the summed *'Frequency'* for this period as the size/colour of the dot;
* Determine for each *'Year'* the *'Sites'* on which there is evidence of an *'Amphora type'* with a certain *'Provenance'*;
* Scale dot size/colour with the count of amphora types at a *'Site'*.

### 1. Import packages
**Note**: Rememeber to always import functions from `functions.py` file

If the packages from the `requirements.txt` are **absent**, they can be installed via `!pip` command

One needs to do it only once. Example of installation is given below. The lines are commented with `#` symbol 

Run the cell where the packages are imported'. If an error of type `'no module named... is found'` occurs:

1.  delete `#` before the corresponding package name; 
2.  run the cell.

In [None]:
# !pip install pandas
# !pip install seaborn
# !pip install matplotlib
# !pip install numpy
# !pip install regex
# !pip install geopandas
# !pip install kaleido
# !pip install plotly
# !pip install pyproj

In [1]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
from matplotlib import cm
import regex as re
import geopandas
import kaleido
import plotly
import plotly.express as px
import plotly.io as pio
import pyproj
sns.set()
import sys
sys.path.append("../src")
from functions import preprocess, freq_per_year, propor_to_map_range   # module with all functions used for the task

### 2. Load data into pandas dataframe
With `usecols = []`  one specifies which columns from a csv file to load (optional)

In [2]:
df = pd.read_csv('SonataDataNewNew.csv', usecols = [
                                             'Amphora_type', 
                                             'Amphora_type_upper_date', 
                                             'Amphora_type_lower_date', 
                                             'Site', 
                                             'Provenance', 
                                             'Frequency', 
                                             'Longitude', 
                                             'Latitude'
                                             ])

### 3. Prepare data

#### 3.1 Check in which columns numeric values are of an object type
#### 3.2 If found, convert objects into numeric values (float) 
This is essential for performing math operations with these variables

In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1707 entries, 0 to 1706
Data columns (total 8 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   Site                     1707 non-null   object 
 1   Amphora_type             1707 non-null   object 
 2   Provenance               1703 non-null   object 
 3   Frequency                1705 non-null   float64
 4   Amphora_type_lower_date  1701 non-null   object 
 5   Amphora_type_upper_date  1696 non-null   object 
 6   Latitude                 1707 non-null   float64
 7   Longitude                1707 non-null   object 
dtypes: float64(2), object(6)
memory usage: 106.8+ KB


In [3]:
# Invalid parsing will be set as NaN
df['Amphora_type_upper_date'] = pd.to_numeric(df['Amphora_type_upper_date'], errors='coerce') 
df['Amphora_type_lower_date'] = pd.to_numeric(df['Amphora_type_lower_date'], errors='coerce') 
df['Longitude'] = pd.to_numeric(df['Longitude'], errors='coerce')

#### 3.3 Clean text data (from punctuation, double spaces) and lowercase
This is done in order to avoid inconsistency in object names, etc. Thus, to avoid errors while counting

In [4]:
df['Site'] = preprocess(df['Site'])
df['Provenance'] = preprocess(df['Provenance'])
df.head()

Unnamed: 0,Site,Amphora_type,Provenance,Frequency,Amphora_type_lower_date,Amphora_type_upper_date,Latitude,Longitude
0,acqui terme,Dr 6A,adriatic italy,3.0,-25.0,50.0,44.675532,8.470658
1,acqui terme,Dr 6B,adriatic italy,7.0,1.0,150.0,44.675532,8.470658
2,acqui terme,Dr 7-13,baetica,26.0,-30.0,150.0,44.675532,8.470658
3,acqui terme,Haltern 70,baetica,1.0,-80.0,192.0,44.675532,8.470658
4,acqui terme,Dr 2-4_5,tyrrhenian italy,4.0,-70.0,225.0,44.675532,8.470658


### 4. Create the dataframe which inlcudes *'Sites'* only for a certain *'Provenance'*

In [5]:
Africa = df[df['Provenance'] == 'africa']
Africa.head()

Unnamed: 0,Site,Amphora_type,Provenance,Frequency,Amphora_type_lower_date,Amphora_type_upper_date,Latitude,Longitude
21,alba pompeia,AfricanaI,africa,12.0,150.0,400.0,44.700835,8.035244
68,altinum,Africana_3,africa,2.0,200.0,400.0,45.956108,18.683919
69,altinum,spatheia,africa,2.0,375.0,700.0,45.956108,18.683919
75,"aquae statiellae, corso cavour, albergo bue rosso",AfricanaIIIB,africa,1.0,275.0,400.0,44.675532,8.470658
99,aquileia,Dr 2-4_1,africa,2.0,-25.0,100.0,45.768945,13.367774


### 5. Calculate *'Frequency'* per *'Year'* per *'Amphora type'*. Add the resulting values to the dataframe

In [9]:
Africa = freq_per_year(data = Africa,
                       lower_date = 'Amphora_type_lower_date',
                       upper_date = 'Amphora_type_upper_date',
                       freq = 'Frequency')

Africa.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data[freq_per_year] = data[freq] / (data[upper_date] - data[lower_date])


Unnamed: 0,Site,Amphora_type,Provenance,Frequency,Amphora_type_lower_date,Amphora_type_upper_date,Latitude,Longitude,Freq_per_year
21,alba pompeia,AfricanaI,africa,12.0,150.0,400.0,44.700835,8.035244,0.048
68,altinum,Africana_3,africa,2.0,200.0,400.0,45.956108,18.683919,0.01
69,altinum,spatheia,africa,2.0,375.0,700.0,45.956108,18.683919,0.006154
75,aquae statiellae corso cavour albergo bue rosso,AfricanaIIIB,africa,1.0,275.0,400.0,44.675532,8.470658,0.008
99,aquileia,Dr 2-4_1,africa,2.0,-25.0,100.0,45.768945,13.367774,0.016


### 6. Calculate the proportion of '*Frequency*' to a given map period and add the results to the dataframe

In [10]:
Africa = propor_to_map_range(data = Africa, 
                             map_lower_date = 150, 
                             map_upper_date = 200,
                             object_lower_date = 'Amphora_type_lower_date',          
                             object_upper_date = 'Amphora_type_upper_date',
                             freq_per_year = 'Freq_per_year')

Africa.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data[proportion] = 0
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data[proportion].iloc[row] = data[freq_per_year].iloc[row] * date_range


Unnamed: 0,Site,Amphora_type,Provenance,Frequency,Amphora_type_lower_date,Amphora_type_upper_date,Latitude,Longitude,Freq_per_year,Proportion
21,alba pompeia,AfricanaI,africa,12.0,150.0,400.0,44.700835,8.035244,0.048,2.4
68,altinum,Africana_3,africa,2.0,200.0,400.0,45.956108,18.683919,0.01,0.0
69,altinum,spatheia,africa,2.0,375.0,700.0,45.956108,18.683919,0.006154,0.0
75,aquae statiellae corso cavour albergo bue rosso,AfricanaIIIB,africa,1.0,275.0,400.0,44.675532,8.470658,0.008,0.0
99,aquileia,Dr 2-4_1,africa,2.0,-25.0,100.0,45.768945,13.367774,0.016,0.0


### 7. Create the datarfame containing only *'Sites'* with *'Amphoras'* which fall within the map range (if proportion == 0, do not include)

In [11]:
Africa = Africa[Africa['Proportion'] > 0]

Africa.head()

Unnamed: 0,Site,Amphora_type,Provenance,Frequency,Amphora_type_lower_date,Amphora_type_upper_date,Latitude,Longitude,Freq_per_year,Proportion
21,alba pompeia,AfricanaI,africa,12.0,150.0,400.0,44.700835,8.035244,0.048,2.4
106,aquileia,AfricanaI,africa,11.0,150.0,400.0,45.768945,13.367774,0.044,2.2
107,aquileia,AfricanaII,africa,2.0,150.0,400.0,45.768945,13.367774,0.008,0.4
108,aquileia,AfricanaIIA,africa,8.0,150.0,300.0,45.768945,13.367774,0.053333,2.666667
109,aquileia,AfricanaIIB,africa,1.0,150.0,300.0,45.768945,13.367774,0.006667,0.333333


### 8. Calculate summed amphora *'Frequency'* per *'Site'*
To that end, one needs to specify the variables on the basis of which the data will be grouped 

In the cell below, *'Proportion'*  values are grouped by *'Site'*, *'Latitude'* and *'Longitude'*

Then the grouped *'Proportion'* values are summed

In [12]:
summed_frequency = Africa.groupby(['Site', 'Latitude', 'Longitude'])['Proportion'].sum()        
summed_frequency = summed_frequency.reset_index()
summed_frequency = summed_frequency.rename(columns = {'Proportion':'Summed_freq'})
summed_frequency.head()

Unnamed: 0,Site,Latitude,Longitude,Summed_freq
0,alba pompeia,44.700835,8.035244,2.4
1,aquileia,45.768945,13.367774,6.575
2,basilica hilariana,41.88582,12.496624,8.266667
3,boccone del povero,41.891775,12.486137,1.794872
4,caerepyrgi,42.01518,11.963719,0.25641


### 9. Count the number of unique *'Amphora types'* per *'Site'*
In the cell below, *'Amphora types'* are grouped by *'Site'* and then the unique number of *'Sites'* is calculated


In [13]:
amphora_type_count = Africa.groupby('Site')['Amphora_type'].nunique()
amphora_type_count = amphora_type_count.reset_index()
amphora_type_count = amphora_type_count.rename(columns={'Amphora_type': 'Amphora_type_count'})
amphora_type_count.head()

Unnamed: 0,Site,Amphora_type_count
0,alba pompeia,1
1,aquileia,7
2,basilica hilariana,5
3,boccone del povero,1
4,caerepyrgi,1


### 10. Make a dataframe containing the data required for plotting, namely: 
 - *'Sites'* 
 - *'Sites'* coordinates
 - Summed frequencies
 - Unique amphora type count values

In [14]:
Africa_new = pd.merge(summed_frequency, amphora_type_count, on = 'Site')
Africa_new.head()

Unnamed: 0,Site,Latitude,Longitude,Summed_freq,Amphora_type_count
0,alba pompeia,44.700835,8.035244,2.4,1
1,aquileia,45.768945,13.367774,6.575,7
2,basilica hilariana,41.88582,12.496624,8.266667,5
3,boccone del povero,41.891775,12.486137,1.794872,1
4,caerepyrgi,42.01518,11.963719,0.25641,1


### 11. Create the GeoDataFrame with geometry column (based on longitude and latitude coordinates) 

In [15]:
Africa_map = geopandas.GeoDataFrame(
    Africa_new, geometry = geopandas.points_from_xy(Africa_new.Longitude, Africa_new.Latitude))

Africa_map.head()

Unnamed: 0,Site,Latitude,Longitude,Summed_freq,Amphora_type_count,geometry
0,alba pompeia,44.700835,8.035244,2.4,1,POINT (8.03524 44.70083)
1,aquileia,45.768945,13.367774,6.575,7,POINT (13.36777 45.76895)
2,basilica hilariana,41.88582,12.496624,8.266667,5,POINT (12.49662 41.88582)
3,boccone del povero,41.891775,12.486137,1.794872,1,POINT (12.48614 41.89178)
4,caerepyrgi,42.01518,11.963719,0.25641,1,POINT (11.96372 42.01518)


### 12. Plot maps
**Note:** Remember to change the title of the plot

`color_continuous_scale = ['#1ed14b',  '#d63638']` <- for green to red (basically just find css for wanted colors)

#### 12.1 **All *'Site'* locations which have *'Provenance'* == Africa and are dated between 150-200** 
#### The **dot size** is scaled by the summed *'Frequency'* per group of unique *'Sites'*

In [14]:
fig = px.scatter_geo(Africa_map,
                     lat = Africa_map.geometry.y,          # param for latitude coordinates
                     lon = Africa_map.geometry.x,          # param for longitude
                     height = 1200, 
                     #text = Africa_map['Site'],           # param to add labels to dots, better do not use
                     size = Africa_map.Summed_freq,        # param for dot size scaling
                     scope = 'europe',                     # param for resrticting a map to a specific continent
                     projection = 'mercator')              # param for geographic projection
                      

    
# centre map by lat and long of a country
# set up a 'projection_scale' to zoom into the country 

fig.update_geos(projection_scale = 6, center_lat = 41.8719, center_lon = 12.5674)

# title of the map, its potion (title_x) and font size can be set up
fig.update_layout(title_text = 'Summed Frequency per Site for Africa in Date Range 150–200', 
                  title_x = 0.5, 
                  title_font_size = 20)
   

fig.write_image('fig1.pdf') # to save plot (any format, .pdf, .png, etc )   
fig.show()

#### The **dot colour** is scaled by the sum frequency per group of unique *'Sites'*

In [15]:
fig = px.scatter_geo(Africa_map,
                     lat = Africa_map.geometry.y,
                     lon = Africa_map.geometry.x,
                     height = 1200,
                     color = Africa_map.Summed_freq,
                   #  text = Africa_map['Site'],                        # param to add labels to dots  
                     scope = 'europe',  
                     color_continuous_scale = ['#1ed14b',  '#d63638'],  # param for colourbar palette
                     projection = 'mercator')         


fig.update_geos(projection_scale = 6, center_lat = 41.8719, center_lon = 12.5674)


fig.update_layout(title_text = 'Summed Frequency per Site for Africa in Date Range 150–200', 
                  title_x = 0.43, 
                  title_font_size = 20, 
                  coloraxis_colorbar = dict(len = 0.80, y = 0.60, xanchor = 'center', xpad = 192, title = ' '))

fig.write_image('fig2.pdf')
fig.show()


#### 12.2 **Show *'Sites'* at which there is evidence of *'Amphora types'* in a given *'Provenance'***


In [16]:
fig = px.scatter_geo(Africa_map,
                     lat = Africa_map.geometry.y,
                     lon = Africa_map.geometry.x,
                     height = 1200, 
                     #text = Africa_map['Site'],           
                     scope = 'europe',                    
                     projection = 'mercator')      
         
        
fig.update_geos(projection_scale = 6, center_lat = 41.8719, center_lon = 12.5674)

fig.update_layout(title_text='Sites for Africa in Date Range 150–200', 
                  title_x = 0.5, 
                  title_font_size = 20)    

fig.write('fig3.pdf')
fig.show()


#### 12.3 **Scale the dot size/colour by (unique) *'Amphora types'* per *'Site'***

In [18]:
fig = px.scatter_geo(Africa_map,
                     lat = Africa_map.geometry.y,
                     lon = Africa_map.geometry.x,
                     height = 1200, 
                     size = Africa_map.Amphora_type_count,      
                   #  text = Africa_map['Site'],              
                     scope = 'europe',                           
                     projection = 'mercator')
                                       

fig.update_geos(projection_scale = 6, center_lat = 41.8719, center_lon = 12.5674)

fig.update_layout(title_text='Amphora Type Count per Site for Africa in Date Range 150–200', 
                  title_x = 0.5,
                  title_font_size = 20)    

fig.write('fig4.pdf')
fig.show()

In [19]:
fig = px.scatter_geo(Africa_map,
                     lat = Africa_map.geometry.y,
                     lon = Africa_map.geometry.x,
                     height = 1200,
                     color = Africa_map.Amphora_type_count,
                    # text = Africa_map['Site'],               
                     scope = 'europe',  
                     color_continuous_scale = ['#1ed14b',  '#d63638'],  
                     projection = 'mercator')         


fig.update_geos(projection_scale = 6, center_lat = 41.8719, center_lon = 12.5674)

fig.update_layout(title_text = 'Amphora Type Count per Site for Africa in Date Range 150–200', 
                  title_x = 0.5, 
                  title_font_size = 20, 
                  coloraxis_colorbar=dict(len = 0.80, y = 0.60, xanchor = 'center', xpad = 192, title = ' '))

fig.write('fig5.pdf')
fig.show()