# Plotting outgroup f3-statistics on a map

You can use the python code in this jupyter notebook to plot the f3-statistics you generated onto a world map. To execute the contents of each cell, just press Shift+Enter. Follow along with the instructions to create your map 

Make sure that you have saved this jupyter notebook in the same directory where the `outgroup_f3.out` file is saved, otherwise your code won't run correctly

### Start by loading all of the necessary python libraries that you'll need to make your plot

In [None]:
import warnings
	
import pandas as pd
import geopandas as gpd

import matplotlib.pyplot as plt

#The following code will turn off unnecessary warning messages
warnings.simplefilter(action='ignore', category=FutureWarning)

### Next, extract your results from your outgroup_f3.out results file

These are actually bash scripts, but you can execute bash commands from within a Jupyter Notebook if you start the line with an exclamation point (!)

The first line creates the header file for the new file you'll be creating and the second file extracts the relevant information from your `outgroup_f3.out` file

In [None]:
! echo "Source1,Source2,Target,f3-statistic,std-err,z-score,num-snps" > results_outgroup_f3.csv
! grep "result:" outgroup_f3.out | awk '{print $2 "," $3 "," $4 "," $5 "," $6 "," $7 "," $8}' >> results_outgroup_f3.csv

### Next, you'll save your results into a pandas dataframe - a special kind of table used in python

In [None]:
results = pd.read_csv("results_outgroup_f3.csv")

### Take a look at your pandas dataframe

In [None]:
results

### Add latitude and longtidue information to your dataframe

You'll need to provide your own table with coordinate information for each population you included in your analysis. The following is an example file that you can use to get started. Once you've created your own file, replace the example. 

In [None]:
coordinates = pd.read_csv("~/153784/data/reference_data/Practical6_example_lat_lon.csv") 

As you can see below, your coordinates file should have three columns, with the headers "pop", "lat" and "long". Be sure to save your file as a comma separated list (i.e. a csv), since that is what this script is expecting. 

To make this file, you’ll need to look up the geographic coordinates associated with each population included in your analysis. They can be found in the AADR “anno” file (https://docs.google.com/spreadsheets/d/1NJEPY-JPSjj3ERmM1SXkz7vYVafIaJ0gjpRQ-XLxAmk/edit?usp=sharing). 

*Note 1- Sometimes different individuals from the same population have different coordinates listed in the AADR anno file. If that’s the case, don’t worry, for the purpose of this assignment, you can just pick one set of coordinates to represent each population's location.*

*Note 2 - Some populations don't have any coordinates associated with them. Based on the available location information, you are encouraged to choose an approximate location for them so that they can be included in your map. But it is also okay to exclude these populations from your plot.*

In [None]:
coordinates

### Merge in the coordinate information to your results file

In [None]:
results_coords = results.merge(coordinates, left_on="Source2", right_on="pop", how="left")

Take a moment to visually check that your results file is still formatted correctly and that it contains coordinate information for each population

In [None]:
results_coords

### It's time to plot your results using the python library geopandas

First you'll need to reformat your results dataframe to add the information that geopandas requires. And you'll also need to load in some additional information that geopandas will use to make the world map

In [None]:
#Prepare your data to plot with geopandas
gdf = gpd.GeoDataFrame(results_coords, geometry=gpd.points_from_xy(results_coords.long,results_coords.lat))

# Load in map information - we're using 'naturalearth_lowres' data originally from https://www.naturalearthdata.com/downloads/110m-cultural-vectors/.
shapefile_path = "~/153784/data/reference_data/map_data/ne_110m_admin_0_countries/ne_110m_admin_0_countries.shp"  
world = gpd.read_file(shapefile_path)

#set the color scale - we're using viridis (see more options here: https://matplotlib.org/stable/users/explain/colors/colormaps.html)
cm = plt.colormaps.get_cmap('viridis')

The following code will create your map. Try adjusting the title, marker size and any other element that you'd like to change

In [None]:
#Create an empty map of the world
fig, ax = plt.subplots(1, figsize=(15,7))
ax.set_aspect('equal')
world.plot(ax=ax, color='white', edgecolor='grey')

#Add your data
gdf.plot(ax=ax, marker='o', column=gdf["f3-statistic"], cmap=cm, markersize=10, legend=True, label="f3-statistic")

#Add a title
ax.set_title("Your Title Goes Here", fontsize=15)

#Show your plot
plt.show()

#Save your plot
fig.savefig("f3-statistic-map.png")

Your map should now be saved in your practical_6 directory in a file called "f3-statistic-map.png"

### An extra challenge

If you'd like a challenge, try making a second panel for your figure that zooms in on the region of the world that your individual shares the most matches to. Be sure to adjust the color scale so that the range shown is relevant to the region you are displaying. 

Check out the geopandas documentation if you aren't sure where to start: https://geopandas.org/en/stable/docs/user_guide.html