# October 16

**Conflict**

![](https://media.giphy.com/media/3oz8xABuMZ7AsrcMc8/giphy.gif)

# Warm Up 
It is possible to have multiple owners of a GitHub repository. For our warm-up today, we're going to do an exercise. Get into pairs.

1. Partner 1 & 2: Navigate to the github page for your Lastname_project
2. Partner 1: Go to "Settings." Click "Collaborators." Enter your partner's GitHub name.
3. Partner 2: You might get an email notification saying that you've been added.
4. Partner 2: Clone your partner's repo. Copy the repository location by clicking on the big, green "Clone or download" button. Open a new terminal, and type:

```git clone ``` followed by the address you copied.
5. Partner 2: Now, pick a file. Change something in it. Doesn't matter what, but it has to be something were there is already text on that line.
6. Partner 2: Add, commit and push your code.
7. Partner 1: Pull in your partner's changes. Do you see conflict?
8. Partner 1: Open the conflict file in a text editor, and let me know once you have your conflict.

# Resolving Conflict

This is a little bit social, and a little bit technical. You made a change to something that was. And in a co-owned repo, no one needs to approve your changes. You just get to make them! 

So how can we resolve these issues? 

1. **Read** First, I normally look at the changes. What did this person do? Can I see why? 
2. **Ask** If I don't understand, I ask for help.
3. **Decide** Then I do the merge, and add and commit the code. 


# GBIF

We're at a bit of a crossroads in the course: a lot of the end part of the class is about making websites, and other forms of communicating results. But I think it would help if we did a little more fun programming stuff first.

We're going to get some occurrence data off of GBIF, the website for the Global Biodiversity Information Facility website. We're going to see if we can get some information on where these salamanders live.

First, we will get the names of the salamanders. I've put an empty cell below, and below it an answer. See if you can read in the `plethodon.phy` file with Dendropy, and get the names of our salamanders.


In [1]:
## Answer below
import dendropy

sal_dat = dendropy.DnaCharacterMatrix.get(path = "../data/plethodon.phy", schema="phylip")
sal_names = sal_dat.taxon_namespace.labels()


Next, we'll use a library called `pygbif`, which interfaces with GBIF to get locality information for our salamanders. We will search each salamander name against the GBIF database to get locations where that salamander is found. We are only going to do this for a couple salamanders, to keep the exercise tractable for all of us to do.

In [2]:
import pandas as pd
from pygbif import occurrences

list_of_dictionaries = []

for name in sal_names[1:5]:
    sal_dict = occurrences.search(scientificName = name)
    list_of_dictionaries.append(sal_dict)


There are 53 records for the salamaders in our sample - all our salamanders were found.  The data download as a dictionary, and for ease of processing, we will change these into dataframes. If you view one of the dictionaries, there's a lot of padding - we really only want the "results" entry.

In [3]:
import pandas as pd

dict_one = list_of_dictionaries[0]['results']
sal_df = pd.DataFrame.from_dict(dict_one)

for item in list_of_dictionaries[1:]:
    temp_df = pd.DataFrame.from_dict(item['results'])
    sal_df = sal_df.append(temp_df)

of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.


  sort=sort)


If you look at the dataframe, you'll see that there's a lot of information here, and lots of different ways that we could subset the data. For example, perhaps we only want records where a person physically saw the salamander: 

In [4]:
sal_slim = sal_df[sal_df.basisOfRecord == "HUMAN_OBSERVATION"]

In [5]:
sal_slim

Unnamed: 0,acceptedScientificName,acceptedTaxonKey,accessRights,associatedSequences,basisOfRecord,bibliographicCitation,catalogNumber,class,classKey,collectionCode,...,type,typeStatus,typifiedName,verbatimCoordinateSystem,verbatimElevation,verbatimEventDate,verbatimLocality,verbatimSRS,vernacularName,year
0,"Plethodon amplus Highton & Peabody, 2000",2431511,,,HUMAN_OBSERVATION,,15340283,Amphibia,131,Observations,...,,,,,,2018-08-11 9:42:04 PM EDT,"North Carolina, US",,,2018.0
1,"Plethodon amplus Highton & Peabody, 2000",2431511,,,HUMAN_OBSERVATION,,6711596,Amphibia,131,Observations,...,,,,,,2017/06/18 10:17 AM EDT,"North Carolina, US",,,2017.0
2,"Plethodon amplus Highton & Peabody, 2000",2431511,,,HUMAN_OBSERVATION,,4315418,Amphibia,131,Observations,...,,,,,,2016-10-08,"North Carolina, US",,,2016.0
3,"Plethodon amplus Highton & Peabody, 2000",2431511,,,HUMAN_OBSERVATION,,52-35599-1897,Amphibia,131,NC-NHP,...,,,,,,,,,,2014.0
4,"Plethodon amplus Highton & Peabody, 2000",2431511,,,HUMAN_OBSERVATION,,52-35598-1861,Amphibia,131,NC-NHP,...,,,,,,,,,,2014.0
5,"Plethodon amplus Highton & Peabody, 2000",2431511,,,HUMAN_OBSERVATION,,52-35596-1861,Amphibia,131,NC-NHP,...,,,,,,,,,,2014.0
6,"Plethodon amplus Highton & Peabody, 2000",2431511,,,HUMAN_OBSERVATION,,52-35598-1827,Amphibia,131,NC-NHP,...,,,,,,,,,,2014.0
7,"Plethodon amplus Highton & Peabody, 2000",2431511,,,HUMAN_OBSERVATION,,52-35595-1861,Amphibia,131,NC-NHP,...,,,,,,,,,,2012.0
8,"Plethodon amplus Highton & Peabody, 2000",2431511,,,HUMAN_OBSERVATION,,52-35601-1861,Amphibia,131,NC-NHP,...,,,,,,,,,,2009.0
9,"Plethodon amplus Highton & Peabody, 2000",2431511,,,HUMAN_OBSERVATION,,52-27456-1827,Amphibia,131,NC-NHP,...,,,,,,,,,,2008.0


This is still quite a bit of data! However, many databases don't process null records, or give an error if the record is null. So, let's remove the NaN values, and also save our work to a file. That way if the internet dies, or something, we still have our searches. 

In [6]:
import numpy as np

locs = sal_slim.dropna(subset=['verbatimLocality'])
locs.to_csv("../data/locs.csv")

In [7]:
locs

Unnamed: 0,acceptedScientificName,acceptedTaxonKey,accessRights,associatedSequences,basisOfRecord,bibliographicCitation,catalogNumber,class,classKey,collectionCode,...,type,typeStatus,typifiedName,verbatimCoordinateSystem,verbatimElevation,verbatimEventDate,verbatimLocality,verbatimSRS,vernacularName,year
0,"Plethodon amplus Highton & Peabody, 2000",2431511,,,HUMAN_OBSERVATION,,15340283,Amphibia,131,Observations,...,,,,,,2018-08-11 9:42:04 PM EDT,"North Carolina, US",,,2018.0
1,"Plethodon amplus Highton & Peabody, 2000",2431511,,,HUMAN_OBSERVATION,,6711596,Amphibia,131,Observations,...,,,,,,2017/06/18 10:17 AM EDT,"North Carolina, US",,,2017.0
2,"Plethodon amplus Highton & Peabody, 2000",2431511,,,HUMAN_OBSERVATION,,4315418,Amphibia,131,Observations,...,,,,,,2016-10-08,"North Carolina, US",,,2016.0
0,"Plethodon angusticlavius Grobman, 1944",2431498,,,HUMAN_OBSERVATION,,11412172,Amphibia,131,Observations,...,,,,,,2018/01/10 11:31 AM CST,"Arkansas, US",,,2018.0
1,"Plethodon angusticlavius Grobman, 1944",2431498,,,HUMAN_OBSERVATION,,11412325,Amphibia,131,Observations,...,,,,,,2018/01/11 2:28 PM CST,"Arkansas, US",,,2018.0
2,"Plethodon angusticlavius Grobman, 1944",2431498,,,HUMAN_OBSERVATION,,9703951,Amphibia,131,Observations,...,,,,,,2018/01/31 4:54 PM CST,"Arkansas, US",,,2018.0
3,"Plethodon angusticlavius Grobman, 1944",2431498,,,HUMAN_OBSERVATION,,11412327,Amphibia,131,Observations,...,,,,,,2018/01/11 2:58 PM CST,"Arkansas, US",,,2018.0
4,"Plethodon angusticlavius Grobman, 1944",2431498,,,HUMAN_OBSERVATION,,9703766,Amphibia,131,Observations,...,,,,,,2018/01/30 4:14 PM CST,"Arkansas, US",,,2018.0
5,"Plethodon angusticlavius Grobman, 1944",2431498,,,HUMAN_OBSERVATION,,11412174,Amphibia,131,Observations,...,,,,,,2018/01/10 11:39 AM CST,"Arkansas, US",,,2018.0
6,"Plethodon angusticlavius Grobman, 1944",2431498,,,HUMAN_OBSERVATION,,9703765,Amphibia,131,Observations,...,,,,,,2018/01/30 2:05 PM CST,"Arkansas, US",,,2018.0


What do you notice about these localities? 

We'll need to locate them to coordinates before we can plot them. To do this, we will use an open-source package call geopy, which takes in a string of a location and searches that string against a global map to get Lat and Long coordinates. We will use the values in the `locs` `verbatimLocality` column to do this. 

## Exercise Two: Talk the below code out with a partner. Decide what it does, and then run it to confirm.

In [9]:
from geopy.geocoders import Nominatim
import numpy as np

geolocator = Nominatim(user_agent="class")
locs['coords'] = 0
locations = []
for row in locs.verbatimLocality.iteritems():
    latlong = geolocator.geocode(row[1])
    if latlong is None:
        locations.append(0)
    else: 
        locations.append(latlong[1])


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """


We now have a list. Use the below cell to confirm that the list is the same length as the locations dataframe.

In [12]:
len(locations) == len(locs)

True

This is an important step called `testing`. What would it mean if the list was not the same length? Would we want to keep using it? 

Next we will append the list as a new column. 

In [13]:
se = pd.Series(locations)
locs['coords'] = se.values

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  


In [14]:
locs

Unnamed: 0,acceptedScientificName,acceptedTaxonKey,accessRights,associatedSequences,basisOfRecord,bibliographicCitation,catalogNumber,class,classKey,collectionCode,...,typeStatus,typifiedName,verbatimCoordinateSystem,verbatimElevation,verbatimEventDate,verbatimLocality,verbatimSRS,vernacularName,year,coords
0,"Plethodon amplus Highton & Peabody, 2000",2431511,,,HUMAN_OBSERVATION,,15340283,Amphibia,131,Observations,...,,,,,2018-08-11 9:42:04 PM EDT,"North Carolina, US",,,2018.0,"(35.6729639, -79.0392919)"
1,"Plethodon amplus Highton & Peabody, 2000",2431511,,,HUMAN_OBSERVATION,,6711596,Amphibia,131,Observations,...,,,,,2017/06/18 10:17 AM EDT,"North Carolina, US",,,2017.0,"(35.6729639, -79.0392919)"
2,"Plethodon amplus Highton & Peabody, 2000",2431511,,,HUMAN_OBSERVATION,,4315418,Amphibia,131,Observations,...,,,,,2016-10-08,"North Carolina, US",,,2016.0,"(35.6729639, -79.0392919)"
0,"Plethodon angusticlavius Grobman, 1944",2431498,,,HUMAN_OBSERVATION,,11412172,Amphibia,131,Observations,...,,,,,2018/01/10 11:31 AM CST,"Arkansas, US",,,2018.0,"(35.2048883, -92.4479108)"
1,"Plethodon angusticlavius Grobman, 1944",2431498,,,HUMAN_OBSERVATION,,11412325,Amphibia,131,Observations,...,,,,,2018/01/11 2:28 PM CST,"Arkansas, US",,,2018.0,"(35.2048883, -92.4479108)"
2,"Plethodon angusticlavius Grobman, 1944",2431498,,,HUMAN_OBSERVATION,,9703951,Amphibia,131,Observations,...,,,,,2018/01/31 4:54 PM CST,"Arkansas, US",,,2018.0,"(35.2048883, -92.4479108)"
3,"Plethodon angusticlavius Grobman, 1944",2431498,,,HUMAN_OBSERVATION,,11412327,Amphibia,131,Observations,...,,,,,2018/01/11 2:58 PM CST,"Arkansas, US",,,2018.0,"(35.2048883, -92.4479108)"
4,"Plethodon angusticlavius Grobman, 1944",2431498,,,HUMAN_OBSERVATION,,9703766,Amphibia,131,Observations,...,,,,,2018/01/30 4:14 PM CST,"Arkansas, US",,,2018.0,"(35.2048883, -92.4479108)"
5,"Plethodon angusticlavius Grobman, 1944",2431498,,,HUMAN_OBSERVATION,,11412174,Amphibia,131,Observations,...,,,,,2018/01/10 11:39 AM CST,"Arkansas, US",,,2018.0,"(35.2048883, -92.4479108)"
6,"Plethodon angusticlavius Grobman, 1944",2431498,,,HUMAN_OBSERVATION,,9703765,Amphibia,131,Observations,...,,,,,2018/01/30 2:05 PM CST,"Arkansas, US",,,2018.0,"(35.2048883, -92.4479108)"


We will now do three last activities: 
- Write this raw data into our data folder as a CSV file.
- Decide how we want to treat missing values (0-values), and apply this treatment.
- Write out the treated data as separate from the raw data. 

In [15]:
locs.to_csv("../data/locs.csv")

In [23]:
no_zeroes = locs[(locs.coords != 0)]

In [25]:
no_zeroes.to_csv("../data_output/dropped_zeroes.csv")