This notebook is data wrangling.  It kinda sucks, and maybe not be super useful in the future as plateIDs are becoming more standardised across models.  Essentially it refers to a look up table and uses that table to update plateIDs and rewrite the rotation file.  It uses Pandas alot.

This example converts some four digit plate IDs to three digit plate IDs to make Merdith et al. (2017) more compatible with models for the Palaeozoic, Mesozoic and Cenozoic.  It can be scaled up or down in scope depending on the purpose.

The other Plate ID notebook in this folder is a bit simpler (probably more dangerous) but could also be used.

References

Merdith, A.S., Collins, A.S., Williams, S.E., Pisarevsky, S., Foden, J.D., Archibald, D.B., Blades, M.L., Alessio, B.L., Armistead, S., Plavsa, D., Clark, C., and Müller, R.D. 2017. A full-plate global reconstruction of the Neoproterozoic. Gondwana Research, 50, pp.84-134.

In [1]:
import numpy as np
import pandas as pd

In [2]:
#setbasedir for loading reconstruction files
basedir = '/Users/Andrew/Documents/PhD/Scripts/Python_Scripts/pyGPlates_examples/General_plate_reconstruction/Sample_data/'

#we aren't using pygplates, instead we just want to read the rotation file as a text file
#we first need to divide the rot file into two columns
#the first column is the rotation (moving plate id | time | lat | lon | angle | fixed plate id)
#the second column is the metadata/description
#this is because we then need to later subdivide the rotation into its individual components
headings = ['rotation','description'] #set column headings
df_rot = pd.read_csv('%sMer17_1000-520Ma_rotations.rot' % basedir, 
                       error_bad_lines=False,
                       sep='!', 
                       header=None,
                       names=headings) #use '!' as the separator, do not load headings, but pass our custom ones through

#print out first five lines to check
df_rot[:5]

Unnamed: 0,rotation,description
0,1001 0.0 90.0 0.0 0.0 000,
1,1001 410.0 -25.23 -10.61 67.78 000,
2,1001 460.0 -30.68 42.38 101.23 000,
3,1001 515.0 35.51 -156.17 -122.94 000,
4,1001 520.0 35.01 -157.53 -123.49 000,


In [3]:
#now we have to wrangle it, alot

#create two dataframes from the columns.  We only really want the rotation to adjust, but we need the other to remerge
#at the end
df_rot1 = df_rot['rotation'] #rotation dataframe
df_rot1a = df_rot['description'] #metadata dataframe

#split rotation dataframe using whitespaces
df_rot2 = df_rot1.str.split('\s*') #the \s* means any number of white spaces

#unfortunately our data is no longer a dataframe but something else
#iterate through and copy it into a list
df_rot3 = []
for row in df_rot2:
    df_rot3.append(row)
    
#convert from list back to dataframe using new column headings
labels = ['moving plate', 'age', 'lat', 'lon', 'angle', 'fixed plate', 'break']
df_rot4 = pd.DataFrame.from_records(df_rot3, columns=labels)
#populate the 'break' column with ! so that when we save it back out to a .rot file we have the descriptions
df_rot4['break'] = '!'

#remerge the two dataframes back toa  single one using concatenate
dataframes = [df_rot4, df_rot1a]
df_final = pd.concat(dataframes, axis=1) #axis=1 refers to the y axis (i.e. join along columns)

#so we should have the original rotation file but broken into columns with headings
#this now gives us a general framework for any rotation file that we can do stuff to

#however this puts out everything as a string, and we need to make sure that some of our numbers are ints or floats
df_final = df_final.astype({'moving plate': int, 'age': float, 'lat' : float,
                            'lon': float, 'angle': float, 'fixed plate' : int,
                            'break' : str, 'description' : str})

#print out first five rows again
df_final[:5]

Unnamed: 0,moving plate,age,lat,lon,angle,fixed plate,break,description
0,1001,0.0,90.0,0.0,0.0,0,!,
1,1001,410.0,-25.23,-10.61,67.78,0,!,
2,1001,460.0,-30.68,42.38,101.23,0,!,
3,1001,515.0,35.51,-156.17,-122.94,0,!,
4,1001,520.0,35.01,-157.53,-123.49,0,!,


In [4]:
#we will also load in our plate ID look up table
df_ID_table = pd.read_csv('%sPlate_IDs_for_topology_conversion.csv' % basedir)
#check out first five entires
df_ID_table[:5]

Unnamed: 0,Plate,Old,New
0,Laurentia,1001,101
1,Greenland,1002,102
2,Rockall,1003,318
3,Ganderia,171,184
4,Carolinia,172,185


In [5]:
#great!
#now make a look up dictionary of the plate ids
#we want to search our rotation file for 'Old' plate IDs, and then replace them with 'New' ones
lookup_id = list(zip(df_ID_table.Old,df_ID_table.New))

In [6]:
#create a dummy copy of our dataframe to do the look up on
df_new = df_final
#now loop through and change :\
for index,rotation in df_new.iterrows():
    #print rotation[5]    
    moving_ID = rotation[0]
    fixed_ID = rotation[5]
    #we do this loop twice instead of just once so we can have a break statement, which makes it faster
    for i in lookup_id:
        if moving_ID == i[0]:
            moving_ID = i[1]
            break
    for i in lookup_id:
        if fixed_ID == i[0]:
            fixed_ID = i[1]
            break
    df_new.at[index, 'moving plate'] = moving_ID
    df_new.at[index, 'fixed plate'] = fixed_ID

#check
df_new[10:20]

Unnamed: 0,moving plate,age,lat,lon,angle,fixed plate,break,description
10,101,870.0,30.87,146.88,-189.99,0,!,
11,101,940.0,44.77,170.26,-205.37,0,!,
12,101,1050.0,43.54,179.56,-168.63,0,!,
13,101,1100.0,64.47,-167.3,-175.54,0,!,
14,102,0.0,90.0,0.0,0.0,101,!,
15,102,420.0,70.4,-94.1,-18.0,101,!,
16,102,1000.0,70.4,-94.1,-18.0,101,!,Greenland
17,318,0.0,75.32,159.61,-23.47,101,!,
18,318,1000.0,75.32,159.61,-23.47,101,!,
19,1005,0.0,90.0,0.0,0.0,101,!,
