# Lab 5. Abstraction and reusability
#### Computational Methods for Geoscience - EPS 400/522
#### Instructor: Eric Lindsey

Due: Oct. 5, 2023

---------

Adrian Marziliano

In [1]:
# some useful imports and settings
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import glob
import os

# better looking figures on high-resolution screens
%config InlineBackend.figure_format = 'retina'

# reload modules if they have changed - necessary when you are editing your own module
%load_ext autoreload
%autoreload 2

### 1. Using glob to find files

The folder 'timeseries' (you will have to unzip it first) contains a set of GNSS timeseries from the UNR MAGNET site. Let's explore how 'glob' can interact with these files.

1. Use glob to get a list of all the files, and print out each filename.

2. The sites starting with a letter 'P' were installed under a single project called the 'Plate Boundary Observatory'. Suppose we wanted to list only those files - can you use 'glob' with wildcards to return only the list of names starting with P?

In [2]:
# Verify that the current working directory has changed
print("Current working directory:", os.getcwd())

Current working directory: /home/jovyan/CompMethods_EPS522/Labs/Lab 5


In [3]:
# Make sure directory is set to 'timeseries' folder
os.chdir('/home/jovyan/CompMethods_EPS522/Labs/Lab 5/timeseries/')

In [4]:
# Get the list of tenv3 files from the "timeseries" folder
tenv3_files = glob.glob('*.tenv3')
#print('All files: ',tenv3_files)

# Get a list of the tenv3 files
tenv3_Pfiles = glob.glob('P*.tenv3')
print(f'1. All files: {tenv3_files}\n2. Site files with letter P: {tenv3_Pfiles}')

1. All files: ['MC10.NA.tenv3', 'SC01.NA.tenv3', 'P034.NA.tenv3', 'P029.NA.tenv3', 'NMLG.NA.tenv3', 'RG01.NA.tenv3', 'P028.NA.tenv3', 'TC01.NA.tenv3', 'AZCN.NA.tenv3', 'CTI4.NA.tenv3']
2. Site files with letter P: ['P034.NA.tenv3', 'P029.NA.tenv3', 'P028.NA.tenv3']


### 2. Write a module to interact with the GNSS timeseries

The module should have (at a minimum) the following four functions with their definitions:

fit_timeseries(tlist,ylist) - accepts two lists: t (decimal year) and y (displacement timeseries)  as 1-D numpy arrays, and returns the least-squares velocity and uncertainty for that timeseries. If possible, try to re-use the line-fitting code you wrote for Lab 3 for this purpose.

fit_velocities(filename) - accepts a filename, reads in the data, and uses fit_timeseries() to estimate the E, N and U components of velocity for that site.

get_coordinates(filename) - accepts a filename and returns the average latitude, longitude, and elevation for that site over the time period.

fit_all_velocities(folder,pattern) - accepts a folder name and a 'glob' pattern and returns a pandas data frame with the site name, coordinates, velocities and uncertainties.

Finally, import your module and demonstrate each function below to show how it works and what it returns.

In [45]:
file_path ='../timeseries/AZCN.NA.tenv3'
tenv3_sample = pd.read_csv(file_path, delim_whitespace=True )
tenv3_sample

Unnamed: 0,site,YYMMMDD,yyyy.yyyy,__MJD,week,d,reflon,_e0(m),__east(m),____n0(m),...,_ant(m),sig_e(m),sig_n(m),sig_u(m),__corr_en,__corr_eu,__corr_nu,_latitude(deg),_longitude(deg),__height(m)
0,AZCN,99MAY10,1999.3539,51308,1009,1,-107.9,-977,-0.744686,4078731,...,0.0,0.000894,0.001071,0.003738,0.001207,0.067395,-0.150295,36.839793,-107.910961,1862.93747
1,AZCN,99MAY11,1999.3566,51309,1009,2,-107.9,-977,-0.741628,4078731,...,0.0,0.000838,0.001050,0.003623,0.021724,-0.031813,-0.130306,36.839793,-107.910961,1862.93534
2,AZCN,99MAY12,1999.3593,51310,1009,3,-107.9,-977,-0.742445,4078731,...,0.0,0.000868,0.001055,0.003642,-0.001512,-0.034516,-0.086356,36.839793,-107.910961,1862.93970
3,AZCN,99MAY13,1999.3621,51311,1009,4,-107.9,-977,-0.744588,4078731,...,0.0,0.001016,0.001200,0.004131,0.036789,0.025548,-0.159457,36.839793,-107.910961,1862.93930
4,AZCN,99MAY14,1999.3648,51312,1009,5,-107.9,-977,-0.746577,4078731,...,0.0,0.001342,0.001580,0.005565,-0.067832,-0.069363,-0.131309,36.839793,-107.910961,1862.93748
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4105,AZCN,11JUN15,2011.4524,55727,1640,3,-107.9,-977,-0.757917,4078731,...,0.0,0.000828,0.001068,0.003440,0.024802,-0.068325,-0.079274,36.839793,-107.910963,1862.93348
4106,AZCN,11AUG10,2011.6057,55783,1648,3,-107.9,-977,-0.761025,4078731,...,0.0,0.001333,0.001489,0.005270,0.178436,-0.087887,-0.207460,36.839793,-107.910963,1862.93815
4107,AZCN,11AUG11,2011.6085,55784,1648,4,-107.9,-977,-0.757568,4078731,...,0.0,0.002455,0.002242,0.006725,0.080286,-0.065957,-0.107477,36.839793,-107.910963,1862.92224
4108,AZCN,11AUG12,2011.6112,55785,1648,5,-107.9,-977,-0.761642,4078731,...,0.0,0.008280,0.003808,0.008938,0.032072,-0.074917,-0.182729,36.839793,-107.910963,1862.92939


In [51]:
def fit_timeseries(tlist, ylist):
    """
    Fit a linear model to a displacement timeseries.
    
    Args:
    tlist (numpy.ndarray): 1-D numpy array of decimal years.
    ylist (numpy.ndarray): 1-D numpy array of displacement timeseries.
    
    Returns:
    tuple: A tuple containing the least-squares velocity and its uncertainty.
    """
    A = np.vstack([tlist, np.ones(len(tlist))]).T
    m, c = np.linalg.lstsq(A, ylist, rcond=None)[0]
    
    # Calculate the uncertainty of the velocity
    residuals = ylist - (m * tlist + c)
    sigma = np.std(residuals)
    velocity_uncertainty = sigma / np.sqrt(len(tlist))
    
    return m, velocity_uncertainty


# Sample function usage with sample data:
tlist = tenv3_sample['yyyy.yyyy']
ylist = tenv3_sample['____up(m)']

m, velocity_uncertainty = fit_timeseries(tlist, ylist)

print(f'velocity: {m:0.6f} velocity uncertainty: {velocity_uncertainty:0.6f}')

velocity: -0.000861 velocity uncertainty: 0.000098


In [52]:
def fit_velocities(filename):
    """
    Estimate the E, N, and U components of velocity for a site.
    
    Args:
    filename (str): Name of the file containing timeseries data.
    
    Returns:
    dict: A dictionary containing site name, E, N, and U velocities with uncertainties.
    """
    data = np.loadtxt(filename, delimiter=',')
    tlist = data[:, 0]
    east_displacement = data[:, 1]
    north_displacement = data[:, 2]
    vertical_displacement = data[:, 3]
    
    east_velocity, east_uncertainty = fit_timeseries(tlist, east_displacement)
    north_velocity, north_uncertainty = fit_timeseries(tlist, north_displacement)
    vertical_velocity, vertical_uncertainty = fit_timeseries(tlist, vertical_displacement)
    
    return {
        'Site Name': filename,
        'E Velocity': east_velocity,
        'N Velocity': north_velocity,
        'U Velocity': vertical_velocity,
        'E Velocity Uncertainty': east_uncertainty,
        'N Velocity Uncertainty': north_uncertainty,
        'U Velocity Uncertainty': vertical_uncertainty
    }

# Sample function usage with sample data:
filename = tenv3_sample
east_velocity = tenv3_sample['__east(m)']
north_velocity = tenv3_sample['_north(m)']
vertical_velocity = tenv3_sample['____up(m)']

print(north_velocity)

0       0.845796
1       0.843869
2       0.845974
3       0.846115
4       0.845767
          ...   
4105    0.876575
4106    0.883958
4107    0.878386
4108    0.881139
4109    0.877338
Name: _north(m), Length: 4110, dtype: float64


In [53]:
def get_coordinates(filename):
    """
    Get the average latitude, longitude, and elevation for a site.
    
    Args:
    filename (str): Name of the file containing site information.
    
    Returns:
    tuple: A tuple containing latitude, longitude, and elevation.
    """
    data = np.loadtxt(filename, delimiter=',')
    latitudes = data[:, 0]
    longitudes = data[:, 1]
    elevations = data[:, 2]
    
    avg_latitude = np.mean(latitudes)
    avg_longitude = np.mean(longitudes)
    avg_elevation = np.mean(elevations)
    
    return avg_latitude, avg_longitude, avg_elevation

In [54]:
print(avg_latitude)

NameError: name 'avg_latitude' is not defined

In [None]:
import numpy as np
import pandas as pd
import glob

def fit_timeseries(tlist, ylist):
    """
    Fit a linear model to a displacement timeseries.
    
    Args:
    tlist (numpy.ndarray): 1-D numpy array of decimal years.
    ylist (numpy.ndarray): 1-D numpy array of displacement timeseries.
    
    Returns:
    tuple: A tuple containing the least-squares velocity and its uncertainty.
    """
    A = np.vstack([tlist, np.ones(len(tlist))]).T
    m, c = np.linalg.lstsq(A, ylist, rcond=None)[0]
    
    # Calculate the uncertainty of the velocity
    residuals = ylist - (m * tlist + c)
    sigma = np.std(residuals)
    velocity_uncertainty = sigma / np.sqrt(len(tlist))
    
    return m, velocity_uncertainty

def fit_velocities(filename):
    """
    Estimate the E, N, and U components of velocity for a site.
    
    Args:
    filename (str): Name of the file containing timeseries data.
    
    Returns:
    dict: A dictionary containing site name, E, N, and U velocities with uncertainties.
    """
    data = np.loadtxt(filename, delimiter=',')
    tlist = data[:, 0]
    east_displacement = data[:, 1]
    north_displacement = data[:, 2]
    vertical_displacement = data[:, 3]
    
    east_velocity, east_uncertainty = fit_timeseries(tlist, east_displacement)
    north_velocity, north_uncertainty = fit_timeseries(tlist, north_displacement)
    vertical_velocity, vertical_uncertainty = fit_timeseries(tlist, vertical_displacement)
    
    return {
        'Site Name': filename,
        'E Velocity': east_velocity,
        'N Velocity': north_velocity,
        'U Velocity': vertical_velocity,
        'E Velocity Uncertainty': east_uncertainty,
        'N Velocity Uncertainty': north_uncertainty,
        'U Velocity Uncertainty': vertical_uncertainty
    }

def get_coordinates(filename):
    """
    Get the average latitude, longitude, and elevation for a site.
    
    Args:
    filename (str): Name of the file containing site information.
    
    Returns:
    tuple: A tuple containing latitude, longitude, and elevation.
    """
    data = np.loadtxt(filename, delimiter=',')
    latitudes = data[:, 0]
    longitudes = data[:, 1]
    elevations = data[:, 2]
    
    avg_latitude = np.mean(latitudes)
    avg_longitude = np.mean(longitudes)
    avg_elevation = np.mean(elevations)
    
    return avg_latitude, avg_longitude, avg_elevation

def fit_all_velocities(folder, pattern):
    """
    Fit velocities and collect site information for all files matching the pattern in a folder.
    
    Args:
    folder (str): Name of the folder containing the data files.
    pattern (str): Glob pattern to filter files.
    
    Returns:
    pandas.DataFrame: A DataFrame containing site name, coordinates, velocities, and uncertainties.
    """
    file_list = glob.glob(f"{folder}/{pattern}")
    
    data_list = []
    for filename in file_list:
        site_info = get_coordinates(filename)
        velocities = fit_velocities(filename)
        site_info.update(velocities)
        data_list.append(site_info)
    
    df = pd.DataFrame(data_list)
    return df

if __name__ == "__main__":
    folder_path = 'data_folder'
    file_pattern = '*.csv'
    result_df = fit_all_velocities(folder_path, file_pattern)
    print(result_df)

### 3. Upload the module to GitHub, along with a README.md file explaining briefly how to use it.

Enter a link to your GitHub repository here for me to check out: 

GitHub: [AdrianMarzil](https://github.com/AdrianMarzil)

### 4. Use the timeseries calculation module you created

Using at most 5 lines of code, import the module you created above and use it to estimate the timeseries for all 10 of the sites, print them out, and save the results to a new file 'site_velocities.csv'. Feel free to download more sites as well and put them in the folder too!


### 5. Re-use your module to estimate sea level rise rates

Go to the following page and download at least 5 monthly sea level timeseries spanning at least 100 years: https://psmsl.org/products/gloss/glossmap.html. Place them in a new folder.

(To download the data: click a station icon on the map, then click the station number/name (first link in the pop-up, e.g. "155: Honolulu". Then right-click the link next to the plot of monthly data ("Download monthly mean sea level data.") and save it as a file.)

Now, create a new function "fit_tide_gauge" in your module that re-uses your function "fit_timeseries" to return the relative sea level rate of change for a given station. 

Next, modify your function "fit_all_velocities" to accept a "type" parameter (GNSS or tide), and re-use it to estimate the rates for all the tide gauges you downloaded. Print out the results below.

Finally, update your github repository with this new version of the module.
