## CLAUDE PYRATE BACKEND WORKFLOW
Certainly! I'll combine all the information from the previous responses into a comprehensive overview, organized into the requested sections.

1. Priors:

Priors represent our initial beliefs about the parameters before observing the data. In PyRate, priors are used for both preservation and birth-death models.

For both Preservation and Birth-Death models:

```python
def get_hyper_priorBD(timesL, timesM, L, M, T, hyperP):
    # Calculate prior probability for birth-death model parameters
    
    # Sum the log probabilities of speciation rates (L) under a gamma distribution
    priorBD = sum(prior_gamma(L, hyperP[0], hyperP[1]))
    
    # Add the sum of log probabilities of extinction rates (M) under the same gamma distribution
    priorBD += sum(prior_gamma(M, hyperP[0], hyperP[1]))
    
    # Add a uniform prior on rate shift times (negative log of T for each shift)
    priorBD += -log(T) * (len(L)-1 + len(M)-1)
    
    # Return the total log prior probability
    return priorBD

def prior_gamma(L, a, b):
    # Calculate the log probability density of L under a gamma distribution
    # a is the shape parameter, b is the rate parameter (1/scale)
    return scipy.stats.gamma.logpdf(L, a, scale=1./b)
```

Important notes about priors:
- Priors are set before seeing the data and remain fixed during the analysis.
- They do not change based on the likelihood calculation.
- The prior stays as the user-inputted prior (or the programmed default prior) during the entire MCMC sampling process.

2. Likelihood:

The likelihood represents the probability of observing the data given the model parameters. In PyRate, there are separate likelihood calculations for preservation models and birth-death models.

For Preservation Models:

```python
def preservation_likelihood(fossil_times, q, model='HPP'):
    if model == 'HPP':
        # Homogeneous Poisson Process
        return HPP_likelihood(fossil_times, q)
    elif model == 'NHPP':
        # Non-homogeneous Poisson Process
        return NHPP_likelihood(fossil_times, q)
    elif model == 'TPP':
        # Time-variable Poisson Process
        return TPP_likelihood(fossil_times, q)

def HPP_likelihood(fossil_times, q):
    # Calculate likelihood for Homogeneous Poisson Process
    return len(fossil_times) * log(q) - q * (max(fossil_times) - min(fossil_times))

def NHPP_likelihood(fossil_times, q):
    # Calculate likelihood for Non-homogeneous Poisson Process
    # This is a simplified version; actual implementation is more complex
    return sum([log(q(t)) for t in fossil_times]) - integrate(q, min(fossil_times), max(fossil_times))

def TPP_likelihood(fossil_times, q):
    # Calculate likelihood for Time-variable Poisson Process
    # This is a simplified version; actual implementation is more complex
    return sum([log(q[time_bin(t)]) for t in fossil_times]) - sum([q[i] * bin_width for i, bin_width in enumerate(time_bins)])

def NHPP_lik(arg):
    # Unpack the arguments
    [m, M, shapeGamma, q_rate, i, cov_par, ex_rate] = arg
    
    # Get fossil occurrences for species i
    x = fossil[i]
    
    # Initialize log-likelihood
    lik = 0
    
    # Count number of fossil occurrences for this species
    k = len(x[x>0])
    
    # ... (calculation details omitted for brevity)
    
    # Calculate and return the log-likelihood
    return -q*(M-m) + sum(logPERT4_density(M,m,a,b,x)+log(q)) - log(1-exp(-q*(M-m)))
```

For Birth-Death Models:

```python
def birth_death_likelihood(speciation_times, extinction_times, L, M):
    # Calculate likelihood for Birth-Death process
    lik = sum([log(L(t)) for t in speciation_times])
    lik += sum([log(M(t)) for t in extinction_times])
    lik -= integrate(L+M, min(speciation_times), max(speciation_times))
    return lik

def birth_death_likelihood(speciation_rate, extinction_rate, start_time, end_time):
    # Calculate the probability of survival to the present
    survival_prob = calculate_survival_probability(speciation_rate, extinction_rate, start_time, end_time)
    
    # Calculate the likelihood of the speciation events
    speciation_likelihood = 0
    for t in speciation_times:
        speciation_likelihood += log(speciation_rate(t))
    
    # Calculate the likelihood of the extinction events
    extinction_likelihood = 0
    for t in extinction_times:
        extinction_likelihood += log(extinction_rate(t))
    
    # Combine all components
    log_likelihood = speciation_likelihood + extinction_likelihood + log(survival_prob)
    
    return log_likelihood
```

3. Poisson Process:

The Poisson process is primarily used in the preservation models to describe the fossil occurrence process. The likelihood functions for HPP, NHPP, and TPP are different implementations of Poisson processes.

```python
def NHPP_likelihood(fossil_times, preservation_rate, start_time, end_time):
    # Calculate the cumulative preservation rate
    cumulative_rate = integrate_preservation_rate(preservation_rate, start_time, end_time)
    
    # Calculate the log-likelihood
    log_likelihood = 0
    for t in fossil_times:
        # Add log of preservation rate at each fossil occurrence time
        log_likelihood += log(preservation_rate(t))
    
    # Subtract the cumulative preservation rate
    log_likelihood -= cumulative_rate
    
    return log_likelihood
```

4. Markov Process:

The birth-death process, which models speciation and extinction, is a continuous-time Markov process. The `birth_death_likelihood` function represents this process.

5. Posterior:

The posterior combines the prior and likelihood, representing our updated beliefs about the parameters after observing the data.

```python
def calculate_posterior(fossil_data, preservation_params, birth_death_params): # Take fossil data and parameter values
    # Calculate preservation likelihood
    pres_lik = preservation_likelihood(fossil_data, preservation_params, model=chosen_preservation_model)
    
    # Calculate birth-death likelihood
    bd_lik = birth_death_likelihood(speciation_times, extinction_times, birth_death_params['L'], birth_death_params['M'])
    
    # Calculate priors
    pres_prior = prior_preservation(preservation_params)
    bd_prior = get_hyper_priorBD(timesL, timesM, birth_death_params['L'], birth_death_params['M'], T, hyperP)
    
    # Calculate posterior
    posterior = pres_lik + bd_lik + pres_prior + bd_prior
    
    return posterior

def calculate_posterior(fossil_data, speciation_extinction_params):
    # Calculate likelihood using Poisson process for fossils
    fossil_likelihood = NHPP_likelihood(fossil_data, preservation_rate)
    
    # Calculate likelihood using birth-death (Markov) process
    birth_death_likelihood = birth_death_likelihood(speciation_rate, extinction_rate)
    
    # Combine likelihoods
    total_likelihood = fossil_likelihood + birth_death_likelihood
    
    # Calculate prior probabilities
    prior = calculate_prior(speciation_extinction_params)
    
    # Posterior is proportional to likelihood times prior
    log_posterior = total_likelihood + prior
    
    return log_posterior
```

6. MCMC Sampling:

MCMC is used to sample from the posterior distribution of the parameters.

```python
def MCMC(fossil_data, initial_params, n_iterations):
    current_params = initial_params
    current_posterior = calculate_posterior(fossil_data, current_params)
    
    for i in range(n_iterations):
        # Propose new parameters
        proposed_params = propose_new_params(current_params)
        
        # Calculate posterior for proposed parameters
        proposed_posterior = calculate_posterior(fossil_data, proposed_params)
        
        # Accept or reject proposal
        if log(random()) < proposed_posterior - current_posterior:
            current_params = proposed_params
            current_posterior = proposed_posterior
        
        # Store samples (after burn-in)
        if i > burn_in:
            store_sample(current_params)
    
    return samples

def MCMC(arg):
    # ... (initialization code omitted)
    
    for iteration in range(n_iterations):
        # Propose new parameter values
        proposed_params = propose_new_params(current_params)
        
        # Calculate likelihood and prior for proposed parameters
        proposed_likelihood = calculate_likelihood(proposed_params)
        proposed_prior = calculate_prior(proposed_params)
        proposed_posterior = proposed_likelihood + proposed_prior
        
        # Calculate acceptance probability
        acceptance_prob = min(1, exp(proposed_posterior - current_posterior))
        
        # Accept or reject the proposal
        if random.random() < acceptance_prob:
            current_params = proposed_params
            current_posterior = proposed_posterior
        
        # Store samples (after burn-in)
        if iteration > burn_in:
            store_sample(current_params)
    
    return samples
```

7. Maximum Likelihood Test for Preservation Models (Silvestro et al. 2019):

This test is used to choose the most appropriate preservation model before the main Bayesian analysis.

```python
def preservation_model_selection(fossil_data):
    models = ['HPP', 'NHPP', 'TPP']
    AICs = []
    
    for model in models:
        # Find maximum likelihood parameters
        q_ML = optimize(preservation_likelihood, fossil_data, model)
        
        # Calculate maximum likelihood
        ML = preservation_likelihood(fossil_data, q_ML, model)
        
        # Calculate AIC
        k = number_of_parameters(model)
        AIC = 2*k - 2*ML
        AICs.append(AIC)
    
    # Select model with lowest AIC
    best_model = models[np.argmin(AICs)]
    return best_model
```

This test affects the analysis by selecting the best-fitting preservation model (HPP, NHPP, or TPP) based on AIC scores. The selected model is then used in the main Bayesian analysis. It does not change the priors or alter the fundamental Bayesian approach of the analysis.

8. Marginal Likelihood:

While not directly calculated in the main MCMC process, marginal likelihood is estimated for model comparison using thermodynamic integration:

```python
def marginal_likelihood(marginal_file, l, t):
    # Initialize marginal likelihood
    mL = 0
    
    # Integrate the likelihood over different temperatures
    for i in range(len(l)-1):
        # Use trapezoidal rule for integration
        mL += ((l[i]+l[i+1])/2.) * (t[i]-t[i+1])
    
    # Print and return the estimated marginal likelihood
    print("\n Marginal likelihood:", mL)
    return mL
```

Conclusion:
PyRate uses a Bayesian framework to infer speciation and extinction rates from fossil data. It incorporates both preservation processes (modeled as Poisson processes) and diversification processes (modeled as birth-death Markov processes). The analysis begins with a maximum likelihood test to select the best preservation model, followed by MCMC sampling to estimate the posterior distributions of the parameters. Throughout the analysis, priors remain fixed, and the posterior is calculated as the product of the prior and likelihood. This approach allows for robust estimation of diversification rates while accounting for the complexities and uncertainties in the fossil record.

Here is the detailed step-by-step guide for using PyRate, including the code for each step, explanations, and notes on file creation and user input. The workflow differentiates between Preservation Models and Birth-Death Models, as well as NHPP, TPP, HPP, RJMCMC, and BDMCMC models.

## PERPLEXITY: Step-by-Step Guide for Using PyRate

### 1. Feeding Fossil Occurrence Data

**Example:**
We have fossil occurrence data for 20 different species of the genus *Canis*. Each species has a certain number of fossils and their range (date of occurrence in millions of years).

**Code (R):**
```r
# Load the PyRate utilities in R
source(".../PyRate-master/pyrate_utilities.r")

# Define extant species
extant_dogs = c("Canis rufus", "Canis lupus", "Canis aureus", "Canis latrans", "Canis mesomelas", "Canis anthus", "Pseudalopex gymnocercus", "Canis adustus", "Canis familiaris")

# Parse the raw data and generate PyRate input file
extract.ages.pbdb(file= ".../Canis_pbdb_data.csv", extant_species=extant_dogs)
```
**Explanation:**
- **Line 1:** Load the PyRate utility functions.
- **Line 4:** Define a vector of extant species.
- **Line 7:** Parse the raw data from the Paleobiology Database (PBDB) and generate a PyRate-compatible input file.

**Files Created:**
- `Canis_pbdb_data_TaxonListt.txt`: List of all species included in the dataset.
- `Canis_pbdb_data_PyRate.py`: Python file with all occurrences formatted for a PyRate analysis.

### 2. Model Parameters

**Preservation Model:**
- **HPP**: Only one set rate, so we only have one parameter: Preservation rate (q).

**Code (Terminal):**
```bash
# Define the preservation model as HPP
python PyRate.py .../Canis_pbdb_data_PyRate.py -mHPP
```
**Explanation:**
- **Command:** Run PyRate with the HPP model, which assumes a constant preservation rate over time.

**Files Created:**
- None directly from this command, but it will generate output files during the analysis.

**TPP:**
- Preservation rate (q) for each time window specified in the input data.

**Code (Terminal):**
```bash
# Define the preservation model as TPP with time windows
python PyRate.py .../Canis_pbdb_data_PyRate.py -qShift .../epochs_q.txt
```
**Explanation:**
- **Command:** Run PyRate with the TPP model, which allows preservation rates to vary across predefined time windows.

**Files Created:**
- None directly from this command, but it will generate output files during the analysis.

**Birth-Death Model:**
- Needs speciation (S) and extinction (E) parameters.

**Code (Terminal):**
```bash
# Define the birth-death model
python PyRate.py .../Canis_pbdb_data_PyRate.py -A 2
```
**Explanation:**
- **Command:** Run PyRate with the birth-death model, which estimates speciation and extinction rates.

**Files Created:**
- None directly from this command, but it will generate output files during the analysis.

### 3. Define Prior Probability Distributions (Priors P(A))

**Example Preservation Model Prior:**
**Code (Terminal):**
```bash
# Set prior for preservation rate
python PyRate.py .../Canis_pbdb_data_PyRate.py -pP 1.5 0.3
```
**Explanation:**
- **Command:** Set the prior distribution for the preservation rate using a gamma distribution with shape 1.5 and rate 0.3.

**Example Birth-Death Model Prior:**
**Code (Terminal):**
```bash
# Set priors for speciation and extinction rates
python PyRate.py .../Canis_pbdb_data_PyRate.py -pS 2 0.5 -pE 2 0.5
```
**Explanation:**
- **Command:** Set the prior distributions for the speciation and extinction rates using gamma distributions with shape 2 and rate 0.5.

### 4. Calculate Likelihood P(B|A)

**Preservation Model:**
**Code (Terminal):**
```bash
# Calculate likelihood for preservation model
python PyRate.py .../Canis_pbdb_data_PyRate.py -mHPP -data_info
```
**Explanation:**
- **Command:** Calculate the likelihood of the observed fossil data given the preservation model parameters.

**Files Created:**
- None directly from this command, but it will generate output files during the analysis.

**Birth-Death Model:**
**Code (Terminal):**
```bash
# Calculate likelihood for birth-death model
python PyRate.py .../Canis_pbdb_data_PyRate.py -A 2 -data_info
```
**Explanation:**
- **Command:** Calculate the likelihood of the observed fossil data given the birth-death model parameters.

**Files Created:**
- None directly from this command, but it will generate output files during the analysis.

### 5. Marginal Likelihood (Model Comparison)

**Example:**
**Code (Terminal):**
```bash
# Run model comparison across preservation models
python PyRate.py .../Canis_pbdb_data_PyRate.py -qShift .../epochs_q.txt -PPmodeltest
```
**Explanation:**
- **Command:** Compare different preservation models (NHPP, HPP, TPP) to find the best fit for the data.

**Files Created:**
- None directly from this command, but it will generate output files during the analysis.

### 6. Update Posterior

**Theoretical Step:**
Use Bayes' theorem to combine the prior and the likelihood to form the posterior distribution.

### 7. Sample Posterior

**Computational Step:**
Use MCMC to generate samples from the posterior distribution and get specific estimates for the parameters.

**Code (Terminal):**
```bash
# Run MCMC to sample from the posterior distribution
python PyRate.py .../Canis_pbdb_data_PyRate.py -A 2 -mHPP -mG -n 20000000 -s 5000
```
**Explanation:**
- **Command:** Run MCMC with 20 million iterations, sampling every 5,000 iterations, to generate samples from the posterior distribution.

**Files Created:**
- `pyrate_mcmc_logs`: Folder containing log files with posterior samples.

### 8. Posterior of Preservation Model

**Example:**
**Code (Terminal):**
```bash
# Posterior distribution for preservation rate
# This will be part of the output files generated by the MCMC run
```
**Explanation:**
- **Command:** The posterior distribution for the preservation rate will be included in the output files generated by the MCMC run.

### 9. Posterior of Birth-Death Model

**Example:**
**Code (Terminal):**
```bash
# Posterior distribution for speciation and extinction rates
# This will be part of the output files generated by the MCMC run
```
**Explanation:**
- **Command:** The posterior distribution for the speciation and extinction rates will be included in the output files generated by the MCMC run.

### 10. Model Estimations

**Example:**
Based on the posterior distributions, we might estimate:
- **Net diversification rate**: 0.3 species per million years (speciation rate - extinction rate)
- **Expected species duration**: 285,714 years (1 / extinction rate)
- **Expected number of fossils per species**: 143 fossils (preservation rate * expected species duration)

### 11. Hyperpriors

**Example:**
If used, define hyperpriors for the parameters of the prior distributions.

**Code (Terminal):**
```bash
# Set hyperpriors for the parameters of the prior distributions
python PyRate.py .../Canis_pbdb_data_PyRate.py -pP 1.5 0 -pS 2 0 -pE 2 0
```
**Explanation:**
- **Command:** Set hyperpriors for the parameters of the prior distributions, allowing the rate parameter to be estimated from the data.

### Summary

This step-by-step guide provides a clear workflow for using PyRate, including the code for each step. The code snippets are based on the PyRate tutorial and are designed to help you accomplish each step in the analysis process. The guide also notes when new files are created, where they are saved, and which files feed into which steps. User input is needed for defining extant species, setting prior distributions, and specifying model parameters.

Citations:
[1] https://ppl-ai-file-upload.s3.amazonaws.com/web/direct-files/20206449/482bd4d1-aa32-4711-8068-9808d0b22d89/paste.txt
[2] https://ppl-ai-file-upload.s3.amazonaws.com/web/direct-files/20206449/8a7b8d46-a20e-4344-956a-0dfbb617c52d/paste.txt

# CLAUDE: USER GUIDE
Here is the updated code to complete the PyRate Tutorial #1, with additional comments explaining how the PyRate.py code is working behind the scenes:

# Generate PyRate input file (option 1)

1. Download fossil occurrences for a clade from the Paleobiology Database.
   - Search for the genus Canis and save it as Canis_pbdb_data.csv
   - Check "Show accepted names only" and uncheck "Include metadata at the beginning of the output"

2. Launch R by opening a Terminal window and typing `R`, or use the R GUI app or RStudio.

3. Load the pyrate_utilities.r file in R:
   ```R
   # Load the pyrate_utilities.r file to access utility functions for PyRate
   # This file contains functions to parse and process the input data for PyRate
   source("../PyRate-master/pyrate_utilities.r")
   ```

4. Check which species are extant today and define a vector of extant species:
   ```R
   # Define a vector of extant Canis species
   # This information is used to correctly assign the status of each species (extinct or extant)
   extant_dogs = c("Canis rufus","Canis lupus","Canis aureus","Canis latrans","Canis mesomelas","Canis anthus","Pseudalopex gymnocercus","Canis adustus","Canis familiaris")
   ```

5. Parse the raw data and generate PyRate input file:
   ```R
   # Extract fossil occurrence data from PBDB raw table and save it in a PyRate-compatible input file
   # The extract.ages.pbdb function from pyrate_utilities.r is used to process the data
   # It extracts the fossil occurrences, assigns species status, and generates the input file
   extract.ages.pbdb(file= "Canis_pbdb_data.csv", extant_species=extant_dogs)
   ```
   This function does not check for synonyms or typos. The `replicates` option (default 1) resamples the ages of each fossil occurrence from the respective temporal range. The `cutoff` option removes occurrences with an age range greater than the specified value.

6. Check PyRate input files in the Terminal:
   ```bash
   # Browse to the PyRate folder
   cd "../PyRate-master"
   
   # Launch PyRate with the following arguments to get summary statistics
   # The -data_info flag is used to get a summary of the input data
   python PyRate.py 'Canis_pbdb_data_PyRate.py' -data_info
   ```
   This command checks the generated input files and provides summary statistics.

# Generate PyRate input file (option 2)

1. Prepare a fossil occurrence table with 4 columns: Taxon name, Status (extinct or extant), Minimum age, and Maximum age. An additional column can be included for a trait value (e.g., body mass).

2. Launch R as explained above.

3. Load the pyrate_utilities.r file as explained above.

4. Parse the raw data and generate PyRate input file:
   ```R
   # Parse the raw data and generate PyRate input file with 10 replicates
   # The extract.ages function from pyrate_utilities.r is used to process the data
   # It extracts the fossil occurrences, assigns species status, and generates the input file
   extract.ages(file="../PyRate/example_files/Ursidae.txt", replicates=10)
   ```
   This function includes `replicates` and `cutoff` options similar to `extract.ages.pbdb()`.

### Accounting for age dependence in multiple occurrences from the same site
If a fossil assemblage or site contains several occurrences, they should be considered as coeval when randomizing their age within the temporal range. This can be done by specifying a "Site" column in the input data with a number indicating the assemblage ID for each occurrence.

*pyrate_utilities.r* (`extract.ages()`) will automatically resample fossil ages by site when a "Site" column is included in the input table.

# Check species names for typos and inconsistent spelling

PyRate implements an algorithm to check for inconsistent spelling in species names (format: *Genus\_species*) using the `-check_names` function. It requires a text file with one species name per row, such as the \*\_SpeciesList.txt file generated while preparing PyRate's input files.

Run the following command in the Terminal:
```bash
# Check for inconsistent spelling in species names
# The -check_names flag is used to specify the input file with species names
python PyRate.py -check_names PBDB_dataset_TaxonList.txt
```

This returns a table saved in a text file with plausible typos. Ranks 0 and 1 indicate the most likely cases of misspellings, whereas ranks 2 and 3 are most likely truly different names. Note that this algorithm does not check for synonyms.

The researcher should double-check these possible inconsistencies and fix the names in their dataset accordingly.

# Estimation of speciation and extinction rates through time

## Defining the preservation model
PyRate supports different preservation models:

1. **Non-homogeneous Poisson process of preservation (NHPP)**: Default model, preservation rates change during the lifespan of each lineage following a bell-shaped distribution.
   - The `NHPP_lik` function in PyRate.py calculates the likelihood of the NHPP model.

2. **Homogeneous Poisson process (HPP)**: Preservation rate is constant through time.
   ```bash
   # Run PyRate with HPP model
   # The -mHPP flag is used to specify the HPP model
   python PyRate.py Canis_pbdb_data_PyRate.py -mHPP
   ```
   - The `HOMPP_lik` function in PyRate.py calculates the likelihood of the HPP model.

3. **Time-variable Poisson process (TPP)**: Preservation rates are constant within predefined time frames but can vary across time frames (e.g., geological epochs).
   ```bash
   # Run PyRate with TPP model and provide a file with times delimiting the epochs
   # The -qShift flag is used to specify the file with the times of rate shifts
   python PyRate.py Canis_pbdb_data_PyRate.py -qShift epochs_q.txt
   ```
   The default prior on the vector of preservation rates is a single gamma distribution with shape = 1.5 and rate = 1.5. These values can be changed using the `-pP` command, e.g., `-pP 2 0.1`.
   The rate parameter of the prior can be estimated from the data by setting it to 0:
   ```bash
   # Run PyRate with TPP model and estimate the rate parameter of the prior
   # The -pP flag is used to specify the prior parameters
   python PyRate.py Canis_pbdb_data_PyRate.py -qShift epochs_q.txt -pP 1.5 0
   ```
   This approach is recommended when multiple preservation rates are estimated.
   - The `TPP_model` variable in PyRate.py indicates whether the TPP model is being used.
   - The `NHPP_lik` function in PyRate.py calculates the likelihood of the TPP model by integrating over the different time frames.

4. **Gamma model of rate heterogeneity**: Can be coupled with NHPP, HPP, and TPP models to account for heterogeneity in the preservation rate across lineages.
   ```bash
   # Run PyRate with NHPP model and Gamma rate heterogeneity
   # The -mG flag is used to specify the Gamma model
   python PyRate.py Canis_pbdb_data_PyRate.py -mG
   
   # Run PyRate with HPP model and Gamma rate heterogeneity
   python PyRate.py Canis_pbdb_data_PyRate.py -mHPP -mG
   
   # Run PyRate with TPP model and Gamma rate heterogeneity
   python PyRate.py Canis_pbdb_data_PyRate.py -qShift epochs_q.txt -mG
   ```

   When combining a TPP model with a Gamma model, you can log the estimated relative preservation rate for each lineage using the `-log_sp_q_rates` flag:
   ```bash
   # Run PyRate with TPP model, Gamma rate heterogeneity, and log per-lineage relative preservation rates
   # The -log_sp_q_rates flag is used to log the relative preservation rates
   python PyRate.py Canis_pbdb_data_PyRate.py -qShift epochs_q.txt -mG -log_sp_q_rates
   ```
   This will save an additional log file with the estimated relative preservation rate for each lineage.
   - The `argsG` variable in PyRate.py indicates whether the Gamma model is being used.
   - The `get_gamma_rates` function in PyRate.py calculates the Gamma-distributed rate heterogeneity across lineages.
   - The `NHPP_lik`, `HOMPP_lik`, and `TPP_model` functions in PyRate.py incorporate the Gamma rate heterogeneity when calculating the likelihoods.

### Model testing across preservation models
A maximum likelihood test is available to assess which of NHPP, HPP, or TPP is best supported by the data. Run the test using the `-PPmodeltest` flag:
```bash
# Run the preservation model test
# The -PPmodeltest flag is used to run the model test
python PyRate.py Canis_pbdb_data_PyRate.py -qShift epochs_q.txt -PPmodeltest
```
The Gamma model is not tested here, but it is recommended to add it to the best model among NHPP, HPP, or TPP as selected by the model testing.
- The `PPmodeltest` function in the `pyrate_lib.PPmodeltest` module performs the model testing.

## Analysis setup
The main settings for a standard PyRate analysis are:

1. Estimate origination and extinction times of each lineage
2. Estimate preservation rate and its level of heterogeneity
3. Estimate speciation and extinction rates through time

Temporal rate variation is introduced by rate shifts. The number and temporal placement of shifts are estimated from the data using the RJMCMC algorithm (default) or the BDMCMC algorithm.

Run the analysis using the PyRate input file:
```bash
# Run PyRate analysis with BDMCMC algorithm
# The -A 2 flag sets the algorithm to BDMCMC
python PyRate.py Canis_pbdb_data_PyRate.py -A 2
```
The `-A 2` flag sets the algorithm to BDMCMC, while `-A 4` (or omitting the flag) sets it to RJMCMC.
- The `MCMC` function in PyRate.py runs the MCMC analysis based on the specified settings.
- The `BDMCMC` function implements the BDMCMC algorithm, while the `RJMCMC` function implements the RJMCMC algorithm.

To analyze a specific replicate from the input file, use the `-j` flag followed by the replicate number:
```bash
# Run PyRate analysis with HPP model, Gamma rate heterogeneity, and replicate 1
# The -j flag specifies the replicate number
python PyRate.py Canis_pbdb_data_PyRate.py -A 2 -mHPP -mG -j 1
```

To change the number of MCMC iterations and the sampling frequency, use the `-n` and `-s` flags:
```bash
# Run PyRate analysis with Gamma rate heterogeneity, 20 million iterations, and sampling every 5000 iterations
# The -n flag sets the number of iterations, and the -s flag sets the sampling frequency
python PyRate.py Canis_pbdb_data_PyRate.py -mG -n 20000000 -s 5000
```

## Output files
The PyRate analysis produces three output files in the *pyrate\_mcmc\_logs* folder:

1. **sum.txt**: Text file with the complete list of settings used in the analysis.
2. **mcmc.log**: Tab-separated table with the MCMC samples of the posterior, prior, likelihoods, preservation rate, shape parameter of gamma-distributed heterogeneity, number of sampled rate shifts, time of origin of the oldest lineage, total branch length, and times of speciation and extinction of all taxa.
3. **marginal_rates.log**: Tab-separated table with the posterior samples of the marginal rates of speciation, extinction, and net diversification, calculated within 1 time unit (typically Myr).

## Summarize the results
Open the log files in the program **Tracer** to check MCMC convergence and determine the proportion of burnin.

Calculate the sampling frequencies of birth-death models with different numbers of rate shifts using the `-mProb` command:
```bash
# Calculate sampling frequencies of birth-death models with different numbers of rate shifts
# The -mProb flag specifies the input mcmc.log file, and the -b flag specifies the number of initial samples to discard as burnin
python PyRate.py -mProb Canis_pbdb_data_mcmc.log -b 200
```
The `-b 200` flag indicates that the first 200 samples will be removed as burnin.

Generate rates-through-time plots using the `-plot` command:
```bash
# Generate rates-through-time plots
# The -plot flag specifies the input marginal_rates.log file, and the -b flag specifies the number of initial samples to discard as burnin
python PyRate.py -plot Canis_pbdb_data_marginal_rates.log -b 200
```
This will generate an R script and a PDF file with the RTT plots showing speciation, extinction, and net diversification through time. Use the `-plot2` flag for a slightly different flavor of the RTT plot.
- The `plot_RTT` function in PyRate.py generates the rates-through-time plots.

#### Combine log files across replicates
To combine log files from different replicates into one, use the `-combLog` command:
```bash
# Combine log files across replicates
# The -combLog flag specifies the directory where the log files are located, the -tag flag specifies a tag to identify the files to combine, and the -b flag specifies the number of initial samples to discard as burnin
PyRate.py -combLog path_to_your_log_files -tag mcmc -b 100
```
`path_to_your_log_files` specifies the directory where the log files are, `-tag mcmc` combines all files containing _mcmc.log_ in the file name, and `-b 100` excludes the first 100 samples from each log file as burnin.
- The `comb_log_files` function in PyRate.py combines the log files across replicates.

To generate a rates-through-time plot that combines all replicates, use the `-plot` command:
```bash
# Generate rates-through-time plot combining all replicates
# The -plot flag specifies the directory where the log files are located, the -tag flag specifies a tag to identify the files to combine, and the -b flag specifies the number of initial samples to discard as burnin
PyRate.py -plot path_to_your_log_files -tag Canis_pbdb -b 100
```
This will combine all the _marginal\_rates.log_ files that include `Canis_pbdb` in the file name and combine the results in a single plot.

#### Plot preservation rates through time
To summarize and plot preservation rates estimated using the TPP model, use the `-plotQ` command:
```bash
# Plot preservation rates through time
# The -plotQ flag specifies the input mcmc.log file, the -qShift flag specifies the file with the times of rate shifts, and the -b flag specifies the number of initial samples to discard as burnin
PyRate.py -plotQ Canis_pbdb_data_mcmc.log -qShift epochs.txt -b 100
```
Provide the `mcmc.log` file and the file with the times of rate shift (e.g., `epochs_q.txt`).
- The `plot_RTT` function in PyRate.py generates the preservation rate plots when using the `-plotQ` flag.

## Speciation and extinction rates within fixed time bins
#### Analysis setup
PyRate can also fit birth-death models in which the number and temporal placement of rate shifts is fixed a priori, e.g., based on geological epochs. Provide a file with the predefined times of rate shifts using the `-fixShift` command:
```bash
# Run PyRate with fixed times of rate shifts
# The -fixShift flag specifies the file with the fixed times of rate shifts
python PyRate.py Canis_pbdb_data_PyRate.