# **Programming for Data Analysis**
---

**Author: Damien Farrell**

---

## **Project 2:**
## **An Analysis of Paleo-Present Climate Data**

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import re
import requests

import jupyter_black

jupyter_black.load()
sns.set_theme()

### Data Set Components

1. **Temperature Estimate:**
   - Temperature difference anomaly from the average temperature of the last 1000 years over the past 800,000 years.
   - Temperature estimated from the analysis of deuterium in the ice cores, with various corrections.
     <br>
     <br>

1. **Composite Carbon Dioxide Record:**
   - Composite record of atmospheric carbon dioxide (CO2) levels over the past 800,000 years.
   - Antarctic ice-core records of carbon dioxide extend back 800,000 years at Dome C and over 400,000 years at the Vostok site. Additional shorter recordS from Taylor Dome, another Antarctic location.
     <br>
     <br>

1. **Methane Record (EPICA Dome C Ice Core):**
   - Detailed methane record to 800,000 years before AD 1950.
     <br>
     <br>

1. **Dublin Airport Hourly Data:**
   - Detailed weather data from Dublin airport.
     <br>
     <br>

In [2]:
present_year = 1950

Extract Temperature Data from NOAA EDC3 2007 and Convert into a DataFrame

In [3]:
url = "https://www.ncei.noaa.gov/pub/data/paleo/icecore/antarctica/epica_domec/edc3deuttemp2007.txt"
response = requests.get(url)  # Generate response object
text = response.text  # Return the HTML of webpage as string
data = text

# The pattern to find the start of the table
pattern = re.compile(r"Bag\s+ztop\s+Age\s+Deuterium\s+Temperature", re.IGNORECASE)

# Extract the data starting from the match
match = pattern.search(data)
start_index = match.start()
table_data = data[start_index:]

# Convert the table_data to a list of lines
table_lines = table_data.strip().split("\n")

# Extract column names and data
columns = table_lines[0].split()
rows = [line.split() for line in table_lines[1:]]

# Create a Temperature DataFrame
temp_df = pd.DataFrame(rows, columns=columns)
numeric_columns = ["Bag", "ztop", "Age", "Deuterium", "Temperature"]
temp_df[numeric_columns] = temp_df[numeric_columns].apply(
    pd.to_numeric, errors="coerce"
)

temp_df = temp_df.dropna().reset_index(drop=True)

In [4]:
temp_df

Unnamed: 0,Bag,ztop,Age,Deuterium,Temperature
0,13,6.60,38.37379,-390.90,0.88
1,14,7.15,46.81203,-385.10,1.84
2,15,7.70,55.05624,-377.80,3.04
3,16,8.25,64.41511,-394.10,0.35
4,17,8.80,73.15077,-398.70,-0.42
...,...,...,...,...,...
5780,5796,3187.25,797408.00000,-440.20,-8.73
5781,5797,3187.80,798443.00000,-439.00,-8.54
5782,5798,3188.35,799501.00000,-441.10,-8.88
5783,5799,3188.90,800589.00000,-441.42,-8.92


In [5]:
co2_df = pd.read_excel(
    "./data/CO2/grl52461-sup-0003-supplementary.xls",
    sheet_name="CO2 Composite",
    skiprows=14,
)

In [6]:
co2_df

Unnamed: 0,Gasage (yr BP),CO2 (ppmv),sigma mean CO2 (ppmv)
0,-51.030000,368.022488,0.060442
1,-48.000000,361.780737,0.370000
2,-46.279272,359.647793,0.098000
3,-44.405642,357.106740,0.159923
4,-43.080000,353.946685,0.043007
...,...,...,...
1896,803925.284376,202.921723,2.064488
1897,804009.870607,207.498645,0.915083
1898,804522.674630,204.861938,1.642851
1899,805132.442334,202.226839,0.689587


In [7]:
ch4_df = pd.read_csv(
    "./data/CH4/Nehrbass-Ahles-etal_2020_CH4_comp.tab", 
    skiprows=22, 
    sep="\t"
)

In [8]:
ch4_df

Unnamed: 0,Depth ice/snow [m] (average depth),Age [ka BP],CH4 [ppbv],CH4 [ppbv] (corrected for gravitational s...),Uncertainty [±]
0,2479.40,300.136,520,523,10
1,2481.03,300.747,459,462,10
2,2481.60,300.981,487,490,10
3,2483.23,301.756,453,456,10
4,2483.80,301.960,453,456,10
...,...,...,...,...,...
413,2797.83,445.906,405,407,10
414,2798.42,446.656,400,402,10
415,2799.53,447.907,438,440,10
416,2800.62,449.062,439,441,10


In [33]:
irish_weather_df = pd.read_csv(
    "./data/irish/hly532.csv", sep=",", skiprows=23, low_memory=False
)

In [34]:
irish_weather_df

Unnamed: 0,date,ind,rain,ind.1,temp,ind.2,wetb,dewpt,vappr,rhum,...,ind.3,wdsp,ind.4,wddir,ww,w,sun,vis,clht,clamt
0,01-jan-1943 00:00,0,0.4,0,7.2,0,6.8,6.1,9.5,93,...,1,13,1,240,61,6,0.0,10000,9,8
1,01-jan-1943 01:00,0,0.7,0,7.8,0,7.6,7.2,10.2,96,...,1,19,1,240,61,6,0.0,10000,8,8
2,01-jan-1943 02:00,0,0.5,0,8.7,0,8.3,7.7,10.7,95,...,1,24,1,250,51,6,0.0,7000,7,8
3,01-jan-1943 03:00,2,0.0,0,9.1,0,8.7,8.3,11.0,95,...,1,24,1,270,50,6,0.0,10000,9,7
4,01-jan-1943 04:00,2,0.0,0,9.4,0,8.8,8.3,10.9,93,...,1,24,1,270,50,5,0.0,10000,8,8
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
709292,30-nov-2023 20:00,2,0.0,0,1.6,0,0.6,-1.1,5.6,81,...,2,8,2,340,25,81,0.0,20000,60,7
709293,30-nov-2023 21:00,0,0.0,1,-0.3,1,-1.0,-2.5,5.1,86,...,2,8,2,340,1,81,0.0,18000,60,5
709294,30-nov-2023 22:00,3,0.0,1,-1.1,1,-1.5,-2.4,5.2,91,...,2,6,2,330,2,11,0.0,20000,999,1
709295,30-nov-2023 23:00,3,0.0,1,-1.3,1,-1.7,-2.5,5.1,91,...,2,7,2,300,2,11,0.0,25000,999,1


Text Analysis

https://guides.library.upenn.edu/penntdm/python/import_files#:~:text=Importing%20Files%20(Web%20Scraping)&text=The%20get()%20function%20in,it%20in%20a%20Python%20object.

Parsing a text file into a pandas DataFrame

https://codereview.stackexchange.com/questions/257729/parsing-a-text-file-into-a-pandas-

Pandas Timestamp Limitations

https://calmcode.io/til/pandas-timerange.html#:~:text=Since%20pandas%20represents%20timestamps%20in,limited%20to%20approximately%20584%20years.

How To Resample and Interpolate Your Time Series Data With Python

https://machinelearningmastery.com/resample-interpolate-time-series-data-python/

pandas.read_excel

https://pandas.pydata.org/docs/reference/api/pandas.read_excel.html


Compilation of improved CH4 data derived from the European Project for Ice Coring in Antarctica (EPICA) Dome C

https://doi.pangaea.de/10.1594/PANGAEA.914908


An optimized multi-proxy, multi-site Antarctic ice and gas orbital chronology (AICC2012): 120–800 ka

https://cp.copernicus.org/articles/9/1715/2013/

***
# End