# **Programming for Data Analysis**
---

**Author: Damien Farrell**

---

## **Project 2:**
## **An Analysis of Paleo-Present Climate Data**

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import re
import requests

import jupyter_black

jupyter_black.load()
sns.set_theme()

### Data Set Components

1. **Temperature Estimate:**
   - Temperature difference anomaly from the average temperature of the last 1000 years over the past 800,000 years.
   - Temperature estimated from the analysis of deuterium in the ice cores, with various corrections.
     <br>
     <br>

1. **Composite Carbon Dioxide Record:**
   - Composite record of atmospheric carbon dioxide (CO2) levels over the past 800,000 years.
   - Antarctic ice-core records of carbon dioxide extend back 800,000 years at Dome C and over 400,000 years at the Vostok site. Additional shorter recordS from Taylor Dome, another Antarctic location.
     <br>
     <br>

1. **Methane Record (EPICA Dome C Ice Core):**
   - Detailed methane record to 800,000 years before AD 1950.
     <br>
     <br>

1. **Dublin Airport Hourly Data:**
   - Detailed weather data from Dublin airport.
     <br>
     <br>



Composite Rainfall Time Series from 1711-2016 for Ireland
https://www.met.ie/climate/available-data/long-term-data-sets/

IOI_1711_SERIES 

Reconstruction of a long-term historical daily maximum and minimum air temperature network dataset for Ireland (1831-1968)
https://www.met.ie/climate/available-data/long-term-data-sets/

Valentia-Observatory-telegraphic-reporting-station_1921-1943
Valentia-Observatory-telegraphic-reporting-station_1850-1920
Valentia-Observatory_second-order-station_1883-1909


Observatory Monthly Data
https://data.gov.ie/dataset/valentia-observatory-monthly-data
mly2275





In [2]:
present_year = 1950

Extract Temperature Data from NOAA EDC3 2007 and Convert into a DataFrame

In [3]:
url = "https://www.ncei.noaa.gov/pub/data/paleo/icecore/antarctica/epica_domec/edc3deuttemp2007.txt"
response = requests.get(url)  # Generate response object
text = response.text  # Return the HTML of webpage as string
data = text

# The pattern to find the start of the table
pattern = re.compile(r"Bag\s+ztop\s+Age\s+Deuterium\s+Temperature", re.IGNORECASE)

# Extract the data starting from the match
match = pattern.search(data)
start_index = match.start()
table_data = data[start_index:]

# Convert the table_data to a list of lines
table_lines = table_data.strip().split("\n")

# Extract column names and data
columns = table_lines[0].split()
rows = [line.split() for line in table_lines[1:]]

# Create a Temperature DataFrame
temp_df = pd.DataFrame(rows, columns=columns)
numeric_columns = ["Bag", "ztop", "Age", "Deuterium", "Temperature"]
temp_df[numeric_columns] = temp_df[numeric_columns].apply(
    pd.to_numeric, errors="coerce"
)

temp_df = temp_df.dropna().reset_index(drop=True)

In [4]:
temp_df

Unnamed: 0,Bag,ztop,Age,Deuterium,Temperature
0,13,6.60,38.37379,-390.90,0.88
1,14,7.15,46.81203,-385.10,1.84
2,15,7.70,55.05624,-377.80,3.04
3,16,8.25,64.41511,-394.10,0.35
4,17,8.80,73.15077,-398.70,-0.42
...,...,...,...,...,...
5780,5796,3187.25,797408.00000,-440.20,-8.73
5781,5797,3187.80,798443.00000,-439.00,-8.54
5782,5798,3188.35,799501.00000,-441.10,-8.88
5783,5799,3188.90,800589.00000,-441.42,-8.92


In [5]:
co2_df = pd.read_excel(
    "./data/CO2/grl52461-sup-0003-supplementary.xls",
    sheet_name="CO2 Composite",
    skiprows=14,
)

In [6]:
co2_df

Unnamed: 0,Gasage (yr BP),CO2 (ppmv),sigma mean CO2 (ppmv)
0,-51.030000,368.022488,0.060442
1,-48.000000,361.780737,0.370000
2,-46.279272,359.647793,0.098000
3,-44.405642,357.106740,0.159923
4,-43.080000,353.946685,0.043007
...,...,...,...
1896,803925.284376,202.921723,2.064488
1897,804009.870607,207.498645,0.915083
1898,804522.674630,204.861938,1.642851
1899,805132.442334,202.226839,0.689587


In [7]:
ch4_df = pd.read_csv(
    "./data/CH4/Nehrbass-Ahles-etal_2020_CH4_comp.tab", skiprows=22, sep="\t"
)

In [8]:
ch4_df

Unnamed: 0,Depth ice/snow [m] (average depth),Age [ka BP],CH4 [ppbv],CH4 [ppbv] (corrected for gravitational s...),Uncertainty [±]
0,2479.40,300.136,520,523,10
1,2481.03,300.747,459,462,10
2,2481.60,300.981,487,490,10
3,2483.23,301.756,453,456,10
4,2483.80,301.960,453,456,10
...,...,...,...,...,...
413,2797.83,445.906,405,407,10
414,2798.42,446.656,400,402,10
415,2799.53,447.907,438,440,10
416,2800.62,449.062,439,441,10


In [9]:
irish_weather_df = pd.read_csv(
    "./data/irish/mly2275.csv",
    sep=",",
    skiprows=19,
    usecols=["year", "month", "meant", "rain"],
)

irish_weather_df["year-month"] = irish_weather_df[["year", "month"]].apply(
    lambda row: "-".join(row.values.astype(str)), axis=1
)

In [10]:
irish_weather_df

Unnamed: 0,year,month,meant,rain,year-month
0,1939,10,10.0,105.5,1939-10
1,1939,11,10.3,251.9,1939-11
2,1939,12,6.0,116.9,1939-12
3,1940,1,5.8,163.8,1940-1
4,1940,2,7.9,179.6,1940-2
...,...,...,...,...,...
1005,2023,7,15.3,170.3,2023-7
1006,2023,8,15.9,177.1,2023-8
1007,2023,9,15.6,189.3,2023-9
1008,2023,10,12.8,253.1,2023-10


Importing, cleaning, and transforming the Irish temperature weather data.

In [63]:
# Reading in the csv file
irish_weather_df1 = pd.read_csv(
    "./data/irish/Valentia-Observatory_second-order-station_1883-1909.csv",
    sep=",",
    encoding="ISO-8859-1",
)

# Transforming the dataset
irish_weather_df1.drop(["Max (°F)", "Min (°F)"], axis=1, inplace=True)
irish_weather_df1["meant"] = irish_weather_df1[["Max (°C)", "Min (°C)"]].mean(axis=1)
irish_weather_df1.drop(["Max (°C)", "Min (°C)"], axis=1, inplace=True)
irish_weather_df1.rename(
    columns={"Year": "year", "Month": "month", "Day ": "day"}, inplace=True
)

irish_weather_df1["date"] = pd.to_datetime(
    irish_weather_df1[["year", "month", "day"]], errors="coerce"
)
irish_weather_df1 = irish_weather_df1.drop(["year", "month", "day"], axis=1)
irish_weather_df1.set_index("date", inplace=True)

# Downsample to yearly
resample = irish_weather_df1.resample("Y")
irish_weather_df1 = resample.mean()

# Reading in the csv file
irish_weather_df2 = pd.read_csv(
    "./data/irish/Valentia-Observatory-telegraphic-reporting-station_1850-1920.csv",
    sep=",",
    encoding="ISO-8859-1",
)

# Transforming the dataset
irish_weather_df2.drop(["Max (°F)", "Min (°F)"], axis=1, inplace=True)
irish_weather_df2["meant"] = irish_weather_df2[["Max (°C)", "Min (°C)"]].mean(axis=1)
irish_weather_df2.drop(["Max (°C)", "Min (°C)"], axis=1, inplace=True)
irish_weather_df2.rename(
    columns={"Year": "year", "Month": "month", "Day": "day"}, inplace=True
)
irish_weather_df2["date"] = pd.to_datetime(
    irish_weather_df2[["year", "month", "day"]], errors="coerce"
)
irish_weather_df2 = irish_weather_df2.drop(["year", "month", "day"], axis=1)
irish_weather_df2.set_index("date", inplace=True)

# Downsample to yearly
resample = irish_weather_df2.resample("Y")
irish_weather_df2 = resample.mean()

# Reading in the csv file
irish_weather_df3 = pd.read_csv(
    "./data/irish/Valentia-Observatory-telegraphic-reporting-station_1921-1943.csv",
    sep=",",
    encoding="ISO-8859-1",
)

# Transforming the dataset
irish_weather_df3.drop(
    ["Max at 7h (°F)", "Min at 7h (°F)", "Max at 18h (°F)", "Min at 18h (°F)"],
    axis=1,
    inplace=True,
)
irish_weather_df3["meant"] = irish_weather_df3[
    ["Max at 7h (°C)", "Min at 7h (°C)", "Max at 18h (°C)", "Min at 18h (°C)"]
].mean(axis=1)
irish_weather_df3.drop(
    ["Max at 7h (°C)", "Min at 7h (°C)", "Max at 18h (°C)", "Min at 18h (°C)"],
    axis=1,
    inplace=True,
)
irish_weather_df3.rename(
    columns={"Year": "year", "Month": "month", "Daily": "day"}, inplace=True
)
irish_weather_df3["date"] = pd.to_datetime(
    irish_weather_df3[["year", "month", "day"]], errors="coerce"
)
irish_weather_df3 = irish_weather_df3.drop(["year", "month", "day"], axis=1)
irish_weather_df3.set_index("date", inplace=True)

# Downsample to yearly
resample = irish_weather_df3.resample("Y")
irish_weather_df3 = resample.mean()

# Combining the datasets
combined_irish_weather_df = pd.concat(
    [irish_weather_df1, irish_weather_df2, irish_weather_df3]
)

In [64]:
combined_irish_weather_df

Unnamed: 0_level_0,meant
date,Unnamed: 1_level_1
1883-12-31,10.238219
1884-12-31,10.658197
1885-12-31,10.090822
1886-12-31,10.127534
1887-12-31,10.343151
...,...
1939-12-31,10.717603
1940-12-31,10.874454
1941-12-31,10.360137
1942-12-31,10.547466


Text Analysis

https://guides.library.upenn.edu/penntdm/python/import_files#:~:text=Importing%20Files%20(Web%20Scraping)&text=The%20get()%20function%20in,it%20in%20a%20Python%20object.

Parsing a text file into a pandas DataFrame

https://codereview.stackexchange.com/questions/257729/parsing-a-text-file-into-a-pandas-

Pandas Timestamp Limitations

https://calmcode.io/til/pandas-timerange.html#:~:text=Since%20pandas%20represents%20timestamps%20in,limited%20to%20approximately%20584%20years.

How To Resample and Interpolate Your Time Series Data With Python

https://machinelearningmastery.com/resample-interpolate-time-series-data-python/

pandas.read_excel

https://pandas.pydata.org/docs/reference/api/pandas.read_excel.html


Compilation of improved CH4 data derived from the European Project for Ice Coring in Antarctica (EPICA) Dome C

https://doi.pangaea.de/10.1594/PANGAEA.914908


An optimized multi-proxy, multi-site Antarctic ice and gas orbital chronology (AICC2012): 120–800 ka

https://cp.copernicus.org/articles/9/1715/2013/


How to concatenate multiple column values into a single column in Pandas dataframe

https://stackoverflow.com/questions/39291499/how-to-concatenate-multiple-column-values-into-a-single-column-in-pandas-datafra

***
# End