# Programing for data analytics project 2

# Question

• Analyse CO2 vs Temperature Anomaly from 800kyrs – present.
• Examine one other (paleo/modern) features (e.g. CH4 or polar ice-coverage)
• Examine Irish context:
o Climate change signals: (see Maynooth study: The emergence of a climate change
signal in long-term Irish meteorological observations - ScienceDirect)
• Fuse and analyse data from various data sources and format fused data set as a pandas
dataframe and export to csv and json formats
• For all of the above variables, analyse the data, the trends and the relationships between
them (temporal leads/lags/frequency analysis).
• Predict global temperature anomaly over next few decades (synthesise data) and compare to
published climate models if atmospheric CO2 trends continue
• Comment on accelerated warming based on very latest features (e.g. temperature/polar-icecoverage)
Use a Jupyter notebook for your analysis and track your progress using GitHub.
Use an academic referencing style

# Import Required Libraries:

In [915]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
from sklearn.linear_model import LinearRegression
from scipy import signal
from datetime import date


# import data and reading the file 
this code loads data from an Excel file into two separate pandas DataFrames (CO2_LUTHI and CO2_LUTHI_new). Each DataFrame corresponds to a specific sheet in the Excel file, and the data can then be analyzed and manipulated using pandas functionalities. The file path and sheet names are specified using raw string literals to handle any potential issues with escape characters or special characters in the file path.



# CO2 LUTHI


In [916]:
CO2_LUTHI = pd.read_excel(
    r'C:\Users\fifoa\OneDrive\Desktop\ATU\PFDA-PROJECT-2\CO2 LUTHI.xls', sheet_name='2.  Vostok-TD-Dome C')
CO2_LUTHI_new = pd.read_excel(
    r'C:\Users\fifoa\OneDrive\Desktop\ATU\PFDA-PROJECT-2\CO2 LUTHI.xls', sheet_name='1.  new CO2 data')


# data extraction and slicing  DataFrames 
slicing_params is a list of tuples where each tuple contains the slicing parameters for one DataFrame. The loop iterates through these parameters, extracts the corresponding subset from CO2_LUTHI or CO2_LUTHI_new, and appends the subset to the resulting_dfs list. Finally, the resulting DataFrames are assigned to separate variables with meaningful names.

Data Extraction and Slicing from CO2 LUTHI DataFrames"
"DataFrame Slicing for Multiple Variables in CO2 LUTHI Data"
"Subset Creation from CO2 LUTHI and CO2 LUTHI New DataFrames"
"Analysis: Extracting Specific Data from CO2 LUTHI and CO2 LUTHI New"
"Data Exploration: Selecting Subsets from CO2 LUTHI and CO2 LUTHI New"


This approach makes it easy to add or modify slicing parameters without duplicating code for each DataFrame extraction.

EXAMPLE FROM THE BELOW IS USED TO CREATE BELOW
https://stackoverflow.com/questions/1335392/iteration-over-list-slices

https://medium.com/probably-programming/python-slicing-looping-and-copying-a-list-a2ad96a170ba#:~:text=You%20can%20use%20a%20slice%20in%20a%20for,of%20the%20office%20are%3A%20%3E%3E%3E%20Michael%20%3E%3E%3E%20Dwight








In [917]:
monnin_luthi = CO2_LUTHI.iloc[6:189, 1:3]
pettit_luthi = CO2_LUTHI.iloc[19:353, 5:7]
siegenthaler_1_LUTHI = CO2_LUTHI.iloc[6:26, 16:18]
siegenthaler_2_LUTHI = CO2_LUTHI.iloc[6:328, 12:14]
luthi_luthi = CO2_LUTHI_new.iloc[16:253, 1:3]

# REPLACING NAMES AND RENAMING COLUMNS 


This loop dynamically renames the columns based on the number of columns in each subset. It uses a list comprehension to create logical column names

In [918]:
monnin_luthi.rename(columns=({'Unnamed: 1':'yr_bp', 'Unnamed: 2':'co2_ppmv'}), inplace=True)
pettit_luthi.rename(columns=({'Unnamed: 5':'yr_bp', 'Unnamed: 6':'co2_ppmv'}), inplace=True)
siegenthaler_1_luthi.rename(columns=({'Unnamed: 16':'yr_bp', 'Unnamed: 17':'co2_ppmv'}), inplace=True)
siegenthaler_2_luthi.rename(columns=({'Unnamed: 12':'yr_bp', 'Unnamed: 13':'co2_ppmv'}), inplace=True)
luthi_luthi.rename(columns=({'Unnamed: 1':'yr_bp', 'Unnamed: 2':'co2_ppmv'}), inplace=True)

Generating a column to compute the number of years before 2023 and removing row countaining null 






In [919]:

import pandas as pd

def process_dataframe(df):
    # Rename columns
    df.rename(columns={'Unnamed: 1': 'yr_bp', 'Unnamed: 2': 'co2_ppmv'}, inplace=True)

    # Create a column that calculates the number of years before 2023
    df['years_before_2023'] = 2023 - df['yr_bp']

    # Create a column that calculates the year
    df['calculated_year'] = 2023 + df['years_before_2023']

    # Drop rows with null values
    df.dropna(inplace=True)

# Example usage for each DataFrame
process_dataframe(monnin_luthi)
process_dataframe(pettit_luthi)
process_dataframe(siegenthaler_1_luthi)
process_dataframe(siegenthaler_2_luthi)
process_dataframe(luthi_luthi)


concatenates a list of DataFrames (luthi_frames) into a single DataFrame (luthi_full_co2_data) along a new index, effectively combining the CO2 data from different studies into a unified dataset.

In [920]:
luthi_frames = [monnin_luthi ,pettit_luthi, siegenthaler_1_luthi, siegenthaler_2_luthi, luthi_luthi]

luthi_full_co2_data = pd.concat(luthi_frames, ignore_index = True)

The code luthi_full_co2_data.head() is displaying the first few rows of the DataFrame luthi_full_co2_data. The head() method in pandas is used to retrieve the top rows of a DataFrame, and by default, it returns the first five rows

In [921]:
luthi_full_co2_data.head()

Unnamed: 0,yr_bp,co2_ppmv,years_before_2023,calculated_year
0,137,280.4,1886,3909
1,268,274.9,1755,3778
2,279,277.9,1744,3767
3,395,279.1,1628,3651
4,404,281.9,1619,3642


droping  any rows with missing values in the DataFrame luthi_full_co2_data and information about data types and non-null counts for each column.

In [922]:
luthi_full_co2_data.dropna(inplace=True)

# Verify the data types and non-null counts
luthi_full_co2_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1096 entries, 0 to 1095
Data columns (total 4 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   yr_bp              1096 non-null   object
 1   co2_ppmv           1096 non-null   object
 2   years_before_2023  1096 non-null   object
 3   calculated_year    1096 non-null   object
dtypes: object(4)
memory usage: 34.4+ KB
