# Master Methdology

## Variables
1) Unit

Big/Small
    
2) Method

    Mann Kendall/Linear Regression
    
3) Aggregation
    Standardising the data over the unit before or after the aquifer 
    
4) Data
    Good and bad quality data

5) TimeFrame
    What time frame to use.
    

## Time Frame Selection

In [3]:
%%time

import pandas as pd
import numpy as np
import pymannkendall as mk
from scipy.stats import linregress

# Read the CSV file into a pandas DataFrame
data = pd.read_csv('groundwater_timeseries_data_Negative.csv')

# Convert the 'date' column to datetime format
data['date'] = pd.to_datetime(data['date'])

# Function to validate date input in YYYY-MM-DD format
def validate_date_input(date_string):
    try:
        return pd.to_datetime(date_string)
    except ValueError:
        print("Invalid date format. Please use YYYY-MM-DD format.")
        return None

# Validate if a date is within the data range
def is_date_within_range(date, data_range):
    return data_range[0] <= date <= data_range[1]

data_range = (data['date'].min(), data['date'].max())

# Ask the user to input the end date
while True:
    end_date_input = input("Enter the end date (YYYY-MM-DD): ")
    end_date = validate_date_input(end_date_input)
    if end_date is not None and is_date_within_range(end_date, data_range):
        break
    elif end_date is not None:
        print("Entered end date is not within the data range.")

# Ask the user to input the time frame in years
while True:
    try:
        time_frame_years = int(input("Enter the time frame in years: "))
        if time_frame_years > 0:
            break
        else:
            print("Time frame must be a positive integer.")
    except ValueError:
        print("Invalid input. Please enter a valid integer.")

# Calculate the start date based on the end date and time frame
start_date = end_date - pd.DateOffset(years=time_frame_years)

# Validate if the start date is within the data range
while not is_date_within_range(start_date, data_range):
    print("The calculated start date is not within the data range.")
    end_date_input = input("Enter the end date (YYYY-MM-DD): ")
    end_date = validate_date_input(end_date_input)
    if end_date is not None:
        start_date = end_date - pd.DateOffset(years=time_frame_years)

print("Start date:", start_date)
print("End date:", end_date)


Enter the end date (YYYY-MM-DD):  1986-01-01
Enter the time frame in years:  5


The calculated start date is not within the data range.


Enter the end date (YYYY-MM-DD):  1986-01-01


The calculated start date is not within the data range.


Enter the end date (YYYY-MM-DD):  2022-01-01


Start date: 2017-01-01 00:00:00
End date: 2022-01-01 00:00:00
CPU times: total: 1.14 s
Wall time: 37.6 s


## CSV File creation

In [7]:
import pandas as pd

# Load the CSV file
csv_filename = "groundwater_timeseries_data_Negative.csv"  # Replace with your CSV file path
data = pd.read_csv(csv_filename)

# Display all columns for user reference
print("Columns in the CSV file:")
for i, column in enumerate(data.columns, start=1):
    print(f"{i}. {column}")

# Get input from the user using column numbers
date_column_num = int(input("Enter the number of the column for dates: ")) - 1
level_column_num = int(input("Enter the number of the column for levels: ")) - 1
site_column_num = int(input("Enter the number of the column for sites: ")) - 1
latitude_column_num = int(input("Enter the number of the column for latitudes (enter 0 if not applicable): ")) - 1
longitude_column_num = int(input("Enter the number of the column for longitudes (enter 0 if not applicable): ")) - 1

# Extract relevant columns using the provided numbers
selected_columns = [
    data.columns[date_column_num],
    data.columns[level_column_num],
    data.columns[site_column_num]
]

if latitude_column_num >= 0 and longitude_column_num >= 0:
    selected_columns.extend([
        data.columns[latitude_column_num],
        data.columns[longitude_column_num]
    ])

# Create a new DataFrame with selected columns
processed_data = data[selected_columns]

# Ask user about level unit conversion
level_unit_conversion = None
while level_unit_conversion not in ["1", "2", "3"]:
    level_unit_conversion = input(
        "Are the level values in:\n1. Meters\n2. Feet\n3. Other (provide a multiplicative factor)\nEnter the corresponding number: "
    )

# Convert level values to meters
if level_unit_conversion == "2":
    processed_data[data.columns[level_column_num]] *= 0.3048
elif level_unit_conversion == "3":
    conversion_factor = float(input("Enter the multiplicative factor to convert to meters: "))
    processed_data[data.columns[level_column_num]] *= conversion_factor

# Display the processed level values
print("Processed level values:")
print(processed_data[data.columns[level_column_num]])

# Ask user about making level values negative if not already
if (processed_data[data.columns[level_column_num]] >= 0).all():
    make_negative = input("Do you want to make the level values negative to represent depth to water table? (yes/no): ")
    if make_negative.lower() == "yes":
        processed_data[data.columns[level_column_num]] *= -1

# Display the final processed data
print(processed_data)


Columns in the CSV file:
1. date
2. level
3. site


Enter the number of the column for dates:  1
Enter the number of the column for levels:  2
Enter the number of the column for sites:  3
Enter the number of the column for latitudes (enter 0 if not applicable):  0
Enter the number of the column for longitudes (enter 0 if not applicable):  0
Are the level values in:
1. Meters
2. Feet
3. Other (provide a multiplicative factor)
Enter the corresponding number:  2


Processed level values:
0          -5.157216
1          -4.669536
2          -4.334256
3          -4.690872
4          -5.215128
             ...    
1459195   -62.060328
1459196   -62.407800
1459197   -62.423040
1459198   -62.605920
1459199   -63.148464
Name: level, Length: 1459200, dtype: float64
               date      level       site
0        1985-01-01  -5.157216     Site_1
1        1985-02-01  -4.669536     Site_1
2        1985-03-01  -4.334256     Site_1
3        1985-04-01  -4.690872     Site_1
4        1985-05-01  -5.215128     Site_1
...             ...        ...        ...
1459195  2022-08-01 -62.060328  Site_3200
1459196  2022-09-01 -62.407800  Site_3200
1459197  2022-10-01 -62.423040  Site_3200
1459198  2022-11-01 -62.605920  Site_3200
1459199  2022-12-01 -63.148464  Site_3200

[1459200 rows x 3 columns]
