# Overview

The provided Python script is a user-interactive program designed to process CSV files. The main function of the script is `main()`, which serves as the entry point of the program. 

The `main()` function starts by asking the user if they want to merge multiple CSV files into one. If the user responds with "yes", the function calls `ask_to_merge_files()` to prompt the user for the directory containing the CSV files to be merged. The path to this directory is then passed to `merge_csv_files()`, which merges the files into a single pandas DataFrame.

If the user does not want to merge multiple files, the function instead asks for the name of a single file to process. The file name is passed to `read_file()`, which reads the file into a DataFrame.

Once the DataFrame has been created (either by merging multiple files or reading a single file), the function performs a series of operations to modify the DataFrame based on user input. 

First, it calls `delete_columns(df)` to ask the user if they want to delete any columns from the DataFrame and perform the deletion if requested. 

Next, it calls `rename_columns(df)` to ask the user if they want to rename any columns and perform the renaming if requested.

The function then asks the user if they have location data and want to filter or clip the dataset using it. If the user responds with "yes", the function prompts the user to specify the type of location data they have (either latitude and longitude data or a column like state/nation/etc.) and passes this information to `filter_by_location_data(df, location_type)`, which filters the DataFrame accordingly.

The function then calls `modify_columns(df)` to ask the user if they want to modify the values in any columns and perform the modification if requested.

Finally, the function calls `save_dataframe(df)` to save the DataFrame to an Excel file and a CSV file in a directory named "results".

In summary, this script provides a comprehensive and interactive way to process CSV files. It allows the user to merge multiple files, delete and rename columns, filter data based on location, modify column values, and save the processed data to new files.

## Installing Packages

This Python script is designed to automate the process of checking for the presence of certain Python packages in the current environment and installing them if they are not already installed.

The script begins by importing two modules: importlib and subprocess. The importlib module is used to dynamically import Python modules at runtime, while subprocess is used to spawn new processes, connect to their input/output/error pipes, and obtain their return codes.

The install_package function is defined to install a Python package using pip. It takes one argument, package_name, which is the name of the package to install. The subprocess.check_call function is used to run the command pip install package_name in a new process.

The check_and_install_package function is defined to check if a Python package is installed and install it if it is not. It also takes one argument, package_name, which is the name of the package to check and install. The importlib.import_module function is used to attempt to import the package. If the import fails, an ImportError is raised, which is caught by the except block. In this case, the install_package function is called to install the package.

Finally, a list of packages to check and install if missing is defined. This list includes "pandas", "numpy", "geopandas", "shapely", and "fiona". A for loop is used to iterate over each package in the list and call the check_and_install_package function for each one. This ensures that all the necessary packages are installed before the rest of the script is run.

In [None]:
import importlib
import subprocess

def install_package(package_name):
    """
    Installs a Python package using pip.

    Args:
        package_name (str): The name of the package to install.
    """
    subprocess.check_call([sys.executable, "-m", "pip", "install", package_name])

def check_and_install_package(package_name):
    """
    Checks if a Python package is installed and installs it if missing.

    Args:
        package_name (str): The name of the package to check and install.
    """
    try:
        importlib.import_module(package_name)
    except ImportError:
        print(f"{package_name} is not installed. Installing it now...")
        install_package(package_name)

# List of packages to check and install if missing
packages_to_install = ["pandas", "numpy", "geopandas", "shapely", "fiona"]

# Check and install each package
for package in packages_to_install:
    check_and_install_package(package)


## Importing Libraries

This Python script begins by importing several libraries that are necessary for data analysis and visualization.

The os library provides a way of using operating system dependent functionality, such as reading or writing to the file system.

The glob library is used to find all the pathnames matching a specified pattern according to the rules used by the Unix shell, although results are returned in arbitrary order. No tilde expansion is done, but *, ?, and character ranges expressed with [] will be correctly matched.

The pandas library, imported as pd, is a powerful data manipulation library. It provides data structures and functions needed to manipulate structured data, including functions for reading and writing data in a variety of formats.

The geopandas library, imported as gpd, is a project to add support for geographic data to pandas objects. It extends the datatypes used by pandas to allow spatial operations on geometric types.

The matplotlib.pyplot library, imported as plt, is a collection of functions that provide a MATLAB-like interface for making plots and charts.

The logging library is included in the standard library and provides a flexible framework for emitting log messages from Python programs. It is used here to log events that occur while the program is running.

The sys module provides access to some variables used or maintained by the Python interpreter and to functions that interact strongly with the interpreter.

The shapely.geometry module, from which Point is imported, is used for manipulation and analysis of planar geometric objects. The Point class is used to represent a point in a 2-dimensional space.

The triple quotes """ are used to create a multi-line string that serves as a comment, explaining that the script imports necessary libraries for data analysis and visualization.

In [None]:
import os
import glob
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
import logging
import sys
from shapely.geometry import Point

"""
This script imports necessary libraries for data analysis and visualization.
"""

## Merging Files

This Python script defines a function ask_to_merge_files() that interacts with the user to determine if they want to merge multiple CSV files into one. If the user wants to merge files, the function creates a directory to store the merged file and returns the path to this directory. If the user does not want to merge files, the function simply returns None.

The function begins by asking the user if they want to merge multiple CSV files into one. This is done using the input() function, which waits for the user to type something into the console and press enter. The user's input is then stored in the merge_files variable.

If the user types "yes" (case-insensitive), the function proceeds to create a directory for the merged file. The name of this directory is "csv_merge", as stored in the merge_folder variable. The os.path.dirname() and os.path.realpath() functions are used to get the path to the current directory (the directory where the script is located). The os.path.join() function is then used to create the full path to the new directory by appending the directory name to the current directory path.

The os.path.exists() function is used to check if the directory already exists. If it does not exist, the os.makedirs() function is used to create the directory.

The function then instructs the user to create a folder called 'csv_merge' in the folder where the code lies and place all the files to merge in that folder. It asks the user to type 'yes' when done. If the user types 'yes', the function returns the path to the merge folder. If the user types anything else, the function prints an error message and returns None.

If the user initially types anything other than "yes" when asked if they want to merge files, the function immediately returns None.

In [None]:
import os

def ask_to_merge_files():
    """
    Asks the user if they want to merge multiple CSV files into one.
    If yes, creates a folder to store the merged file and returns the path to the folder.
    If no, returns None.
    """
    merge_files = input("Do you want to merge multiple CSV files into one? (yes/no): ")

    if merge_files.lower() == "yes":
        merge_folder = "csv_merge"
        current_folder = os.path.dirname(os.path.realpath(__file__))
        merge_folder_path = os.path.join(current_folder, merge_folder)

        if not os.path.exists(merge_folder_path):
            os.makedirs(merge_folder_path)

        print(f"Please create a folder called '{merge_folder}' in the folder where the code lies and place all the files to merge in that folder.")
        print("Type 'yes' when done.")

        user_done = input()
        if user_done.lower() == "yes":
            return merge_folder_path
        else:
            print("Invalid input. Exiting the script.")
            return None
    else:
        return None


The provided Python function, merge_csv_files(folder_path), is designed to merge multiple CSV files located in a specified directory. These files are merged based on a common column that the user specifies. The function takes one argument, folder_path, which is the path to the directory containing the CSV files.

The function starts by using the glob.glob() function to find all CSV files in the specified directory. If no CSV files are found, the function prints a message and returns None.

Next, the function prompts the user to enter the name of the reference file, which is the file that contains the column to be used for merging. The function checks if the reference file is in the list of CSV files in the directory. If it's not, the function prints a message and returns None.

The function then reads the reference file into a pandas DataFrame using pd.read_csv(). It displays the columns in the DataFrame as a numbered list and prompts the user to enter the number of the column to be used for merging. This is done in a while loop that continues until the user enters a valid column number.

The function then iterates over the list of CSV files. For each file that is not the reference file, it reads the file into a temporary DataFrame and attempts to merge it with the main DataFrame using the pd.merge() function. The merge is done on the specified column and uses an outer join, which means that the merged DataFrame will include all rows from both the main and temporary DataFrames. If the specified column does not exist in the temporary DataFrame, the function prints an error message and skips the file.

Finally, the function prints the first few rows of the final merged DataFrame using df.head() and returns the DataFrame. This function is robust and handles several potential errors, such as the absence of CSV files in the directory, the absence of the reference file, and the absence of the merge column in one of the files.

In [None]:
def merge_csv_files(folder_path):
    """
    Merges multiple CSV files in a specified folder based on a common column.

    Args:
        folder_path (str): The path to the folder containing the CSV files.

    Returns:
        pandas.DataFrame: The merged DataFrame.

    Raises:
        FileNotFoundError: If the specified folder does not exist.
        ValueError: If no CSV files are found in the specified folder.
        ValueError: If the specified reference file is not found in the folder.
        ValueError: If an invalid column number is entered.
        KeyError: If the merge column does not exist in one of the CSV files.

    Example:
        merge_csv_files("/path/to/folder")
    """
    
    # Get a list of all CSV files in the folder
    csv_files = glob.glob(os.path.join(folder_path, "*.csv"))

    if not csv_files:
        print("No CSV files found in the specified folder.")
        return None

    # Ask the user to type the name of the file which has the column to be used for merging
    reference_file_name = input("Enter the name of the file which has the column to be used for merging (including the extension): ")

    # Check if the reference file is in the folder
    if reference_file_name not in [os.path.basename(file) for file in csv_files]:
        print("The specified reference file is not found in the folder.")
        return None

    # Read the reference CSV file
    reference_file_path = os.path.join(folder_path, reference_file_name)
    df = pd.read_csv(reference_file_path)

    # Display the columns as a numbered list
    print("Columns in the reference CSV file:")
    for i, col in enumerate(df.columns, start=1):
        print(f"{i}. {col}")

    # Ask for the column to merge on
    while True:
        try:
            merge_column = int(input("Enter the column number to use for merging (enter the corresponding number): ")) - 1
            if merge_column >= 0 and merge_column < len(df.columns):
                merge_column_name = df.columns[merge_column]
                break
            else:
                print("Invalid column number. Please enter a valid column number.")
        except ValueError:
            print("Invalid input. Please enter a valid column number.")

    # Read and merge the remaining CSV files
    for file in csv_files:
        if os.path.basename(file) != reference_file_name:
            temp_df = pd.read_csv(file)
            try:
                df = pd.merge(df, temp_df, on=merge_column_name, how="outer")
            except KeyError:
                print(f"Error: The column '{merge_column_name}' does not exist in the file '{os.path.basename(file)}'. Skipping this file.")

    print("\nFinal merged DataFrame:")
    print(df.head())

    return df

## Reading FIles

The provided Python function, read_file(file_name), is designed to read a CSV file and return its contents as a pandas DataFrame. The function takes one argument, file_name, which is the name of the CSV file to be read.

The function begins by checking if the file exists in the current directory using the os.path.isfile(file_name) method. If the file exists, the function proceeds to the next step. If the file does not exist, the function prints a message indicating that the file was not found and returns None.

If the file exists, the function attempts to read the file using the pd.read_csv(file_name) function from the pandas library. This function reads a CSV file and returns a DataFrame, which is a two-dimensional labeled data structure with columns of potentially different types. The DataFrame is stored in the variable df.

If the file is read successfully, the function returns the DataFrame df. If an error occurs during the reading of the file, the function catches the exception, prints an error message that includes the details of the exception, and returns None.

This function is robust and handles potential errors, such as the absence of the file in the current directory and errors that may occur during the reading of the file.


In [None]:
def read_file(file_name):
    """
    Reads a CSV file and returns a pandas DataFrame.

    Parameters:
    file_name (str): The name of the CSV file to be read.

    Returns:
    pandas.DataFrame: The DataFrame containing the data from the CSV file.
    None: If the file is not found or an error occurs during reading.
    """
    if os.path.isfile(file_name):
        try:
            df = pd.read_csv(file_name)
            return df
        except Exception as e:
            print(f"Something went wrong. Error: {str(e)}")
            return None
    else:
        print("File not found. Please make sure the file is in the same directory as the code.")
        return None

## Deleting Columns

The provided Python function, delete_columns(df), is designed to delete specified columns from a pandas DataFrame. The function takes one argument, df, which is the DataFrame from which columns will be deleted.

The function begins by asking the user if they want to delete any columns from the DataFrame. This is done using the input() function, which waits for the user to type something into the console and press enter. The user's input is then stored in the delete_columns variable.

If the user types "yes" (case-insensitive), the function proceeds to the next step. It displays the columns in the DataFrame as a numbered list. This is done using a for loop that iterates over the DataFrame's columns, which are accessed using df.columns. The enumerate() function is used to get the index of each column (starting from 1) along with the column name.

The function then asks the user to input the numbers of the columns they want to delete. The user can enter the numbers as a comma-separated list (e.g., "1,2,3") or as a range (e.g., "1-3"). The user's input is stored in the delete_input variable.

The function then splits the user's input by comma to get the individual ranges. It initializes an empty list, delete_indices, to store the indices of the columns to delete.

The function then iterates over the individual ranges. For each range, it checks if it contains a hyphen. If it does, the function splits the range by the hyphen to get the start and end indices. It then uses the range() function to get all the indices in the range and adds them to the delete_indices list. If the range does not contain a hyphen, the function simply adds the index to the delete_indices list. If an error occurs during this process (e.g., if the user entered an invalid number or range), the function prints an error message and continues to the next iteration.

Finally, the function deletes the selected columns from the DataFrame. This is done using the df.drop() function, which removes the specified labels from the rows or columns. The function specifies the labels to drop by indexing the DataFrame's columns with the delete_indices list and sets axis=1 to indicate that columns should be dropped. The function then returns the modified DataFrame.

If the user initially types anything other than "yes" when asked if they want to delete columns, the function immediately returns the original DataFrame without making any changes.

In [None]:
def delete_columns(df):
    """
    Deletes specified columns from a DataFrame.

    Args:
        df (pandas.DataFrame): The DataFrame from which columns will be deleted.

    Returns:
        pandas.DataFrame: The DataFrame with the specified columns deleted.
    """
    delete_columns = input("Do you want to delete any columns? (yes/no): ")
    if delete_columns.lower() == "yes":
        # Display the columns as a numbered list
        print("Columns:")
        for i, column in enumerate(df.columns, start=1):
            print(f"{i}. {column}")

        # Ask the user to input the numbers of the columns they want to delete
        delete_input = input("Enter the numbers of the columns you want to delete (e.g. 1,2,3 or 1-3): ")

        # Split the input by comma or hyphen
        delete_ranges = delete_input.replace(" ", "").split(",")

        # Initialize a list to store the column indices to delete
        delete_indices = []

        # Iterate over the delete ranges
        for delete_range in delete_ranges:
            # Check if the range contains a hyphen
            if "-" in delete_range:
                try:
                    start, end = delete_range.split("-")
                    delete_indices.extend(range(int(start)-1, int(end)))
                except:
                    print("Invalid range. Please try again.")
            else:
                try:
                    delete_indices.append(int(delete_range)-1)
                except:
                    print("Invalid column number. Please try again.")

        # Delete the selected columns from the dataframe
        df = df.drop(df.columns[delete_indices], axis=1)
    return df

## Renaming Columns

The provided Python function, rename_columns(df), is designed to rename specified columns in a pandas DataFrame based on user input. The function takes one argument, df, which is the DataFrame whose columns will be renamed.

The function begins by asking the user if they want to rename any columns in the DataFrame. This is done using the input() function, which waits for the user to type something into the console and press enter. The user's input is then stored in the rename_columns variable.

If the user types "yes" (case-insensitive), the function proceeds to the next step. It displays the columns in the DataFrame as a numbered list. This is done using a for loop that iterates over the DataFrame's columns, which are accessed using df.columns. The enumerate() function is used to get the index of each column (starting from 1) along with the column name.

The function then asks the user to input the numbers or range of columns they want to rename. The user can enter the numbers as a comma-separated list (e.g., "2,3,5") or as a range (e.g., "5-8"). The user's input is stored in the rename_input variable.

The function then splits the user's input by comma to get the individual ranges. It initializes an empty list, rename_indices, to store the indices of the columns to rename.

The function then iterates over the individual ranges. For each range, it checks if it contains a hyphen. If it does, the function splits the range by the hyphen to get the start and end indices. It then uses the range() function to get all the indices in the range and adds them to the rename_indices list. If the range does not contain a hyphen, the function simply adds the index to the rename_indices list.

Finally, the function iterates over the rename indices. For each index, it gets the previous name of the column, asks the user for the new name of the column, and renames the column in the DataFrame using the df.rename() function. The function then returns the modified DataFrame.

If the user initially types anything other than "yes" when asked if they want to rename columns, the function immediately returns the original DataFrame without making any changes.

In [None]:
def rename_columns(df):
    """
    Renames columns in a DataFrame based on user input.

    Args:
        df (pandas.DataFrame): The DataFrame to rename columns in.

    Returns:
        pandas.DataFrame: The DataFrame with renamed columns.
    """
    rename_columns = input("Do you want to rename any columns? (yes/no): ")
    if rename_columns.lower() == "yes":
        # Display the columns as a numbered list
        print("Columns:")
        for i, column in enumerate(df.columns, start=1):
            print(f"{i}. {column}")

        # Ask the user to input the numbers or range of columns they want to rename
        rename_input = input("Enter the numbers or range of columns you want to rename (e.g. 2,3,5 or 5-8): ")

        # Split the input by comma or hyphen
        rename_ranges = rename_input.replace(" ", "").split(",")

        # Iterate over the rename ranges
        for rename_range in rename_ranges:
            # Check if the range contains a hyphen
            if "-" in rename_range:
                start, end = rename_range.split("-")
                rename_indices = range(int(start), int(end)+1)
            else:
                rename_indices = [int(rename_range)]

            # Iterate over the rename indices
            for rename_index in rename_indices:
                # Get the previous name of the column
                previous_name = df.columns[rename_index-1]

                # Ask the user for the new name of the column
                new_name = input(f"Enter the new name for column {rename_index} (previous name: {previous_name}): ")

                # Rename the column in the dataframe
                df.rename(columns={previous_name: new_name}, inplace=True)
    return df


## Filtering by Locations

The provided Python function, filter_by_location_data(df, location_type), is designed to filter a pandas DataFrame based on location data. The function takes two arguments: df, which is the DataFrame to filter, and location_type, which is the type of location data to filter by. Valid options for location_type are "1" for latitude and longitude, and "2" for column.

If location_type is "1", the function first displays the columns in the DataFrame as a numbered list. It then asks the user to specify which columns represent latitude and longitude. The user's input is used to rename these columns to "lat" and "longitude" respectively.

Next, the function asks the user if they have a shapefile or geopackage for clipping. If the user answers "yes", the function asks how they would like to clip the dataset: using latitude and longitude bounding box coordinates, or using a shapefile/geopackage. If the user chooses to use bounding box coordinates, they are prompted to enter the minimum and maximum latitude and longitude values. These values are used to create a bounding box, and the DataFrame is clipped accordingly.

If the user chooses to use a shapefile/geopackage, they are asked to enter the name of the file. The function checks if the file is a shapefile or a geopackage based on its extension, and reads the file into a GeoDataFrame. The DataFrame is then clipped using the GeoDataFrame.

If location_type is "2", the function again displays the columns in the DataFrame as a numbered list. It then asks the user to specify which column contains the location data. The unique values in this column are displayed as a numbered list, and the user is asked to specify which values they would like to keep. The DataFrame is then filtered to only include rows where the location column is one of the specified values.

If location_type is neither "1" nor "2", the function prints an error message and exits.

In all cases, the function returns the filtered DataFrame. This function is robust and handles potential errors, such as invalid user input or missing files.

In [None]:
def filter_by_location_data(df, location_type):
    """
    Filters a DataFrame based on location data.

    Args:
        df (pandas.DataFrame): The DataFrame to filter.
        location_type (str): The type of location data to filter by. 
            Valid options are "1" for latitude and longitude, and "2" for column.

    Returns:
        pandas.DataFrame: The filtered DataFrame.
    """
    if location_type == "1":
        # Latitude and Longitude Potion
        # Display Columns
        print("Columns:")
        for i, col in enumerate(df.columns, start=1):
            print(f"{i}. {col}")

        # Ask for Latitude Column
        latitude_column = int(input("Which column represents latitude? (Enter the column number): ")) - 1
        df.rename(columns={df.columns[latitude_column]: "lat"}, inplace=True)

        # Ask for Longitude Column
        longitude_column = int(input("Which column represents longitude? (Enter the column number): ")) - 1
        df.rename(columns={df.columns[longitude_column]: "longitude"}, inplace=True)

        # Ask about Clipping
        use_clipping = input("Do you have a shapefile/geopackage for clipping? (yes/no): ")

        if use_clipping.lower() == "yes":
            # Step 3: Ask the user how they want to clip the dataset
            clip_option = input("How would you like to clip the dataset?\n1. Using latitude and longitude bounding box coordinates\n2. Using a shapefile/geopackage\nEnter the option number (1 or 2): ")

            if clip_option == "1":
                # Collect bounding box coordinates
                min_lat = float(input("Enter the minimum latitude: "))
                max_lat = float(input("Enter the maximum latitude: "))
                min_lon = float(input("Enter the minimum longitude: "))
                max_lon = float(input("Enter the maximum longitude: "))
                bbox = (min_lon, min_lat, max_lon, max_lat)  # Create bounding box
                df = clip_dataframe(df, use_clipping, clip_option, bbox=bbox)

            elif clip_option == "2":
                # Ask the user for the name of the shapefile/geopackage
                logging.info("Please make sure the shapefile/geopackage is in the same folder as the code.")
                clip_file = input("Enter the file name for clipping (including the extension): ")

                # Check if the file is a shapefile or a geopackage
                if clip_file.endswith(".shp"):
                    clip_geodata = gpd.read_file(clip_file)
                elif clip_file.endswith((".gpkg", ".gpkg2")):
                    clip_geodata = gpd.read_file(clip_file, driver="GPKG")
                else:
                    print("Invalid file format. Please use a shapefile or a geopackage.")
                    sys.exit(1)

                # Clip the data using the shapefile/geopackage
                df = gpd.clip(df, clip_geodata)

    elif location_type == "2":
        # Column Potion
        # Display Columns
        print("Columns:")
        for i, col in enumerate(df.columns, start=1):
            print(f"{i}. {col}")

        # Ask for Location Column
        location_column = int(input("Which column has the location data? (Enter the column number): ")) - 1
        location_values = df.iloc[:, location_column].unique()

        # Display Unique Values
        print("Unique Values:")
        for i, value in enumerate(location_values, start=1):
            print(f"{i}. {value}")

        # Ask for Values to Keep
        keep_values_input = input("Which values would you like to keep? (e.g. 2,5,7 or 3-9): ")
        keep_indices = [int(idx) - 1 for idx in keep_values_input.replace(" ", "").split(",")]
        df = df[df.iloc[:, location_column].isin(location_values[keep_indices])]

    else:
        print("Invalid input for location type!")
        sys.exit(1)

    return df

### Clipping the Dataframe using Spatial Data

The provided Python function, clip_dataframe(), is designed to clip a pandas DataFrame based on specified clipping options. Clipping is a process in which you limit the extent of your data to a certain area. This function is particularly useful when dealing with geospatial data, where you might want to focus on a specific geographical area.

The function takes five arguments:

df: The DataFrame to be clipped.
use_clipping: A string that indicates whether to perform clipping or not. Valid values are "yes" or "no".
clip_option: A string that specifies the clipping option to use. Valid values are "1" or "2".
clip_file: An optional argument that specifies the file path of the shapefile to be used for clipping. This is required if clip_option is "2".
bbox: An optional argument that specifies the bounding box coordinates [minx, miny, maxx, maxy] to be used for clipping. This is required if clip_option is "1".
The function begins by checking if use_clipping is "yes". If it is, the function proceeds to check the value of clip_option.

If clip_option is "1", the function clips the DataFrame using latitude and longitude bounding box coordinates. This is done using the df.cx[] indexer, which is a pandas function that allows for label-based indexing of multi-dimensional arrays.

If clip_option is "2", the function attempts to read the shapefile specified by clip_file into a GeoDataFrame using the gpd.read_file() function. It then converts the GeoDataFrame to the same coordinate reference system (CRS) as the DataFrame using the gdf.to_crs() function. The DataFrame is then clipped using the gpd.clip() function, which clips a GeoDataFrame to the polygon extent of another GeoDataFrame. If the specified clip_file is not found, or if an error occurs while clipping the dataset, the function logs an error message and exits.

If clip_option is neither "1" nor "2", the function prints an error message and exits.

Finally, the function returns the clipped DataFrame. This function is robust and handles potential errors, such as invalid user input or missing files.


In [None]:
def clip_dataframe(df, use_clipping, clip_option, clip_file=None, bbox=None):
    """
    Clips the given DataFrame based on the specified clipping options.

    Args:
        df (pandas.DataFrame): The DataFrame to be clipped.
        use_clipping (str): Indicates whether to perform clipping or not. Valid values are "yes" or "no".
        clip_option (str): The clipping option to use. Valid values are "1" or "2".
        clip_file (str, optional): The file path of the shapefile to be used for clipping. Required if clip_option is "2".
        bbox (list, optional): The bounding box coordinates [minx, miny, maxx, maxy] to be used for clipping. Required if clip_option is "1".

    Returns:
        pandas.DataFrame: The clipped DataFrame.

    Raises:
        FileNotFoundError: If the specified clip_file is not found.
        Exception: If an error occurs while clipping the dataset.

    """
    if use_clipping.lower() == "yes":
        if clip_option == "1":
            # Clip the dataset using latitude and longitude bounding box coordinates
            df = df.cx[bbox[0]:bbox[2], bbox[1]:bbox[3]]
            logging.info("Clipped dataset using latitude and longitude.")
        elif clip_option == "2":
            try:
                gdf = gpd.read_file(clip_file)
                gdf = gdf.to_crs(df.crs)
                clipped_df = gpd.clip(df, gdf)
                df = clipped_df  # Update the DataFrame with the clipped data
                logging.info(f"Clipped dataset using {clip_file}")

            except FileNotFoundError:
                logging.error(f"File {clip_file} not found. Please make sure the file is in the same folder as the code.")
                sys.exit(1)

            except Exception as e:
                logging.error("Error clipping dataset:", e)
                sys.exit(1)
        else:
            print("Invalid option. Please choose either 1 or 2.")
            sys.exit(1)
    return df

## Arithmatic Modifications 

The provided Python function, modify_columns(df), is designed to modify the values in the columns of a pandas DataFrame based on user input. The function takes one argument, df, which is the DataFrame to be modified.

The function begins by asking the user if they want to modify the values in any of the columns. This is done using the input() function, which waits for the user to type something into the console and press enter. The user's input is then stored in the modify_columns variable.

If the user types "yes" (case-insensitive), the function enters a while loop that continues until the user decides to stop modifying columns. Inside the loop, the function first displays the columns in the DataFrame as a numbered list. This is done using a for loop that iterates over the DataFrame's columns, which are accessed using df.columns. The enumerate() function is used to get the index of each column (starting from 1) along with the column name.

The function then asks the user to select a column to modify by entering the column number. The user's input is stored in the column_number variable. The function attempts to convert this input to an integer and use it to get the column name and values. If the input is not a valid integer, the function catches the ValueError that is raised, prints an error message, and continues to the next iteration of the loop.

The function then checks if the selected column contains numerical values by checking the dtype.kind attribute of the column values. If the column does not contain numerical values, the function prints a message and asks the user if they want to modify another column. If the user types "yes", the function continues to the next iteration of the loop. If the user types anything else, the function breaks out of the loop.

If the column does contain numerical values, the function asks the user to select a mathematical operation to perform on the column. The user can choose from addition, subtraction, multiplication, and division by entering the corresponding operation number. The function also asks the user to enter the value to perform the operation with, and attempts to convert this value to a float.

The function then performs the selected operation on the selected column. This is done using a series of if and elif statements that check the operation number and perform the corresponding operation. If the operation number is not recognized, the function prints an error message.

Finally, the function asks the user if they want to modify another column. If the user types "yes", the function continues to the next iteration of the loop. If the user types anything else, the function breaks out of the loop.

If the user initially types anything other than "yes" when asked if they want to modify columns, the function prints a message and returns the original DataFrame without making any changes. In all cases, the function returns the modified DataFrame.

In [None]:
def modify_columns(df):
    """
    Modifies the values in the columns of a DataFrame based on user input.

    Args:
        df (pandas.DataFrame): The DataFrame to modify.

    Returns:
        pandas.DataFrame: The modified DataFrame.
    """
    modify_columns = input("Do you want to modify the values in any of the columns, perform unit conversions, etc.? (yes/no): ")

    if modify_columns.lower() == "yes":
        while True:
            # Display the columns as a numbered list
            print("Columns:")
            for i, col in enumerate(df.columns, start=1):
                print(f"{i}. {col}")

            # Ask the user to select a column
            column_number = input("Select the column number you want to modify: ")

            try:
                column_number = int(column_number)
                column_name = df.columns[column_number - 1]
                column_values = df[column_name]

                # Check if the column contains numerical values
                if column_values.dtype.kind not in 'biufc':
                    print("Selected column contains non-numerical values.")
                    modify_other_column = input("Do you want to modify any other column? (yes/no): ")
                    if modify_other_column.lower() == "yes":
                        continue
                    else:
                        break

                # Ask the user for the mathematical operation
                operation = input("Select the mathematical operation:\n1. Addition\n2. Subtraction\n3. Multiplication\n4. Division\nEnter the operation number: ")

                # Ask the user for the value to perform the operation with
                value = input("Enter the value to perform the operation with: ")
                value = float(value)

                # Perform the mathematical operation on the selected column
                if operation == "1":
                    df[column_name] = df[column_name] + value
                elif operation == "2":
                    df[column_name] = df[column_name] - value
                elif operation == "3":
                    df[column_name] = df[column_name] * value
                elif operation == "4":
                    df[column_name] = df[column_name] / value
                else:
                    print("Invalid operation number.")

                modify_other_column = input("Do you want to modify any other column? (yes/no): ")
                if modify_other_column.lower() == "yes":
                    continue
                else:
                    break

            except ValueError:
                print("Invalid column number. Please try again.")
    else:
        print("No modifications requested.")

    return df

## Saving the Dataframe

The provided Python function, save_dataframe(df), is designed to save a pandas DataFrame to both an Excel file and a CSV file. The function takes one argument, df, which is the DataFrame to be saved.

The function starts by defining the path to the directory where the files will be saved. This directory is named "results". The function checks if this directory already exists. If it doesn't, the function creates it. This is done using the os.path.exists() function to check for the existence of the directory and the os.makedirs() function to create it if necessary.

Next, the function defines the path to the Excel file that will be created. This is done by joining the path to the "results" directory with the filename "pre_processed_data.xlsx". The function then saves the DataFrame to this Excel file. This is done using the df.to_excel() function, which writes the DataFrame to an Excel file. The index=False argument is used to prevent the DataFrame's index from being saved to the file.

The function then repeats this process to save the DataFrame to a CSV file. The path to the CSV file is defined by joining the path to the "results" directory with the filename "pre_processed_data.csv". The function then saves the DataFrame to this CSV file using the df.to_csv() function, again with index=False to prevent the DataFrame's index from being saved to the file.

Finally, the function prints a message to the console to inform the user that the DataFrame has been saved to both an Excel file and a CSV file in the "results" directory. The function does not return any value. This function is robust and handles the case where the target directory does not already exist.

In [None]:
def save_dataframe(df):
    """
    Save the pre-processed data as an Excel file and a CSV file.

    Args:
        df (pandas.DataFrame): The pre-processed data to be saved.

    Returns:
        None
    """
    # Create the "results" folder if it doesn't exist
    folder_path = "results"
    if not os.path.exists(folder_path):
        os.makedirs(folder_path)

    # Save the pre-processed data as an Excel file
    excel_file_path = os.path.join(folder_path, "pre_processed_data.xlsx")
    df.to_excel(excel_file_path, index=False)

    # Save the pre-processed data as a CSV file
    csv_file_path = os.path.join(folder_path, "pre_processed_data.csv")
    df.to_csv(csv_file_path, index=False)

    print("Files saved successfully.")



## The Main Function

This `main()` function is the entry point of a Python program designed to process CSV files. It interacts with the user to determine the flow of execution and performs various operations on the data.

Here's a step-by-step explanation:

1. The function first asks the user if they want to merge multiple CSV files into one. If the user answers "yes", the function calls `ask_to_merge_files()` to get the folder path and `merge_csv_files(folder_path)` to merge the files.

2. If the user chooses to work with a single file, the function asks for the file name and reads the file using `read_file(file_name)`.

3. After reading the file(s), the function performs various operations on the DataFrame:
   - `delete_columns(df)`: Deletes specified columns.
   - `rename_columns(df)`: Renames specified columns.
   - If the user has location data and wants to filter/clip the dataset using it, the function asks for the type of location data (latitude and longitude data or a column like state/nation/etc.) and calls `filter_by_location_data(df, location_type)` to perform the filtering.
   - `modify_columns(df)`: Modifies the values in specified columns.
   - `save_dataframe(df)`: Saves the DataFrame as an Excel file and a CSV file.

Please note that the actual implementations of the functions `ask_to_merge_files()`, `merge_csv_files()`, `read_file()`, `delete_columns()`, `rename_columns()`, `filter_by_location_data()`, `modify_columns()`, and `save_dataframe()` are not provided in this code snippet. The behavior of this `main()` function depends on how these functions are implemented.

In [None]:
def main():
    """
    This function is the main entry point of the program.
    It prompts the user for input to determine the flow of execution.
    If the user chooses to merge multiple CSV files into one, it asks for the folder path and merges the files.
    If the user chooses to work with a single file, it asks for the file name and reads the file.
    After reading the file, it performs various operations on the DataFrame such as deleting columns, renaming columns,
    filtering by location data, modifying columns, and saving the DataFrame.
    """
    has_multiple_files = input("Do you want to merge multiple CSV files into one? (yes/no): ")

    if has_multiple_files.lower() == "yes":
        folder_path = ask_to_merge_files()

        if folder_path is not None:
            df = merge_csv_files(folder_path)

            if df is not None:
                df = delete_columns(df)
                df = rename_columns(df)

                has_location_data = input("Do you have location data and want to filter/clip the dataset using it? \n You can use a geopackage/shapefile or column with state/nation names \n If you have any other format kindly state 'no' this step as it might not be supported. (yes/no): ")

                if has_location_data.lower() == "yes":
                    location_type = input("Do you have location data?\n1. Latitude and Longitude data\n2. A column like state/nation/etc.\nChoose a potion (enter the corresponding number): ")
                    df = filter_by_location_data(df, location_type)

                df = modify_columns(df)
                save_dataframe(df)

    else:
        file_name = input("Please enter the name of the file (including the extension): ")
        df = read_file(file_name)

        if df is not None:
            df = delete_columns(df)
            df = rename_columns(df)
            save_dataframe(df)

            has_location_data = input("Do you have location data and want to filter/clip the dataset using it? \n You can use a geopackage/shapefile or column with state/nation names \n If you have any other format kindly state 'no' this step as it might not be supported. (yes/no): ")

            if has_location_data.lower() == "yes":
                location_type = input("Do you have location data?\n1. Latitude and Longitude data\n2. A column like state/nation/etc.\nChoose a potion (enter the corresponding number): ")
                df = filter_by_location_data(df, location_type)

            df = modify_columns(df)
            save_dataframe(df)
