<h1 style="color:blue;">Data Conversion</h1>

## Purpose
Simplify files so that Python can read and parse data easily. Get scripts to process data into different filetypes, and number formats. Convert the time data to a date time that can syncronize a hydrograph input to a date time input.

## View Video
The video for this section is in sharefile `04_01 ASCII Data Conversion.mp4`.

## Ask ChatGPT for a Python Script

- **Objective**: I need a Python script to parse a `hydrostruct.out` file into a more readable csv file.
- **Data**: The script should read structure name, time, inflow, and outflow from each section.
- **Skip Lines**: The first dataset starts with 'THE MAXIMUM', skip empty lines.
- **Output Format**: Save the output in a CSV file named `hydrostruct_simple.csv` with columns for the hydrograph name, time, inflow, and outflow.
- **Dependencies**: Use Pandas.
- **File Paths**: Include variables to set file paths for reading the input and writing the output.
- **Path**: Update this path. `C:\Users\User\Chat GPT Workshop\Data\Ascii`
- **Request**: Can you help me write this Python script using the details provided?

### ðŸ”½ Code Block
***The following cell is an example of what ChatGPT can build.  You can run this code with the dashboard controls or by "shift-enter"



In [1]:
import datetime

# Create a list of dates
dates = [datetime.date(2025, 3, 1), datetime.date(2025, 3, 2), datetime.date(2025, 3, 3)]

# Print the original dates
print("Original dates:")
for date in dates:
    print(date)

# Convert dates to different formats
formatted_dates = []
for date in dates:
    # Format the date as "Month day, Year"
    formatted_date = date.strftime("%B %d, %Y")
    formatted_dates.append(formatted_date)

# Print the formatted dates
print("\nFormatted dates:")
for date in formatted_dates:
    print(date)


Original dates:
2025-03-01
2025-03-02
2025-03-03

Formatted dates:
March 01, 2025
March 02, 2025
March 03, 2025


## Example ChatGPT Query

- **Objective**: I need a Python script to parse a `hydrostruct.out` file into a more readable csv file.
- **Data**: The script should read structure name, time, inflow, and outflow from each section.
- **Skip Lines**: The first dataset starts with 'THE MAXIMUM', skip empty lines.
- **Output Format**: Save the output in a CSV file named `hydrostruct_simple.csv` with columns for the hydrograph name, time, inflow, and outflow.
- **Dependencies**: Use Pandas.
- **File Paths**: Include variables to set file paths for reading the input and writing the output.
- **Path**: `C:\Users\Karen\Chat GPT Workshop\Data\Ascii`
- **Request**: Can you write a Python to write the data to the output file using the details provided?

## ðŸ”½ Code Block

The following code block is the result of this query. The bold text is not necessary in your queries.

In [28]:
import pandas as pd

def parse_hydrostruct(file_path, output_path):
    # Initialize data storage
    data = {
        'Structure_Name': [],
        'Time_hrs': [],
        'Inflow_cfs': [],
        'Outflow_cfs': []
    }

    # Open the file and read lines
    with open(file_path, 'r') as file:
        lines = file.readlines()

    # Clean and filter out unnecessary lines
    lines = [line.strip() for line in lines if line.strip() and not line.strip().startswith('INFLOW')]

    # Process lines
    current_structure = None
    for line in lines:
        if line.startswith('THE MAXIMUM'):
            current_structure = line.split(':')[1].split('STRUCTURE')[0].strip()
        elif line[0].isdigit():
            time, inflow, outflow = line.split()
            data['Structure_Name'].append(current_structure)
            data['Time_hrs'].append(float(time))
            data['Inflow_cfs'].append(float(inflow))
            data['Outflow_cfs'].append(float(outflow))

    # Create DataFrame
    df = pd.DataFrame(data)

    # Save to CSV
    df.to_csv(output_path, index=False)
    print(f"Data written to {output_path}")

# File paths
input_path = 'C:\\Users\\Karen\\Chat GPT Workshop\\Data\\Ascii\\HYDROSTRUCT.OUT'
output_path = 'C:\\Users\\Karen\\Chat GPT Workshop\\Data\\Ascii\\hydrostruct_simple.csv'

# Run the function
parse_hydrostruct(input_path, output_path)


Data written to C:\Users\Karen\Chat GPT Workshop\Data\Ascii\hydrostruct_simple.csv


<h1 style="color:blue;">Date Time Conversion</h1>

## Purpose
Convert the hydrograph time data with a date time format.

## View Video
The video for this section is in sharefile `04_02 Date Time Conversion.mp4`.

## Ask ChatGPT for a Python Script

- **Objective**: I need a Python script to replace the time (hrs) in `hydrostruct_simple.csv` file into a different date time format.
- **Data**: Time is in col2, new format `mm/dd/yyyy hh:mm:ss Start time = 03/13/2025 00:00:00`
- **Add Zero Time**: The datasets are missing zero time.  Can you one row for struct_name 0.0 time, 0.0 dicharge, 0.0 discharge to each dataset.
- **Output Format**: Save the output in a CSV file named `hydrostruct_datetime.csv` with columns for the StructName, datetime, inflow_cfs, and outflow_cfs.
- **Dependencies**: Use Pandas.
- **File Paths**: Include variables to set file paths for reading the input and writing the output.
- **Path**: Update this path. r`C:\Users\User\Chat GPT Workshop\Data\Ascii`
- **Request**: Can you help me write this Python script using the details provided?

### ðŸ”½ Code Block
***The following cell is an example of what ChatGPT can build.  You can run this code with the dashboard controls or by "shift-enter"



In [33]:
import pandas as pd
from datetime import datetime, timedelta

# Define file paths
input_file_path = r'C:\Users\Karen\Chat GPT Workshop\Data\Ascii\hydrostruct_simple.csv'
output_file_path = r'C:\Users\Karen\Chat GPT Workshop\Data\Ascii\hydrostruct_datetime.csv'

# Load the data
df = pd.read_csv(input_file_path)

# Start time for the dataset
start_time = datetime.strptime('03/13/2025 00:00:00', '%m/%d/%Y %H:%M:%S')

# Calculate the datetime for each time entry
df['datetime'] = df['Time_hrs'].apply(lambda x: start_time + timedelta(hours=x))

# Find unique structure names
unique_structures = df['Structure_Name'].unique()

# Create a DataFrame for zero time entries
zero_time_rows = []
for struct in unique_structures:
    zero_time_data = {'Structure_Name': struct, 'Time_hrs': 0.0, 'Inflow_cfs': 0.0, 'Outflow_cfs': 0.0, 
                      'datetime': start_time}
    zero_time_rows.append(zero_time_data)

zero_time_df = pd.DataFrame(zero_time_rows)

# Append zero time rows to the original DataFrame
df = pd.concat([zero_time_df, df], ignore_index=True)

# Reorder DataFrame by Structure_Name and datetime
df = df.sort_values(by=['Structure_Name', 'datetime'])

# Select and reorder the final columns
final_df = df[['Structure_Name', 'datetime', 'Inflow_cfs', 'Outflow_cfs']]

# Save the processed data to a new CSV file
final_df.to_csv(output_file_path, index=False)

print(f'Data has been processed and saved to {output_file_path}')


Data has been processed and saved to C:\Users\Karen\Chat GPT Workshop\Data\Ascii\hydrostruct_datetime.csv


<h1 style="color:blue;">Numerical Precision and Float Variables</h1>

## Purpose
Precision is crucial in 2D modeling and large data operations because it affects the accuracy of calculations and the reliability of results. Inadequate precision can lead to errors that may impact analyses and simulations, making it vital to understand how to manage numerical data effectively.

## View Video
The video for this section is in sharefile `04_03 Numerical Precision.mp4`.

## Examples
Here are some common operations involving float variables:

- **Calculate the area of every grid element in a mesh**:
- **Convert a float to an integer**:
- **Round a float to an integer**:

## Instructions
1. **Ask ChatGPT to define floats and integers.** Understand how a float differs from a real number in terms of representation and precision.

2. **Request ChatGPT to build a list of float values.** This will help you see the diversity of float representations.

3. **Ask ChatGPT to create a script to convert the float values to integers.** This practice will illustrate the conversion process and its implications.

4. **Request a script to sort the data from high to low.** Sorting allows you to analyze data trends and patterns effectively.

By mastering numerical precision and float variables, you can enhance the accuracy and reliability of your data analysis in various applications!

In [24]:
import pandas as pd
import numpy as np

# Set the random seed for reproducibility
np.random.seed(0)

# Create a list of float values
float_values = np.random.uniform(low=10.5, high=99.5, size=20)

# Create a DataFrame with the float values
df_floats = pd.DataFrame(float_values, columns=['Float Values'])

# Convert float values to integers directly (truncating the decimal part)
df_floats['Integer Values'] = df_floats['Float Values'].astype(int)

# Round float values before converting to integers
df_floats['Rounded Integers'] = df_floats['Float Values'].round().astype(int)

# Round float values to 4 decimal places
df_floats['Rounded to 4 Decimals'] = df_floats['Float Values'].round(4)

# Convert integer values back to floats
df_floats['Reconverted Floats'] = df_floats['Integer Values'].astype(float)

# Display the DataFrame with original float values, integer values, rounded integers, rounded to 4 decimals, and reconverted float values
df_floats



Unnamed: 0,Float Values,Integer Values,Rounded Integers,Rounded to 4 Decimals,Reconverted Floats
0,59.344402,59,59,59.3444,59.0
1,74.151854,74,74,74.1519,74.0
2,64.14594,64,64,64.1459,64.0
3,58.994603,58,59,58.9946,58.0
4,48.205277,48,48,48.2053,48.0
5,67.984576,67,68,67.9846,67.0
6,49.445262,49,49,49.4453,49.0
7,89.867797,89,90,89.8678,89.0
8,96.265986,96,96,96.266,96.0
9,44.626295,44,45,44.6263,44.0


<h1 style="color:blue;">Numerical Precision with Point Data</h1>

## Purpose
Precision is crucial in 2D modeling and large data operations because it directly influences the accuracy of spatial calculations and the reliability of results. For instance, calculating areas for grid elements in a mesh requires high precision to avoid small errors that can propagate through an analysis. This example demonstrates how numerical precision is handled when reading mesh data and calculating grid element areas.

## Examples
Here are some common operations involving float variables that you will explore in this notebook:

- **Calculate the area of each grid element in a mesh**: This operation demonstrates how spatial data (x, y coordinates) can be used to compute areas with precision.
  
## Instructions
1. **Ask ChatGPT to explain float precision.** Understand how float representation can influence the accuracy of spatial calculations.
2. **Ask ChatGPT to calculate cell size of topo.dat.** This will help you automate data reading and area calculation for meshes.

By mastering precision in numerical data handling, you can enhance the accuracy of your floodplain modeling and grid analysis!


## Example ChatGPT Query

- **Objective**: Calculate the grid cell size and area for a centroid point file with space-delimited x, y, z data.
- **Data**: Each row in the file represents a centroid with x, y, and z coordinates. No headers are present.
- **Skip Lines**: Not applicable as there are no headers or initial lines to skip.
- **Output Format**: Save the output in a CSV file named `topoarea.csv` with columns for x, y, z, grid size, and area.
- **Dependencies**: Use Pandas to handle data reading, calculations, and writing to CSV.
- **File Paths**: Include variables to set file paths for reading the input (`topo.dat`) and writing the output (`topoarea.csv`).
- **Path**: `C:\Users\Karen\Chat GPT Workshop\Data\Grids`
- **Request**: Can you create a Python script to calculate grid sizes based on consecutive y-values and compute areas assuming square cells, then write this data to a CSV file?

## ðŸ”½ Code Block

The following code block is the result of this query. (Insert your Python code here if needed)


In [41]:
import pandas as pd

# File paths
input_file_path = r'C:\Users\Karen\Chat GPT Workshop\Data\Grids\topo.dat'
output_file_path = r'C:\Users\Karen\Chat GPT Workshop\Data\Grids\topoarea.csv'

# Load data from space-delimited file without headers
data = pd.read_csv(input_file_path, header=None, sep=r'\s+', names=['x', 'y', 'z'])

# Calculate grid size assuming it's determined by the difference in consecutive y-values
data['grid_size'] = data['y'].diff().abs()

# Calculate the area of the square cell (grid size squared)
data['area'] = data['grid_size'] ** 2

# Handle the first row where the grid size calculation might result in a NaN
#data.fillna(method='bfill', inplace=True)

# Save to CSV
data.to_csv(output_file_path, index=False)

print(f'Data saved to {output_file_path}')


Data saved to C:\Users\Karen\Chat GPT Workshop\Data\Grids\topoarea.csv


<h1 style="color:blue;">Numerical Precision with Polygon Grid Data</h1>

## Purpose
Precision plays a key role in geoprocessing tasks, particularly when calculating areas from polygon grids in GIS. In this example, we calculate the area of each cell in a polygon grid system. While this method is often used in geospatial analyses, it may not be as precise as methods based on structured grid systems. Understanding the limitations of precision in such operations is crucial for accurate modeling and analysis.

## Examples
Here are some common operations involving float variables and geospatial data:

- **Calculate the area of each polygon in a shapefile**: This shows how spatial polygons from a grid system can be used to compute areas.
- **Ensure valid geometries and correct CRS**: Ensuring that the geometries are valid and in the correct coordinate system is crucial for accuracy.
- **Convert areas to specific units**: The area can be converted to other units (e.g., hectares) as needed.

## Instructions
1. **Ask ChatGPT to define the role of CRS (Coordinate Reference System).** Understanding how CRS affects spatial calculations is essential for precision.
2. **Request ChatGPT to create a script calculate the area of a grid shapefile.** This will help you understand how to automate geospatial data handling.
3. **Load Shapefile in QGIS** Load the file and open the attribute table so you can use some more data editors.
4. **Practice calculating polygon areas**: Try converting areas from square meters to other units to gain insight into precision control in geoprocessing.  This can be completed using the field calculator in QGIS. Ask ChatGPT for instructions.
5. **Explore further operations, such as filtering or sorting the polygon data**: See how you can manage and analyze geospatial data more effectively by sorting an attribute table or filtering values within or outside of a specific range.  Ask ChatGPT how to perform this task using the Grid shapefile and the attribute table editor.

By mastering these geoprocessing techniques and understanding how precision influences polygon-based grid systems, you can improve the accuracy and reliability of spatial analyses in GIS workflows.


## Example Query
- **Objective**: Calculate the grid cell size and area grid shapefile that has a single polygon for each grid element.
- **Data**: grid.shp polygon grid element
- **Output Format**: Save the output in a CSV file named `gridarea.csv` grid size and area.
- **Dependencies**: Use GeoPandas to handle data reading, calculations, and writing to CSV.
- **Path**: `C:\Users\Karen\Chat GPT Workshop\Data\Grids`
- **Request**: Can you create a Python script to calculate grid size and area based polygons in the shapefile then write this data to a CSV file?

## ðŸ”½ Code Block

The following code block is the result of this query. (Insert your Python code here if needed)

In [42]:
import geopandas as gpd

# File paths
input_file_path = r'C:\Users\Karen\Chat GPT Workshop\Data\Grids\grid.shp'
output_file_path = r'C:\Users\Karen\Chat GPT Workshop\Data\Grids\gridarea.csv'

# Load the shapefile
gdf = gpd.read_file(input_file_path)

# Calculate the area of each polygon (assumed to be in CRS units that represent meters or feet)
gdf['area'] = gdf['geometry'].area

# Assuming a square grid, calculate the grid size as the square root of the area
gdf['grid_size'] = gdf['area'] ** 0.5

# Save to CSV
gdf[['grid_size', 'area']].to_csv(output_file_path, index=False)

print(f'Data saved to {output_file_path}')


Data saved to C:\Users\Karen\Chat GPT Workshop\Data\Grids\gridarea.csv


In [44]:
import geopandas as gpd

# File paths
input_file_path = r'C:\Users\Karen\Chat GPT Workshop\Data\Grids\grid.shp'
output_file_path = r'C:\Users\Karen\Chat GPT Workshop\Data\Grids\gridarea.csv'

# Load the shapefile
gdf = gpd.read_file(input_file_path)

# Calculate the area of each polygon (assumed to be in CRS units that represent meters or feet)
gdf['area'] = gdf['geometry'].area

# Assuming a square grid, calculate the grid size as the square root of the area
gdf['grid_size'] = gdf['area'] ** 0.5

# Round grid size and area to the nearest integer
gdf['grid_size'] = gdf['grid_size'].round().astype(int)
gdf['area'] = gdf['area'].round().astype(int)

# Save to CSV
gdf[['grid_size', 'area']].to_csv(output_file_path, index=False)

print(f'Data saved to {output_file_path}')



Data saved to C:\Users\Karen\Chat GPT Workshop\Data\Grids\gridarea.csv


<h1 style="color:blue;">Comparing Files Using Python</h1>

## Introduction
Comparing data is a common requirement in many data analysis, and system administration tasks. This section will introduce you to basic techniques for comparing files using Python.

## Objectives
- Understand how to use Python for file comparison.
- Learn to compare text files line by line.
- Explore methods to compare binary files.
- Implement file comparison in practical programming scenarios.

## Tools and Libraries
- **`filecmp` module**: A module that provides functions to compare files and directories in Python.
- **`difflib` library**: Useful for identifying differences between sequences, including lines in text files.

## 1. Comparing Text Files
To compare text files, you can read the files line by line and identify differences using the `difflib` library. This allows you to see exactly what has changed between the two files.

## 2. Getting a List of Files
To build a list of files in a directory, follow these steps:

1. **Navigate to the Directory**: Open File Explorer and go to the desired directory.
2. **Select All Files**: Press `CTRL + A` to select all files in the directory.
3. **Copy as Path**: Hold `Shift`, right-click, and select "Copy as path."
4. This will be the list that you feed to ChatGPT.

If you encounter issues with this process, you can ask ChatGPT to help build a list, even if the data is not on a single line.

## 3. Getting a Script from ChatGPT
Ask ChatGPT to write a Python script that performs the file comparison for you.



## Example Query
- **Objective**: Compare text files across two model folders for differences, ignoring variations in whitespace.
- **Data**: Text files named `DEPFP.OUT, FINALDEP.OUT, FPINFILTRATION.OUT`.
- **Output Format**: Output whether there are differences directly to the console and log specific differences to a file named `diff_log.txt`.
- **Dependencies**: Use Python standard libraries for file handling and text manipulation.
- **Path**: `C:\Users\Karen\Chat GPT Workshop\Data\File Diff`
- **Request**: Create a Python script that compares text files in 'Model1' and 'Model2' folders.
- **White Space**: Ignore whitespace differences using strip and split methods. 
- **Output**: Write results to the console and log specific differences to a file.

## ðŸ”½ Code Block

The following code block is the result of this query. (Insert your Python code here if needed)

In [52]:
import os

def compare_files(path1, path2):
    with open(path1, 'r') as file1, open(path2, 'r') as file2:
        lines1 = file1.readlines()
        lines2 = file2.readlines()
    
    differences = []
    for line1, line2 in zip(lines1, lines2):
        # Strip whitespace from the ends and split by whitespace to ignore differences within lines
        if line1.strip().split() != line2.strip().split():
            differences.append((line1, line2))
    return differences

def main():
    base_path = r'C:\Users\Karen\Chat GPT Workshop\Data\File Diff'
    models = ['Model1', 'Model2']
    files = ['DEPFP.OUT', 'FINALDEP.OUT', 'FPINFILTRATION.OUT']
    
    # Create or open the log file
    with open(os.path.join(base_path, 'diff_log.txt'), 'w') as log:
        for file in files:
            file_path1 = os.path.join(base_path, models[0], file)
            file_path2 = os.path.join(base_path, models[1], file)
            
            if os.path.exists(file_path1) and os.path.exists(file_path2):
                differences = compare_files(file_path1, file_path2)
                if differences:
                    print(f"Differences found in {file}:")
                    log.write(f"Differences in {file}:\n")
                    for line1, line2 in differences:
                        log.write(f"Model1: {line1}Model2: {line2}\n")
                else:
                    print(f"No differences in {file}.")
                    log.write(f"No differences in {file}.\n")
            else:
                print(f"One or both files are missing for {file}.")
                log.write(f"One or both files are missing for {file}.\n")

if __name__ == "__main__":
    main()


No differences in DEPFP.OUT.
Differences found in FINALDEP.OUT:
No differences in FPINFILTRATION.OUT.
