# üåä FLO-2D Data Processing Workshop

Welcome to the workshop. In this session, we will use ChatGPT along with other useful programs to process various forms of data.

## What You Will Learn
1. Google Colab and Notebook Overview
2. Integrate ChatGPT into your FLO-2D toolbox.
3. Develop helpful scripts for efficient data processing.

## Suggested Programs
- **QGIS** (Version 3.22 or later)
- **Notepad++**
- **DB Browser** (for SQLite)
- **HDFView**




# üí¶ Getting Started in Google Colab

When the notebook opens in Colab, run the following setup cell to copy the workshop data into your own Google Drive.

Your Google Drive will then contain:

```
My Drive/
    FLO-2D-AI-Workshop-Data/
        Data/
        AI_and_Python_FLO_2D_Data_Processing.ipynb
```

This gives you a full, writable copy of all workshop materials stored safely in your own cloud space.

---

In [1]:
# Clone the workshop repository from GitHub
!git clone https://github.com/FLO-2DSoftware/FLO-2D-AI-Workshop-Data.git

# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

# Copy the workshop folder into your Drive
!cp -r FLO-2D-AI-Workshop-Data "/content/drive/My Drive/FLO-2D-AI-Workshop-Data"

Cloning into 'FLO-2D-AI-Workshop-Data'...
remote: Enumerating objects: 63, done.[K
remote: Counting objects: 100% (27/27), done.[K
remote: Compressing objects: 100% (19/19), done.[K
remote: Total 63 (delta 13), reused 20 (delta 8), pack-reused 36 (from 3)[K
Receiving objects: 100% (63/63), 47.22 MiB | 25.90 MiB/s, done.
Resolving deltas: 100% (18/18), done.
Mounted at /content/drive


# üß° Google Colab Overview

Add a "!" before the code to run a system shell command.

Use the Play button or 'shift - enter' to run a block of code.

In [2]:
!python --version

Python 3.12.12


# üêçPython Modules

Colab supports Python libraries such as pandas, numpy, geopandas, matplotlib, and h5py, making it ideal for processing FLO-2D output files like HDF5, DAT, and OUT files.

Run ```pip list``` to see the extensive list of Python modules included with Google Colab.

In [3]:
!pip list

Package                                  Version
---------------------------------------- --------------------
absl-py                                  1.4.0
accelerate                               1.12.0
access                                   1.1.9
affine                                   2.4.0
aiofiles                                 24.1.0
aiohappyeyeballs                         2.6.1
aiohttp                                  3.13.2
aiosignal                                1.4.0
aiosqlite                                0.21.0
alabaster                                1.0.0
albucore                                 0.0.24
albumentations                           2.0.8
ale-py                                   0.11.2
alembic                                  1.17.2
altair                                   5.5.0
annotated-types                          0.7.0
antlr4-python3-runtime                   4.9.3
anyio                                    4.11.0
anywidget                          

# üì¶ Adding Python Modules

If a Python module is missing, install it using the pip installer.

Run `pip install pyvista pyvistaqt` to install the raster and 3D visualization modules.

**‚ö†Ô∏è Important** Python modules must be installed each time you start a new Google Colab session because the environment resets and does not keep installed packages.

In [4]:
# Install required packages for this workshop
!pip install pyvista pyvistaqt netcdf4

Collecting pyvista
  Downloading pyvista-0.46.4-py3-none-any.whl.metadata (15 kB)
Collecting pyvistaqt
  Downloading pyvistaqt-0.11.3-py3-none-any.whl.metadata (4.8 kB)
Collecting netcdf4
  Downloading netcdf4-1.7.3-cp311-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (1.9 kB)
Collecting vtk!=9.4.0 (from pyvista)
  Downloading vtk-9.5.2-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (5.6 kB)
Collecting QtPy>=1.9.0 (from pyvistaqt)
  Downloading QtPy-2.4.3-py3-none-any.whl.metadata (12 kB)
Collecting cftime (from netcdf4)
  Downloading cftime-1.6.5-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (8.7 kB)
Downloading pyvista-0.46.4-py3-none-any.whl (2.4 MB)
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m2.4/2.4 MB[0m [31m33.9 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading pyvistaqt-0.11.3-py3-none-any.whl (131 kB)
[2K   [90m‚îÅ‚îÅ‚îÅ‚

In [5]:
import pyvista as pv
import netCDF4


# üìÅ Workshop Data Check

1. In Google Drive, right-click the folder named:  
   **FLO-2D AI Workshop Data**

2. Make sure the Data folder is present.  It contains the data we'll use in this
   workshop.

Your folder path should look like this:

```
My Drive
‚îî‚îÄ‚îÄ FLO-2D-AI-Workshop-Data/
      ‚îî‚îÄ‚îÄ Data/
```



In [6]:
import os

base_path = "/content/drive/My Drive/FLO-2D-AI-Workshop-Data/Data"
os.listdir(base_path)


['ascii', 'file_diff', 'geopackage', 'grid', 'hdf5']

# ü§ñ Get Connected to AI

Open your most commonly used ChatBot
- ChatGPT
- Claude
- CoPilot and GitHub CoPilot
- Gemini

These examples use ChatGPT Plus paid account.  You should be able to do these lessons with a free account.

# Parse Data

**Purpose**: Data is often in an incompatible format. ChatGPT can quickly parse data into a pattern that is easier for python to read and process.

**Example**:
- Convert `hydrostruct.out` into a modified file with limited headings.

## Instructions

1. **Open `hydrostruct.out`** in Notepad++:
   - Identify which lines should be removed.
   - Determine which lines can be ignored.
   - Identify what information is important.
   - Locate the part of the file that constitutes the dataset.

2. **Use ChatGPT for Reformat the File**:
   - Ask ChatGPT to simplify the data by:
     - Removing unnecessary lines.
     - Parsing the data into `time`, `hydroInlet`, and `hydroOutlet`.

3. **Provide Specific Details**:
   
   ChatGPT works best when it has specific instructions.  This helps ChatGPT build a Python script that requires minimal modification.
     - Include file names and paths with your query.
     - Use shift-enter to add a new line to the chat message box.
     - Include distict variables names in a query.
     - Build your query in Notepad++ so you can easily modify it.

4. **Modify and Iterate**:
   
   If you modify the script, feed it back to ChatGPT to update the memory that ChatGPT establishes with you so that you can get better feedback.

5. **Request Documentation**:
   
   Ask ChatGPT to add documentation comments to the script.




# ChatGPT Code Example

```
import os

# If the path of the data is different than the class folder, add the full path to the file.
input_file = 'hydrostruct.out'
output_file = 'hydrostructMOD.out'

def process_file_v4(input_path, output_path):
    """
    A process to strip the delimeters and uneeded lines from a hydrostruct.out file.
    """
    with open(input_path, 'r') as file:
        lines = file.readlines()

    # Remove lines 1 to 6 (0 to 5 in 0-based index)
    lines = lines[6:]

    processed_lines = []
    structure_name = None

    for line in lines:
        stripped_line = line.strip()

        # Skip lines with "INFLOW NODE:" and blank lines
        if "INFLOW NODE:" in stripped_line or not stripped_line:
            continue

        # Extract structure name from lines that start with "THE MAXIMUM DISCHARGE FOR:"
        if stripped_line.startswith("THE MAXIMUM DISCHARGE FOR:"):
            structure_name = stripped_line.split(":")[1].strip().split()[0]  # Extract only the first word
            continue

        # Process lines that begin with numbers
        if stripped_line and stripped_line[0].isdigit():
            split_line = stripped_line.split()
            if structure_name:
                processed_lines.append(structure_name)
                structure_name = None  # Reset after using
            processed_lines.append('\t'.join(split_line))

    # Write the processed content to the output file
    with open(output_path, 'w') as out_file:
        out_file.write('\n'.join(processed_lines))

# Execute the function
process_file_v4(input_file, output_file)

print(f"File processed and saved as {output_file}")


```



# **Basic Plotting**
## **Purpose**:

Introduce visual data analysis, which can be more intuitive and reveal trends and outliers quickly.

## **Examples**:
  
Create plots of the hydrostruct_modified.out file.

## **Instruction**:

Open hydrostruct_modified.out to identify to see if the process was successfull.
Ask ChatGPT to plot the data.
AskChaptGPT to modify the code so the plots are saved to png files with the structure name as the file name.


In [None]:
import os
import matplotlib.pyplot as plt

# Path to the uploaded file
file_path = 'hydrostructMOD.out'

def read_hydrostruct_data(file_path):
    """
    Reads the structured data from a given file and organizes it into a dictionary.
    Each structure's data is stored under its name, with separate lists for time,
    inlet discharge, and outlet discharge.

    Parameters:
        file_path (str): The path to the file containing the data.

    Returns:
        dict: A dictionary with structure names as keys and another dictionary
        containing lists of time, inlet, and outlet data as values.
    """
    data = {}
    with open(file_path, 'r') as file:
        current_structure = None
        for line in file:
            if line.strip() == '':
                continue
            parts = line.strip().split()
            if len(parts) == 1:  # Structure name
                current_structure = parts[0]
                data[current_structure] = {'time': [], 'inlet': [], 'outlet': []}
            else:  # Time, inlet, and outlet data
                time, inlet, outlet = map(float, parts)
                data[current_structure]['time'].append(time)
                data[current_structure]['inlet'].append(inlet)
                data[current_structure]['outlet'].append(outlet)
    return data

# Read the data from the file
hydrostruct_data = read_hydrostruct_data(file_path)

def plot_structure_data(structure, time, inlet, outlet):
    """
    Plots the time vs discharge data for a given structure.

    Parameters:
        structure (str): The name of the structure.
        time (list): List of time points.
        inlet (list): List of inlet discharge values.
        outlet (list): List of outlet discharge values.
    """
    plt.figure(figsize=(10, 6))
    plt.plot(time, inlet, label='Discharge Inlet', color='blue')
    plt.plot(time, outlet, label='Discharge Outlet', color='red')
    plt.title(f'Time vs Discharge for {structure}')
    plt.xlabel('Time')
    plt.ylabel('Discharge')
    plt.legend()
    plt.grid(True)
    plt.show()

# Plot the data for each structure
for structure, values in hydrostruct_data.items():
    plot_structure_data(structure, values['time'], values['inlet'], values['outlet'])


# Sample Markdown Document

This is a simple example of Markdown text. Markdown allows you to write using an easy-to-read, easy-to-write plain text format.

## Introduction

Markdown is widely used in blogging, instant messaging, online forums, and documentation for software projects.

### Features

- **Bold Text**
- *Italicized Text*
- `Code snippets`
- [Links](https://www.example.com)
- Lists
  - Bullet lists
  - Numbered lists

#### Code Example

```python
def hello_world():
    print("Hello, world!")

```








# Data Type Conversion
## Purpose:

Explain why data types matter in pandas and Python (e.g., operations that require specific types like datetime).

## Examples:

- Convert a string to a datetime object using pd.to_datetime().
- Change floats to integers using .astype(int).

## Instruction:

 - Ask ChatGPT to create some data so that you can easily manipulate it.  In this case you can get a quick list of dates.
 - Ask ChatGPT to modify the script to add some rows of different date formats.


In [None]:
import pandas as pd
from datetime import datetime, timedelta

# Generate a list of dates
start_date = datetime.now()
date_list = [start_date + timedelta(days=x) for x in range(20)]

# Convert to DataFrame and initially format dates as strings in 'YYYY-MM-DD'
df_dates = pd.DataFrame({'Original Date': [date.strftime('%Y-%m-%d') for date in date_list]})

# Convert the string dates to datetime objects
df_dates['Date Objects'] = pd.to_datetime(df_dates['Original Date'])

# Reformat the dates to 'MM/DD/YYYY'
df_dates['Formatted Date'] = df_dates['Date Objects'].dt.strftime('%m/%d/%Y')

# Display the DataFrame with original and reformatted dates
df_dates


# Numerical Precision and Float Variables
## Purpose:

Why does precision matter in 2D modeling or large data operations?

## Examples:

- Calculate the area of every grid element in a mesh.
- Convert a float to an integer.
- Change floats to integers using .astype(int).

## **Instruction**:

- Ask ChatGPT to define floats and integers. How is a float different than a real number?
- Ask ChatGPT to build a list of Float values.
- Ask ChatGPT to build a script to convert the values to integers.
- Ask ChatGPT to add a script to sort the data from high to low.


In [None]:
import pandas as pd
import numpy as np

# Set the random seed for reproducibility
np.random.seed(0)

# Create a list of float values
float_values = np.random.uniform(low=10.5, high=99.5, size=20)

# Create a DataFrame with the float values
df_floats = pd.DataFrame(float_values, columns=['Float Values'])

# Convert float values to integers directly (truncating the decimal part)
df_floats['Integer Values'] = df_floats['Float Values'].astype(int)

# Round float values before converting to integers
df_floats['Rounded Integers'] = df_floats['Float Values'].round().astype(int)

# Convert integer values back to floats
df_floats['Reconverted Floats'] = df_floats['Integer Values'].astype(float)

# Display the DataFrame with original float values, integer values, rounded integers, and reconverted float values
df_floats


# Comparing Files Using Python

## Introduction
Comparing data is a common requirement in many programming, data analysis, and system administration tasks. Python provides several methods to accomplish this, ranging from simple text file comparison to more complex binary data checks. This section will introduce you to basic and advanced techniques for comparing files using Python.

## Objectives
- Understand how to use Python for file comparison.
- Learn to compare text files line by line.
- Explore methods to compare binary files.
- Implement file comparison in practical programming scenarios.

## Tools and Libraries
- **`filecmp` module**: A module that provides functions to compare files and directories in Python.
- **`difflib` library**: Useful for identifying differences between sequences, including lines in text files.
- **`hashlib` library**: For generating hashes of files to compare contents at a binary level.

## 1. Comparing Text Files
### Basic Comparison with `filecmp`
```python
import difflib

with open('file1.txt', 'r') as file1, open('file2.txt', 'r') as file2:
    file1_lines = file1.readlines()
    file2_lines = file2.readlines()

differ = difflib.Differ()
diff = list(differ.compare(file1_lines, file2_lines))
print('\n'.join(diff))
```

## 2. Get a list of files
### Use Explorer and NotePad++

Build a list of files in a directory.  Navigate to the directory and use CTRL-A to select all files.
Shift-Right Click and Hot Key 'a' to copy the files as a path.
Paste the paths in NotePad++.

The file name is all that is needed so Alt-Select the path and delete it as a block of text.
Ctrl-h and replace the final " with nothing.

Replace the carriage return with a , to create a list.
Ctrl-Home to move to the top of the file.
Ctrl-H to find replace
Set the Search Mode to Extended
Find:  "\n" which is new line.
Replace ", " which adds a coma - space to the end of each word.

If that doesn't work for you ChatGPT can also build a list even if you don't get the data on a single line.

## 3. Get a script from ChatGPT.
Ask ChatGPT to write a pyton script to perform the comparison.

## 4. Get a script from ChatGPT for hashlib.
Ask ChatGPT to repeat the query for haslib.




In [None]:
import filecmp
import os

# Define the paths to the two input directories
dir1 = r'C:\Projects\1 Tech Support\PythonClass\Import Export Test\Import'
dir2 = r'C:\Projects\1 Tech Support\PythonClass\Import Export Test\Export'

# List of files to compare
files_to_compare = [
    'OUTFLOW.DAT', 'RAIN.DAT', 'SWMM.INP', 'SWMMFLO.DAT', 'SWMMFLORT.DAT',
    'SWMMOUTF.DAT', 'TOLER.DAT', 'TOPO.DAT', 'XSEC.DAT', 'ARF.DAT',
    'CADPTS.DAT', 'CHAN.DAT', 'CHANBANK.DAT', 'CONT.DAT', 'FPLAIN.DAT',
    'HYSTRUC.DAT', 'INFIL.DAT', 'INFLOW.DAT', 'LEVEE.DAT', 'MANNINGS_N.DAT'
]

# Function to compare files
def compare_files(file1, file2):
    # Returns True if files are identical, False otherwise
    return filecmp.cmp(file1, file2, shallow=False)

# Iterate through the list of files and compare each one
results = {}
for file_name in files_to_compare:
    file1 = os.path.join(dir1, file_name)
    file2 = os.path.join(dir2, file_name)

    # Check if both files exist before comparing
    if os.path.exists(file1) and os.path.exists(file2):
        result = compare_files(file1, file2)
        results[file_name] = 'Identical' if result else 'Different'
    else:
        # Handle the case where one or both files are missing
        results[file_name] = 'Missing file(s)'

# Print the comparison results
for file_name, result in results.items():
    print(f'{file_name}: {result}')


In [None]:
import hashlib
import os

# Define the paths to the two input directories
dir1 = r'C:\Projects\1 Tech Support\PythonClass\Import Export Test\Import'
dir2 = r'C:\Projects\1 Tech Support\PythonClass\Import Export Test\Export'

# List of files to compare
files_to_compare = [
    'OUTFLOW.DAT', 'RAIN.DAT', 'SWMM.INP', 'SWMMFLO.DAT', 'SWMMFLORT.DAT',
    'SWMMOUTF.DAT', 'TOLER.DAT', 'TOPO.DAT', 'XSEC.DAT', 'ARF.DAT',
    'CADPTS.DAT', 'CHAN.DAT', 'CHANBANK.DAT', 'CONT.DAT', 'FPLAIN.DAT',
    'HYSTRUC.DAT', 'INFIL.DAT', 'INFLOW.DAT', 'LEVEE.DAT', 'MANNINGS_N.DAT'
]

# Function to generate hash of a file's contents
def hash_file(filepath):
    hash_alg = hashlib.sha256()  # other algorithms like sha512, md5, etc.
    with open(filepath, 'rb') as f:  # Open file in binary mode for reading
        for chunk in iter(lambda: f.read(4096), b""):  # chunks avoid memory overload
            hash_alg.update(chunk)
    return hash_alg.hexdigest()

# Compare files by their hash values
results = {}
for file_name in files_to_compare:
    file1 = os.path.join(dir1, file_name)
    file2 = os.path.join(dir2, file_name)

    # Check if both files exist before comparing
    if os.path.exists(file1) and os.path.exists(file2):
        hash1 = hash_file(file1)
        hash2 = hash_file(file2)
        results[file_name] = 'Identical' if hash1 == hash2 else 'Different'
    else:
        # Handle the case where one or both files are missing
        results[file_name] = 'Missing file(s)'

# Print the comparison results
for file_name, result in results.items():
    print(f'{file_name}: {result}')


# Notepad++ Regular Expression (Regex) Codes

### Introduction
These regular expression patterns can be used in Notepad++'s Find and Replace dialog to perform complex text manipulations efficiently. Always ensure "Regular expression" is selected under "Search Mode" when using these patterns.

### Objectives
- Explore regular expression codes.
- Learn to search by expressions.

Regular expressions are powerful tools for searching and manipulating text. Here's a list of commonly used regex codes in Notepad++:

### Basic Regex Codes

1. **`.` (Dot)**
   - Matches any single character except newline characters.
   - **Example**: `a.c` matches "abc", "adc", "a c", etc.

2. **`^` (Caret)**
   - Matches the start of a line.
   - **Example**: `^Hello` finds "Hello" at the beginning of lines.

3. **`$` (Dollar)**
   - Matches the end of a line.
   - **Example**: `end$` finds "end" at the end of lines.

4. **`*` (Asterisk)**
   - Matches zero or more of the preceding element.
   - **Example**: `lo*` matches "l", "lo", "loo", etc.

5. **`+` (Plus)**
   - Matches one or more of the preceding element.
   - **Example**: `lo+` matches "lo", "loo", "looo", etc.

6. **`?` (Question Mark)**
   - Makes the preceding character optional (matches zero or one occurrence).
   - **Example**: `colou?r` matches "color" and "colour".

7. **`[]` (Character Class)**
   - Matches any single character contained within the brackets.
   - **Example**: `[aeiou]` matches any vowel.

8. **`[^]` (Negated Character Class)**
   - Matches any single character not contained within the brackets.
   - **Example**: `[^aeiou]` matches any non-vowel.

9. **`{n}`**
   - Matches exactly `n` occurrences of the preceding character.
   - **Example**: `lo{2}` matches "loo".

10. **`{n,}`**
    - Matches `n` or more occurrences of the preceding element.
    - **Example**: `lo{2,}` matches "loo", "looo", "loooo", etc.

11. **`{n,m}`**
    - Matches from `n` to `m` occurrences of the preceding character.
    - **Example**: `lo{1,3}` matches "lo", "loo", "looo".

12. **`\` (Backslash)**
    - Escapes a special character.
    - **Example**: `\.` matches a literal dot.

13. **`|` (Pipe)**
    - Acts as a logical OR.
    - **Example**: `cat|dog` matches "cat" or "dog".

14. **`()` (Grouping)**
    - Groups multiple tokens together and creates a capture group for extracting a substring or using back-references.
    - **Example**: `(abc)+` matches "abc", "abcabc", "abcabcabc", etc.

15. **`\1, \2, ...` (Back-references)**
    - Matches the same text as previously matched by a capturing group.
    - **Example**: `(abc)\1` matches "abcabc".

16. **`\s`**
    - Matches any whitespace character (spaces, tabs, line breaks).
    - **Example**: `\s+` matches any sequence of whitespace.

17. **`\S`**
    - Matches any non-whitespace character.
    - **Example**: `\S+` matches any sequence of non-whitespace characters.

18. **`\d`**
    - Matches any digit (equivalent to `[0-9]`).
    - **Example**: `\d+` matches any sequence of digits.

19. **`\D`**
    - Matches any non-digit.
    - **Example**: `\D+` matches any sequence of non-digit characters.

20. **`\w`**
    - Matches any word character (letters, digits, or underscore).
    - **Example**: `\w+` matches any sequence of word characters.

21. **`\W`**
    - Matches any non-word character.
    - **Example**: `\W+` matches any sequence of non-word characters.

Use these patterns in Notepad++'s Find and Replace dialog by selecting "Regular expression" under "Search Mode".



# Regular Expression Test

Copy this text into Notepad++ to try some regular expression searches.

---
---
Hello, welcome to the regex playground! Here are some lines to test:
    
1. The quick brown fox jumps over 13 lazy dogs.
2. 2023-08-30 is a significant date for project launch.
3. Email addresses like john.doe@example.com should be matched.
4. Look for special characters like `%`, `$`, and `&` within this line.
5. The cost of the item was $299.99 on 2020-12-01.
6. My phone number is 555-1234-567, call me maybe!
7. Find lines with only one word: Success
8. This line contains, commas, semicolons; and colons: should be interesting.
9. Match digits like 12345 and non-digits with characters together 123abc.
10. Identify lines that end with a period.
11. Mr. Smith bought cheapsite.com for 1.5 million dollars, i.e., he paid a lot for it.
12. Hello? Who is there? It's me, wondering why you're not here!
13. Catch multi-line statements
that break over two lines.
14. Try matching line breaks and tabs 		here.
15. There should be lines that contain the word 'lines' multiple times in different lines.
16. What about matching words with apostrophes like it's, you're, and they're?
17. Look for patterns that start with a capital letter and end with a question mark?
18. This line is very simple.
19. End of the list.

---
---

##Try these Queries.

1.   Find any digit: Use the regex \d to find all the digits in the document.
2.   Match email addresses: You might use something like \b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b.
3.   Identify monetary values: Try \$\d+(\.\d{2})? to find patterns like $299.99.
4.   Search for multiline statements: Using a pattern like (?s)multi-line.*? lines\. could match across lines.


In [None]:
Ask ChatGPT to break this expression down. \b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b.