<a href="https://colab.research.google.com/github/HansHenseler/masdav2023/blob/main/Part_6_ChatGPT_copilot_DAV.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# ChatGPT as a copilot for data analytics and visualization in digital forensics

Part 6

Master of Advanced Studies in Digital Forensics & Cyber Investigation

Data Analytics and Visualization for Digital Forensics

(c) Hans Henseler, 2023


# Part 6

The following Python code was written with the assistance of ChatGPT 4. The first part generates test data that with a format that is the same as the DirPrint excel file that we used in Part 1. It will help us to test. The second part reorganizes the data and draws a treemap using plotly. After the code the sessions with ChatGPT have been copied that illustrate how ChatGPT assisted us and how we prompted ChatGPT to get the result that we wanted. We could refine the code and actually present Python runtime or syntax errors based on which ChatGPT was able to make corrections (successfully).


In [None]:
# Install and imports needed for the scripts in 1 and 2

!pip install plotly

import plotly.express as px
import pandas as pd
import random
from datetime import datetime

## 1 Script to generate test data

We want the a csv file with data that is similar to the DirPrint_Filelist - clean.xlsx. This is what that looked like:

```
Attributes	Created	LastMod	LastAcc	Size	Name	Path	Folder	Ext	Md5
---A----	05/01/2021 19:19	05/01/2021 19:19	12/08/2021 20:58	22.639	aksdrvsetup.log	C:\Windows\aksdrvsetup.log	C:\Windows\	log	50D08E904772E94EB4B16817EB5BE4F1
#NAAM?	22/04/2021 19:57	22/04/2021 19:57	12/08/2021 20:58	77.824	bfsvc.exe	C:\Windows\bfsvc.exe	C:\Windows\	exe	862251351EC170BB3BD59CB219491811
---A----	18/08/2020 10:52	18/08/2020 10:54	12/08/2021 20:58	3.145.782	BGInfo.bmp	C:\Windows\BGInfo.bmp	C:\Windows\	bmp	EC80E9F3112F716063C273FC1F75F552
```

In [None]:
# Function to create a file row
def create_file(folder, ext, size):
    name = f'file_{random.randint(1, 1000000)}.{ext}'
    path = f'{folder}\\{name}'
    created = datetime.now().strftime("%d/%m/%Y %H:%M")
    return ['---A----', created, created, created, size, name, path, folder, ext, '']

# File extensions for various types
file_types = ['docx', 'xlsx', 'pptx', 'jpg', 'png', 'exe', 'mp4', 'mp3']

# Create an empty DataFrame
columns = ['Attributes', 'Created', 'LastMod', 'LastAcc', 'Size', 'Name', 'Path', 'Folder', 'Ext', 'Md5']
df = pd.DataFrame(columns=columns)

# Create root folders
for i in range(3):
    root_folder = f'C:\\Folder_{i}'

    # Create subfolders
    for j in range(random.randint(5, 15)):
        subfolder = f'{root_folder}\\Subfolder_{j}'

        # Create files in subfolder
        for _ in range(random.randint(20, 50)):
            ext = random.choice(file_types)
            size = random.uniform(5, 500) # Size in KB
            file_row = create_file(subfolder, ext, size)
            df.loc[len(df)] = file_row

        # Create files in the root folder
        for _ in range(random.randint(10, 20)):
            ext = random.choice(file_types)
            size = random.uniform(5, 500) # Size in KB
            file_row = create_file(root_folder, ext, size)
            df.loc[len(df)] = file_row

# Write to CSV
df.to_csv('folder_structure.csv', index=False)


##2 Presenting the data as a treemap


In [None]:
# Read the CSV file created earlier
df = pd.read_csv('folder_structure.csv')

# This code is for processing the actual xlsx file
# file_path = "DirPrint_Filelist - clean.xlsx"
# df = pd.read_excel(file_path)


# Split the 'Path' column into a list
df['path_list'] = df['Path'].str.split('\\')

# Limit to a specific number of levels
max_display_levels = 4

# Create columns for the limited levels
for i in range(max_display_levels):
    df[f'level{i}'] = df['path_list'].apply(lambda x: x[i] if i < len(x) else None)

# Create a treemap, with size based on 'Size' and color based on 'Ext'
fig = px.treemap(df, path=[f'level{i}' for i in range(max_display_levels)], values='Size', color='Ext')

# Show the treemap
fig.show()




# ChatGPT helping with understanding our data and generating test data

## 1 The first prompt

I'm a python programmer and need some assistance with analysing and visualising data that is coded in a csv file. Here is are the first rows of the file:
Attributes	Created	LastMod	LastAcc	Size	Name	Path	Folder	Ext	Md5
---A----	05/01/2021 19:19	05/01/2021 19:19	12/08/2021 20:58	22.639	aksdrvsetup.log	C:\Windows\aksdrvsetup.log	C:\Windows\	log	50D08E904772E94EB4B16817EB5BE4F1
NAAM?	22/04/2021 19:57	22/04/2021 19:57	12/08/2021 20:58	77.824	bfsvc.exe	C:\Windows\bfsvc.exe	C:\Windows\	exe	862251351EC170BB3BD59CB219491811
---A----	18/08/2020 10:52	18/08/2020 10:54	12/08/2021 20:58	3.145.782	BGInfo.bmp	C:\Windows\BGInfo.bmp	C:\Windows\	bmp	EC80E9F3112F716063C273FC1F75F552

## Response from ChatGPT

Sure, I can assist you in analyzing and visualizing the data from the CSV file you provided. Based on the provided data, it seems like you're dealing with file attributes, creation dates, modification dates, access dates, sizes, names, paths, folders, extensions, and MD5 hashes.

Here's what you can do to start analyzing this data using Python:

### Step 1: Load the Data
You can use the pandas library to load the CSV data into a DataFrame. You might need to handle the `#NAAM?` value as well, as it appears to be an error or placeholder.

```python
import pandas as pd

# Read the CSV file
file_path = 'your_file.csv'
df = pd.read_csv(file_path, na_values=['#NAAM?'])

# Preview the first few rows
print(df.head())
```

### Step 2: Data Cleaning
If necessary, you may need to perform some cleaning on the data, such as converting date strings to datetime objects, handling missing values, etc.

```python
# Convert date columns to datetime
date_columns = ['Created', 'LastMod', 'LastAcc']
for col in date_columns:
    df[col] = pd.to_datetime(df[col])

# Handle any other cleaning as needed
```

### Step 3: Data Analysis
Now you can perform various analyses on the data. Here's an example that calculates basic statistics for the 'Size' column:

```python
size_stats = df['Size'].describe()
print(size_stats)
```

### Step 4: Visualization
You can create various visualizations using libraries like matplotlib or seaborn. For example, you might want to create a histogram of the file sizes.

```python
import matplotlib.pyplot as plt

# Create a histogram for the 'Size' column
plt.hist(df['Size'], bins=20, edgecolor='black')
plt.title('Distribution of File Sizes')
plt.xlabel('Size')
plt.ylabel('Frequency')
plt.show()
```

Please let me know what specific analyses or visualizations you'd like to create, and I'll provide more tailored guidance!

## 2 Prompt asking ChatGPT to guess the data types of columns

What are the columns in the example I gave and can you guess their data types?

## Response from ChatGPT

Sure! Based on the provided example, here are the columns and their likely data types:

1. **Attributes**: Describes file attributes, likely a string.
2. **Created**: The creation date and time of the file, likely a datetime object.
3. **LastMod**: The last modification date and time of the file, likely a datetime object.
4. **LastAcc**: The last accessed date and time of the file, likely a datetime object.
5. **Size**: The size of the file, likely a float (given the presence of a decimal point in the values).
6. **Name**: The name of the file, likely a string.
7. **Path**: The full path of the file, likely a string.
8. **Folder**: The folder in which the file resides, likely a string.
9. **Ext**: The file extension, likely a string.
10. **Md5**: The MD5 hash of the file, likely a string (since MD5 hashes are typically represented as hexadecimal strings).

When loading this data with pandas, you might want to explicitly set the data types for some columns, but pandas will often correctly infer these types automatically. If you want to be explicit, you can use the `dtype` parameter in `pd.read_csv()` to specify the types for each column. Here's an example that sets the data types as described above:

```python
date_columns = ['Created', 'LastMod', 'LastAcc']

# Read the CSV file with explicit data types
df = pd.read_csv(file_path, parse_dates=date_columns, na_values=['#NAAM?'], dtype={
    'Attributes': str,
    'Size': float,
    'Name': str,
    'Path': str,
    'Folder': str,
    'Ext': str,
    'Md5': str,
})
```

Note that the `parse_dates` parameter can be used to automatically parse the date columns as datetime objects.

## 3 Prompt ChatGPT asking for a script that generates similar testdata

Can you generate a csv file based on this example resembling a folder structure of files and folders on a C:\ drive with 3 folders in the root and about 10 files, each folder has 5-15 subfolders and 10-20 files of various types including MsOffice, pictures, exe, video, audio etc. The subfolders have 20-50 files of various types. Use random sizes between 5Kb and 500Kb.

## Response from ChatGPT

Certainly! Below is a Python script to generate a CSV file that resembles a folder structure on the C:\ drive, with the specifications you mentioned. It will include 3 folders in the root, and each folder will have 5-15 subfolders and 10-20 files. The subfolders will have 20-50 files of various types, with random sizes between 5KB and 500KB.

The script creates a pandas DataFrame and fills it with random data that follows the structure you described. Finally, it writes the DataFrame to a CSV file.

```python
import pandas as pd
import random
from datetime import datetime

# Function to create a file row
def create_file(folder, ext, size):
    name = f'file_{random.randint(1, 1000000)}.{ext}'
    path = f'{folder}\\{name}'
    created = datetime.now().strftime("%d/%m/%Y %H:%M")
    return ['---A----', created, created, created, size, name, path, folder, ext, '']

# File extensions for various types
file_types = ['docx', 'xlsx', 'pptx', 'jpg', 'png', 'exe', 'mp4', 'mp3']

# Create an empty DataFrame
columns = ['Attributes', 'Created', 'LastMod', 'LastAcc', 'Size', 'Name', 'Path', 'Folder', 'Ext', 'Md5']
df = pd.DataFrame(columns=columns)

# Create root folders
for i in range(3):
    root_folder = f'C:\\Folder_{i}'
    
    # Create subfolders
    for j in range(random.randint(5, 15)):
        subfolder = f'{root_folder}\\Subfolder_{j}'
        
        # Create files in subfolder
        for _ in range(random.randint(20, 50)):
            ext = random.choice(file_types)
            size = random.uniform(5, 500) # Size in KB
            file_row = create_file(subfolder, ext, size)
            df.loc[len(df)] = file_row
        
        # Create files in the root folder
        for _ in range(random.randint(10, 20)):
            ext = random.choice(file_types)
            size = random.uniform(5, 500) # Size in KB
            file_row = create_file(root_folder, ext, size)
            df.loc[len(df)] = file_row

# Write to CSV
df.to_csv('folder_structure.csv', index=False)
```

This script will create a CSV file named `folder_structure.csv` in the current working directory. It will include the structure of folders, subfolders, and files as you described. Feel free to modify the script to fit any additional requirements you may have!

## Conclusion

We showed ChatGPT the first three rows of our data and it was making suggestions what we can do with the data and it could also accurately guess the type of data in our columns. Finally it was able to write a script that generates a test csv with the same columns and random content.

# ChatGPT helping with treemap visualisation

##1 The first prompt

Can you recommend a python library to produce a hierarchical treemap and give a simple example how it works?

## Response from ChatGPT

A hierarchical treemap is a great way to represent a nested structure like a file system. In Python, you can use the Plotly library to create interactive treemaps.

Here's a simple example to create a treemap using Plotly, based on a hypothetical folder structure.

### Step 1: Install Plotly
First, you'll need to install the Plotly library if you haven't already:

```bash
pip install plotly
```

### Step 2: Create a Hierarchical Treemap
You can use the `plotly.express` module to create a treemap easily. Here's an example using a hypothetical DataFrame representing a folder structure:

```python
import plotly.express as px
import pandas as pd

# Create a DataFrame with hierarchical data
data = {
    'path': [
        'Root/Folder 1',
        'Root/Folder 1/Subfolder 1',
        'Root/Folder 1/Subfolder 1/File 1',
        'Root/Folder 1/Subfolder 2',
        'Root/Folder 1/Subfolder 2/File 2',
        'Root/Folder 2',
        'Root/Folder 2/Subfolder 3',
        'Root/Folder 2/Subfolder 3/File 3'
    ],
    'size': [0, 0, 500, 0, 300, 0, 0, 200] # Sizes are for the files only
}

df = pd.DataFrame(data)

# Create a treemap
fig = px.treemap(df, path=['path'], values='size')

# Show the treemap
fig.show()
```

In this example, the `path` column represents the folder structure and is split into levels by the `/` character. The `size` column gives the size of the files.

You'll need to adapt the code to match your specific data structure and needs, but this example should give you a good starting point for creating a hierarchical treemap with Plotly.

##2 Next prompt: Leading step by step (Chain-of-thought)

This treemap expects a path that identifies column names in the dataframe df. In the example you provide path=['path']. Let's say I want to have 3 levels, so path=['level1','level2','level3']. The in df I would need to add columns 'level1','level2' and 'level3' with values where the 'path' value is separated as level1 is the first segment in the path, level2 is the second segment and level3 is the third segment. Can you do that?

## Response from ChatGPT

Certainly! You'll need to split the 'path' column into separate levels and then create the treemap using those levels as the path. Here's how you can do it:

```python
import plotly.express as px
import pandas as pd

# Create a DataFrame with hierarchical data
data = {
    'path': [
        'Root/Folder 1/Subfolder 1',
        'Root/Folder 1/Subfolder 1/File 1',
        'Root/Folder 1/Subfolder 2/File 2',
        'Root/Folder 2/Subfolder 3/File 3'
    ],
    'size': [0, 500, 300, 200]
}

df = pd.DataFrame(data)

# Split the path column into separate levels
df[['level1', 'level2', 'level3']] = df['path'].str.split('/', expand=True)

# Create a treemap
fig = px.treemap(df, path=['level1', 'level2', 'level3'], values='size')

# Show the treemap
fig.show()
```

Here, we used the `str.split` method with the `expand=True` argument to split the 'path' column into separate columns for each level. Then, we used those levels as the path in the `px.treemap` function to create the treemap.

The resulting treemap will have three levels, corresponding to the 'level1', 'level2', and 'level3' columns in the DataFrame. You can adjust the code to match your specific data and folder structure.

## 3 Getting ChatGPT to do this with the real dataset

Looks good. Can you modify this so that the script will read the test csv file we created earlier? The size of a row is based on the file size. The color of the tile in the treemap is based on the file extension.

## Response from ChatGPT

Certainly! The modified script below reads the CSV file we created earlier, extracts the necessary levels for the treemap, and creates a treemap where the size of each tile is based on the file size, and the color of the tile is based on the file extension.

```python
import plotly.express as px
import pandas as pd

# Read the CSV file created earlier
df = pd.read_csv('folder_structure.csv')

# Create a 'path' column by combining 'Folder' and 'Name'
df['path'] = df['Folder'] + '\\' + df['Name']

# Split the 'path' column into separate levels
df[['level1', 'level2', 'level3', 'level4', 'level5']] = df['path'].str.split('\\', expand=True)

# Fill NaN values in 'level2', 'level3', 'level4', 'level5' columns with folder names
for col in ['level2', 'level3', 'level4', 'level5']:
    df[col] = df[col].fillna(df['Folder'])

# Create a treemap, with size based on 'Size' and color based on 'Ext'
fig = px.treemap(df, path=['level1', 'level2', 'level3', 'level4', 'level5'], values='Size', color='Ext')

# Show the treemap
fig.show()
```

Here's a brief explanation of the changes:
- We create a new 'path' column by combining the 'Folder' and 'Name' columns with a backslash.
- We then split this 'path' column into separate levels.
- We use the `fillna` method to fill any NaN values in the 'level2', 'level3', 'level4', 'level5' columns with the folder names, since these levels may be missing for some rows.
- In the `px.treemap` call, we use the 'Size' column for the `values` parameter and the 'Ext' column for the `color` parameter, as requested.

This script will create a treemap visualization based on the structure and file sizes in the CSV file, with colors corresponding to different file extensions.

##4 Prompt ChatGPT to make a correction

The csv file already has a column called path that you can use I believe.

## Response from ChatGPT

You're right; my apologies for missing that detail. Since the CSV file already contains a 'Path' column, we can use it directly and avoid creating a new 'path' column.

Here's the modified script:

```python
import plotly.express as px
import pandas as pd

# Read the CSV file created earlier
df = pd.read_csv('folder_structure.csv')

# Split the 'Path' column into separate levels
df[['level1', 'level2', 'level3', 'level4', 'level5', 'level6']] = df['Path'].str.split('\\', expand=True)

# Fill NaN values in 'level3', 'level4', 'level5', 'level6' columns with folder names
for col in ['level3', 'level4', 'level5', 'level6']:
    df[col] = df[col].fillna(df['Folder'])

# Create a treemap, with size based on 'Size' and color based on 'Ext'
fig = px.treemap(df, path=['level1', 'level2', 'level3', 'level4', 'level5', 'level6'], values='Size', color='Ext')

# Show the treemap
fig.show()
```

In this version of the script, we directly split the 'Path' column from the CSV file into the separate levels for the treemap. The rest of the script remains the same, creating a treemap where the size of each tile is based on the file size, and the color is based on the file extension.

##5 Prompt with ChatGPT a Python error message

There is a problem with line 8 when splitting the path column:
  File "d:/OneDrive/Documenten/Office/2023/Projecten/HS Leiden/Bern Data Analytics/ChatGPT copilot/data_visualisation.py", line 8, in <module>
    df[['level1', 'level2', 'level3', 'level4', 'level5', 'level6']] = df['Path'].str.split('\\', expand=True)
  File "C:\Users\hans\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\pandas\core\frame.py", line 3968, in __setitem__
    self._setitem_array(key, value)
  File "C:\Users\hans\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\pandas\core\frame.py", line 4010, in _setitem_array
    check_key_length(self.columns, key, value)
  File "C:\Users\hans\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\pandas\core\indexers\utils.py", line 401, in check_key_length
    raise ValueError("Columns must be same length as key")
ValueError: Columns must be same length as key



## Response from ChatGPT

My apologies for the confusion. The error is happening because I pre-defined the number of levels in the split operation. The number of levels might not be consistent across all rows, so it's better to first split the paths into a list and then expand it into columns.

Here's a revised code snippet:

```python
import plotly.express as px
import pandas as pd

# Read the CSV file created earlier
df = pd.read_csv('folder_structure.csv')

# Split the 'Path' column into a list
df['path_list'] = df['Path'].str.split('\\')

# Determine the maximum number of levels
max_levels = df['path_list'].apply(len).max()

# Expand the list into separate columns
for i in range(max_levels):
    df[f'level{i}'] = df['path_list'].apply(lambda x: x[i] if i < len(x) else None)

# Create a treemap, with size based on 'Size' and color based on 'Ext'
fig = px.treemap(df, path=[f'level{i}' for i in range(max_levels)], values='Size', color='Ext')

# Show the treemap
fig.show()
```

Here, we first split the 'Path' column into a list and store it in a new 'path_list' column. Then, we determine the maximum number of levels by finding the length of the longest list. Finally, we expand the list into separate columns, creating a column for each level.

This approach should work with any number of levels in the path, and it avoids the error you encountered earlier.

## 6 Prompt to refine the code

The csv file I have is very large and seems to contain many levels. Can I limit the number of levels and is there a call back function in plotly that I can configure so that it will upload additional levels when necessary?

## Response from ChatGPT

Plotly itself does not inherently support a callback function to upload additional levels dynamically when necessary. However, you can still create a visual representation that limits the number of levels displayed, and if you're working within a web framework like Dash (by Plotly), you can create interactive components to load more levels as needed.

Here's a two-part approach you can follow:

### 1. Limit the Number of Levels in the Treemap

You can adjust the code to limit the number of levels you include in the treemap. Here's an example:

```python
import plotly.express as px
import pandas as pd

# Read the CSV file
df = pd.read_csv('folder_structure.csv')

# Split the 'Path' column into a list
df['path_list'] = df['Path'].str.split('\\')

# Limit to a specific number of levels
max_display_levels = 3

# Create columns for the limited levels
for i in range(max_display_levels):
    df[f'level{i}'] = df['path_list'].apply(lambda x: x[i] if i < len(x) else None)

# Create a treemap, with size based on 'Size' and color based on 'Ext'
fig = px.treemap(df, path=[f'level{i}' for i in range(max_display_levels)], values='Size', color='Ext')

# Show the treemap
fig.show()
```

This code limits the treemap to display only the first `max_display_levels` levels.

### 2. Use Dash to Create Interactive Components (Optional)

If you need to create an interactive component to allow users to drill down into additional levels, you might consider using the Dash framework. Dash allows you to create web applications with Plotly visualizations, and it supports callbacks that can react to user input.

Here's a high-level idea of how you might create a button to load more levels:

1. **Create a Dash app with a Plotly treemap and a button to load more levels.**
2. **Define a callback that responds to the button press, loading additional levels into the treemap.**

You can refer to the [Dash documentation](https://dash.plotly.com/) for more information on creating interactive components with callbacks.

Keep in mind that loading additional levels in a very large dataset might have performance implications. You'll want to carefully consider the user experience and potentially implement additional optimizations, such as loading data incrementally, depending on the specific needs of your application.

## Conclusion

As you can see we end up with the scsript that allows us to read the csv file an present it as a treemap. Interestingly, we need to lead ChatGPT step by step so that it would split the path segments accross different columns in the dataframe by presenting it an example in step 2.

# ChatGPT assisting with generating ascii art mazes

##1 The first prompt

Write a python script that prints a maze using ascii characters in 10 x 10 cells. Each row is printed on a single line. You can enter the maze at the top and you have to find your way to the center. Use characters _ and | to make walls. Entrance at the top. Goal at the centre. You don't need to solve the maze. I will print and let my son solve it.

## Response from ChatGPT

Here's a simple Python script to generate a maze in 10x10 cells. I'll generate a simple spiral-like pattern which starts at the top and leads to the center, ensuring that the path is solvable for your son.

```python
def generate_maze(size):
    maze = [[' ' for _ in range(size)] for _ in range(size)]

    # Create outer boundary
    for i in range(size):
        maze[i][0] = '|'
        maze[i][size-1] = '|'
        maze[0][i] = '_'
        maze[size-1][i] = '_'

    # Create entrance
    maze[0][1] = ' '

    # Spiral pattern to reach the center
    x, y = 1, 1
    while x < size // 2 and y < size // 2:
        # Right movement
        for i in range(y, size - y):
            maze[x][i] = '_'
        
        # Downward movement
        for i in range(x, size - x):
            maze[i][size - y - 1] = '|'
        
        # Left movement
        for i in range(y + 1, size - y):
            maze[size - x - 1][i] = '_'
        
        # Upward movement
        for i in range(x + 1, size - x - 1):
            maze[i][y] = '|'
        
        x += 2
        y += 2

    # Print the maze
    for row in maze:
        print("".join(row))

size = 10
generate_maze(size)
```

This will generate a simple maze that spirals inward. Your son should enter from the top and make his way to the center. Have fun!

## 2 Showing an exististing ascii art maze

Nice but not exactly what I meant. Here is an example that use other ascii art. Can you generate something like this but bigger and random with challenging ways to find? My son is very smart!

```
___________________________________  
| _____ |   | ___ | ___ ___ | |   | |
| |   | |_| |__ | |_| __|____ | | | |
| | | |_________|__ |______ |___|_| |
| |_|   | _______ |______ |   | ____|
| ___ | |____ | |______ | |_| |____ |
|___|_|____ | |   ___ | |________ | |
|   ________| | |__ | |______ | | | |
| | | ________| | __|____ | | | __| |
|_| |__ |   | __|__ | ____| | |_| __|
|   ____| | |____ | |__ |   |__ |__ |
| |_______|_______|___|___|___|_____|
```



## Response from ChatGPT

Creating a random maze is a more complex task. The most common algorithm for generating random mazes is the recursive backtracking algorithm. Here's a basic description of the algorithm:

1. Start at a random cell.
2. Mark the current cell as visited, and get a list of its neighbors. For each neighbor, starting with a randomly selected neighbor:
   1. If that neighbor hasn't been visited, remove the wall between this cell and the neighbor, and then recurse with that neighbor as the current cell.

Now, let's implement the maze generation using this algorithm:

```python
import random

WIDTH, HEIGHT = 10, 10
VISITED = 2

# Directions for moving
N, S, E, W = 1, 2, 4, 8
DX = { E: 1, W: -1, N: 0, S: 0 }
DY = { E: 0, W: 0, N: -1, S: 1 }
OPPOSITE = { E: W, W: E, N: S, S: N }

def generate_maze():
    maze = [[0 for _ in range(WIDTH)] for _ in range(HEIGHT)]
    stack = [(random.randint(0, WIDTH-1), random.randint(0, HEIGHT-1))]

    while len(stack) > 0:
        x, y = stack[-1]
        maze[y][x] |= VISITED

        # Get unvisited neighbors
        neighbors = []
        for dir in [N, S, E, W]:
            nx, ny = x + DX[dir], y + DY[dir]
            if 0 <= nx < WIDTH and 0 <= ny < HEIGHT:
                if not maze[ny][nx] & VISITED:
                    neighbors.append(dir)

        if len(neighbors) == 0:
            # No neighbors, backtrack
            stack.pop()
        else:
            # Pick a random direction and move
            dir = random.choice(neighbors)
            nx, ny = x + DX[dir], y + DY[dir]
            maze[y][x] |= dir
            maze[ny][nx] |= OPPOSITE[dir]
            stack.append((nx, ny))

    display_maze(maze)


def display_maze(maze):
    # Top border
    print('_' * (2 * WIDTH))

    # Display maze rows
    for y in range(HEIGHT):
        row = ['|']
        for x in range(WIDTH):
            if maze[y][x] & S != 0:
                row.append(' ')
            else:
                row.append('_')
            
            if maze[y][x] & E != 0:
                if maze[y][x] | maze[y][x+1] & S != 0:
                    row.append(' ')
                else:
                    row.append('_')
            else:
                row.append('|')

        print(''.join(row))


generate_maze()
```

This will create a random maze of 10x10 cells using the recursive backtracking algorithm. The maze will be displayed using ASCII characters. The entrance will be at the top-left and the exit will be at the bottom-right, but you can move the entrance and exit to your preferred locations.

##3 Prompts resolving errors

3. I see a problem. Only top of the maze has a horizontal border. Otherwise only | occur. There are no _ characters. Did you make a mistake?

4. Still not showing. If you are doing a bitwise AND operator and compare to zero (eg maze[y][x] & S != 0), shouldn't you use brackets? I believe the != takes precedence over the the & operator.

5. Still not working. I believe there may be a problem with the maze generation as well. I have set the parameters to generate a 10x10 maze. This gives the following:

```
 _ _ _ _ _ _ _ _ _ _
|   |       |       |
|   | | | | |     | |
| | |   | |       | |
|   |   |   |     | |
|   |   |     | | | |
| |   | |   |   | | |
| | | |     | | |   |
| | |       | | |   |
| | | | | |   |   | |
|       |           |
The maze that was generate looks like this if I print(maze):
[[6, 14, 14, 14, 10, 6, 14, 14, 14, 10], [3, 7, 10, 6, 15, 14, 11, 3, 6, 10], [7, 10, 3, 3, 6, 10, 3, 6, 11, 3], [6, 11, 3, 3, 3, 7, 10, 3, 6, 11], [3, 6, 11, 6, 11, 6, 11, 3, 6, 10], [3, 3, 3, 3, 2, 3, 3, 7, 10, 3], [6, 11, 3, 7, 11, 3, 7, 10, 7, 11], [3, 2, 3, 6, 11, 7, 14, 14, 10, 3], [3, 7, 11, 7, 10, 6, 10, 7, 14, 11], [7, 14, 14, 14, 15, 14, 15, 14, 14, 10]]

```
Can you see what is going wrong?

## Response from ChatGPT

(leaving out the intermediate responses)

Alright, let's break it down. First, let's understand what's happening with the maze generation.

The numbers in the generated maze are based on bitwise representation:

1. `N` is represented by 1 (`0b0001`)
2. `S` is represented by 2 (`0b0010`)
3. `E` is represented by 4 (`0b0100`)
4. `W` is represented by 8 (`0b1000`)
5. `VISITED` is represented by 2 (`0b0010`), but it's the same as `S`. This won't affect the maze generation, but it's a potential source of confusion.

The sum of these values for each cell denotes the paths available from that cell and whether the cell has been visited.

Let's address the issue:

1. The `VISITED` flag should not conflict with our direction flags. Let's change its value.
2. Let's make sure our maze carving (i.e., removing walls) is done correctly.
3. Lastly, we'll ensure that our maze display logic accurately reflects the maze data.

Here's the updated and corrected code:

```python
import random

WIDTH, HEIGHT = 10, 10
VISITED = 16  # Changed the VISITED flag

# Directions for moving
N, S, E, W = 1, 2, 4, 8
DX = { E: 1, W: -1, N: 0, S: 0 }
DY = { E: 0, W: 0, N: -1, S: 1 }
OPPOSITE = { E: W, W: E, N: S, S: N }

def generate_maze():
    maze = [[0 for _ in range(WIDTH)] for _ in range(HEIGHT)]
    stack = [(random.randint(0, WIDTH-1), random.randint(0, HEIGHT-1))]

    while len(stack) > 0:
        x, y = stack[-1]
        maze[y][x] |= VISITED

        # Get unvisited neighbors
        neighbors = []
        for dir in [N, S, E, W]:
            nx, ny = x + DX[dir], y + DY[dir]
            if 0 <= nx < WIDTH and 0 <= ny < HEIGHT:
                if not maze[ny][nx] & VISITED:
                    neighbors.append(dir)

        if len(neighbors) == 0:
            # No neighbors, backtrack
            stack.pop()
        else:
            # Pick a random direction and move
            dir = random.choice(neighbors)
            nx, ny = x + DX[dir], y + DY[dir]
            maze[y][x] |= dir
            maze[ny][nx] |= OPPOSITE[dir]
            stack.append((nx, ny))

    display_maze(maze)
    return maze

def display_maze(maze):
    # Top border
    print(' _' * WIDTH)

    # Display maze rows
    for y in range(HEIGHT):
        row = ['|']
        for x in range(WIDTH):
            if (maze[y][x] & S) != 0:
                row.append(' ')
            else:
                row.append('_')
            
            if (maze[y][x] & E) != 0:
                if ((x+1) < WIDTH and (maze[y][x+1] & S) != 0) or (maze[y][x] & S) != 0:
                    row.append(' ')
                else:
                    row.append('_')
            else:
                row.append('|')
        print(''.join(row))

maze = generate_maze()
print(maze)
```

With these corrections, the maze generation should work correctly, and you should be able to see a proper maze representation with both `|` and `_` characters.

## 6 Prompts for fixing the entry and exit

6. Nice. Here is the output:

```
 _ _ _ _ _ _ _ _ _ _
|  _______|  _____  |
| |  _________|   | |
| | |  _  |   | |___|
| |___| | | | | |_  |
|  _  |  _|_|___|  _|
| |  _| |___    |   |
| |_____|  ___| |_| |
| |  _  | |  _|___  |
| | | |___|   |  ___|
|___|_______|_______|
[[22, 28, 28, 28, 24, 22, 28, 28, 28, 26], [19, 22, 28, 28, 28, 29, 24, 22, 26, 19], [19, 19, 22, 28, 26, 22, 26, 19, 21, 25], [19, 21, 25, 18, 19, 19, 19, 19, 20, 26], [23, 28, 26, 23, 25, 17, 21, 25, 22, 25], [19, 22, 25, 19, 20, 28, 30, 26, 23, 26], [19, 21, 28, 25, 22, 28, 25, 19, 17, 19], [19, 22, 28, 26, 19, 22, 24, 21, 28, 27], [19, 19, 18, 21, 25, 23, 26, 22, 28, 25], [21, 25, 21, 28, 28, 25, 21, 29, 28, 24]]

```
But there is no entrance in the top left and not exit on the bottom right.

7. I noticed that the northern wall isn't printed by checking N. So this fix doesn't work

8. Hmmm. W isn't checked either

## Response from ChatGPT

My apologies for the oversight. You're right; the western walls are implicitly formed by the initial construction of the `row` with `'|'` and don't rely on checking the `W` direction flag.

To ensure an entrance at the top-left and an exit at the bottom-right, we'll adjust the display logic.

```python
def display_maze(maze):
    # Top border
    print(' _' * WIDTH)

    # Display maze rows
    for y in range(HEIGHT):
        if y == 0:  # for the entrance at top-left
            row = [' ']
        else:
            row = ['|']

        for x in range(WIDTH):
            if (maze[y][x] & S) != 0:
                row.append(' ')
            else:
                row.append('_')
            
            if x == WIDTH-1 and y == HEIGHT-1:  # for the exit at bottom-right
                row.append(' ')
            elif (maze[y][x] & E) != 0:
                if ((x+1) < WIDTH and (maze[y][x+1] & S) != 0) or (maze[y][x] & S) != 0:
                    row.append(' ')
                else:
                    row.append('_')
            else:
                row.append('|')
        print(''.join(row))
```

With this change, the maze will have an entrance at the top-left (no western wall for the first cell) and an exit at the bottom-right (no eastern wall for the last cell).

## 7 Prompt on finding the best path from entry to exit

When you create the maze, can you keep track of the shortest path from the top left cell to the bottom right?

## Response from ChatGPT

Yes, certainly! To track the shortest path, we can use a simple breadth-first search (BFS) once the maze is generated. Here's how to do it:

1. From the start (top-left) position, queue it and begin the BFS. The queue will keep track of positions and their distance from the start.
2. For each dequeued position, visit its unvisited neighboring cells and enqueue them with an incremented distance.
3. Continue until the target position (bottom-right) is dequeued. This will give us the shortest path.

Let's incorporate this into the existing maze generation code:

```python
from collections import deque

# ... [rest of the previous code remains unchanged]

def bfs_shortest_path(maze):
    start = (0, 0)
    target = (HEIGHT - 1, WIDTH - 1)

    visited = set()
    queue = deque([(start, [start])])  # each item in the queue is a tuple (position, path_so_far)

    while queue:
        (x, y), path = queue.popleft()

        if (x, y) == target:
            return path

        for dir in [N, S, E, W]:
            nx, ny = x + DX[dir], y + DY[dir]

            if 0 <= nx < WIDTH and 0 <= ny < HEIGHT:
                if (nx, ny) not in visited and (maze[y][x] & dir):
                    visited.add((nx, ny))
                    new_path = list(path)
                    new_path.append((nx, ny))
                    queue.append(((nx, ny), new_path))

    return []

# After generating the maze
maze = generate_maze()
shortest_path = bfs_shortest_path(maze)
print("Shortest Path:", shortest_path)
```

This BFS search will give you the list of positions forming the shortest path from the top-left to the bottom-right of the maze.

## 8 Prompt drawing shortest path

Nice. Can you create a functinon draw_maze_solution taking the maze and the shortest path and that draws the maze, changing the background color of the cells that are visited in the shortest path to blue?

## Response from ChatGPT

summary: ChapGPT proposes to use colorama. Unfortunately that's not working in Google Colab. So we take a different approach. See next prompt

## 9 Prompt shortes path drawing with alternative ascii-art

I'm  using Google colab an colorama isn't working. Can you suggest some alternative ascii art solution to visualise the shortest path using different characters to replace | and _ while still giving the impression they are still there but with a special effect indicating the shortest path?

Certainly! Google Colab doesn't support ANSI escape codes for colored output in its cells, so using ASCII art is a practical approach.

We can use a convention where characters along the shortest path can be represented with bold or standout characters.

Here's one way to do it:

1. Replace `|` with `!` and `_` with `=` if they are on the path.
2. For spaces on the path, use a character like `·` to denote the path's presence.

Let's adjust the `draw_maze_solution` function accordingly:

```python
def draw_maze_solution(maze, path):
    # Top border
    if (0, 0) in path:
        print(' =' + ' _' * (WIDTH - 1))
    else:
        print(' _' * WIDTH)

    # Display maze rows
    for y in range(HEIGHT):
        row = []

        if y == 0:  # for the entrance at top-left
            if (0, 0) in path:
                row.append('·')
            else:
                row.append(' ')
        else:
            if (0, y) in path:
                row.append('!')
            else:
                row.append('|')

        for x in range(WIDTH):
            on_path = (x, y) in path

            if (maze[y][x] & S) != 0:
                row.append('·' if on_path else ' ')
            else:
                row.append('=' if on_path else '_')
            
            if x == WIDTH - 1:  # Rightmost column
                if y == HEIGHT - 1:  # Exit
                    row.append('·' if on_path else ' ')
                else:
                    row.append('!' if on_path else '|')
            else:
                if (maze[y][x] & E) != 0:
                    if ((x + 1) < WIDTH and (maze[y][x + 1] & S) != 0) or (maze[y][x] & S) != 0:
                        row.append('·' if on_path else ' ')
                    else:
                        row.append('=' if on_path else '_')
                else:
                    row.append('!' if on_path else '|')

        print(''.join(row))

# ... Generate maze and get the shortest path...

draw_maze_solution(maze, shortest_path)
```

This will use the characters `!`, `=`, and `·` to visualize the shortest path in the maze.

## Complete code for generating and drawing ascii-art mazes

In [3]:
import random

WIDTH, HEIGHT = 40, 40
VISITED = 16  # Changed the VISITED flag

# Directions for moving
N, S, E, W = 1, 2, 4, 8
DX = { E: 1, W: -1, N: 0, S: 0 }
DY = { E: 0, W: 0, N: -1, S: 1 }
OPPOSITE = { E: W, W: E, N: S, S: N }

def generate_maze():
    maze = [[0 for _ in range(WIDTH)] for _ in range(HEIGHT)]
    stack = [(random.randint(0, WIDTH-1), random.randint(0, HEIGHT-1))]

    while len(stack) > 0:
        x, y = stack[-1]
        maze[y][x] |= VISITED

        # Get unvisited neighbors
        neighbors = []
        for dir in [N, S, E, W]:
            nx, ny = x + DX[dir], y + DY[dir]
            if 0 <= nx < WIDTH and 0 <= ny < HEIGHT:
                if not maze[ny][nx] & VISITED:
                    neighbors.append(dir)

        if len(neighbors) == 0:
            # No neighbors, backtrack
            stack.pop()
        else:
            # Pick a random direction and move
            dir = random.choice(neighbors)
            nx, ny = x + DX[dir], y + DY[dir]
            maze[y][x] |= dir
            maze[ny][nx] |= OPPOSITE[dir]
            stack.append((nx, ny))

    # Ensure entrance and exit
    # Ensure entrance and exit
    maze[0][0] &= ~W   # Clear the western wall of the top-left cell
    maze[HEIGHT-1][WIDTH-1] &= ~E  # Clear the eastern wall of the bottom-right cell

    display_maze(maze)
    return maze

def display_maze(maze):
    # Top border
    print(' _' * WIDTH)

    # Display maze rows
    for y in range(HEIGHT):
        if y == 0:  # for the entrance at top-left
            row = [' ']
        else:
            row = ['|']

        for x in range(WIDTH):
            if (maze[y][x] & S) != 0:
                row.append(' ')
            else:
                row.append('_')

            if x == WIDTH-1 and y == HEIGHT-1:  # for the exit at bottom-right
                row.append(' ')
            elif (maze[y][x] & E) != 0:
                if ((x+1) < WIDTH and (maze[y][x+1] & S) != 0) or (maze[y][x] & S) != 0:
                    row.append(' ')
                else:
                    row.append('_')
            else:
                row.append('|')
        print(''.join(row))


maze = generate_maze()


 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
   ___  |      ___|  _________   ___|  ___     _____  |   |   |   |___  |  ___  |
| |  _  | | |_|  ___| |   |   |   |  _| |  _| |   |_____|___|___|___  |_  |  _| |
| |_  | | |_  | |_______| | | |_| |  ___| |  _| |_____________  |  _|_  |_  |   |
|_  | |_| | |___|  _  |   | |_  | | |  ___|_  | |_____    |_  | |  _____|___| | |
| | |_____|_    | |  _| |_|_  |___| |_  | |  _| |  _  | | |   | |_   _________|_|
|  _|  _______|___| |_      |_______| | | |  _|___| |___| | | | |___|  ___  |   |
| |  _|_____    | |_  | | |_|    _____| | | |  _____  |  _| | |_______|_  | | | |
|___|    _____| |  _| | |___  |_  |  ___|  _| | |  ___| |  _|___  |  _  | |___| |
|  ___|_  |_______|  _|_|   |___| | |_  | |  _| | |   |___|_    | | | |_  |  ___|
| |___  |_______   _|  ___|___  | |_  |___| |_  | | |_   _____| |_  |___  |___  |
|_   ___|  ___  |_____|  ___  |_  | | |   |___  |_____|_____  |_|  _|   |_|  ___|
| | |   | |  _| |

## Finding the shortest path using a breadth-first-search

In [4]:
from collections import deque

# ... [rest of the previous code remains unchanged]

def bfs_shortest_path(maze):
    start = (0, 0)
    target = (HEIGHT - 1, WIDTH - 1)

    visited = set()
    queue = deque([(start, [start])])  # each item in the queue is a tuple (position, path_so_far)

    while queue:
        (x, y), path = queue.popleft()

        if (x, y) == target:
            return path

        for dir in [N, S, E, W]:
            nx, ny = x + DX[dir], y + DY[dir]

            if 0 <= nx < WIDTH and 0 <= ny < HEIGHT:
                if (nx, ny) not in visited and (maze[y][x] & dir):
                    visited.add((nx, ny))
                    new_path = list(path)
                    new_path.append((nx, ny))
                    queue.append(((nx, ny), new_path))

    return []

shortest_path = bfs_shortest_path(maze)
print("Shortest Path:", shortest_path)


Shortest Path: [(0, 0), (1, 0), (2, 0), (3, 0), (3, 1), (2, 1), (1, 1), (1, 2), (2, 2), (2, 3), (2, 4), (3, 4), (4, 4), (4, 3), (4, 2), (4, 1), (4, 0), (5, 0), (5, 1), (5, 2), (6, 2), (6, 3), (7, 3), (7, 2), (7, 1), (8, 1), (9, 1), (9, 0), (10, 0), (11, 0), (12, 0), (13, 0), (14, 0), (15, 0), (15, 1), (16, 1), (16, 2), (16, 3), (16, 4), (15, 4), (15, 3), (14, 3), (14, 2), (14, 1), (13, 1), (13, 2), (13, 3), (13, 4), (14, 4), (14, 5), (15, 5), (16, 5), (17, 5), (17, 4), (17, 3), (17, 2), (17, 1), (18, 1), (18, 0), (19, 0), (20, 0), (21, 0), (21, 1), (20, 1), (20, 2), (20, 3), (19, 3), (18, 3), (18, 4), (19, 4), (19, 5), (19, 6), (19, 7), (18, 7), (17, 7), (17, 8), (17, 9), (18, 9), (18, 10), (18, 11), (19, 11), (19, 10), (20, 10), (20, 11), (20, 12), (20, 13), (21, 13), (21, 12), (21, 11), (22, 11), (23, 11), (23, 10), (22, 10), (21, 10), (21, 9), (21, 8), (22, 8), (22, 7), (22, 6), (23, 6), (24, 6), (25, 6), (26, 6), (26, 7), (25, 7), (24, 7), (24, 8), (24, 9), (24, 10), (25, 10), (25,

## Drawing the maze with shortest path in alternative ascii-art

In [5]:
def draw_maze_solution(maze, path):
    # Top border
    if (0, 0) in path:
        print(' =' + ' _' * (WIDTH - 1))
    else:
        print(' _' * WIDTH)

    # Display maze rows
    for y in range(HEIGHT):
        row = []

        if y == 0:  # for the entrance at top-left
            if (0, 0) in path:
                row.append('·')
            else:
                row.append(' ')
        else:
            if (0, y) in path:
                row.append('!')
            else:
                row.append('|')

        for x in range(WIDTH):
            on_path = (x, y) in path

            if (maze[y][x] & S) != 0:
                row.append('·' if on_path else ' ')
            else:
                row.append('=' if on_path else '_')

            if x == WIDTH - 1:  # Rightmost column
                if y == HEIGHT - 1:  # Exit
                    row.append('·' if on_path else ' ')
                else:
                    row.append('!' if on_path else '|')
            else:
                if (maze[y][x] & E) != 0:
                    if ((x + 1) < WIDTH and (maze[y][x + 1] & S) != 0) or (maze[y][x] & S) != 0:
                        row.append('·' if on_path else ' ')
                    else:
                        row.append('=' if on_path else '_')
                else:
                    row.append('!' if on_path else '|')

        print(''.join(row))

# ... Generate maze and get the shortest path...

draw_maze_solution(maze, shortest_path)


 = _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
| |··=··!·!·!_|··===! |   |···!···!··=! |··=! |···!_____|___|___|___  |_  |  _| |
|_  |·!_|·! |===!  _  |   |·!=··!·!·!··===!_  |·!_____ ···!_  |·!  _____|___| | |
| | |=====!_    | |  _| |_|=··!===!·!=··! |  _|·!··=··!·!·!···!·!_   _________|_|
|___|    _____| |  _| | |___  |_  |··===!  _|·! |··===!·!··=!===··!  _  | |===!·!
|  ___|_  |_______|  _|_|   |___| |·!_  | |··=! |·!···!===!_    |·! | |_  |··===!
| |___  |_______   _|  ___|___  | |=··!___|·!_  |·!·!=···_____| |=··!___  |===··!
|_   ___|··===··!_____|  ___  |_  | |·!···!===··!====_|=====··!_|··=!···!_|··===!
| | |···!·!  _|·!    _| |  _|_  | | |===!·!··===!  _|··===···=!··=!··=!=··!=····!
| | |·!===!··===! |_____|_    |_____|···!·!·!_    |··=!···!_|··=! |=··!__=··!_|·!
| | |·!_ ··=!  ___|   |   | | |··=====!·!===· |_| |=====!=====!_    |===··!===··!
| | |=··!·!  _|  ___|___|___| |=··!  _|=··!  _______  |   |  _  | | |··===!···! |
|  _|··=!·! |____