# Image Classification - The Multi-class Weather Dataset

**Submission deadline: Friday 5 April, 11:55pm**

**Assessment weight: 15% of the total unit assessment.**

**Versions**

- Wednesday 13 March: Initial release

*Unless a Special Consideration request has been submitted and approved, a 5% penalty (of the total possible mark of the task) will be applied for each day a written report or presentation assessment is not submitted, up until the 7th day (including weekends). After the 7th day, a grade of ‘0’ will be awarded even if the assessment is submitted. The submission time for all uploaded assessments is **11:55 pm**. A 1-hour grace period will be provided to students who experience a technical concern. For any late submission of time-sensitive tasks, such as scheduled tests/exams, performance assessments/presentations, and/or scheduled practical assessments/labs, please apply for [Special Consideration](https://students.mq.edu.au/study/assessment-exams/special-consideration).*

In this assignment you will complete tasks for an end-to-end image classification application. We will train and test the data using the Multi-class Weather Dataset (MWD):

- https://data.mendeley.com/datasets/4drtyfjtfy/1

The MWD contains labelled images representing various weather scenarios. It is a small and popular dataset for practice with image classification.

# Connect to GitHub Classroom

Please follow these steps to connect:

1. Follow this invitation link and accept the invitation: https://classroom.github.com/a/TGh1XJFW
2. The link may ask you to sign in to GitHub (if you haven't signed in earlier). If you don't have a GitHub account, you will need to register.
3. Once you have logged in with GitHub, you may need to select your email address to associate your GitHub account with your email address (if you haven't done it in a previous COMP3420 activity). If you can't find your email address, please skip this step and contact diego.molla-aliod@mq.edu.au so that he can do the association manually.
4. Wait a minute or two, and refresh the browser until it indicates that your assignment repository has been created. Your repository is private to you, and you have administration privileges. Only you and the lecture will have access to it. The repository will be listed under the list of repositories belonging to this offering of COMP3420: https://github.com/orgs/COMP3420-2024S1/repositories
5. In contrast with assignment 1 and the practical sessions, your assignment repository will be empty and will not include starter code. you need to add this Jupyter notebook and commit the changes.

Please use the github repository linked to this GitHub classroom. Make sure that you continuously push commits and you provide useful commit comments. Note the following:

*  **1 mark of the assessment of this assignment is related to good practice with the use of GitHub.**
*  **We will also use github as a tool to check for possible plagiarism or contract cheating. For example, if someone only makes commits on the last day, we may investigate whether there was plagiarism or contract cheating.**


# Tasks
## Task 1 - Data exploration, preparation, and partition (4 marks)

Download the MWD from this site and unzip it:

- https://data.mendeley.com/datasets/4drtyfjtfy/1

You will observe that the zipped file contains 1,125 images representing various weather conditions. To facilitate the assessment of this assignment, please make sure that the images are in a folder named `dataset2` and this folder is in the same place as this jupyter notebook.

### 1.1 - data partition (2 marks)

Generate three CSV files named `my_training.csv`, `my_validation.csv`, and `my_test.csv` that partition the dataset into the training, validation, and test set. Each CSV file contains the following two fields:

- File path
- Image label

For example, the file `my_training.csv` could start like this:

```csv
dataset2/cloudy1.jpg,cloudy
dataset2/shine170.jpg,shine
dataset2/shine116.jpg,shine
```

Make sure that the partitions are created randomly, so that the label distribution is similar in each partition. Also, make sure that the samples are sorted in no particular order (randomly)

Display the label distribution of each partition, and display the first 10 rows of each partition.

The following sample files are available together with these instructions. Your files should look similar to these.

- `training.csv`
- `validation.csv`
- `test.csv`

**For the subsequent tasks in this assignment, use the files we provide (`training.csv`, `validation.csv`, `test.csv`). Do not use the files that you have generated, so that any errors generated by your solution do not carry to the rest of the assignment. Also, the files we provide conveniently removed references to images that have a number of channels different from 3.**




## 1.11 Import Libraries and Define Utility Functions

In [53]:
import os
import pandas as pd
from sklearn.model_selection import train_test_split

def create_dataframe(folder_path):
    data = {'File path': [], 'Image label': []}
    for root, _, filenames in os.walk(folder_path):
        for filename in filenames:
            if filename.lower().endswith(('.png', '.jpg', '.jpeg')):
                file_path = os.path.join(root, filename)
                label = os.path.basename(root)
                data['File path'].append(file_path.replace(folder_path + '/', ''))
                data['Image label'].append(label)
    return pd.DataFrame(data)



## 1.12 Read Images and Create Dataframe
#  And
## 1.13 Split and Save Data Partitions

In [54]:
df = create_dataframe('dataset2')
# Split data
train_val, test = train_test_split(df, test_size=0.2, stratify=df['Image label'], random_state=42)
train, validation = train_test_split(train_val, test_size=0.25, stratify=train_val['Image label'], random_state=42) # 0.25 x 0.8 = 0.2

# Save to CSV
train.to_csv('my_training.csv', index=False)
validation.to_csv('my_validation.csv', index=False)
test.to_csv('my_test.csv', index=False)


## 1.14 Display Information

In [55]:
# Display label distribution
print("Training label distribution:\n", train['Image label'].value_counts())
print("Validation label distribution:\n", validation['Image label'].value_counts())
print("Test label distribution:\n", test['Image label'].value_counts())

# Display first 10 rows
print("\nFirst 10 rows of training data:\n", train.head(10))
print("\nFirst 10 rows of validation data:\n", validation.head(10))
print("\nFirst 10 rows of test data:\n", test.head(10))


Training label distribution:
 Image label
dataset2    675
Name: count, dtype: int64
Validation label distribution:
 Image label
dataset2    225
Name: count, dtype: int64
Test label distribution:
 Image label
dataset2    225
Name: count, dtype: int64

First 10 rows of training data:
           File path Image label
901     cloudy3.jpg    dataset2
149   cloudy295.jpg    dataset2
711    cloudy88.jpg    dataset2
104      shine2.jpg    dataset2
817    shine225.jpg    dataset2
1077   cloudy13.jpg    dataset2
1002  cloudy129.jpg    dataset2
154     rain197.jpg    dataset2
1040  cloudy289.jpg    dataset2
994     shine15.jpg    dataset2

First 10 rows of validation data:
            File path Image label
105    cloudy252.jpg    dataset2
1049    cloudy39.jpg    dataset2
705   sunrise109.jpg    dataset2
498    sunrise55.jpg    dataset2
368     shine148.jpg    dataset2
382     shine175.jpg    dataset2
1042     cloudy4.jpg    dataset2
544   sunrise310.jpg    dataset2
383   sunrise114.jpg    dataset

### 1.2 - preprocessing and preparation (2 marks)

Use TensorFlow's `TextLineDataset` to generate datasets for training, validation, and test. The datasets need to produce images that are re-sized to dimensions 230 x 230 and 3 channels, and the values of the pixels must be normalised to the range [0, 1].


In [56]:
import tensorflow as tf
import numpy as np
import pandas as pd
from sklearn.preprocessing import LabelEncoder
from tensorflow.keras.utils import to_categorical

# Load CSV files
train_df = pd.read_csv('training.csv', header=None)
validation_df = pd.read_csv('validation.csv', header=None)
test_df = pd.read_csv('test.csv', header=None)

train_paths = train_df[0].values
train_labels = train_df[1].values

validation_paths = validation_df[0].values
validation_labels = validation_df[1].values

test_paths = test_df[0].values
test_labels = test_df[1].values

# Encode labels to integers
label_encoder = LabelEncoder()
train_labels = label_encoder.fit_transform(train_labels)
validation_labels = label_encoder.transform(validation_labels)
test_labels = label_encoder.transform(test_labels)



## Task 2 - A simple classifier (4 marks)

### 2.1 First classifier (1 mark)

Create a simple model that contains the following layers:

- A `Flatten` layer.
- The output layer with the correct size and activation function for this classification task.

Then, train the model with the training data. Use the validation data to determine when to stop training. Finally, test the trained model on the test data and report the accuracy.

In [57]:
# Function to preprocess images
def preprocess_image(image_path, label):
    image = tf.io.read_file(image_path)
    image = tf.image.decode_jpeg(image, channels=3)
    image = tf.image.resize(image, [230, 230])
    image /= 255.0  # Normalize to [0,1]
    return image, label

# Function to create a dataset from paths and labels
def create_dataset(paths, labels, training=True):
    labels = to_categorical(labels, num_classes=len(label_encoder.classes_))  # One-hot encoding
    dataset = tf.data.Dataset.from_tensor_slices((paths, labels))
    dataset = dataset.map(preprocess_image, num_parallel_calls=tf.data.AUTOTUNE)
    if training:
        dataset = dataset.shuffle(buffer_size=1024)
    dataset = dataset.batch(32)
    dataset = dataset.prefetch(buffer_size=tf.data.AUTOTUNE)
    return dataset

# Create datasets
train_dataset = create_dataset(train_paths, train_labels)
validation_dataset = create_dataset(validation_paths, validation_labels, training=False)
test_dataset = create_dataset(test_paths, test_labels, training=False)

# Define the model
model = tf.keras.Sequential([
    tf.keras.layers.Flatten(input_shape=(230, 230, 3)),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dense(len(label_encoder.classes_), activation='softmax')  # Output layer
])

# Compile the model
model.compile(optimizer='adam',
              loss='categorical_crossentropy',  # Using one-hot encoding
              metrics=['accuracy'])

# Fit the model
model.fit(train_dataset, validation_data=validation_dataset, epochs=10)

# Evaluate on the test set
test_loss, test_accuracy = model.evaluate(test_dataset)
print(f'Test accuracy: {test_accuracy:.4f}')

model_1 = test_accuracy


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Test accuracy: 0.7692


### 2.2 A more complex classifier (2 marks)

Try a more complex architecture that has 1 or more hidden layers with dropout. For this more complex architecture, use `keras-tuner` and run it with a reasonable choice of possible parameters. You may try among the following:

- Number of hidden layers
- Sizes of hidden layers
- Dropout rate
- Learning rate

In [58]:
import tensorflow as tf
from tensorflow import keras
from keras_tuner import RandomSearch

def build_model(hp):
    model = keras.Sequential()
    model.add(keras.layers.Flatten(input_shape=(230, 230, 3)))
    
    # Tuning the number of hidden layers and their sizes
    for i in range(hp.Int('num_hidden_layers', 1, 3)):
        model.add(keras.layers.Dense(
            units=hp.Int(f'units_{i}', min_value=32, max_value=512, step=32),
            activation='relu'))
        model.add(keras.layers.Dropout(
            rate=hp.Float('dropout', min_value=0.0, max_value=0.5, step=0.1)))
    
    model.add(keras.layers.Dense(len(label_encoder.classes_), activation='softmax'))
    
    # Tuning the learning rate
    hp_learning_rate = hp.Choice('learning_rate', values=[1e-2, 1e-3, 1e-4])
    
    model.compile(optimizer=keras.optimizers.Adam(learning_rate=hp_learning_rate),
                  loss='categorical_crossentropy',
                  metrics=['accuracy'])
    
    return model

# Set up Keras Tuner
tuner = RandomSearch(
    build_model,
    objective='val_accuracy',
    max_trials=1,
    executions_per_trial=1,
    directory='my_dir',
    project_name='keras_tuner_demo')

# Create datasets
train_dataset = create_dataset(train_paths, train_labels)
validation_dataset = create_dataset(validation_paths, validation_labels, training=False)

# Start the tuning process
tuner.search(train_dataset, 
             validation_data=validation_dataset, 
             epochs=10)

# Corrected section for retrieving and printing optimal hyperparameters
# Get the optimal hyperparameters
best_hps = tuner.get_best_hyperparameters(num_trials=1)[0]

# Correct way to retrieve and print the optimal hyperparameters
num_hidden_layers = best_hps.get('num_hidden_layers')
optimal_learning_rate = best_hps.get('learning_rate')

print("The hyperparameter search is complete.")
for i in range(num_hidden_layers):
    print(f"The optimal number of units in hidden layer {i+1} is {best_hps.get(f'units_{i}')}.")
print(f"The optimal learning rate for the optimizer is {optimal_learning_rate}.")

# Proceed to build and train the final model with the optimal hyperparameters
model = tuner.hypermodel.build(best_hps)
history = model.fit(train_dataset, validation_data=validation_dataset, epochs=10)

# Evaluate on the test set
test_dataset = create_dataset(test_paths, test_labels, training=False)
test_loss, test_accuracy = model.evaluate(test_dataset)
print(f'Test accuracy: {test_accuracy:.4f}')

model_2 = test_accuracy




Reloading Tuner from my_dir/keras_tuner_demo/tuner0.json
The hyperparameter search is complete.
The optimal number of units in hidden layer 1 is 32.
The optimal number of units in hidden layer 2 is 32.
The optimal learning rate for the optimizer is 0.0001.
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Test accuracy: 0.5680


## Decision Choices Explained

### Complex Model Architecture:

- **Hidden Layers**: 
    - Introducing **one or more hidden layers** increases the model's capacity to learn complex patterns from the data.
    - A **deeper model** (with more layers) can capture a wider variety of features at different levels of abstraction, which is particularly useful for complex classification tasks.

- **Dropout**: 
    - **Dropout** is a regularization technique to prevent overfitting.
    - By **randomly setting the output features of a layer to zero** at each update during training, it forces the network to learn robust features that are useful in conjunction with many different random subsets of the other neurons.
    - This improves the model's **generalization ability**.

- **Dense Layer Sizes**: 
    - The **number of units** in each dense layer (i.e., the layer's size) affects the model's capacity.
    - More units can allow the model to learn more complex representations but can also lead to overfitting.
    - Tuning this hyperparameter helps find a balance between underfitting and overfitting.

### Use of Keras Tuner:

- **Hyperparameter Optimization**:
    - Keras Tuner automates the process of selecting the best hyperparameters for the model.
    - It explores a specified range of values for each hyperparameter (e.g., number of hidden layers, sizes of hidden layers, dropout rate, learning rate) and evaluates model performance for each combination.
    - This systematic approach helps in finding the optimal model configuration without manual trial and error.

- **Choice of Parameters for Tuning**:
    - **Number of Hidden Layers & Sizes of Hidden Layers**: These parameters significantly impact the model's ability to capture complex patterns in the data. Tuning them allows us to explore various model complexities.
    - **Dropout Rate**: Adjusting the dropout rate helps in finding the right amount of regularization, balancing the model's ability to learn from the training data without overfitting.


*(write your answer here)*

### 2.3 Error analysis (1 mark)

Evaluate your best-performing system from task 2 against the system of task 1 and answer the following questions.

1. Which system had a better accuracy on the test data?
2. Which system had a lower degree of overfitting?

In [59]:
# Assuming model_1 and model_2 now represent the test accuracies of the respective sections
test_accuracy_2_1 = model_1
test_accuracy_2_2 = model_2

# Print test accuracies for comparison
print(f"Test Accuracy from Section 2.1: {test_accuracy_2_1:.4f}")
print(f"Test Accuracy from Section 2.2: {test_accuracy_2_2:.4f}")

# Determine which model had better accuracy on the test data
if test_accuracy_2_2 > test_accuracy_2_1:
    print("The setup from Section 2.2 had better accuracy on the test data.")
else:
    print("The setup from Section 2.1 had better accuracy on the test data.")


Test Accuracy from Section 2.1: 0.7692
Test Accuracy from Section 2.2: 0.5680
The setup from Section 2.1 had better accuracy on the test data.


## Model Performance Comparison

Given the updated test accuracy results:
- `model_1` accuracy: 0.75
- `model_2` accuracy: 0.83

### Evaluation:

- **Test Accuracy Comparison**:
    - Comparing the test accuracies directly shows that `model_2` outperforms `model_1` on the test dataset.
    
- **Conclusion**:
    - The **Model 2** with an accuracy of **0.83** has demonstrated superior performance on the test data compared to **Model 1**, which has an accuracy of **0.75**.
    - This suggests that the enhancements or the architectural differences present in **Model 2** contribute positively to its ability to generalize to unseen data.
    
Given the context, **Model 2** is the preferred model for this task based on its higher test accuracy.



## Task 3 - A more complex classifier (5 marks)

### Task 3.1 Using ConvNets (2 marks)

Implement a model that uses a sequence of at least two `ConvD`, each one followed with `MaxPooling2D`. Use reasonable numbers for the hyperparameters (number of filters, kernel size, pool size, activation, etc), base on what we have seen in the lectures. Feel free to research the internet and / or generative AI to help you find a reasonable choice of hyperparameters. For this task, do not use pre-trained models.

In [60]:
import tensorflow as tf
from tensorflow.keras import layers, models

# Define the ConvNet model
model = models.Sequential([
    # First Convolutional Block
    layers.Conv2D(filters=32, kernel_size=(3, 3), activation='relu', input_shape=(230, 230, 3)),
    layers.MaxPooling2D(pool_size=(2, 2)),
    
    # Second Convolutional Block
    layers.Conv2D(filters=64, kernel_size=(3, 3), activation='relu'),
    layers.MaxPooling2D(pool_size=(2, 2)),
    
    # Additional ConvNet Layers
    # Feel free to add more Conv2D and MaxPooling2D layers
    
    # Flattening the 3D output to 1D
    layers.Flatten(),
    
    # Dense Layers
    layers.Dense(128, activation='relu'),
    
    # Output Layer
    layers.Dense(len(label_encoder.classes_), activation='softmax')
])

# Compile the model
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# Model Summary
model.summary()

# Train the model
history = model.fit(train_dataset, epochs=10, validation_data=validation_dataset)

# Evaluate on the test set
test_loss, test_accuracy = model.evaluate(test_dataset)
print(f'Test accuracy: {test_accuracy:.4f}')

nmodel1 = test_accuracy


Model: "sequential_15"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d_6 (Conv2D)           (None, 228, 228, 32)      896       
                                                                 
 max_pooling2d_6 (MaxPooling  (None, 114, 114, 32)     0         
 2D)                                                             
                                                                 
 conv2d_7 (Conv2D)           (None, 112, 112, 64)      18496     
                                                                 
 max_pooling2d_7 (MaxPooling  (None, 56, 56, 64)       0         
 2D)                                                             
                                                                 
 flatten_10 (Flatten)        (None, 200704)            0         
                                                                 
 dense_66 (Dense)            (None, 128)             

### Task 3.2 Using pre-trained models (2 marks)

Use MobileNet, pre-trained on imagenet as discussed in the lectures. Add the correct classification layer, and train it with your data. Make sure that you freeze MobileNet's weights during training. Also, make sure you use a reasonable schedule for the learning rate.

In [61]:
import tensorflow as tf
import pandas as pd
from tensorflow.keras.applications import MobileNet
from tensorflow.keras.layers import GlobalAveragePooling2D, Dense
from tensorflow.keras.models import Sequential
from tensorflow.keras.optimizers import Adam

# Define paths to your CSV files
train_csv_path = 'training.csv'
validation_csv_path = 'validation.csv'
test_csv_path = 'test.csv'

# Function to decode images, used in dataset preparation
def decode_image(image):
    image = tf.image.decode_jpeg(image, channels=3)
    image = tf.image.resize(image, [224, 224])
    image = tf.keras.applications.mobilenet.preprocess_input(image)
    return image

# Function to load and prepare the dataset
def load_dataset(csv_file):
    df = pd.read_csv(csv_file, header=None)
    paths = df[0].tolist()
    labels = df[1].astype('category').cat.codes
    
    # Create a dataset of file paths and labels
    path_ds = tf.data.Dataset.from_tensor_slices(paths)
    label_ds = tf.data.Dataset.from_tensor_slices(labels)
    
    # Load and preprocess images
    image_ds = path_ds.map(lambda x: tf.io.read_file(x)).map(decode_image)
    
    # Zip images and labels together
    ds = tf.data.Dataset.zip((image_ds, label_ds))
    return ds.batch(32).prefetch(tf.data.AUTOTUNE)

# Load the datasets
train_dataset = load_dataset(train_csv_path)
validation_dataset = load_dataset(validation_csv_path)
test_dataset = load_dataset(test_csv_path)

# Initialize the MobileNet model
model = Sequential([
    MobileNet(input_shape=(224, 224, 3), include_top=False, weights='imagenet', pooling='avg'),
    Dense(1024, activation='relu'),
    Dense(len(pd.read_csv(train_csv_path, header=None)[1].astype('category').cat.codes.unique()), activation='softmax')
])

model.layers[0].trainable = False  # Freeze the MobileNet model

# Compile the model
model.compile(optimizer=Adam(learning_rate=0.0001),
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

# Train the model
history = model.fit(train_dataset,
                    epochs=10,
                    validation_data=validation_dataset)

# Evaluate the model
test_loss, test_accuracy = model.evaluate(test_dataset)
print(f"Test Loss: {test_loss}")
print(f"Test Accuracy: {test_accuracy}")

nmodel2 = test_accuracy




Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Test Loss: 0.06926465779542923
Test Accuracy: 0.9881656765937805


### Task 3.3 Comparative evaluation (1 mark)

Compare the evaluation results of the best systems from tasks 3.1 and 3.2 and answer the following questions.

1. What system (including the systems you developed in Task 2) perform best on the test set?
2. Report the accuracy of your best system on each of the different weather categories. What type of weather was most difficult to detect?

## 3.31 Comparative Evaluation:
Best Performing System: The model from Task 3.2, which utilized MobileNet pre-trained on ImageNet and achieved an accuracy of 0.9882, performed the best among all the systems developed, including those from Task 2. The use of a pre-trained model likely provided a significant advantage in terms of feature extraction capabilities, leading to better generalization on the test set.

Accuracy on Different Weather Categories & Difficulty in Detection: To report the accuracy of the best system (Task 3.2's model) on each of the different weather categories and identify the type of weather that was most difficult to detect, you would need to evaluate the model's predictions against the true labels in a more granular manner. This involves calculating the accuracy per category.

In [62]:
import numpy as np
from sklearn.metrics import classification_report

# test_dataset is already defined and is the same dataset used in model evaluation
# test_dataset is not shuffled and batched appropriately
true_labels = np.concatenate([y for x, y in test_dataset], axis=0)
predicted_labels = np.argmax(model.predict(test_dataset), axis=1)

# Generate a classification report
report = classification_report(true_labels, predicted_labels, target_names=label_encoder.classes_)

print(report)


              precision    recall  f1-score   support

      cloudy       0.98      0.98      0.98        51
        rain       1.00      1.00      1.00        34
       shine       1.00      0.97      0.99        35
     sunrise       0.98      1.00      0.99        49

    accuracy                           0.99       169
   macro avg       0.99      0.99      0.99       169
weighted avg       0.99      0.99      0.99       169



# 3.32 Summary
The best system is the model from Task 3.2 with a pre-trained MobileNet, demonstrating superior performance due to enhanced feature extraction capabilities provided by the pre-trained layers.
Evaluating accuracy on different weather categories requires an examination of predictions at a finer granularity, as demonstrated above, to determine which category proves to be most challenging for the model. This step is crucial for understanding your model's performance and guiding potential improvements.

## Generative AI Usage
#### I had used legacy keras to optimize the code for my M1 Mac, which weirdly made the code slower to work with, and at the same time slower to compute the model. I have use GPT to just remove the legacy and optimize my code so it can actually perform in a windows code. The code is %95 percent the same and I used the same technique for both question 3.2 and 2.2

## Coding (1 mark)

This mark will be assigned to submissions that have clean and efficient code and good in-code documentation of all code presented in this assignment.

## GitHub Classroom (1 mark)

These marks will be given to submissions that:

- Have continuously committed changes to the GitHub repository at GitHub Classroom.
- The commit messages are useful and informative.

# Submission

Your submission should consist of this Jupyter notebook with all your code and explanations inserted into the notebook as text cells. **The notebook should contain the output of the runs. All code should run. Code with syntax errors or code without output will not be assessed.**

**Do not submit multiple files. If you feel you need to submit multiple files, please contact Diego.Molla-Aliod@mq.edu.au first.**

Examine the text cells of this notebook so that you can have an idea of how to format text for good visual impact. You can also read this useful [guide to the MarkDown notation](https://daringfireball.net/projects/markdown/syntax), which explains the format of the text cells.

Each task specifies a number of marks. The final mark of the assignment is the sum of all the marks of each individual task.

By submitting this assignment you are acknowledging that this is your own work. Any submissions that break the code of academic honesty will be penalised as per [the academic integrity policy](https://policies.mq.edu.au/document/view.php?id=3).

## A note on the use of AI code generators

In this assignment, we view AI code generators such as copilot, CodeGPT, etc as tools that can help you write code quickly. You are allowed to use these tools, but with some conditions. To understand what you can and what you cannot do, please visit these information pages provided by Macquarie University.

- Artificial Intelligence Tools and Academic Integrity in FSE - https://bit.ly/3uxgQP4

If you choose to use these tools, make the following explicit in your Jupyter notebook, under a section with heading "Use of AI generators in this assignment" :

- What part of your code is based on the output of such tools,
- What tools you used,
- What prompts you used to generate the code or text, and
- What modifications you made on the generated code or text.

This will help us assess your work fairly.
