# Drone Detection Project Manual

The aim of this documentation is let users replicate the entire process of setup, training and running the Dolatro project. We tried to make this as user friendly as possible, please forgive us if there are any inconsistencies.

### The following link explains the inital steps to set up a jetson nano 4GB version:
https://developer.nvidia.com/embedded/learn/get-started-jetson-nano-devkit
After successfully installing JetPack, plug the MicroSD card into the jetson Nano and power it on. Make sure to connect the two jumper pins (J48 Pin) on the Jetson before using an AC adapter. Read the guide above for more details

## Ubuntu Setup

Right after a fresh Ubuntu flash on your Jetson Nano, you’ll want to:

Refresh and upgrade all system packages

#### UPDATE SYSTEM
sudo apt update
sudo apt upgrade -y
sudo reboot
#### RUN THIS

sudo apt install python3-pip -y

### INSTALL CORE DEPENDENCIES
Note that if you get errors using 'pip' use 'pip3' instead:

pip3 install --upgrade pip
pip3 install numpy pandas matplotlib
pip3 install opencv-python
pip3 install torch torchvision torchaudio
python3 -m pip install --upgrade pip setuptools wheel

#### INSTALL PYCUDA & CYTHON
pip3 install cython==0.29.36
export PATH=/usr/local/cuda-10.2/bin:${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-10.2/lib64:$LD_LIBRARY_PATH
python3 -m pip install pycuda --user



#### CREATE VIRTUAL ENVIRONMENT WITH PRE_INSTALLED ROOT SITE PACKAGES
python3 -m venv --system-site-packages ~/envs/pycuda_env_sys
source ~/envs/pycuda_env_sys/bin/activate

### INSTALL ACOUSTIC DETECTION PACKAGES FIRST!

pip3 install sounddevice
pip3 install pydub
pip3 install pyusb

For adafruit, do the following:

sudo pip3 install -U \
adafruit-circuitpython-busdevice==5.1.2 \
adafruit-circuitpython-motor==3.3.5 \
adafruit-circuitpython-pca9685==3.4.1 \
adafruit-circuitpython-register==1.9.8 \
adafruit-circuitpython-servokit==1.3.8 \
Adafruit-Blinka==6.11.1 \
Adafruit-GPIO==1.0.3 \
Adafruit-MotorHAT==1.4.0 \
Adafruit-PlatformDetect==3.19.6 \
Adafruit-PureIO==1.1.9 \
Adafruit-SSD1306==1.6.2

For librosa, do the following:

Install Numba without dependencies:
pip3 install numba==0.53

Install LLVM 6.x:

sudo apt update
sudo apt install -y wget gnupg software-properties-common
wget https://apt.llvm.org/llvm-snapshot.gpg.key
sudo apt-key add llvm-snapshot.gpg.key

sudo apt update
sudo apt install -y wget gnupg software-properties-common
wget https://apt.llvm.org/llvm-snapshot.gpg.key
sudo apt-key add llvm-snapshot.gpg.key

sudo add-apt-repository "deb http://apt.llvm.org/bionic/ llvm-toolchain-bionic-9 main"
sudo apt update

sudo apt install -y llvm-9 llvm-9-dev llvm-9-tools clang-9

Now install llvmlite from source:

sudo apt update
sudo apt install -y python3-dev python3-pip build-essential cmake git libffi-dev

git clone https://github.com/numba/llvmlite.git
cd llvmlite
git checkout v0.33.0


export LLVM_CONFIG=/usr/bin/llvm-config-9
python3 -m pip install .

Finally, install compatoble librosa version:

pip3 install librosa==0.8.1

## Python Setup:
JetPack already comes preinstalled with Python 3.6. However, the system may still configured to install libraries on Python 2 version, which is not valid and should be changed. To check if the Python directory is correctly initiated, run the following code and observe the directories:


Note: to run code using Ubuntu, simply create a text file on the text editor software that comes preinstalled in JetPack, write the code, and save it as a .py file. Alternatively, if your code will be short such as the code shown in the following cell, simply run "python -m '(code)'"

If you observe that the firstmost directory is not the correct one, being '/usr/local/lib/python3.6/dist‑packages'
Run the following code to fix it:

# Visual Detection: Setting Up

###### Install Anaconda Navigator (Distribution Installer) for your operating system.
###### https://www.anaconda.com/download/success


###### Create a folder on which you will be working on. 
###### Make sure it is in the root of a storage directory to prevent future issues (example: "C:/Project_Folder")

###### Search for "anaconda prompt" in Start. It will open a cmd like window. 

In [5]:
#Enter the following prompts (one by one and in order)

#This will create the conda environment named visualDetection
conda create -n visualDetection 

#Activate the environment after creation
conda activate visualDetection

#Set your directory to Project_Folder, so all you work will be saved on that folder
cd "path/to/your/folder"   #REPLACE YOUR FOLDER PATH HERE

#Then clone the yolov5 github repository
git clone https://github.com/ultralytics/yolov5.git

#change directory to the cloned yolov5 folder
cd yolov5

#Then run the requirements.txt and it will isntall all the required dependencies and libraries
pip install -r requirements.txt


SyntaxError: invalid syntax (4178688365.py, line 4)

##### Datasets Preparation

###### Now we will get the datasets for our projects 
###### Download the First Dataset from https://drive.google.com/file/d/1NPYaop35ocVTYWHOYQQHn8YHsM9jmLGr/view
###### Extract it in a new folder called "Datasets" inside the yolov5 folder directory
###### We will first run a script to remove the infrared video samples first and then use the remaining rgb video sample to create images and labels, valid to YOLO training format

We will need to then convert the collected rgb video files to YOLO training format.


# Acoustic Detection Setup:

To perform Acoustic Detection, we need to perform the following:

- Find a suitable dataset
- Preprocess the dataset
- Extract MFCCs
- Train and Save model

## Step 1: Find a Suitable Dataset

The dataset used for Acoustic Detection can be found form the following link: https://huggingface.co/datasets/geronimobasso/drone-audio-detection-samples, and it is claimed by the owner to be the largest dataset on the internet for drone audio, consisting of nearly 7 GB of files. We decided to select five parquets (~ 23,000 audio files) split equally in half for 'No drone' and 'Drone' data

## Step 2: Preprocess the Dataset

After downloading the dataset, the files will be grouped in parquet zip folders and will need a python script to be extracted and labeled correctly. Be

In [None]:
import os
import pandas as pd
import json
import base64
from glob import glob

def decode_audio_field(field):
    if isinstance(field, (bytes, bytearray)):
        return bytes(field)
    if isinstance(field, str):
        try:
            return base64.b64decode(field)
        except Exception:
            arr = json.loads(field)
            return bytes(arr)
    if isinstance(field, list):
        return bytes(field)
    raise ValueError(f"Unsupported audio field type: {type(field)}")

def process_parquet_file(parquet_path, yes_dir, no_dir, audio_column="audio_bytes", label_column='path'):
    df = pd.read_parquet(parquet_path)
    for idx, row in df.iterrows():
        audio_bytes = decode_audio_field(row[audio_column])
        
        label = str(row[label_column]).strip().upper()
        output_dir = no_dir if 'no-drone' in label else yes_dir
        os.makedirs(output_dir, exist_ok=True)
        
        base_name = os.path.splitext(os.path.basename(parquet_path))[0]
        filename = f"{base_name}_{idx}.wav"
        
        out_path = os.path.join(output_dir, filename)
        with open(out_path, "wb") as f:
            f.write(audio_bytes)

def main(parquet_dir, yes_dir, no_dir, audio_column="bytes", label_column="path"):
    parquet_files = glob(os.path.join(parquet_dir, "*.parquet"))
    for pq in parquet_files:
        process_parquet_file(pq, yes_dir, no_dir, audio_column, label_column)

if __name__ == "__main__":
    import argparse
    parser = argparse.ArgumentParser(description="Decode audio from Parquet files and save as WAV by label")
    parser.add_argument("parquet_dir", help="Directory containing .parquet files")
    parser.add_argument("yes_dir", help="Output directory for YES DRONE WAVs")
    parser.add_argument("no_dir", help="Output directory for NO DRONE WAVs")
    parser.add_argument("--audio_column", default="bytes", help="Name of the audio bytes column in the Parquet files")
    parser.add_argument("--label_column", default="path", help="Name of the label column (YES_DRONE / NO_DRONE)")
    args = parser.parse_args()
    
    # Ensure output directories exist
    os.makedirs(args.yes_dir, exist_ok=True)
    os.makedirs(args.no_dir, exist_ok=True)
    
    main(args.parquet_dir, args.yes_dir, args.no_dir, args.audio_column, args.label_column)

# Usage:
# Save this script as `decode_wav.py`, then run:
# python decode_wav.py /path/to/parquets /path/to/YES_DRONE /path/to/NO_DRONE

## Step 3: Extract MFCCs and Set Up Data

To extract MFCCs, follow these steps:

In [7]:
# Install Dependencies

!pip install tensorflow==2.4.1 tensorflow-gpu==2.4.1 tensorflow-io matplotlib

# Build Dataloading Function:
def load_wav_16k_mono(filename):
    # Load encoded wav file
    file_contents = tf.io.read_file(filename)
    # Decode wav (tensors by channels) 
    wav, sample_rate = tf.audio.decode_wav(file_contents, desired_channels=1)
    # Removes trailing axis
    wav = tf.squeeze(wav, axis=-1)
    sample_rate = tf.cast(sample_rate, dtype=tf.int64)
    # Goes from 44100Hz to 16000hz - amplitude of the audio signal
    return wav


# Concatenate Dataset:
POS = os.path.join('path-to-pos-dir')
NEG = os.path.join('path-to-neg-dir')
pos = tf.data.Dataset.list_files(POS+'\*.wav')
neg = tf.data.Dataset.list_files(NEG+'\*.wav')
positives = tf.data.Dataset.zip((pos, tf.data.Dataset.from_tensor_slices(tf.ones(len(pos)))))
negatives = tf.data.Dataset.zip((neg, tf.data.Dataset.from_tensor_slices(tf.zeros(len(neg)))))
data = positives.concatenate(negatives)

# Build preprocessing Function
def preprocess(file_path, label): 
    wav = load_wav_16k_mono(file_path)
    wav = wav[:'rate-of-audio-dataset']
    zero_padding = tf.zeros(['rate-of-audio-dataset'] - tf.shape(wav), dtype=tf.float32)
    wav = tf.concat([zero_padding, wav],0)
    spectrogram = tf.signal.stft(wav, frame_length=320, frame_step=32)
    spectrogram = tf.abs(spectrogram)
    spectrogram = tf.expand_dims(spectrogram, axis=2)
    print(spectrogram)
    return spectrogram, label

# Create Testing and Training Partitions
data = data.map(preprocess)
data = data.cache()
data = data.shuffle('length-of-audio-dataset')
data = data.batch(16).map(lambda x, y: (tf.ensure_shape(x, (None, 'SHAPE X', 'SHAPE Y', 'SHAPE Z')), y))  
data = data.prefetch(8)


ERROR: Could not find a version that satisfies the requirement tensorflow==2.4.1 (from versions: 2.16.0rc0, 2.16.1, 2.16.2, 2.17.0rc0, 2.17.0rc1, 2.17.0, 2.17.1, 2.18.0rc0, 2.18.0rc1, 2.18.0rc2, 2.18.0, 2.18.1, 2.19.0rc0, 2.19.0)
ERROR: No matching distribution found for tensorflow==2.4.1


## Step 4: Training the Model

Building the model may differ based on hardware and dataset size. One must undergo some trial and error before using the most optimal model. The model we have used for our dataset can be coded by:

In [11]:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, GlobalAveragePooling2D
from tensorflow.keras.layers import Dense, Dropout, BatchNormalization

input_shape = (241, 257, 1)

model = Sequential([
    # Conv Block 1
    Conv2D(32, (3, 3), activation='relu', padding='same', input_shape=input_shape),
    BatchNormalization(),
    MaxPooling2D(pool_size=(2, 2)),

    # Conv Block 2
    Conv2D(64, (3, 3), activation='relu', padding='same'),
    BatchNormalization(),
    MaxPooling2D(pool_size=(2, 2)),

    # Conv Block 3
    Conv2D(128, (3, 3), activation='relu', padding='same'),
    BatchNormalization(),
    MaxPooling2D(pool_size=(2, 2)),

    # Global Pooling instead of Flatten
    GlobalAveragePooling2D(),

    # Dense Layers
    Dense(128, activation='relu'),
    Dropout(0.5),  # Helps prevent overfitting

    # Output Layer
    Dense(1, activation='sigmoid')  # Binary classification
])

model.compile('Adam', loss='binary_crossentropy', metrics=[tf.keras.metrics.Recall(),tf.keras.metrics.Precision()])
model.summary()

In [None]:
### Now, just fit and train the model

from tensorflow.keras.callbacks import EarlyStopping
earlystop_cb = EarlyStopping(
    monitor='val_loss',
    min_delta=0.001,
    patience=1,
    restore_best_weights=True,
    verbose=1
)

hist = model.fit(
    train,
    epochs=50,
    validation_data=test,
    callbacks=[earlystop_cb]
)

model.save('model_name.keras')

# Our model is ready and saved!

## Step 5: Convert from .keras to ONNX

After finishing up with our model, we need to convert it to ONNX:
(Do this step on a Google Colab file!)

## Install Dependencies:

pip3 install keras tf2onnx keras tensorflow

## RUN THIS

import keras
import tf2onnx
import tensorflow as tf

model = keras.models.load_model("model_name.keras")
spec = [tf.TensorSpec(shape=(None, 491, 257, 1), dtype=tf.float32, name='input')]
model.output_names=['output']
model_proto, _ = tf2onnx.convert.from_keras(
    model,
    input_signature=spec,
    opset=11,
    output_path="the_final_model3.onnx"
)


## STEP 6: Convert from ONNX to TRT

On the Jetson Nano, do this:

trtexec --onnx=model.onnx --saveEngine=model_fp32.engine --workspace=2048

### Please make sure all files are accessed through their correct respective directories!
### Congratulations, you should have the final model!

## Now let's set up the ReSpeaker:

Connect the respaker to the jetson

sudo apt-get update
sudo pip install pyusb click
git clone https://github.com/respeaker/usb_4_mic_array.git
cd usb_4_mic_array
sudo python3 dfu.py --download 6_channels_firmware.bin
