## Data notebook (T2 2024)
This python notebook is designed to download the data from google buckets and show summary of the current available training data

**Goal**: Download the bird dataset from google buckets to local machine. In addition show a summary of the state of the current data as of T2 2024.

### Notebook Overview:
- **Download**: Downloads data from google bucket to specified local location
- **Data Summary**:shows summary of all the available datasets
- **Upload**: Uploads the weather sounds data to google bucket


### Download

Switch between the list of buckets to download that specific data as needed<br>
Buckets available to download: 
- project_echo_bucket_3
- project_echo_bucket_2
- project_echo_bucket_1

In [None]:
pip install --upgrade google-cloud-storage

In [2]:
from google.cloud import storage
import os

list_of_buckets = ["project_echo_bucket_3","project_echo_bucket_2","project_echo_bucket_1"]
os.environ["GCLOUD_PROJECT"] = "sit-23t1-project-echo-25288b9"

# Enter here your desired download folder
dl_dir = r"D:\Deakin\Project Echo\Data2"

storage_client = storage.Client()
bucket = storage_client.get_bucket(list_of_buckets[1])
blobs = bucket.list_blobs()  # Get list of files
for blob in blobs:
    folder_name = blob.name.split("/")[0]
    file_name = blob.name.split("/")[1]
    path = os.path.join(dl_dir, folder_name)
    if not os.path.exists(path):
        os.makedirs(path)
    blob.download_to_filename(os.path.join(dl_dir, folder_name, file_name))

KeyboardInterrupt: 

### Data summary
Below is the summary of the data in the buckets

In [None]:
storage_client = storage.Client()
# Get the buckets
bucket1 = storage_client.get_bucket(list_of_buckets[0])
bucket2 = storage_client.get_bucket(list_of_buckets[1])
bucket3 = storage_client.get_bucket(list_of_buckets[2])
# bucket4 = storage_client.get_bucket(list_of_buckets[3])
# List all blobs in both buckets
blobs1 = bucket1.list_blobs()
blobs2 = bucket2.list_blobs()
blobs3 = bucket3.list_blobs()
# blobs4 = bucket4.list_blobs()
# Get the file names
files1 = {blob.name for blob in blobs1}
files2 = {blob.name for blob in blobs2}
files3 = {blob.name for blob in blobs3}
# files4 = {blob.name for blob in blobs4}

In [3]:
from collections import Counter

def summarize_files(bucket_name):
    # Initialize a storage client
    storage_client = storage.Client()

    # Get the bucket
    bucket = storage_client.get_bucket(bucket_name)

    # List all blobs in the bucket
    blobs = bucket.list_blobs()

    # Initialize summary variables
    total_files = 0
    total_size = 0

    print("Files in bucket:")
    for blob in blobs:
        total_files += 1
        total_size += blob.size
        print(f"Name: {blob.name}, Size: {blob.size} bytes")

    print("\nSummary:")
    print(f"Total files: {total_files}")
    print(f"Total size: {total_size/(1024*1024)} Megabytes")


def compare_buckets(files1, files2):
    # Find matching file names
    matching_files = files1.intersection(files2)
    total_files = len(files1.union(files2))

    # Calculate the percentage of matching file names
    if total_files > 0:
        match_percentage = (len(matching_files) / total_files) * 100
    else:
        match_percentage = 0

    print(f"Total files in bucket 1: {len(files1)} ")
    print(f"Total files in bucket 2: {len(files2)} ")
    print(f"Matching files: {len(matching_files)}")
    print(f"Percentage of matching file names: {match_percentage:.2f}%")


def class_count(files):
    class_counter = Counter()

    for file_name in files:
        class_name = os.path.dirname(file_name)
        class_counter[class_name] += 1

    # Print the summary
    print("Class summary:")
    for class_name, count in class_counter.items():
        print(f"{class_name}: {count}")

    print(f"\nTotal number of unique classes: {len(class_counter)}")
    print(f"Total number of files: {sum(class_counter.values())}")

In [54]:
summarize_files(list_of_buckets[0])


Files in bucket:
Name: Acanthiza chrysorrhoa/region_11.250-13.250.mp3, Size: 33260 bytes
Name: Acanthiza chrysorrhoa/region_11.750-13.750.mp3, Size: 16944 bytes
Name: Acanthiza chrysorrhoa/region_12.800-14.800.mp3, Size: 33260 bytes
Name: Acanthiza chrysorrhoa/region_13.250-15.250.mp3, Size: 33260 bytes
Name: Acanthiza chrysorrhoa/region_14.750-16.050.mp3, Size: 11301 bytes
Name: Acanthiza chrysorrhoa/region_16.750-18.750.mp3, Size: 33260 bytes
Name: Acanthiza chrysorrhoa/region_17.600-19.600.mp3, Size: 16944 bytes
Name: Acanthiza chrysorrhoa/region_17.750-19.750.mp3, Size: 16944 bytes
Name: Acanthiza chrysorrhoa/region_18.500-20.500.mp3, Size: 16748 bytes
Name: Acanthiza chrysorrhoa/region_22.750-24.750.mp3, Size: 33260 bytes
Name: Acanthiza chrysorrhoa/region_24.750-26.750.mp3, Size: 33260 bytes
Name: Acanthiza chrysorrhoa/region_28.750-30.750.mp3, Size: 33260 bytes
Name: Acanthiza chrysorrhoa/region_3.650-4.900.mp3, Size: 21356 bytes
Name: Acanthiza chrysorrhoa/region_4.650-6.650.mp

In [4]:
summarize_files(list_of_buckets[1])




Files in bucket:
Name: Acanthiza chrysorrhoa/region_11.250-13.250.mp3, Size: 33261 bytes
Name: Acanthiza chrysorrhoa/region_11.750-13.750.mp3, Size: 16945 bytes
Name: Acanthiza chrysorrhoa/region_12.800-14.800.mp3, Size: 33261 bytes
Name: Acanthiza chrysorrhoa/region_13.250-15.250.mp3, Size: 33261 bytes
Name: Acanthiza chrysorrhoa/region_14.750-16.050.mp3, Size: 11302 bytes
Name: Acanthiza chrysorrhoa/region_16.750-18.750.mp3, Size: 33261 bytes
Name: Acanthiza chrysorrhoa/region_17.600-19.600.mp3, Size: 16945 bytes
Name: Acanthiza chrysorrhoa/region_18.500-20.500.mp3, Size: 16749 bytes
Name: Acanthiza chrysorrhoa/region_22.750-24.750.mp3, Size: 33261 bytes
Name: Acanthiza chrysorrhoa/region_24.750-26.750.mp3, Size: 33261 bytes
Name: Acanthiza chrysorrhoa/region_28.750-30.750.mp3, Size: 33261 bytes
Name: Acanthiza chrysorrhoa/region_3.650-4.900.mp3, Size: 21357 bytes
Name: Acanthiza chrysorrhoa/region_4.650-6.650.mp3, Size: 33261 bytes
Name: Acanthiza chrysorrhoa/region_41.750-43.750.mp

In [5]:
summarize_files(list_of_buckets[2])


Files in bucket:
Name: Acanthiza chrysorrhoa/region_17.750-19.750.wav, Size: 181036 bytes
Name: Acanthiza chrysorrhoa/region_72.650-74.650.wav, Size: 181036 bytes
Name: Acanthiza chrysorrhoa/region_75.750-77.750.wav, Size: 194860 bytes
Name: Acanthiza lineata/region_2.700-4.700.wav, Size: 194860 bytes
Name: Acanthiza lineata/region_34.000-36.000.wav, Size: 389642 bytes
Name: Acanthiza lineata/region_42.450-44.450.wav, Size: 194860 bytes
Name: Acanthiza nana/region_11.550-13.550.wav, Size: 361994 bytes
Name: Acanthiza nana/region_14.500-16.500.wav, Size: 181036 bytes
Name: Acanthiza nana/region_87.900-89.900.wav, Size: 194860 bytes
Name: Acanthiza pusilla/region_21.150-22.800.wav, Size: 162604 bytes
Name: Acanthiza pusilla/region_25.800-27.000.wav, Size: 118828 bytes
Name: Acanthiza pusilla/region_90.000-92.000.wav, Size: 194860 bytes
Name: Acanthiza reguloides/region_12.950-13.550.wav, Size: 61228 bytes
Name: Acanthiza reguloides/region_4.800-5.600.wav, Size: 159242 bytes
Name: Acanthi

In [56]:
print("bucket 1 --- bucket 2")
compare_buckets(files1, files2)
print("bucket 2 --- bucket 3")
compare_buckets(files2, files3)
print("bucket 1 --- bucket 3")
compare_buckets(files1, files3)


bucket 1 --- bucket 2
Total files in bucket 1: 7536 
Total files in bucket 2: 7161 
Matching files: 6888
Percentage of matching file names: 88.21%
bucket 2 --- bucket 3
Total files in bucket 1: 7161 
Total files in bucket 2: 353 
Matching files: 0
Percentage of matching file names: 0.00%
bucket 1 --- bucket 3
Total files in bucket 1: 7536 
Total files in bucket 2: 353 
Matching files: 5
Percentage of matching file names: 0.06%


In [57]:
class_count(files1)

Class summary:
Barnardius zonarius: 202
Cisticola exilis: 212
Sus Scrofa: 40
Phylidonyris niger: 540
Vulpes vulpes: 103
Cincloramphus mathewsi: 148
Acanthiza nana: 144
Falco peregrinus: 43
Parvipsitta pusilla: 63
Acanthiza pusilla: 246
Rhipidura albiscapa: 442
Colluricincla harmonica: 608
Rhipidura leucophrys: 646
Plectorhyncha lanceolata: 47
Cophixalus infacetus: 172
Carterornis leucotis: 77
Cophixalus exiguus: 30
Philemon corniculatus: 109
Anthochaera phrygia: 33
Litoria inermis: 211
Dasyurus maculatus: 261
Ranoidea caerulea: 12
Acanthorhynchus tenuirostris: 100
Falco berigora: 48
Haliastur sphenurus: 116
Acanthiza reguloides: 157
Artamus cyanopterus: 23
Acanthiza lineata: 29
Philemon citreogularis: 168
Acanthiza uropygialis: 62
Pachycephala simplex: 3
Capra Hircus: 54
Scythrops novaehollandiae: 7
Petrochelidon nigricans: 47
Stizoptera bichenovii: 56
Pycnoptilus floccosus: 8
Manorina melanophrys: 61
Eurostopodus mystacalis: 22
Strepera versicolor: 18
Daphoenositta chrysoptera: 43
Rat

In [39]:
class_count(files2)


Class summary:
Dasyurus maculatus: 257
Barnardius zonarius: 199
Cisticola exilis: 208
Sus Scrofa: 37
Phylidonyris niger: 537
Vulpes vulpes: 100
Cincloramphus mathewsi: 144
Acanthiza nana: 141
Falco peregrinus: 40
Parvipsitta pusilla: 60
Acanthiza pusilla: 243
Rhipidura albiscapa: 439
Colluricincla harmonica: 604
Rhipidura leucophrys: 643
Plectorhyncha lanceolata: 44
Cophixalus infacetus: 168
Carterornis leucotis: 73
Cophixalus exiguus: 27
Philemon corniculatus: 106
Anthochaera phrygia: 30
Litoria inermis: 208
Ranoidea caerulea: 8
Acanthorhynchus tenuirostris: 97
Falco berigora: 45
Haliastur sphenurus: 113
Acanthiza reguloides: 154
Artamus cyanopterus: 19
Acanthiza lineata: 26
Philemon citreogularis: 165
Acanthiza uropygialis: 59
Capra Hircus: 51
Petrochelidon nigricans: 43
Stizoptera bichenovii: 52
Manorina melanophrys: 58
Strepera versicolor: 15
Daphoenositta chrysoptera: 40
Rattus Norvegicus: 99
Melithreptus brevirostris: 18
Rhipidura rufifrons: 87
Petroica goodenovii: 120
Melithrept

In [40]:
class_count(files3)


Class summary:
Spilopelia chinensis: 3
Callocephalon fimbriatum: 3
Daphoenositta chrysoptera: 3
Egretta novaehollandiae: 3
Melithreptus gularis: 3
Acanthiza chrysorrhoa: 3
Chenonetta jubata: 3
Dama Dama: 3
Climacteris picumnus: 3
Scythrops novaehollandiae: 3
Philemon corniculatus: 3
Petrochelidon ariel: 3
Cormobates leucophaea: 3
Vulpes vulpes: 3
Capra Hircus: 3
Acanthiza reguloides: 3
Rhipidura albiscapa: 3
Petroica rosea: 3
Cisticola exilis: 3
Neophema pulchella: 3
Gallinula tenebrosa: 3
Rattus Norvegicus: 3
Petroica phoenicea: 3
Symposiachrus trivirgatus: 3
Philemon citreogularis: 3
Uperoleia mimula: 3
Geopelia cuneata: 2
Ceyx azureus: 3
Artamus superciliosus: 3
Phaps elegans: 3
Haliastur sphenurus: 3
Coracina papuensis: 3
Dicaeum hirundinaceum: 3
Eurostopodus mystacalis: 3
Cervus Unicolour: 3
Colluricincla harmonica: 3
Sus Scrofa: 3
Nesoptilotis leucotis: 3
Acanthorhynchus tenuirostris: 3
Trichosurus vulpecula: 3
Alauda arvensis: 3
Melithreptus lunatus: 3
Petrochelidon nigricans: 3

### Upload
Uploading the weather sounds dataset to google bucket

In [60]:
# Create a new bucket
weather_sounds_bucket = "weather_sounds"
bucket = storage_client.create_bucket(weather_sounds_bucket)

print(f"Bucket {bucket.name} created.")

Forbidden: 403 POST https://storage.googleapis.com/storage/v1/b?project=sit-23t1-project-echo-25288b9&prettyPrint=false: s222521515@deakin.edu.au does not have storage.buckets.create access to the Google Cloud project. Permission 'storage.buckets.create' denied on resource (or it may not exist).

In [None]:
# Enter bucket name, check connection

bucket_name = 'project_echo_bucket_3'

my_bucket = storage_client.get_bucket(bucket_name)

vars(my_bucket)