### Database

- in this notebook you can find information about:

    - the structure of a document within the database
    
    - how to load files to mongoDB
    
    - how to merge collection within mongoDB

##### This is what a dataset should look like:

In [17]:
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

# load dataset
path = './dataset.csv'
dataset = pd.read_csv(path)
dataset.head(2)

Unnamed: 0,file_name,suffix,duration_s,pleasant,vibrant,eventful,chaotic,annoying,monotonous,uneventful,calm,ISO_Pleasantness,ISO_Eventfulness,SC_Nature,SC_Human,SC_Household,SC_Installation,SC_Signals,SC_Traffic,SC_Speech,SC_Music,Activity,Location8,FGsource,LAeq_default,N5_default,FavgArith_default,RAavgArith,SavgArith_default,R_default,T_default
0,1132730_16,.wav,15.0,1,3,3,3,3,3,2,0,1.146447,2.646447,0.0,0.0,4.0,0.0,0.0,0.0,0.0,0.0,1_Cooking_housework_workout,3_Bathroom,Wäschetrockner,56.0,11.3,0.0188,7.39,1.06,0.196,0.465
1,1132730_32,.wav,15.0,1,1,2,3,3,2,0,1,1.146447,2.56066,0.0,0.0,3.6,0.0,0.0,0.0,0.0,0.0,4_Sleeping_waking_up_relaxing,1_LivingRoom,Heizlüfter 2 Stück,40.12,3.51,0.0084,2.62,1.14,0.0624,0.0401


- if some of these Items are not in your dataset, you might want to create them and set them to nan, 0 or "undefined"

- if items are missing, some functionalities in the webapp will not be working 

- all soundscape items are scaled from 0-4. The sliders in the web app are also sclaed from 0-4 (except the acoustic features)
- --> The ranges of the acoustic features are the min and max value of each feature in the dataset. You have to set the slider ranges of the acoustic features manually in the webapp according to your dataset
- --> set min max values in frontend/src/components/SliderRanges/sliderSoundscapeComponent.js


In [None]:
# get min max ranges of acoustic features
# --> set min max values in frontend/src/components/SliderRanges/sliderSoundscapeComponent.js

col = ['LAeq_default', 'N5_default', 'FavgArith_default', 'RAavgArith', 'SavgArith_default', 'R_default', 'T_default']

min_max_values = dataset[col].agg({'min', 'max'})
min_max_values

### MongoDB

- you need to have mongoDB installed on your machine or use MongoDB Atlas

- to start mongoDB on Mac: brew services start mongodb-community@6.0

- you might want to check out mongoDB Compass for a nice GUI overview of your database

##### Create Connection

In [19]:
import pandas as pd
from pymongo import MongoClient

# Connect to MongoDB (the link will be different if you use mongoDB Atlas)
client = MongoClient('mongodb://localhost:27017/')

# name of your database (set the name later on in the backend/src/config.py file)
db = client['soundscape_search']

# create a collection
collection = db['dataset_demo']

##### Load to DB

In [20]:
# run this to store all soundscapes to database
for index, soundscape in dataset.iterrows():
    collection.insert_one(soundscape.to_dict())

In [6]:
# run this to just store a few

collection = db['dataset_2']

for i in range(1000, 2000):
    a = dataset.iloc[i]
    collection.insert_one(a.to_dict())

##### Merge different Collections

In [7]:
collection1 = db['dataset_1']
collection2 = db['dataset_2']
merged_collection = db['merged_collection']

# Retrieve documents from both collections and merge them
documents_collection1 = collection1.find()
documents_collection2 = collection2.find()

for document in documents_collection1:
    merged_collection.insert_one(document)

for document in documents_collection2:
    merged_collection.insert_one(document)

##### Store to mongoDB Atlas (cloud database)

--> atlas search, might be better for text search?

--> you first need to create a search index in atlas (see online documentation)

In [2]:
import pandas as pd
from pymongo import MongoClient

client = MongoClient("your connection string to mongoDB atlas")
db = client['soundscape_search']
collection = db['soundscape_search']

# run this to store all soundscapes to database
for index, soundscape in dataset.iterrows():
    collection.insert_one(soundscape.to_dict())

In [None]:
# run search query using mongoDB Atlas search
def text_search():
    result = collection.aggregate([
        {   
            "$search": {
                "index": "text_search",
                "text": {
                    "query": "mann im haus",
                    "path": "FGsource",
                    "fuzzy": {}
                }
            }
        }
    ])
    return list(result)

text_search()