#### Beschreibung:
Importiert die projektspezifischen Rohdaten aus dem Projektordner in die MongoDB (Datenbank).

#### Workflow:
Rohdaten werden über die Projektordner data/projects/*&lt;Projektname&gt;*/machine/ ausgelesen. Zum Vergleich dieser Notebooks wird die Normalisierte Kreuzkorrelation aus data_compare/Normalized_Cross_Correlation verwendet.

# Daten in MongoDB importieren

In [None]:
%run ../Setup.ipynb

from isac.conversion import xml_to_json as converter
from isac.database.connection import database_connector as connector
from isac.database.structure import column_meta_helper as structureMeta
from isac import configuration
import pandas as pd

if __name__ == "__main__":
    print("Please start import by running notebook notebooks/import_projects/Import_All_Projects")
else:
    importFile = configuration.machine_importFile
    connection = connector.DatabaseConnector(name = "ISAC_" + configuration.project_name).connect()

## Daten konvertieren / normieren

#### Einträge einlesen

In [None]:
order = configuration.order

records = pd.read_csv(importFile, encoding='latin1', header=None, names = order)
print("Anzahl der gefundenen Einträge: " + str(len(records)))

## Cleaning
#### Die einzelnen Dokumente werden bereinigt importiert

In [None]:
df = pd.DataFrame(records)

#### Rename, Sort And Reorder

In [None]:
df.sort_values(by=['time'])

#### Cleaning Data
#### Fill NaN Values With Value Of Last Row
Pandas supports a fillna method with diffrent options. In our case ffill, forward fill, the method pastes the last occuring correct value in the next row.

In [None]:
df = df.fillna(method='ffill')

#### Clean From Errors
As far as we examined the data and what the experts concluded there aren't any errors or outliers

#### Clean Duplicates
Also no duplicate data was founnd. Therefore the dataset stays the same atfer droping the duplicates by having any. 

In [None]:
df = df.drop_duplicates()

In [None]:
records = df[order].to_dict('records')

## In Mongo speichern

In [None]:
# Alte Collection löschen
collection = connection.getCollection(configuration.collections.cleaned)
collection.drop()

# Neue Dokumente einfügen
result = collection.insert_many(record for record in records)

print("Anzahl der gespeicherten bereinigten Dokumente: " + str(len(result.inserted_ids)))