# 01 Create Data for Path Animation Decision Model

In this notebook, we create a training dataset for a model that decides which paths of an SVG file should be animated. For this, the following steps are required:
* Create a subset of the most critical paths of an SVG file and extract them into separate SVG files
* Upload the separate SVG files to our label website and then label which paths should be animated
* Download the labelled dataset from the website and preprocess them to be used to train a model

## I. Create a subset of the most critical paths of an SVG file and extract them into separate SVG files

Before we are able to extract the most important paths of a SVG (measured by MSE when removed) we have to preprocess the logos and then decompose the logos into the separat paths.

In [1]:
from src.preprocessing.insert_paths_ids import insert_ids_in_folder
from src.preprocessing.decompose_logo import decompose_logos_in_folder
from src.preprocessing.sort_and_extract_paths import *

Add unique identifiers to every path of a svg file. The decompose logos in their separate paths.

In [None]:
insert_ids_in_folder("../data/external/logos_dataset", new_folder="../data/interim/logos_preprocessed")

In [None]:
expand_viewbox_in_folder("../data/interim/logos_preprocessed", 50, "../data/interim/logos_preprocessed")

In [None]:
decompose_logos_in_folder("../data/interim/logos_preprocessed", "../data/interim/logos_decomposed")

In [None]:
apply_embedding_model_to_svgs(data_folder="../data/interim/logos_decomposed", split_paths=True, save=True)

In [None]:
with open('data/embeddings/hierarchical_ordered_decomposed_svgs_embedding.pkl', 'rb') as f:
    path_embedding = pickle.load(f)

@Becci was sind hier die nächsten Schritte mit dem Modul sort_and_extract_paths

## II. Upload the separate SVG files to our label website and then label which paths should be animated

The selected svg files containing paths in "../data/interim/logos_paths_selected" are uploaded to the [label website](https://animate-logos.web.app/label-paths.html). After they have been labelled by enough persons, the data can be downloaded in the next step.

## III. Download the labelled dataset from the website and preprocess them to be used to train a model

We have to download the data from the websites database and then prepare them for the modelling.

In [None]:
from src.data.interact_with_website_database import connect_to_firestore, retrieve_documents_from_collection
from src.data.aggregate_path_label import aggregate_path_animation_decisions_label

In [None]:
firestore_client = connect_to_firestore()

In [None]:
logos_dataset_paths_animation_decisions = retrieve_documents_from_collection(firestore_client,
                                                                                 collection="labelpath")
logos_dataset_paths_animation_decisions.to_csv(
        "../../data/interim/logos_paths_animation_decision_label/logos_dataset_paths_animation_decisions.csv")

In [None]:
aggregate_path_animation_decisions_label(path_decision_labels_dataset=
                                   "../../data/interim/logos_paths_animation_decision_label"
                                   "/logos_dataset_paths_animation_decisions.csv",
                                   matching_filenames_canva_dataset="../../data/interim/logos_paths_information"
                                                                    "/matching_filenames_canva.csv",
                                   matching_filenames_designer_dataset="../../data/interim"
                                                                       "/logos_paths_information"
                                                                       "/matching_filenames_designer.csv"):