Skip to content

Latest commit

 

History

History
47 lines (35 loc) · 3.97 KB

README.md

File metadata and controls

47 lines (35 loc) · 3.97 KB

Dataset Source and Preprocessing Steps

Preprocessing

  • preprocessing.py

    1. Preprocesses the dataset
    2. Derives its characteristics
    3. (Optional) Partitions the dataset into train/validation/test
  • E.g. python3 preprocessing.py -i "ML-100K.txt" -p 1

  • You can preprocess additional datasets and/or perform your own version of preprocessing

Characteristics

Euclidean Distance

  • euclidean_distance.py
    1. Derives the pairwise Euclidean distance between every pair of datasets based on their characteristics
    2. Generates a simple visualisation

Dataset Similarities (Euclidean Distance)

Clustering

Datasets & Clustering Visualisation with t-SNE (5 Clusters)

Clustering