Skip to content
/ Fred Public

A fast, scalable and light-weight C++ Fréchet and DTW distance library, exposed to python and focused on clustering of polygonal curves.

License

Notifications You must be signed in to change notification settings

derohde/Fred

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

97 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Fred alt text

A fast, scalable and light-weight C++ Fréchet and DTW distance library, exposed to python and focused on clustering of polygonal curves.

DISTANCE FUNCTION FOR CLUSTERING CAN BE CHOSEN (heuristic in case of DTW)

CURVE AND CURVES CAN NOW BE PICKLED

Ingredients

import Fred as fred

  • for verbosity, set fred.config.verbosity, default is 0, possible values 0,1,2,3

Number of Threads

By default, Fred will automatically determine the number of threads to use. If you want to set an upper limit, set fred.config.number_threads. Set to -1 to enable dynamic mode again.

Curve

  • signature: fred.Curve(np.ndarray), fred.Curve(np.ndarray, str name)
  • properties: fred.Curve.values: curves as np.ndarray, fred.Curve.name: get name of curve, fred.Curve.dimensions: dimension of curve, fred.Curve.complexity: number of points of curve

Curves

  • signature: fred.Curves()
  • methods: fred.Curves.add(curve): add curve, fred.Curves[i]: get ith curve, len(fred.Curves): number curves, fred.Curves + fred.Curves: add two sets of curves, fred.Curves.simplify(l): return set of simplified curves
  • properties: fred.Curves.m: maximum complexity of the contained curves, fred.Curves.values: curves as np.ndarray

continous Fréchet distance

  • signature: fred.continuous_frechet(curve1, curve2)
  • returns: fred.Continuous_Frechet_Result with members value, time_bounds: running-time for upper and lower bound, number_searches: number of free space diagrams built, time_searches: running-time for free spaces
continuous Fréchet distance config
  • approximation error in percent of distance: fred.config.continuous_frechet_error, which defaults to 1

discrete Fréchet distance

  • signature: fred.discrete_frechet(curve1, curve2)
  • returns: fred.Discrete_Frechet_Result with members value and time

discrete dynamic time warping distance

  • signature: fred.discrete_dynamic_time_warping(curve1, curve2)
  • returns: fred.Discrete_Dynamic_Time_Warping_Distance with members value and time

Curve Simplification

All simplifications are vertex-restricted!

Frechet distance

minimum error simplification
approximate minimum link simplification
approximate minimum error simplification
  • binary search on fred.frechet_approximate_minimum_link_simplification
  • signature: fred.frechet_approximate_minimum_error_simplification(fred.Curve, int complexity)
  • returns: fred.Curvethat uses input curves vertices, with complexity number of vertices and that has small distance to input curve

Dynamic Time Warping

approximate minimum error simplification

Clustering

Distance storing

  • Set fred.config.use_distance_matrix to False if already computed distances should not be stored. This makes sense for massive data sets, especially when so much memory is consumed that the OS kills the process.

Underlying distance function

The parameter distance_func controls which distance function to use. Possible values:

  • 0: continuous Fréchet distance
  • 1: discrete Fréchet distance
  • 2: discrete dynamic time warping distance (algorithms are then of heuristic nature)

Consecutive call option

The parameter consecutive_call controls whether distances and simplifications already computed in a previous clustering call are resused (set to true then); defaults to false. Has no effect when fred.config.use_distance_matrix == False.

discrete (k,l)-center clustering (continuous Fréchet)

  • from Approximating (k,l)-center clustering for curves
  • signature: fred.discrete_klcenter(int k, int l, fred.Curves, bool local_search, bool consecutive_call, bool random_first_center, bool fast_simplification, int distance_func) with parameters
    • k: number of centers
    • l: maximum complexity of the centers
    • local_search: number of iterations of local search to improve solution, defaults to 0
    • random_first_center: determines if first center is chosen uniformly at random or first curve is used as first center, optional, defaults to true
    • fast_simplification: determines whether to use the minimum error simplification or the faster approximate minimum error simplification, defaults to false
  • returns: fred.Clustering_Result with mebers
    • value: objective value
    • time: running-time
    • assignment: empty if compute_assignment has not been called

discrete (k,l)-median clustering (continuous Fréchet)

  • Algorithm from section 4.3 in Geometric Approximation Algorithms + simplification
  • signature: fred.discrete_klmedian(int k, int l, fred.Curves, bool consecutive_call, bool fast_simplification, int distance_func) with parameters
    • k: number of centers
    • l: maximum complexity of the centers
    • fast_simplification: determines whether to use the minimum error simplification or the faster approximate minimum error simplification, defaults to false
  • returns: fred.Clustering_Result with mebers
    • value: objective value
    • time: running-time
    • assignment: empty if compute_assignment has not been called

Clustering Result

  • signature: fred.Clustering_Result
  • methods:
    • len(fred.Clustering_Result): number of centers
    • fred.Clustering_Result[i]: get ith center
    • fred.Clustering_Result.compute_assignment(fred.Curves, bool consecutive_call): assigns every curve to its nearest center with parameter consecutive_call, which defaults to false; set to true, if you want to assign the curves used for clustering
    • fred.Clustering_Result.optimize(fred.Curves, bool consecutive_call): (heuristically) optimizes cluster centers using a stabbing algorithm
  • members:
    • value: objective value
    • time: running-time
    • assignment: empty if compute_assignment was not called

Cluster Assignment

  • signature: fred.Cluster_Assignment
  • methods:
    • len(fred.Cluster_Assignment): number of centers
    • fred.Cluster_Assignment.count(i): number of curves assigned to center i
    • fred.Cluster_Assignment.get(i,j): get index of jth curve assigned to center i
    • fred.Cluster_Assignment.distance(i,j): get distance of jth curve assigned to center i to center i

Dimension Reduction via Gaussian Random Projection

Installation

Requirements

You have to have installed:

  • C++ compiler
  • CMake
  • git
  • optional: openmp available (should be a part of your compiler)

That's it!

Installing on Windows

It's best to use the Ubuntu subsystem, which is easy to install through powershell. Once you have this installed, follow this procedure:

sudo apt update 
sudo apt upgrade
sudo apt install python3-pip cmake git gcc
pip3 install --user fred-frechet

Installing on Mac

Apple's clang does not really support openmp, which is now kind of an integral part of Fred. You can try to install libomp via homebrew, but the best way would be to get a virtual ubuntu running. Fred does install out of the box on Mac though, but without omp support.

Linux/Unix Installation Procedure

  • Variant 1: simply run pip install Fred-Frechet
  • Variant 2: clone repository and run make for installation into userdir

Test

Just run python py/test.py.

Mini Example

import Fred as fred
import numpy as np
import pandas as pd

curve1d = fred.Curve([1., 2.]) # Curve stores a polygonal curve with 
                                         # at least two points of at least one 
                                         # and equal number of dimensions

curve2d1 = fred.Curve([[1., 0.], [2., 1.], [3., 0.]]) # any number of dimensions and points works
curve2d2 = fred.Curve([[1., -1.], [2., -2.], [3., -1.]], "optional name, e.g. displayed in plot") 

print(curve2d1)

fred.plot_curve(curve2d1, curve2d2)
fred.plot_curve(curve2d2, fred.frechet_minimum_error_simplification(curve2d2, 2))

print("distance is {}".format(fred.continuous_frechet(curve2d1, curve2d2).value))

print("download HUGE curves") 

import requests, zipfile, io             # you can use all libraries 
                                         # that work with numpy to read data into fred

re = requests.get("https://archive.ics.uci.edu/ml/machine-learning-databases/00447/data.zip", stream=True)
zf = zipfile.ZipFile(io.BytesIO(re.content))

ps1 = fred.Curve(pd.read_csv(zf.open('PS1.txt'), delimiter="\t", header=None).values[:50], "PS1")
ps2 = fred.Curve(pd.read_csv(zf.open('PS2.txt'), delimiter="\t", header=None).values[:50], "PS2")
ps3 = fred.Curve(pd.read_csv(zf.open('PS3.txt'), delimiter="\t", header=None).values[:50], "PS3")
ps4 = fred.Curve(pd.read_csv(zf.open('PS4.txt'), delimiter="\t", header=None).values[:50], "PS4")
ps5 = fred.Curve(pd.read_csv(zf.open('PS5.txt'), delimiter="\t", header=None).values[:50], "PS5")
ps6 = fred.Curve(pd.read_csv(zf.open('PS6.txt'), delimiter="\t", header=None).values[:50], "PS6")

curves = fred.Curves() # for clustering or if you want to apply dimension reduction
                       # you need to encapsulate your curves in a Curves object
              
curves.add(ps1)
curves.add(ps2)
curves.add(ps3)
curves.add(ps4)
curves.add(ps5)
curves.add(ps6)

fred.plot_curve(curves)

curves = fred.dimension_reduction(curves, 0.95) # fred is pretty fast but with high dimensional data
                                                # a dimension reduction massively improves running-time
                                                # even for smaller values of epsilon
                                                
fred.plot_curve(curves)

# Oneshot clustering - if you already know the value of k

clustering = fred.discrete_klcenter(2, 10, curves) # fast but coarse

clustering = fred.discrete_klmedian(2, 10, curves) # slow but better results

print("clustering cost is {}".format(clustering.value))

for i, center in enumerate(clustering):
    print(f"center {i} is {center}")

fred.plot_curve(clustering)

# Multiple clustering calls - if you need to find a suitable value for k

for k in range(2, 6):
    
    clustering = fred.discrete_klcenter(k, 10, curves, consecutive_call=True)
    print(f"clustering cost is {clustering.value}")
    
    clustering = fred.discrete_klmedian(k, 10, curves, consecutive_call=True)
    print(f"clustering cost is {clustering.value}")

clustering.compute_assignment(curves, consecutive_call=True) # use consecutive_call = False when computing assignment for curves other
                                                             # than those used for computing the clustering

for i in range(len(clustering)):
    for j in range(clustering.assignment.count(i)):
        print(f"{curves[clustering.assignment.get(i,j)].name} was assigned to center {clustering[i].name}")

clustering.optimize_centers(curves, consecutive_call = True) # uses stabbing to get better centers

fred.plot_clustering(clustering, curves)

Visual Example

distance_func = 0 # try 2 for DTW

import Fred as fred

curves = fred.Curves()
curves.add(fred.Curve([[-10.93943942, -37.06242838],[-10.93944232, -37.06256992],[-10.93944232, -37.06256992],[-10.93944232, -37.06256992],[-10.93899402, -37.06280406],[-10.93860314, -37.06296434],[-10.93835807, -37.06278978],[-10.93829299, -37.06237239],[-10.93854427, -37.06176238],[-10.93889765, -37.06105771],[-10.9392863 , -37.06034013],[-10.93958482, -37.0597757 ],[-10.93981259, -37.05935465],[-10.94003912, -37.0589944 ],[-10.94029375, -37.05890855],[-10.94044568, -37.05899324],[-10.94071019, -37.05920933],[-10.94091498, -37.05938079],[-10.94092187, -37.05938262],[-10.94111489, -37.05955812],[-10.94143233, -37.05984643],[-10.94189287, -37.06029526],[-10.94234691, -37.06074624],[-10.9425804 , -37.06096802],[-10.94263953, -37.06101946],[-10.94263953, -37.06101946],[-10.94263953, -37.06101946],[-10.94266097, -37.0610417 ],[-10.94286097, -37.0612328 ],[-10.94315305, -37.06155233],[-10.94350765, -37.06192153],[-10.94410138, -37.0621534 ],[-10.94468481, -37.06230469],[-10.94527759, -37.06251654],[-10.94594673, -37.06281271],[-10.94671859, -37.06308869],[-10.94730157, -37.06335312],[-10.94778644, -37.06381369],[-10.94844587, -37.06445785],[-10.94904832, -37.06500803],[-10.94979403, -37.06571115],[-10.9504376 , -37.06632879],[-10.95101859, -37.06687168],[-10.95164801, -37.06724984],[-10.95194617, -37.06738994],[-10.95245685, -37.06766077],[-10.95311692, -37.06801289],[-10.95373597, -37.06834647],[-10.95425643, -37.06861906],[-10.95456037, -37.06876904],[-10.9551056 , -37.06906634],[-10.95571837, -37.06937444],[-10.95645998, -37.06951516],[-10.95722549, -37.06945968],[-10.9582645 , -37.06937347],[-10.959267  , -37.06930038],[-10.96033403, -37.06924749],[-10.9616459 , -37.069179  ],[-10.96270941, -37.06910438],[-10.96397319, -37.06903497],[-10.96497853, -37.06897429],[-10.9656844 , -37.06850669],[-10.96592044, -37.06768898],[-10.96610809, -37.06688631],[-10.96630181, -37.06600845],[-10.96646721, -37.06534033],[-10.96698213, -37.06499147],[-10.96764198, -37.06528425],[-10.96814327, -37.06557494],[-10.9687186 , -37.06599832],[-10.96927369, -37.0664603 ],[-10.96945204, -37.06660118],[-10.96959754, -37.06671691],[-10.96985275, -37.0669367 ],[-10.97016947, -37.0671957 ],[-10.97041451, -37.06739896],[-10.97048234, -37.06735379],[-10.97032484, -37.06715337],[-10.97020309, -37.06704726],[-10.97024076, -37.066873  ],[-10.97046391, -37.0665994 ],[-10.97074272, -37.06632107],[-10.97105451, -37.06647656],[-10.97140669, -37.06677885],[-10.97186827, -37.06717748],[-10.97201951, -37.06732553],[-10.97215358, -37.06744312],[-10.9723854 , -37.06761853],[-10.97258704, -37.06777718],[-10.97286671, -37.067999  ],[-10.97318196, -37.06824471],[-10.97329008, -37.06835596],[-10.97331396, -37.0683913 ],[-10.97331396, -37.0683913 ],[-10.97340625, -37.06848176],[-10.97355518, -37.06857843],[-10.97388026, -37.06881238],[-10.97409123, -37.06894244],[-10.97418474, -37.06886148],[-10.97428593, -37.0687222 ],[-10.97436295, -37.06863122],[-10.97442223, -37.06855982],[-10.97447997, -37.06848969],[-10.97454607, -37.06841423],[-10.97467334, -37.06824215],[-10.97479694, -37.06806764],[-10.97487819, -37.06798559],[-10.97494334, -37.06790972],[-10.97494334, -37.06790972],[-10.97496155, -37.06788944],[-10.97496155, -37.06788944],[-10.97496155, -37.06788944],[-10.97496155, -37.06788944]]))
curves.add(fred.Curve([[-10.90889296, -37.05237154],[-10.90904785, -37.05260632],[-10.90973536, -37.05266121],[-10.91010518, -37.05261827],[-10.91000414, -37.05256681],[-10.91048646, -37.05260013],[-10.91135943, -37.05269498],[-10.91229152, -37.05277322],[-10.91342033, -37.05270886],[-10.91431659, -37.05278532],[-10.91506897, -37.05266128],[-10.91544909, -37.05242861],[-10.91593664, -37.05267365],[-10.91667623, -37.05265982],[-10.91748816, -37.05281022],[-10.91801454, -37.05269153],[-10.91787201, -37.05279041],[-10.91785134, -37.05282591],[-10.91792023, -37.052902  ],[-10.91837675, -37.05296211],[-10.91850017, -37.0533763 ],[-10.91849212, -37.05358395],[-10.91849799, -37.05357595],[-10.9184368 , -37.05413224],[-10.91850316, -37.05443409],[-10.91850858, -37.05446375],[-10.9185052 , -37.05446315],[-10.91839066, -37.05517003],[-10.91841456, -37.05612296],[-10.9185024 , -37.05687158],[-10.9181955 , -37.05773603],[-10.91861413, -37.05873733],[-10.91855651, -37.05895014],[-10.91835643, -37.05889317],[-10.91854341, -37.05965251],[-10.91848843, -37.06081932],[-10.91843979, -37.06219825],[-10.91841746, -37.06266105],[-10.91837862, -37.06279615],[-10.91831496, -37.06300964],[-10.9183473 , -37.06299321],[-10.91837899, -37.06298774],[-10.91837791, -37.06298758],[-10.918378  , -37.06298698],[-10.91837822, -37.06298652],[-10.91841453, -37.06321011],[-10.91847828, -37.06394916],[-10.91861829, -37.06397188],[-10.91861297, -37.06398652],[-10.91861178, -37.06398627],[-10.91861355, -37.06398464],[-10.91861037, -37.06398528],[-10.91861033, -37.06398535],[-10.91861101, -37.06398616],[-10.91861229, -37.06398567],[-10.91861452, -37.06398519],[-10.91852107, -37.06433786],[-10.91854463, -37.06535452],[-10.91862404, -37.06630359],[-10.91879595, -37.06735107],[-10.91863274, -37.06821768],[-10.91867692, -37.06827064],[-10.91864985, -37.06846654],[-10.91861474, -37.06844821],[-10.91861158, -37.06845233],[-10.91861335, -37.06844958],[-10.91863853, -37.06863415],[-10.91866434, -37.06936379],[-10.91857642, -37.06997517],[-10.91855849, -37.07080263],[-10.91839213, -37.07178178],[-10.91835998, -37.07254682],[-10.91817896, -37.07257863],[-10.91805938, -37.07302097],[-10.91824085, -37.07327488],[-10.91825536, -37.07351718],[-10.9182579 , -37.07365122],[-10.91824237, -37.07365167],[-10.91824648, -37.07364797],[-10.9182473 , -37.07364716],[-10.91824587, -37.07364717],[-10.9182463 , -37.07364646],[-10.91824667, -37.07364626],[-10.91824711, -37.07364598],[-10.91829794, -37.07417675],[-10.91832389, -37.07471123],[-10.91834816, -37.07553953],[-10.91838258, -37.07596903],[-10.91830836, -37.076009  ],[-10.91831556, -37.07602092],[-10.91835704, -37.07625065],[-10.91840705, -37.07718778],[-10.91836099, -37.07770235],[-10.91843053, -37.07811014],[-10.91859487, -37.07820678],[-10.91832849, -37.07835976],[-10.91841839, -37.07871173],[-10.91858561, -37.07912875],[-10.91853693, -37.08029196],[-10.91847022, -37.08132272],[-10.91864122, -37.08232053],[-10.91878421, -37.08390708],[-10.91902161, -37.08549171],[-10.91957646, -37.08678353],[-10.91990603, -37.08762652],[-10.92011327, -37.08811781],[-10.92026249, -37.08852046],[-10.92027593, -37.08896067],[-10.92028812, -37.08971096],[-10.92086275, -37.09022295],[-10.92119806, -37.09125864],[-10.92158668, -37.09245244],[-10.9219998 , -37.09341251],[-10.92227813, -37.09389165],[-10.92251669, -37.09491899],[-10.9227356 , -37.09633969],[-10.92225014, -37.09771002],[-10.92184819, -37.09906076],[-10.92144139, -37.09987985],[-10.92144481, -37.09995651],[-10.92130557, -37.10039003],[-10.92115916, -37.1016734 ],[-10.92137987, -37.10308411],[-10.92125626, -37.10455606],[-10.92161594, -37.10520467],[-10.92292039, -37.10514096],[-10.92374638, -37.10509667],[-10.92379817, -37.10503713],[-10.92420939, -37.10468552]]))
curves.add(fred.Curve([[-10.90885896, -37.05223934],[-10.90887542, -37.05237009],[-10.90888581, -37.05236599],[-10.9088792 , -37.05236119],[-10.908879  , -37.0523613 ],[-10.90887897, -37.05236133],[-10.90887908, -37.05236184],[-10.90887925, -37.05236193],[-10.90887916, -37.05236232],[-10.90888031, -37.05238124],[-10.90901423, -37.05261118],[-10.90985044, -37.05264657],[-10.9107936 , -37.05260385],[-10.91128793, -37.05269182],[-10.91130952, -37.05271182],[-10.91131643, -37.05271003],[-10.91131638, -37.05270946],[-10.91131644, -37.05270941],[-10.91143227, -37.05271666],[-10.91178154, -37.05270394],[-10.91236017, -37.05273264],[-10.91239461, -37.05271299],[-10.91241785, -37.05269624],[-10.91241363, -37.0526976 ],[-10.91241079, -37.05269947],[-10.91287302, -37.05276819],[-10.91292392, -37.05277988],[-10.91327659, -37.05276669],[-10.91415117, -37.05275218],[-10.91495188, -37.05273417],[-10.91496974, -37.05215649],[-10.9149677 , -37.05136187],[-10.91497338, -37.05058764],[-10.91496995, -37.05030794],[-10.91496585, -37.04945366],[-10.91495289, -37.04837212],[-10.91492666, -37.04793266],[-10.9154707 , -37.04776878],[-10.9164315 , -37.04755438],[-10.91754675, -37.04716886],[-10.91769218, -37.04709118],[-10.91810692, -37.0469302 ],[-10.91830395, -37.0470265 ],[-10.91830991, -37.0471823 ],[-10.9183291 , -37.04789938],[-10.91833195, -37.04793058],[-10.91834546, -37.04793107],[-10.91834466, -37.04793116],[-10.91835246, -37.04833939],[-10.91834766, -37.04877383],[-10.91838263, -37.04880066],[-10.91838436, -37.04879317],[-10.91838504, -37.04883652],[-10.9183568