forked from Rudd-O/python-audioprocessing
-
Notifications
You must be signed in to change notification settings - Fork 0
/
TODO
116 lines (87 loc) · 7.19 KB
/
TODO
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
Things to do:
* asserts() everywhere if the file isnt 44100 because we are not prepared to deal with that
* improve algorithm to find audio onset so that two byte-per-byte identical songs but
with different offsets always report the same offset. Verify this in audacity, by
overlapping the two songs with corrected offset. if the algorithm worked
thse songs with different offsets should correlate even more highly (in theory,
1.0) than the current correlations
* if only one spectrum analyzed in plot(), use a 2D plot instead of a 3D one
* if using 3D plot, rotate so that the detail of the spectrogram is visible instead
of having the time series obscure subsequent spectrums
* amarok plugin to plot spectrums of selected songs
* "replace this song in playlist with better quality version?"
* "show duplicates, add duplicates to playlist"
* move plotting code to a separate module to save RAM by not loading GUI
toolkits and matplotlib
* document and clean up the library
* there is a bug in the wave module -- mono files's .getnchannels() method returns 2
* write an algorithm that combines the phase beat detector and a time-domain beat detector
for best results, and write a program that uses it and writes the TBPM tag on mp3
* now that we know the FFT code is solid, we can perhaps apply the bark scale to get the
appropriate buckets from the FFT, so we can compare songs
* for a song comparison plugin for Amarok, think about the database structure that will
contain the signatures and also cached correlations between them
* make a set of command-line tools that allow to analyze and cache those results, then
perform correlations, right from the command line, without the need for Amarok or
programming expertise
* re-enable support for correlating less than two minutes of audio (disparity in
vector sizes for the input of the correlate() method)
* practice fadvise or find a way to decode an mp3 through pipes, sigpipe or stop the
decoding as soon as it gets to a certain amount of data, read the data chunked, and
also correctly determine sample rate (mpg321 pipe fails to do correct sample rate
on several input songs)
* use streams to process data as fast as possible and provide a streams abstraction
to reduce the amount of RAM required to process and to avoid temporary files
altogether
* investigate use of float32 to reduce RAM usage during analysis. investigate precision
impact on correlation using float32/16 instead of float64.
* investigate appropriate methods to serialize values. pit pickle binary against
Google pbuffers and others. goal: minute data size without appreciable
precision loss, fastest correlation computing.
* investigate whether a central server provides a good performance advantage to
offload correlations and take advantage of a huge database of correlations. naturally
each feature extraction sent to the server must be keyed against a unique ID
generated by the music player itself. investigate architecture of said server.
* reduce RAM usage by loading chunk-by-chunk instead of the whole 2 minutes in RAM
* improve performance by vectorizing processing as much as possible
andufo:
pero de que te sirve la DB si no tienes la musica en el servidor
Rudd-O 22:51:56
de mucho
con la DB puedes ofrecer servicios de autotagging de canciones a organizaciones
como radios, etc, uff
puedes recomendar a la gente en la pagina web directamente que compre albums de amazon con affiliate systems
puedes saltar a pagerank 2 jajaja, si es un buen servicio
solo en ads y en affiliate stuff hay uffffffffffff
y como funcionaria sin mayor intervencion de los usuarios, puedes hacer maravillas sin esfuerzo humano
podrias recibir 20 submissions de la misma cancion de varios usuarios, y el sistema haria una eleccion automatica para decidir cuales meta tags estan correctos, y decirle a esos usuarios "hey, esta mal tageada tu cancion, corregir?"
podrfias sugerirle a la gente "esta cancion es de mala cali
standalone
* find duplicates in collection
* discern between complete and incomplete songs
* normalize statistics
with central server
* batch download fingerprints from central server, keyed on an md5 hash such as amarok's -- if fingerprint does not exist, computer does it
* computer also submits correlations to central server
* tag songs based on fingerprint (if exact match, there you go, if not, correlation to find out which one, ranked on corr strength)
* find which other albums "out there in the cloud" contain the song in question
we need to find a way to keep clean data in the server
* caching of the fingerprints? inside an mp3 tag? inside a local database?
* caching of the correlations? keyed by md5sum? keyed by file path?
allies
* matplotlib experts to beautify displays
* we need to write a function that will correlate a bunch of songs and plot the correlations workbench.py-style
* someone with audio experience to improve the audio onset algorithm
* someone with vectorized computing experience to improve the correlation speed
Several interesting possibilities follow from that (stage 1):
- Amarok can auto-identify duplicates and normalize ratings, play counts and scores) for the duplicates. This is useful for those of us who prefer full albums. Also, no duplicates and normalized statistics mean that the true favorites really bubble up this time.
- Amarok can avoid putting duplicates in your portable devices, where space is a concern.
- You can now make UI to weed duplicates out easily. This is useful for those who prefer single tracks and no duplicates.
- Amarok could tell you "this track is already in your collection under a different album" upon UI actions, such as "copy to collection".
How, how is this better than using MusicBrainz and the like? Simple, you don't need a central server or unique IDs for tracks (MusicDNS which is the partner of MusicBrainz works that way, it has a series of technical problems that make it less reliable than Butterscotch). All you need is computation power, which your computer has aplenty. MusicBrainz also has problems with false positives and the like. And you can decode any format, not just the formats that a closed source library can.
But if you throw a central server into the mix (stage 2), this gets much better:
- Amarok can auto-submit tags+fingerprints+amarokuniqueid to the server.
- Amarok can avoid computing the fingerprint by asking the server for the fingerprint corresponding to an uniqueid.
- Amarok can auto-tag songs based on exact matches of the fingerprints, AND it can also show the user alternatives in case of non-exact matches (correlations > 0.9 in the current definition of the algorithm) which should be very few. Imagine being able to complete incomplete tags from a database vetted by majority "vote" (submission, see below).
- We can build the definitive music encyclopedia all by ourselves. We can use that data to provide very accurate tagging since our service would know, mathematically, which tags correspond to which fingerprints. We can also make it very user-participative, allowing user submissions to grow the database and letting the user say "no, this info is wrong" so we get a self-correcting encyclopedia with minimal user or admin intervention.
- We can make the service *completely anonymous*.