Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Newer
Older
100644 471 lines (363 sloc) 17.064 kB
1c02f0c Bio.sequtils and Bio.SeqUtils were duplicated code, and even worse we…
chapmanb authored
1 This file provides documentation for modules in Biopython that have been moved
2 or deprecated in favor of other modules. This provides some quick and easy
3 to find documentation about how to update your code to work again.
4
9a188e0 @peterjc Colour/color support (mostly comment changes, but a couple of codes c…
peterjc authored
5 Bio.Graphics.GenomeDiagram and colour/color
6 ===========================================
7 GenomeDiagram originally used colour (UK spelling of color) for argument names.
8 For its integration into Biopython 1.50, this will support both colour and color,
9 to help people port existing scripts written for the standalone version of
10 GenomeDiagram. However, we do intend to deprecate and then eventually remove
11 support for colour in later releases of Biopython.
12
13 Bio.AlignAce and Bio.MEME
14 =========================
15 As of Biopython 1.50, these modules are considered to be obsolete with the
16 introduction of Bio.Motif, and will be deprecated in a future release.
17
2669c4d - Updates to move from Numeric python to NumPy. Python modules have b…
chapmanb authored
18 Numeric support
19 ===============
41a4497 @peterjc Declaring Bio.Transcribe and Bio.Translate as obsolete and likely to …
peterjc authored
20 Following the release of 1.48, Numeric support in Biopython is discontinued.
8b4babf @peterjc Deprecating Bio.mathfns and Bio.stringfns
peterjc authored
21 Please move to NumPy for Biopython 1.49 or later.
2669c4d - Updates to move from Numeric python to NumPy. Python modules have b…
chapmanb authored
22
cfd18f5 @peterjc Making the Seq object's .data into a new style property with a warnin…
peterjc authored
23 Bio.Seq
24 =======
25 Direct use of the Seq object (and MutableSeq object) .data property is discouraged.
26 As of release 1.49, writing to the Seq object's .data property triggers a warning,
27 and this property is likely to be made read only in the next release.
28
41a4497 @peterjc Declaring Bio.Transcribe and Bio.Translate as obsolete and likely to …
peterjc authored
29 Bio.Transcribe and Bio.Translate
30 ================================
31 Declared obsolete in Release 1.49.
32 Please use the methods or functions in Bio.Seq instead.
33
1255664 @peterjc Declaring Bio.mathfns, Bio.stringfns, Bio.listfns and their C impleme…
peterjc authored
34 Bio.mathfns, Bio.stringfns and Bio.listfns (and their C code variants)
35 ==========================================
36 Declared obsolete in Release 1.49.
8b4babf @peterjc Deprecating Bio.mathfns and Bio.stringfns
peterjc authored
37 Bio.mathfns and Bio.stringfns were deprecated in Release 1.50
38 (the deprecation of Bio.listfns is still pending)
1255664 @peterjc Declaring Bio.mathfns, Bio.stringfns, Bio.listfns and their C impleme…
peterjc authored
39
04cfde9 @peterjc Mention Bio.distance deprecation and Bio.cdistance removal (see Bug 2…
peterjc authored
40 Bio.distance (and Bio.cdistance)
41 ================================
42 Bio.distance was deprecated in Release 1.49, at which point its C code
43 implementation Bio.cdistance was removed (this was not intended as a public
44 API).
45
db96eda @peterjc Deprecating Bio.Ndb as the website this parsed has been redesigned.
peterjc authored
46 Bio.Ndb
47 =======
48 Deprecated in Release 1.49, as the website this parsed has been redesigned.
49
fb76593 @peterjc Labelling Martel and Bio.Mindy as obsolete, updating the news about t…
peterjc authored
50 Martel
51 ======
bbfff83 @peterjc Removing Bio.ECell which was deprecated in Biopython 1.47 (also added…
peterjc authored
52 Declared obsolete in Release 1.48, deprecated in Release 1.49.
fb76593 @peterjc Labelling Martel and Bio.Mindy as obsolete, updating the news about t…
peterjc authored
53
54 Bio.Mindy
55 =========
bbfff83 @peterjc Removing Bio.ECell which was deprecated in Biopython 1.47 (also added…
peterjc authored
56 Declared obsolete in Release 1.48, deprecated in Release 1.49.
76300d6 @peterjc Updates include deprecation of Martel/Mindy
peterjc authored
57
1b6c598 @peterjc Deprecating Bio.DBXRef which was used in the Bio.builders Martel pars…
peterjc authored
58 Bio.builders, Bio.Std, Bio.StdHandler, Bio.Decode and Bio.DBXRef
59 ================================================================
60 Part of the Martel/Mindy parsing infrastructure, these were deprecated in
61 Release 1.49.
fb76593 @peterjc Labelling Martel and Bio.Mindy as obsolete, updating the news about t…
peterjc authored
62
fd2e06b @peterjc Adding minimal module docstrings for the deprecated Bio.Writers/Bio.w…
peterjc authored
63 Bio.Writer and Bio.writers
64 ==========================
bbfff83 @peterjc Removing Bio.ECell which was deprecated in Biopython 1.47 (also added…
peterjc authored
65 Deprecated in Release 1.48.
fd2e06b @peterjc Adding minimal module docstrings for the deprecated Bio.Writers/Bio.w…
peterjc authored
66
6ce5052 @peterjc Bio.Emboss.Primer was deprecated in Biopython 1.48
peterjc authored
67 Bio.Emboss.Primer
68 =================
69 Deprecated in Release 1.48, this parser was replaced by Bio.Emboss.Primer3 and
70 Bio.Emboss.PrimerSearch instead.
71
fb76593 @peterjc Labelling Martel and Bio.Mindy as obsolete, updating the news about t…
peterjc authored
72 Bio.MetaTool
73 ============
bbfff83 @peterjc Removing Bio.ECell which was deprecated in Biopython 1.47 (also added…
peterjc authored
74 Deprecated in Release 1.48, this was a parser for the output of MetaTool 3.5
fb76593 @peterjc Labelling Martel and Bio.Mindy as obsolete, updating the news about t…
peterjc authored
75 which is now obsolete.
76
81015eb @peterjc Declaring Bio.PubMed and the online parts of Bio.GenBank as OBSOLETE,…
peterjc authored
77 Bio.GenBank
78 ===========
79 The online functionality (search_for, download_many, and NCBIDictionary) was
5b78a4e @peterjc Deprecating the online bits of Bio.GenBank in favour of Bio.Entrez (t…
peterjc authored
80 declared obsolete in Release 1.48, and deprecated in Release 1.50.
81 Please use Bio.Entrez instead.
81015eb @peterjc Declaring Bio.PubMed and the online parts of Bio.GenBank as OBSOLETE,…
peterjc authored
82
83 Bio.PubMed
84 ==========
558b4c0 @peterjc Deprecating Bio.PubMed in favour of Bio.Entrez
peterjc authored
85 Declared obsolete in Release 1.48, deprecated in Release 1.49.
86 Please use Bio.Entrez instead.
81015eb @peterjc Declaring Bio.PubMed and the online parts of Bio.GenBank as OBSOLETE,…
peterjc authored
87
01e6e76 @peterjc Bio.EUtils deprecated in favour of Bio.Entrez
peterjc authored
88 Bio.EUtils
89 ==========
bbfff83 @peterjc Removing Bio.ECell which was deprecated in Biopython 1.47 (also added…
peterjc authored
90 Deprecated in favor of Bio.Entrez in Release 1.48.
01e6e76 @peterjc Bio.EUtils deprecated in favour of Bio.Entrez
peterjc authored
91
4796600 @peterjc Noting API change for Bio.Sequencing
peterjc authored
92 Bio.Sequencing
93 ==============
94 A revised API was added and the old one deprecated in Biopython 1.48:
95 Bio.Sequencing.Ace.RecordParser --> Bio.Sequencing.Ace.read(handle)
96 Bio.Sequencing.Ace.Iterator --> Bio.Sequencing.Ace.parse(handle)
97 Bio.Sequencing.Phd.RecordParser --> Bio.Sequencing.Phd.read(handle)
98 Bio.Sequencing.Phd.Iterator --> Bio.Sequencing.Phd.parse(handle)
99
100
fa7ab2d @peterjc Updating for recent deprecations
peterjc authored
101 Bio.Blast.NCBIWWW
102 =================
bbfff83 @peterjc Removing Bio.ECell which was deprecated in Biopython 1.47 (also added…
peterjc authored
103 The HTML BLAST parser was deprecated as of Release 1.48.
104 The deprecated functions blast and blasturl were removed in Release 1.44.
fa7ab2d @peterjc Updating for recent deprecations
peterjc authored
105
106 Bio.Saf
107 =======
108 Deprecated as of Release 1.48, as it appears to have no users, and relies
bbfff83 @peterjc Removing Bio.ECell which was deprecated in Biopython 1.47 (also added…
peterjc authored
109 on Martel which doesn't work properly with mxTextTools 3.0.
fa7ab2d @peterjc Updating for recent deprecations
peterjc authored
110
ad46521 @peterjc Deprecating Bio.NBRF in favour of the 'pir' format in Bio.SeqIO
peterjc authored
111 Bio.NBRF
112 ========
113 Deprecated as of Release 1.48 in favor of the "pir" format in Bio.SeqIO
114
5be4221 @peterjc Deprecating Bio.IntelliGenetics in favour of the ig format in Bio.SeqIO
peterjc authored
115 Bio.IntelliGenetics
116 ===================
fa7ab2d @peterjc Updating for recent deprecations
peterjc authored
117 Deprecated as of Release 1.48 in favor of the "ig" format in Bio.SeqIO
d01c450 Getting ready for release 1.46.
mdehoon authored
118
890dada @peterjc Removing deprecated Bio.SeqIO submodules (code was moved under Bio.Al…
peterjc authored
119 Bio.SeqIO submodules PhylipIO, ClustalIO, NexusIO and StockholmIO
120 =================================================================
121 You can still use the "phylip", "clustal", "nexus" and "stockholm" formats
122 in Bio.SeqIO, however these are now supported via Bio.AlignIO, with the
123 old code deprecated in Releases 1.46 or 1.47, and removed in Release 1.49.
124
5e507c9 Updating for release 1.47.
mdehoon authored
125 Bio.ECell
126 =========
127 Deprecated as of Release 1.47, as it appears to have no users, and the code
bbfff83 @peterjc Removing Bio.ECell which was deprecated in Biopython 1.47 (also added…
peterjc authored
128 does not seem relevant for ECell 3. Removed in Release 1.49.
5e507c9 Updating for release 1.47.
mdehoon authored
129
d4f785b @peterjc Removing Bio.Ais which was deprecated in Biopython 1.45
peterjc authored
130 Bio.Ais
131 =======
132 Deprecated as of Release 1.45, removed in Release 1.49.
133
3f5ba50 @peterjc Removing Bio.LocusLink which was deprecated in Biopython 1.45 -- the …
peterjc authored
134 Bio.LocusLink
135 =============
136 Deprecated as of Release 1.45, removed in Release 1.49.
137 The NCBI's LocusLink was superseded by Entrez Gene.
138
edb3ac6 @peterjc Removing modules Bio.SGMLExtractor, Bio.CDD, Bio.Gobase and Bio.Rebas…
peterjc authored
139 Bio.SGMLExtractor
140 =================
141 Deprecated as of Release 1.46, removed in Release 1.49.
142
d01c450 Getting ready for release 1.46.
mdehoon authored
143 Bio.Rebase
144 ==========
edb3ac6 @peterjc Removing modules Bio.SGMLExtractor, Bio.CDD, Bio.Gobase and Bio.Rebas…
peterjc authored
145 Deprecated as of Release 1.46, removed in Release 1.49.
d01c450 Getting ready for release 1.46.
mdehoon authored
146
147 Bio.Gobase
148 ==========
edb3ac6 @peterjc Removing modules Bio.SGMLExtractor, Bio.CDD, Bio.Gobase and Bio.Rebas…
peterjc authored
149 Deprecated as of Release 1.46, removed in Release 1.49.
d01c450 Getting ready for release 1.46.
mdehoon authored
150
151 Bio.CDD
152 =======
edb3ac6 @peterjc Removing modules Bio.SGMLExtractor, Bio.CDD, Bio.Gobase and Bio.Rebas…
peterjc authored
153 Deprecated as of Release 1.46, removed in Release 1.49.
d01c450 Getting ready for release 1.46.
mdehoon authored
154
21059b1 @peterjc Bio.biblio was deprecated for Biopython 1.45, but I didn't remember t…
peterjc authored
155 Bio.biblio
156 ==========
b927439 @peterjc Bio.WWW deprecation, and updating old entries to say when they were r…
peterjc authored
157 Deprecated as of Release 1.45, removed in Release 1.48
21059b1 @peterjc Bio.biblio was deprecated for Biopython 1.45, but I didn't remember t…
peterjc authored
158
4556db2 @peterjc Bringing these up to date with changes since Biopython 1.44
peterjc authored
159 Bio.WWW
160 =======
b927439 @peterjc Bio.WWW deprecation, and updating old entries to say when they were r…
peterjc authored
161 The modules under Bio.WWW were deprecated in Release 1.45, and removed in 1.48.
162 The remaining stub Bio.WWW was deprecated in Release 1.48.
163
4556db2 @peterjc Bringing these up to date with changes since Biopython 1.44
peterjc authored
164 The functionality in Bio.WWW.SCOP, Bio.WWW.InterPro and Bio.WWW.ExPASy
165 is now available from Bio.SCOP, Bio.InterPro and Bio.ExPASy instead.
166
5145a4d @peterjc Bringing this up to date for Biopython 1.44
peterjc authored
167 Bio.SeqIO
168 =========
169 The old Bio.SeqIO.FASTA and Bio.SeqIO.generic were deprecated in favour of
bbfff83 @peterjc Removing Bio.ECell which was deprecated in Biopython 1.47 (also added…
peterjc authored
170 the new Bio.SeqIO module as of Release 1.44, removed in Release 1.47.
5145a4d @peterjc Bringing this up to date for Biopython 1.44
peterjc authored
171
fe10992 @peterjc Mentioning a few old modules deprecated in 1.44 and removed in 1.46
peterjc authored
172 Bio.Medline.NLMMedlineXML
173 =========================
bbfff83 @peterjc Removing Bio.ECell which was deprecated in Biopython 1.47 (also added…
peterjc authored
174 Deprecated in Release 1.44, removed in 1.46.
fe10992 @peterjc Mentioning a few old modules deprecated in 1.44 and removed in 1.46
peterjc authored
175
176 Bio.MultiProc
177 =============
bbfff83 @peterjc Removing Bio.ECell which was deprecated in Biopython 1.47 (also added…
peterjc authored
178 Deprecated in Release 1.44, removed in 1.46.
fe10992 @peterjc Mentioning a few old modules deprecated in 1.44 and removed in 1.46
peterjc authored
179
180 Bio.MarkupEditor
181 ================
bbfff83 @peterjc Removing Bio.ECell which was deprecated in Biopython 1.47 (also added…
peterjc authored
182 Deprecated in Release 1.44, removed in 1.46.
fe10992 @peterjc Mentioning a few old modules deprecated in 1.44 and removed in 1.46
peterjc authored
183
5145a4d @peterjc Bringing this up to date for Biopython 1.44
peterjc authored
184 Bio.lcc
185 =======
bbfff83 @peterjc Removing Bio.ECell which was deprecated in Biopython 1.47 (also added…
peterjc authored
186 Deprecated in favor of Bio.SeqUtils.lcc in Release 1.44, removed in 1.46.
5145a4d @peterjc Bringing this up to date for Biopython 1.44
peterjc authored
187
188 Bio.crc
189 =======
bbfff83 @peterjc Removing Bio.ECell which was deprecated in Biopython 1.47 (also added…
peterjc authored
190 Deprecated in favor of Bio.SeqUtils.CheckSum in Release 1.44, removed in 1.46.
5145a4d @peterjc Bringing this up to date for Biopython 1.44
peterjc authored
191
192 Bio.FormatIO
193 ============
bbfff83 @peterjc Removing Bio.ECell which was deprecated in Biopython 1.47 (also added…
peterjc authored
194 This was removed in Release 1.44 (a deprecation was not possible).
5145a4d @peterjc Bringing this up to date for Biopython 1.44
peterjc authored
195
fa7ab2d @peterjc Updating for recent deprecations
peterjc authored
196 Bio.expressions (and therefore Bio.config, Bio.dbdefs, Bio.formatdefs, Bio.dbdefs)
5145a4d @peterjc Bringing this up to date for Biopython 1.44
peterjc authored
197 ===============
bbfff83 @peterjc Removing Bio.ECell which was deprecated in Biopython 1.47 (also added…
peterjc authored
198 These were deprecated in Release 1.44, and removed in Release 1.49.
5145a4d @peterjc Bringing this up to date for Biopython 1.44
peterjc authored
199
200 Bio.Kabat
201 =========
bbfff83 @peterjc Removing Bio.ECell which was deprecated in Biopython 1.47 (also added…
peterjc authored
202 This was deprecated in Release 1.43 and removed in Release 1.44.
5145a4d @peterjc Bringing this up to date for Biopython 1.44
peterjc authored
203
34b4f31 Added the functions 'complement' and 'reverse_complement' to Bio.Seq'…
mdehoon authored
204 Bio.SeqUtils
205 ============
206 The functions 'complement' and 'antiparallel' in Bio.SeqUtils have been
76300d6 @peterjc Updates include deprecation of Martel/Mindy
peterjc authored
207 deprecated as of Release 1.31, and removed in Release 1.43.
208 Use the functions 'complement' and 'reverse_complement' in Bio.Seq instead.
34b4f31 Added the functions 'complement' and 'reverse_complement' to Bio.Seq'…
mdehoon authored
209
210 Bio.GFF
211 =======
212 The functions 'forward_complement' and 'antiparallel' in Bio.GFF.easy have been
76300d6 @peterjc Updates include deprecation of Martel/Mindy
peterjc authored
213 deprecated as of Release 1.31, and removed in Release 1.43.
214 Use the functions 'complement' and 'reverse_complement' in Bio.Seq instead.
efd9b60 Added blast to qblast change to DEPRECATED file
chapmanb authored
215
1c02f0c Bio.sequtils and Bio.SeqUtils were duplicated code, and even worse we…
chapmanb authored
216 Bio.sequtils
b0acc00 Added instructions on how to move to Bio.Cluster from Bio.kMeans and
mdehoon authored
217 ============
bbfff83 @peterjc Removing Bio.ECell which was deprecated in Biopython 1.47 (also added…
peterjc authored
218 Deprecated as of Release 1.30, removed in Release 1.42.
1c02f0c Bio.sequtils and Bio.SeqUtils were duplicated code, and even worse we…
chapmanb authored
219 Use Bio.SeqUtils instead.
b0acc00 Added instructions on how to move to Bio.Cluster from Bio.kMeans and
mdehoon authored
220
909bae9 Deprecated Bio.SVM and recommend usage of libsvm.
chapmanb authored
221 Bio.SVM
222 =======
bbfff83 @peterjc Removing Bio.ECell which was deprecated in Biopython 1.47 (also added…
peterjc authored
223 Deprecated as of Release 1.30, removed in Release 1.42.
909bae9 Deprecated Bio.SVM and recommend usage of libsvm.
chapmanb authored
224 The Support Vector Machine code in Biopython has been superceeded by a
225 more robust (and maintained) SVM library, which includes a python
226 interface. We recommend using LIBSVM:
227
228 http://www.csie.ntu.edu.tw/~cjlin/libsvm/
b0acc00 Added instructions on how to move to Bio.Cluster from Bio.kMeans and
mdehoon authored
229
23b046b Removed internal references to RecordFile, which are really not needed.
chapmanb authored
230 Bio.RecordFile
231 ==============
bbfff83 @peterjc Removing Bio.ECell which was deprecated in Biopython 1.47 (also added…
peterjc authored
232 Deprecated as of Release 1.30, removed in Release 1.42.
23b046b Removed internal references to RecordFile, which are really not needed.
chapmanb authored
233 RecordFile wasn't completely implemented and duplicates the work
41a4497 @peterjc Declaring Bio.Transcribe and Bio.Translate as obsolete and likely to …
peterjc authored
234 of most standard parsers.
23b046b Removed internal references to RecordFile, which are really not needed.
chapmanb authored
235
b0acc00 Added instructions on how to move to Bio.Cluster from Bio.kMeans and
mdehoon authored
236 Bio.kMeans and Bio.xkMeans
237 ==========================
bbfff83 @peterjc Removing Bio.ECell which was deprecated in Biopython 1.47 (also added…
peterjc authored
238 Deprecated as of Release 1.30, removed in Release 1.42.
b0acc00 Added instructions on how to move to Bio.Cluster from Bio.kMeans and
mdehoon authored
239
240 The k-Means algorithm is an algorithm for unsupervised clustering of data.
241 Biopython includes an implementation of the k-means clustering algorithm
242 in kMeans.py. Recently, a larger set of clustering algorithms entered
243 Biopython as Bio.Cluster. As the kcluster routine in Bio.Cluster also implements
244 the k-means clustering algorithm, the kMeans.py module has been deprecated.
245 Below you will find a description of how to switch from kMeans.py to
246 Bio.Cluster's kcluster.
247
248 The function kcluster in Bio.Cluster performs k-means or k-medians clustering.
249 The corresponding function in kMeans.py is called cluster. This function takes
250 the following arguments:
251
252 o data
253 o k
254 o distance_fn
255 o init_centroids_fn
256 o calc_centroid_fn
257 o max_iterations
258 o update_fn
259
260 The function kcluster in Bio.Cluster takes the following arguments:
261
262 o data
263 o nclusters
264 o mask
265 o weight
266 o transpose
267 o npass
268 o method
269 o dist
270 o initialid
271
272
273 Arguments for kMeans.py's cluster, and their equivalents in Bio.Cluster
274 -----------------------------------------------------------------------
275
276
277 o data:
278
279 In kMeans.py, data is a list of vectors, each containing the same number of
280 data points. Within the context of clustering genes based on their gene
281 expression values, each vector would correspond to the gene expression data of
282 one particular gene, and the values in the vector would correspond to the
283 measured gene expression value by the different microarrays. The cluster
284 routine in kMeans.py always performs a row-wise clustering by grouping vectors.
285
286 The argument data to Bio.Cluster's kcluster has the same structure as in
287 kMeans.py. However, Bio.Cluster allows row-wise and column-wise clustering by
288 the transpose argument. If transpose==0 (the default value), kcluster performs
289 row-wise clustering, consistent with kMeans.py. If transpose==1, kcluster
290 performs column-wise clustering. The same behavior can be obtained, of course,
291 by transposing the data array before calling kcluster.
292
293
294 o k:
295
296 The desired number of clusters is specified by the input argument k in
297 kMeans.py. The corresponding argument in Bio.Cluster's kcluster is nclusters.
298
299 o distance_fn:
300
301 In kMeans.py, the argument distance_fn represents the distance function to
302 calculate the distances between items and cluster centroids. This argument
303 corresponds to a true Python function. The default value is the Euclidean
304 distance, implemented as distance.euclidean in distance.py. User-defined
305 distance functions can also be used.
306
307 The k-means routine in Bio.Cluster does not allow user-specified distance
308 functions. Instead, it provides the following nine built-in distance functions,
309 depending on the argument dist:
310
311 dist=='e': Euclidean distance
312 dist=='h': Harmonically summed Euclidean distance
313 dist=='b': City-block distance
314 dist=='c': Pearson correlation
315 dist=='a': absolute value of the Pearson correlation
316 dist=='u': uncentered correlation
317 dist=='x': absolute uncentered correlation
318 dist=='s': Spearmans rank correlation
319 dist=='k': Kendalls tau
320
321 User-defined distance functions are possible only by modifying the C code in
322 cluster.c (which may not be as hard as it sounds). The default distance function
323 is the Euclidean distance (distance=='e'). Note that in Bio.Cluster the
324 Euclidean distance is defined as the sum of squared differences, whereas in
325 kMeans.py the square root of this quantity is taken. This does not affect the
326 clustering result.
327
328 o init_centroids_fn:
329
330 This function specifies the initial choice for the cluster centroids. By
331 default, cluster in kMeans.py uses a random initial choice of cluster centroids
332 by randomly choosing k data vectors from the input vectors in the data input
333 argument. Alternatively, the user can specify a user-defined function to choose
334 the initial cluster centroids.
335
336 In Bio.Cluster, the k-means algorithm in kcluster starts from an initial cluster
337 assignment instead of an initial choice of cluster centroids. As far as I know,
338 these two initialization methods are equivalent in practice. Similar to the
339 cluster routine in kMeans.py, Bio.Cluster's kcluster performs a random initial
340 assignment of items to clusters. Alternatively, users can specify a
341 (deterministic) initial clustering via the initialid argument. This argument is
342 None by default. If not None, it should be a 1D array (or list) containing the
343 number (between 0 and nclusters-1) of the cluster to which each item is
344 assigned initially.
345
346 Note that the k-means routine in Bio.Cluster performs automatic repeats of the
347 algorithm, each time starting from a different random initial clustering. See
348 the comment for the npass argument below.
349
350 o calc_centroid_fn:
351
352 This argument specifies how to calculate the cluster centroids, given the data
353 vectors of the items that belong to each cluster. By default, the mean over the
354 vectors is calculated. A user-defined function can also be used.
355
356 Bio.Cluster's kcluster does not allow user-defined functions. Instead, the
357 method to calculate the cluster centroid is determined by the argument method,
358 which can be either 'a' (arithmetic mean) or 'm' (median). The default is to
359 calculate the mean ('a').
360
361 o max_iterations:
362
363 The cluster routine in kMeans.py has an argument max_iterations, which is used
364 to stop the iteration it the routine does not converge after the given number of
365 iterations.
366
367 The kcluster routine in Bio.Cluster does not have such an argument. The failure
368 of a k-means algorithm to converge is due to the occurrence of periodic
369 clustering solutions during the course of the k-means algorithm. The kcluster
370 routine in Bio.Cluster automatically checks for the occurrence of such a
371 periodicity in the solutions. If a periodic behavior is detected, the algorithm
372 is interrupted and the last clustering solution is returned. Accordingly, the
373 kcluster routine is guaranteed to return a clustering solution. Also see the
374 discussion of the npass argument below.
375
376 o update_fn:
377
378 The argument update_fn to cluster in kMeans.py is a hook function that is
379 called at the beginning of every iteration and passed the iteration number,
380 cluster centroids, and current cluster assignments. It is used by xkMeans.py,
381 which provides a visualization of k-means clustering. Currently there is no
382 equivalent in Bio.Cluster.
383
384
385 Other arguments for Bio.Cluster's kcluster.
386 -------------------------------------------
387
388 Three arguments in Bio.Cluster's kcluster do not have a direct equivalent in
389 kMeans.py's cluster.
390
391 o mask:
392
393 Microarray experiments tend to suffer from a large number of missing data. The
394 argument mask to Bio.Cluster's kcluster lets the user specify which data are
395 missing. This argument is an array with the same shape as data, and contains
396 a 1 for each data point that is present, and a 0 for a missing data point:
397
398 mask[i,j]==1: data[i,j] is valid
399 mask[i,j]==0: data[i,j] is a missing data point
400
401 Missing data points are ignored by the clustering algorithm. By default, mask
402 is an array containing 1's everywhere.
403
404 o weight:
405
406 The weight argument is used to put different weights on different data point.
407 For example, when clustering genes based on their gene expression profile, we
408 may want to attach a bigger weight to some microarrays compared to others. By
409 default, the weight argument contains equal weights of 1.0 for all data points.
410 Note that for row-wise clustering, the weight argument is a 1D vector whose
411 length is equal to the number of columns. For column-wise clustering, the length
412 of this argument is equal to the number of rows.
413
414 o npass:
415
416 Typical implementations of the k-means clustering algorithm rely on a random
417 initialization. Unlike Self-Organizing Maps, however, the k-means algorithm has
418 a clearly defined goal, which is to minimize the within-cluster sum of
419 distances. Different k-means clustering solutions (based on different initial
420 clusterings) can therefore be compared to each other directly. In order to
421 increase the chance of finding the optimal k-means clustering solution, the
422 k-means routine in Bio.Cluster automatically repeats the algorithm npass times,
423 each time starting from a different initial random clustering. The best
424 clustering solution, as well as in how many of the npass attempts it was found,
425 is returned to the user. For more information, see the output variable nfound
426 below.
427
428
429 Return values
430 -------------
431
432 The cluster routine in kMeans.py returns two values:
433
434 o centroids
435 o clusters
436
437 The kcluster routine in Bio.Cluster returns four values:
438
439 o clusterid
440 o centroids
441 o error
442 o nfound
443
444
445 o centroids:
446
447 The centroids return value contains the centroids of the k clusters that were
448 found, and corresponds to the centroids return value from Bio.Cluster's
449 kcluster routine.
450
451 o clusters:
452
453 The clusters return value contains the number of the cluster to which each
454 vector was assigned. The corresponding return value in Bio.Cluster's kcluster
455 is clusterid.
456
457 o error:
458
459 The error return value from Bio.Cluster's kcluster is the within-cluster sum of
460 distances for the optimal clustering solution that was found. This value can be
461 used to compare different clustering solutions to each other.
462
463 o nfound:
464
465 The nfound return value from Bio.Cluster's kcluster shows in how many of the
466 npass runs the optimal clustering solution was found. Accordingly, nfound is at
467 least 1 and at most equal to npass. A large value for nfound is an indication
468 that the clustering solution that was found is optimal. On the other hand, if
469 nfound is equal to 1, it is very well possible that a better clustering solution
470 exists than the one found by kcluster.
Something went wrong with that request. Please try again.