Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Newer
Older
100644 452 lines (347 sloc) 16.231 kB
1c02f0c Bio.sequtils and Bio.SeqUtils were duplicated code, and even worse we…
chapmanb authored
1 This file provides documentation for modules in Biopython that have been moved
2 or deprecated in favor of other modules. This provides some quick and easy
3 to find documentation about how to update your code to work again.
4
2669c4d - Updates to move from Numeric python to NumPy. Python modules have b…
chapmanb authored
5 Numeric support
6 ===============
41a4497 @peterjc Declaring Bio.Transcribe and Bio.Translate as obsolete and likely to …
peterjc authored
7 Following the release of 1.48, Numeric support in Biopython is discontinued.
8b4babf @peterjc Deprecating Bio.mathfns and Bio.stringfns
peterjc authored
8 Please move to NumPy for Biopython 1.49 or later.
2669c4d - Updates to move from Numeric python to NumPy. Python modules have b…
chapmanb authored
9
cfd18f5 @peterjc Making the Seq object's .data into a new style property with a warnin…
peterjc authored
10 Bio.Seq
11 =======
12 Direct use of the Seq object (and MutableSeq object) .data property is discouraged.
13 As of release 1.49, writing to the Seq object's .data property triggers a warning,
14 and this property is likely to be made read only in the next release.
15
41a4497 @peterjc Declaring Bio.Transcribe and Bio.Translate as obsolete and likely to …
peterjc authored
16 Bio.Transcribe and Bio.Translate
17 ================================
18 Declared obsolete in Release 1.49.
19 Please use the methods or functions in Bio.Seq instead.
20
1255664 @peterjc Declaring Bio.mathfns, Bio.stringfns, Bio.listfns and their C impleme…
peterjc authored
21 Bio.mathfns, Bio.stringfns and Bio.listfns (and their C code variants)
22 ==========================================
23 Declared obsolete in Release 1.49.
8b4babf @peterjc Deprecating Bio.mathfns and Bio.stringfns
peterjc authored
24 Bio.mathfns and Bio.stringfns were deprecated in Release 1.50
25 (the deprecation of Bio.listfns is still pending)
1255664 @peterjc Declaring Bio.mathfns, Bio.stringfns, Bio.listfns and their C impleme…
peterjc authored
26
db96eda @peterjc Deprecating Bio.Ndb as the website this parsed has been redesigned.
peterjc authored
27 Bio.Ndb
28 =======
29 Deprecated in Release 1.49, as the website this parsed has been redesigned.
30
fb76593 @peterjc Labelling Martel and Bio.Mindy as obsolete, updating the news about t…
peterjc authored
31 Martel
32 ======
bbfff83 @peterjc Removing Bio.ECell which was deprecated in Biopython 1.47 (also added…
peterjc authored
33 Declared obsolete in Release 1.48, deprecated in Release 1.49.
fb76593 @peterjc Labelling Martel and Bio.Mindy as obsolete, updating the news about t…
peterjc authored
34
35 Bio.Mindy
36 =========
bbfff83 @peterjc Removing Bio.ECell which was deprecated in Biopython 1.47 (also added…
peterjc authored
37 Declared obsolete in Release 1.48, deprecated in Release 1.49.
76300d6 @peterjc Updates include deprecation of Martel/Mindy
peterjc authored
38
1b6c598 @peterjc Deprecating Bio.DBXRef which was used in the Bio.builders Martel pars…
peterjc authored
39 Bio.builders, Bio.Std, Bio.StdHandler, Bio.Decode and Bio.DBXRef
40 ================================================================
41 Part of the Martel/Mindy parsing infrastructure, these were deprecated in
42 Release 1.49.
fb76593 @peterjc Labelling Martel and Bio.Mindy as obsolete, updating the news about t…
peterjc authored
43
fd2e06b @peterjc Adding minimal module docstrings for the deprecated Bio.Writers/Bio.w…
peterjc authored
44 Bio.Writer and Bio.writers
45 ==========================
bbfff83 @peterjc Removing Bio.ECell which was deprecated in Biopython 1.47 (also added…
peterjc authored
46 Deprecated in Release 1.48.
fd2e06b @peterjc Adding minimal module docstrings for the deprecated Bio.Writers/Bio.w…
peterjc authored
47
6ce5052 @peterjc Bio.Emboss.Primer was deprecated in Biopython 1.48
peterjc authored
48 Bio.Emboss.Primer
49 =================
50 Deprecated in Release 1.48, this parser was replaced by Bio.Emboss.Primer3 and
51 Bio.Emboss.PrimerSearch instead.
52
fb76593 @peterjc Labelling Martel and Bio.Mindy as obsolete, updating the news about t…
peterjc authored
53 Bio.MetaTool
54 ============
bbfff83 @peterjc Removing Bio.ECell which was deprecated in Biopython 1.47 (also added…
peterjc authored
55 Deprecated in Release 1.48, this was a parser for the output of MetaTool 3.5
fb76593 @peterjc Labelling Martel and Bio.Mindy as obsolete, updating the news about t…
peterjc authored
56 which is now obsolete.
57
81015eb @peterjc Declaring Bio.PubMed and the online parts of Bio.GenBank as OBSOLETE,…
peterjc authored
58 Bio.GenBank
59 ===========
60 The online functionality (search_for, download_many, and NCBIDictionary) was
61 declared obsolete in Release 1.48, with the intention of an official deprecation
62 in the following release. Please use Bio.Entrez instead.
63
64 Bio.PubMed
65 ==========
558b4c0 @peterjc Deprecating Bio.PubMed in favour of Bio.Entrez
peterjc authored
66 Declared obsolete in Release 1.48, deprecated in Release 1.49.
67 Please use Bio.Entrez instead.
81015eb @peterjc Declaring Bio.PubMed and the online parts of Bio.GenBank as OBSOLETE,…
peterjc authored
68
01e6e76 @peterjc Bio.EUtils deprecated in favour of Bio.Entrez
peterjc authored
69 Bio.EUtils
70 ==========
bbfff83 @peterjc Removing Bio.ECell which was deprecated in Biopython 1.47 (also added…
peterjc authored
71 Deprecated in favor of Bio.Entrez in Release 1.48.
01e6e76 @peterjc Bio.EUtils deprecated in favour of Bio.Entrez
peterjc authored
72
4796600 @peterjc Noting API change for Bio.Sequencing
peterjc authored
73 Bio.Sequencing
74 ==============
75 A revised API was added and the old one deprecated in Biopython 1.48:
76 Bio.Sequencing.Ace.RecordParser --> Bio.Sequencing.Ace.read(handle)
77 Bio.Sequencing.Ace.Iterator --> Bio.Sequencing.Ace.parse(handle)
78 Bio.Sequencing.Phd.RecordParser --> Bio.Sequencing.Phd.read(handle)
79 Bio.Sequencing.Phd.Iterator --> Bio.Sequencing.Phd.parse(handle)
80
81
fa7ab2d @peterjc Updating for recent deprecations
peterjc authored
82 Bio.Blast.NCBIWWW
83 =================
bbfff83 @peterjc Removing Bio.ECell which was deprecated in Biopython 1.47 (also added…
peterjc authored
84 The HTML BLAST parser was deprecated as of Release 1.48.
85 The deprecated functions blast and blasturl were removed in Release 1.44.
fa7ab2d @peterjc Updating for recent deprecations
peterjc authored
86
87 Bio.Saf
88 =======
89 Deprecated as of Release 1.48, as it appears to have no users, and relies
bbfff83 @peterjc Removing Bio.ECell which was deprecated in Biopython 1.47 (also added…
peterjc authored
90 on Martel which doesn't work properly with mxTextTools 3.0.
fa7ab2d @peterjc Updating for recent deprecations
peterjc authored
91
ad46521 @peterjc Deprecating Bio.NBRF in favour of the 'pir' format in Bio.SeqIO
peterjc authored
92 Bio.NBRF
93 ========
94 Deprecated as of Release 1.48 in favor of the "pir" format in Bio.SeqIO
95
5be4221 @peterjc Deprecating Bio.IntelliGenetics in favour of the ig format in Bio.SeqIO
peterjc authored
96 Bio.IntelliGenetics
97 ===================
fa7ab2d @peterjc Updating for recent deprecations
peterjc authored
98 Deprecated as of Release 1.48 in favor of the "ig" format in Bio.SeqIO
d01c450 Getting ready for release 1.46.
mdehoon authored
99
890dada @peterjc Removing deprecated Bio.SeqIO submodules (code was moved under Bio.Al…
peterjc authored
100 Bio.SeqIO submodules PhylipIO, ClustalIO, NexusIO and StockholmIO
101 =================================================================
102 You can still use the "phylip", "clustal", "nexus" and "stockholm" formats
103 in Bio.SeqIO, however these are now supported via Bio.AlignIO, with the
104 old code deprecated in Releases 1.46 or 1.47, and removed in Release 1.49.
105
5e507c9 Updating for release 1.47.
mdehoon authored
106 Bio.ECell
107 =========
108 Deprecated as of Release 1.47, as it appears to have no users, and the code
bbfff83 @peterjc Removing Bio.ECell which was deprecated in Biopython 1.47 (also added…
peterjc authored
109 does not seem relevant for ECell 3. Removed in Release 1.49.
5e507c9 Updating for release 1.47.
mdehoon authored
110
d4f785b @peterjc Removing Bio.Ais which was deprecated in Biopython 1.45
peterjc authored
111 Bio.Ais
112 =======
113 Deprecated as of Release 1.45, removed in Release 1.49.
114
3f5ba50 @peterjc Removing Bio.LocusLink which was deprecated in Biopython 1.45 -- the …
peterjc authored
115 Bio.LocusLink
116 =============
117 Deprecated as of Release 1.45, removed in Release 1.49.
118 The NCBI's LocusLink was superseded by Entrez Gene.
119
edb3ac6 @peterjc Removing modules Bio.SGMLExtractor, Bio.CDD, Bio.Gobase and Bio.Rebas…
peterjc authored
120 Bio.SGMLExtractor
121 =================
122 Deprecated as of Release 1.46, removed in Release 1.49.
123
d01c450 Getting ready for release 1.46.
mdehoon authored
124 Bio.Rebase
125 ==========
edb3ac6 @peterjc Removing modules Bio.SGMLExtractor, Bio.CDD, Bio.Gobase and Bio.Rebas…
peterjc authored
126 Deprecated as of Release 1.46, removed in Release 1.49.
d01c450 Getting ready for release 1.46.
mdehoon authored
127
128 Bio.Gobase
129 ==========
edb3ac6 @peterjc Removing modules Bio.SGMLExtractor, Bio.CDD, Bio.Gobase and Bio.Rebas…
peterjc authored
130 Deprecated as of Release 1.46, removed in Release 1.49.
d01c450 Getting ready for release 1.46.
mdehoon authored
131
132 Bio.CDD
133 =======
edb3ac6 @peterjc Removing modules Bio.SGMLExtractor, Bio.CDD, Bio.Gobase and Bio.Rebas…
peterjc authored
134 Deprecated as of Release 1.46, removed in Release 1.49.
d01c450 Getting ready for release 1.46.
mdehoon authored
135
21059b1 @peterjc Bio.biblio was deprecated for Biopython 1.45, but I didn't remember t…
peterjc authored
136 Bio.biblio
137 ==========
b927439 @peterjc Bio.WWW deprecation, and updating old entries to say when they were r…
peterjc authored
138 Deprecated as of Release 1.45, removed in Release 1.48
21059b1 @peterjc Bio.biblio was deprecated for Biopython 1.45, but I didn't remember t…
peterjc authored
139
4556db2 @peterjc Bringing these up to date with changes since Biopython 1.44
peterjc authored
140 Bio.WWW
141 =======
b927439 @peterjc Bio.WWW deprecation, and updating old entries to say when they were r…
peterjc authored
142 The modules under Bio.WWW were deprecated in Release 1.45, and removed in 1.48.
143 The remaining stub Bio.WWW was deprecated in Release 1.48.
144
4556db2 @peterjc Bringing these up to date with changes since Biopython 1.44
peterjc authored
145 The functionality in Bio.WWW.SCOP, Bio.WWW.InterPro and Bio.WWW.ExPASy
146 is now available from Bio.SCOP, Bio.InterPro and Bio.ExPASy instead.
147
5145a4d @peterjc Bringing this up to date for Biopython 1.44
peterjc authored
148 Bio.SeqIO
149 =========
150 The old Bio.SeqIO.FASTA and Bio.SeqIO.generic were deprecated in favour of
bbfff83 @peterjc Removing Bio.ECell which was deprecated in Biopython 1.47 (also added…
peterjc authored
151 the new Bio.SeqIO module as of Release 1.44, removed in Release 1.47.
5145a4d @peterjc Bringing this up to date for Biopython 1.44
peterjc authored
152
fe10992 @peterjc Mentioning a few old modules deprecated in 1.44 and removed in 1.46
peterjc authored
153 Bio.Medline.NLMMedlineXML
154 =========================
bbfff83 @peterjc Removing Bio.ECell which was deprecated in Biopython 1.47 (also added…
peterjc authored
155 Deprecated in Release 1.44, removed in 1.46.
fe10992 @peterjc Mentioning a few old modules deprecated in 1.44 and removed in 1.46
peterjc authored
156
157 Bio.MultiProc
158 =============
bbfff83 @peterjc Removing Bio.ECell which was deprecated in Biopython 1.47 (also added…
peterjc authored
159 Deprecated in Release 1.44, removed in 1.46.
fe10992 @peterjc Mentioning a few old modules deprecated in 1.44 and removed in 1.46
peterjc authored
160
161 Bio.MarkupEditor
162 ================
bbfff83 @peterjc Removing Bio.ECell which was deprecated in Biopython 1.47 (also added…
peterjc authored
163 Deprecated in Release 1.44, removed in 1.46.
fe10992 @peterjc Mentioning a few old modules deprecated in 1.44 and removed in 1.46
peterjc authored
164
5145a4d @peterjc Bringing this up to date for Biopython 1.44
peterjc authored
165 Bio.lcc
166 =======
bbfff83 @peterjc Removing Bio.ECell which was deprecated in Biopython 1.47 (also added…
peterjc authored
167 Deprecated in favor of Bio.SeqUtils.lcc in Release 1.44, removed in 1.46.
5145a4d @peterjc Bringing this up to date for Biopython 1.44
peterjc authored
168
169 Bio.crc
170 =======
bbfff83 @peterjc Removing Bio.ECell which was deprecated in Biopython 1.47 (also added…
peterjc authored
171 Deprecated in favor of Bio.SeqUtils.CheckSum in Release 1.44, removed in 1.46.
5145a4d @peterjc Bringing this up to date for Biopython 1.44
peterjc authored
172
173 Bio.FormatIO
174 ============
bbfff83 @peterjc Removing Bio.ECell which was deprecated in Biopython 1.47 (also added…
peterjc authored
175 This was removed in Release 1.44 (a deprecation was not possible).
5145a4d @peterjc Bringing this up to date for Biopython 1.44
peterjc authored
176
fa7ab2d @peterjc Updating for recent deprecations
peterjc authored
177 Bio.expressions (and therefore Bio.config, Bio.dbdefs, Bio.formatdefs, Bio.dbdefs)
5145a4d @peterjc Bringing this up to date for Biopython 1.44
peterjc authored
178 ===============
bbfff83 @peterjc Removing Bio.ECell which was deprecated in Biopython 1.47 (also added…
peterjc authored
179 These were deprecated in Release 1.44, and removed in Release 1.49.
5145a4d @peterjc Bringing this up to date for Biopython 1.44
peterjc authored
180
181 Bio.Kabat
182 =========
bbfff83 @peterjc Removing Bio.ECell which was deprecated in Biopython 1.47 (also added…
peterjc authored
183 This was deprecated in Release 1.43 and removed in Release 1.44.
5145a4d @peterjc Bringing this up to date for Biopython 1.44
peterjc authored
184
34b4f31 Added the functions 'complement' and 'reverse_complement' to Bio.Seq'…
mdehoon authored
185 Bio.SeqUtils
186 ============
187 The functions 'complement' and 'antiparallel' in Bio.SeqUtils have been
76300d6 @peterjc Updates include deprecation of Martel/Mindy
peterjc authored
188 deprecated as of Release 1.31, and removed in Release 1.43.
189 Use the functions 'complement' and 'reverse_complement' in Bio.Seq instead.
34b4f31 Added the functions 'complement' and 'reverse_complement' to Bio.Seq'…
mdehoon authored
190
191 Bio.GFF
192 =======
193 The functions 'forward_complement' and 'antiparallel' in Bio.GFF.easy have been
76300d6 @peterjc Updates include deprecation of Martel/Mindy
peterjc authored
194 deprecated as of Release 1.31, and removed in Release 1.43.
195 Use the functions 'complement' and 'reverse_complement' in Bio.Seq instead.
efd9b60 Added blast to qblast change to DEPRECATED file
chapmanb authored
196
1c02f0c Bio.sequtils and Bio.SeqUtils were duplicated code, and even worse we…
chapmanb authored
197 Bio.sequtils
b0acc00 Added instructions on how to move to Bio.Cluster from Bio.kMeans and
mdehoon authored
198 ============
bbfff83 @peterjc Removing Bio.ECell which was deprecated in Biopython 1.47 (also added…
peterjc authored
199 Deprecated as of Release 1.30, removed in Release 1.42.
1c02f0c Bio.sequtils and Bio.SeqUtils were duplicated code, and even worse we…
chapmanb authored
200 Use Bio.SeqUtils instead.
b0acc00 Added instructions on how to move to Bio.Cluster from Bio.kMeans and
mdehoon authored
201
909bae9 Deprecated Bio.SVM and recommend usage of libsvm.
chapmanb authored
202 Bio.SVM
203 =======
bbfff83 @peterjc Removing Bio.ECell which was deprecated in Biopython 1.47 (also added…
peterjc authored
204 Deprecated as of Release 1.30, removed in Release 1.42.
909bae9 Deprecated Bio.SVM and recommend usage of libsvm.
chapmanb authored
205 The Support Vector Machine code in Biopython has been superceeded by a
206 more robust (and maintained) SVM library, which includes a python
207 interface. We recommend using LIBSVM:
208
209 http://www.csie.ntu.edu.tw/~cjlin/libsvm/
b0acc00 Added instructions on how to move to Bio.Cluster from Bio.kMeans and
mdehoon authored
210
23b046b Removed internal references to RecordFile, which are really not needed.
chapmanb authored
211 Bio.RecordFile
212 ==============
bbfff83 @peterjc Removing Bio.ECell which was deprecated in Biopython 1.47 (also added…
peterjc authored
213 Deprecated as of Release 1.30, removed in Release 1.42.
23b046b Removed internal references to RecordFile, which are really not needed.
chapmanb authored
214 RecordFile wasn't completely implemented and duplicates the work
41a4497 @peterjc Declaring Bio.Transcribe and Bio.Translate as obsolete and likely to …
peterjc authored
215 of most standard parsers.
23b046b Removed internal references to RecordFile, which are really not needed.
chapmanb authored
216
b0acc00 Added instructions on how to move to Bio.Cluster from Bio.kMeans and
mdehoon authored
217 Bio.kMeans and Bio.xkMeans
218 ==========================
bbfff83 @peterjc Removing Bio.ECell which was deprecated in Biopython 1.47 (also added…
peterjc authored
219 Deprecated as of Release 1.30, removed in Release 1.42.
b0acc00 Added instructions on how to move to Bio.Cluster from Bio.kMeans and
mdehoon authored
220
221 The k-Means algorithm is an algorithm for unsupervised clustering of data.
222 Biopython includes an implementation of the k-means clustering algorithm
223 in kMeans.py. Recently, a larger set of clustering algorithms entered
224 Biopython as Bio.Cluster. As the kcluster routine in Bio.Cluster also implements
225 the k-means clustering algorithm, the kMeans.py module has been deprecated.
226 Below you will find a description of how to switch from kMeans.py to
227 Bio.Cluster's kcluster.
228
229 The function kcluster in Bio.Cluster performs k-means or k-medians clustering.
230 The corresponding function in kMeans.py is called cluster. This function takes
231 the following arguments:
232
233 o data
234 o k
235 o distance_fn
236 o init_centroids_fn
237 o calc_centroid_fn
238 o max_iterations
239 o update_fn
240
241 The function kcluster in Bio.Cluster takes the following arguments:
242
243 o data
244 o nclusters
245 o mask
246 o weight
247 o transpose
248 o npass
249 o method
250 o dist
251 o initialid
252
253
254 Arguments for kMeans.py's cluster, and their equivalents in Bio.Cluster
255 -----------------------------------------------------------------------
256
257
258 o data:
259
260 In kMeans.py, data is a list of vectors, each containing the same number of
261 data points. Within the context of clustering genes based on their gene
262 expression values, each vector would correspond to the gene expression data of
263 one particular gene, and the values in the vector would correspond to the
264 measured gene expression value by the different microarrays. The cluster
265 routine in kMeans.py always performs a row-wise clustering by grouping vectors.
266
267 The argument data to Bio.Cluster's kcluster has the same structure as in
268 kMeans.py. However, Bio.Cluster allows row-wise and column-wise clustering by
269 the transpose argument. If transpose==0 (the default value), kcluster performs
270 row-wise clustering, consistent with kMeans.py. If transpose==1, kcluster
271 performs column-wise clustering. The same behavior can be obtained, of course,
272 by transposing the data array before calling kcluster.
273
274
275 o k:
276
277 The desired number of clusters is specified by the input argument k in
278 kMeans.py. The corresponding argument in Bio.Cluster's kcluster is nclusters.
279
280 o distance_fn:
281
282 In kMeans.py, the argument distance_fn represents the distance function to
283 calculate the distances between items and cluster centroids. This argument
284 corresponds to a true Python function. The default value is the Euclidean
285 distance, implemented as distance.euclidean in distance.py. User-defined
286 distance functions can also be used.
287
288 The k-means routine in Bio.Cluster does not allow user-specified distance
289 functions. Instead, it provides the following nine built-in distance functions,
290 depending on the argument dist:
291
292 dist=='e': Euclidean distance
293 dist=='h': Harmonically summed Euclidean distance
294 dist=='b': City-block distance
295 dist=='c': Pearson correlation
296 dist=='a': absolute value of the Pearson correlation
297 dist=='u': uncentered correlation
298 dist=='x': absolute uncentered correlation
299 dist=='s': Spearmans rank correlation
300 dist=='k': Kendalls tau
301
302 User-defined distance functions are possible only by modifying the C code in
303 cluster.c (which may not be as hard as it sounds). The default distance function
304 is the Euclidean distance (distance=='e'). Note that in Bio.Cluster the
305 Euclidean distance is defined as the sum of squared differences, whereas in
306 kMeans.py the square root of this quantity is taken. This does not affect the
307 clustering result.
308
309 o init_centroids_fn:
310
311 This function specifies the initial choice for the cluster centroids. By
312 default, cluster in kMeans.py uses a random initial choice of cluster centroids
313 by randomly choosing k data vectors from the input vectors in the data input
314 argument. Alternatively, the user can specify a user-defined function to choose
315 the initial cluster centroids.
316
317 In Bio.Cluster, the k-means algorithm in kcluster starts from an initial cluster
318 assignment instead of an initial choice of cluster centroids. As far as I know,
319 these two initialization methods are equivalent in practice. Similar to the
320 cluster routine in kMeans.py, Bio.Cluster's kcluster performs a random initial
321 assignment of items to clusters. Alternatively, users can specify a
322 (deterministic) initial clustering via the initialid argument. This argument is
323 None by default. If not None, it should be a 1D array (or list) containing the
324 number (between 0 and nclusters-1) of the cluster to which each item is
325 assigned initially.
326
327 Note that the k-means routine in Bio.Cluster performs automatic repeats of the
328 algorithm, each time starting from a different random initial clustering. See
329 the comment for the npass argument below.
330
331 o calc_centroid_fn:
332
333 This argument specifies how to calculate the cluster centroids, given the data
334 vectors of the items that belong to each cluster. By default, the mean over the
335 vectors is calculated. A user-defined function can also be used.
336
337 Bio.Cluster's kcluster does not allow user-defined functions. Instead, the
338 method to calculate the cluster centroid is determined by the argument method,
339 which can be either 'a' (arithmetic mean) or 'm' (median). The default is to
340 calculate the mean ('a').
341
342 o max_iterations:
343
344 The cluster routine in kMeans.py has an argument max_iterations, which is used
345 to stop the iteration it the routine does not converge after the given number of
346 iterations.
347
348 The kcluster routine in Bio.Cluster does not have such an argument. The failure
349 of a k-means algorithm to converge is due to the occurrence of periodic
350 clustering solutions during the course of the k-means algorithm. The kcluster
351 routine in Bio.Cluster automatically checks for the occurrence of such a
352 periodicity in the solutions. If a periodic behavior is detected, the algorithm
353 is interrupted and the last clustering solution is returned. Accordingly, the
354 kcluster routine is guaranteed to return a clustering solution. Also see the
355 discussion of the npass argument below.
356
357 o update_fn:
358
359 The argument update_fn to cluster in kMeans.py is a hook function that is
360 called at the beginning of every iteration and passed the iteration number,
361 cluster centroids, and current cluster assignments. It is used by xkMeans.py,
362 which provides a visualization of k-means clustering. Currently there is no
363 equivalent in Bio.Cluster.
364
365
366 Other arguments for Bio.Cluster's kcluster.
367 -------------------------------------------
368
369 Three arguments in Bio.Cluster's kcluster do not have a direct equivalent in
370 kMeans.py's cluster.
371
372 o mask:
373
374 Microarray experiments tend to suffer from a large number of missing data. The
375 argument mask to Bio.Cluster's kcluster lets the user specify which data are
376 missing. This argument is an array with the same shape as data, and contains
377 a 1 for each data point that is present, and a 0 for a missing data point:
378
379 mask[i,j]==1: data[i,j] is valid
380 mask[i,j]==0: data[i,j] is a missing data point
381
382 Missing data points are ignored by the clustering algorithm. By default, mask
383 is an array containing 1's everywhere.
384
385 o weight:
386
387 The weight argument is used to put different weights on different data point.
388 For example, when clustering genes based on their gene expression profile, we
389 may want to attach a bigger weight to some microarrays compared to others. By
390 default, the weight argument contains equal weights of 1.0 for all data points.
391 Note that for row-wise clustering, the weight argument is a 1D vector whose
392 length is equal to the number of columns. For column-wise clustering, the length
393 of this argument is equal to the number of rows.
394
395 o npass:
396
397 Typical implementations of the k-means clustering algorithm rely on a random
398 initialization. Unlike Self-Organizing Maps, however, the k-means algorithm has
399 a clearly defined goal, which is to minimize the within-cluster sum of
400 distances. Different k-means clustering solutions (based on different initial
401 clusterings) can therefore be compared to each other directly. In order to
402 increase the chance of finding the optimal k-means clustering solution, the
403 k-means routine in Bio.Cluster automatically repeats the algorithm npass times,
404 each time starting from a different initial random clustering. The best
405 clustering solution, as well as in how many of the npass attempts it was found,
406 is returned to the user. For more information, see the output variable nfound
407 below.
408
409
410 Return values
411 -------------
412
413 The cluster routine in kMeans.py returns two values:
414
415 o centroids
416 o clusters
417
418 The kcluster routine in Bio.Cluster returns four values:
419
420 o clusterid
421 o centroids
422 o error
423 o nfound
424
425
426 o centroids:
427
428 The centroids return value contains the centroids of the k clusters that were
429 found, and corresponds to the centroids return value from Bio.Cluster's
430 kcluster routine.
431
432 o clusters:
433
434 The clusters return value contains the number of the cluster to which each
435 vector was assigned. The corresponding return value in Bio.Cluster's kcluster
436 is clusterid.
437
438 o error:
439
440 The error return value from Bio.Cluster's kcluster is the within-cluster sum of
441 distances for the optimal clustering solution that was found. This value can be
442 used to compare different clustering solutions to each other.
443
444 o nfound:
445
446 The nfound return value from Bio.Cluster's kcluster shows in how many of the
447 npass runs the optimal clustering solution was found. Accordingly, nfound is at
448 least 1 and at most equal to npass. A large value for nfound is an indication
449 that the clustering solution that was found is optimal. On the other hand, if
450 nfound is equal to 1, it is very well possible that a better clustering solution
451 exists than the one found by kcluster.
Something went wrong with that request. Please try again.