Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Newer
Older
100644 446 lines (342 sloc) 16.018 kB
1c02f0c Bio.sequtils and Bio.SeqUtils were duplicated code, and even worse we…
chapmanb authored
1 This file provides documentation for modules in Biopython that have been moved
2 or deprecated in favor of other modules. This provides some quick and easy
3 to find documentation about how to update your code to work again.
4
2669c4d - Updates to move from Numeric python to NumPy. Python modules have b…
chapmanb authored
5 Numeric support
6 ===============
41a4497 @peterjc Declaring Bio.Transcribe and Bio.Translate as obsolete and likely to …
peterjc authored
7 Following the release of 1.48, Numeric support in Biopython is discontinued.
8 Please move to NumPy.
2669c4d - Updates to move from Numeric python to NumPy. Python modules have b…
chapmanb authored
9
cfd18f5 @peterjc Making the Seq object's .data into a new style property with a warnin…
peterjc authored
10 Bio.Seq
11 =======
12 Direct use of the Seq object (and MutableSeq object) .data property is discouraged.
13 As of release 1.49, writing to the Seq object's .data property triggers a warning,
14 and this property is likely to be made read only in the next release.
15
41a4497 @peterjc Declaring Bio.Transcribe and Bio.Translate as obsolete and likely to …
peterjc authored
16 Bio.Transcribe and Bio.Translate
17 ================================
18 Declared obsolete in Release 1.49.
19 Please use the methods or functions in Bio.Seq instead.
20
1255664 @peterjc Declaring Bio.mathfns, Bio.stringfns, Bio.listfns and their C impleme…
peterjc authored
21 Bio.mathfns, Bio.stringfns and Bio.listfns (and their C code variants)
22 ==========================================
23 Declared obsolete in Release 1.49.
24
db96eda @peterjc Deprecating Bio.Ndb as the website this parsed has been redesigned.
peterjc authored
25 Bio.Ndb
26 =======
27 Deprecated in Release 1.49, as the website this parsed has been redesigned.
28
fb76593 @peterjc Labelling Martel and Bio.Mindy as obsolete, updating the news about t…
peterjc authored
29 Martel
30 ======
bbfff83 @peterjc Removing Bio.ECell which was deprecated in Biopython 1.47 (also added…
peterjc authored
31 Declared obsolete in Release 1.48, deprecated in Release 1.49.
fb76593 @peterjc Labelling Martel and Bio.Mindy as obsolete, updating the news about t…
peterjc authored
32
33 Bio.Mindy
34 =========
bbfff83 @peterjc Removing Bio.ECell which was deprecated in Biopython 1.47 (also added…
peterjc authored
35 Declared obsolete in Release 1.48, deprecated in Release 1.49.
76300d6 @peterjc Updates include deprecation of Martel/Mindy
peterjc authored
36
1b6c598 @peterjc Deprecating Bio.DBXRef which was used in the Bio.builders Martel pars…
peterjc authored
37 Bio.builders, Bio.Std, Bio.StdHandler, Bio.Decode and Bio.DBXRef
38 ================================================================
39 Part of the Martel/Mindy parsing infrastructure, these were deprecated in
40 Release 1.49.
fb76593 @peterjc Labelling Martel and Bio.Mindy as obsolete, updating the news about t…
peterjc authored
41
fd2e06b @peterjc Adding minimal module docstrings for the deprecated Bio.Writers/Bio.w…
peterjc authored
42 Bio.Writer and Bio.writers
43 ==========================
bbfff83 @peterjc Removing Bio.ECell which was deprecated in Biopython 1.47 (also added…
peterjc authored
44 Deprecated in Release 1.48.
fd2e06b @peterjc Adding minimal module docstrings for the deprecated Bio.Writers/Bio.w…
peterjc authored
45
6ce5052 @peterjc Bio.Emboss.Primer was deprecated in Biopython 1.48
peterjc authored
46 Bio.Emboss.Primer
47 =================
48 Deprecated in Release 1.48, this parser was replaced by Bio.Emboss.Primer3 and
49 Bio.Emboss.PrimerSearch instead.
50
fb76593 @peterjc Labelling Martel and Bio.Mindy as obsolete, updating the news about t…
peterjc authored
51 Bio.MetaTool
52 ============
bbfff83 @peterjc Removing Bio.ECell which was deprecated in Biopython 1.47 (also added…
peterjc authored
53 Deprecated in Release 1.48, this was a parser for the output of MetaTool 3.5
fb76593 @peterjc Labelling Martel and Bio.Mindy as obsolete, updating the news about t…
peterjc authored
54 which is now obsolete.
55
81015eb @peterjc Declaring Bio.PubMed and the online parts of Bio.GenBank as OBSOLETE,…
peterjc authored
56 Bio.GenBank
57 ===========
58 The online functionality (search_for, download_many, and NCBIDictionary) was
59 declared obsolete in Release 1.48, with the intention of an official deprecation
60 in the following release. Please use Bio.Entrez instead.
61
62 Bio.PubMed
63 ==========
558b4c0 @peterjc Deprecating Bio.PubMed in favour of Bio.Entrez
peterjc authored
64 Declared obsolete in Release 1.48, deprecated in Release 1.49.
65 Please use Bio.Entrez instead.
81015eb @peterjc Declaring Bio.PubMed and the online parts of Bio.GenBank as OBSOLETE,…
peterjc authored
66
01e6e76 @peterjc Bio.EUtils deprecated in favour of Bio.Entrez
peterjc authored
67 Bio.EUtils
68 ==========
bbfff83 @peterjc Removing Bio.ECell which was deprecated in Biopython 1.47 (also added…
peterjc authored
69 Deprecated in favor of Bio.Entrez in Release 1.48.
01e6e76 @peterjc Bio.EUtils deprecated in favour of Bio.Entrez
peterjc authored
70
4796600 @peterjc Noting API change for Bio.Sequencing
peterjc authored
71 Bio.Sequencing
72 ==============
73 A revised API was added and the old one deprecated in Biopython 1.48:
74 Bio.Sequencing.Ace.RecordParser --> Bio.Sequencing.Ace.read(handle)
75 Bio.Sequencing.Ace.Iterator --> Bio.Sequencing.Ace.parse(handle)
76 Bio.Sequencing.Phd.RecordParser --> Bio.Sequencing.Phd.read(handle)
77 Bio.Sequencing.Phd.Iterator --> Bio.Sequencing.Phd.parse(handle)
78
79
fa7ab2d @peterjc Updating for recent deprecations
peterjc authored
80 Bio.Blast.NCBIWWW
81 =================
bbfff83 @peterjc Removing Bio.ECell which was deprecated in Biopython 1.47 (also added…
peterjc authored
82 The HTML BLAST parser was deprecated as of Release 1.48.
83 The deprecated functions blast and blasturl were removed in Release 1.44.
fa7ab2d @peterjc Updating for recent deprecations
peterjc authored
84
85 Bio.Saf
86 =======
87 Deprecated as of Release 1.48, as it appears to have no users, and relies
bbfff83 @peterjc Removing Bio.ECell which was deprecated in Biopython 1.47 (also added…
peterjc authored
88 on Martel which doesn't work properly with mxTextTools 3.0.
fa7ab2d @peterjc Updating for recent deprecations
peterjc authored
89
ad46521 @peterjc Deprecating Bio.NBRF in favour of the 'pir' format in Bio.SeqIO
peterjc authored
90 Bio.NBRF
91 ========
92 Deprecated as of Release 1.48 in favor of the "pir" format in Bio.SeqIO
93
5be4221 @peterjc Deprecating Bio.IntelliGenetics in favour of the ig format in Bio.SeqIO
peterjc authored
94 Bio.IntelliGenetics
95 ===================
fa7ab2d @peterjc Updating for recent deprecations
peterjc authored
96 Deprecated as of Release 1.48 in favor of the "ig" format in Bio.SeqIO
d01c450 Getting ready for release 1.46.
mdehoon authored
97
890dada @peterjc Removing deprecated Bio.SeqIO submodules (code was moved under Bio.Al…
peterjc authored
98 Bio.SeqIO submodules PhylipIO, ClustalIO, NexusIO and StockholmIO
99 =================================================================
100 You can still use the "phylip", "clustal", "nexus" and "stockholm" formats
101 in Bio.SeqIO, however these are now supported via Bio.AlignIO, with the
102 old code deprecated in Releases 1.46 or 1.47, and removed in Release 1.49.
103
5e507c9 Updating for release 1.47.
mdehoon authored
104 Bio.ECell
105 =========
106 Deprecated as of Release 1.47, as it appears to have no users, and the code
bbfff83 @peterjc Removing Bio.ECell which was deprecated in Biopython 1.47 (also added…
peterjc authored
107 does not seem relevant for ECell 3. Removed in Release 1.49.
5e507c9 Updating for release 1.47.
mdehoon authored
108
3f5ba50 @peterjc Removing Bio.LocusLink which was deprecated in Biopython 1.45 -- the …
peterjc authored
109 Bio.LocusLink
110 =============
111 Deprecated as of Release 1.45, removed in Release 1.49.
112 The NCBI's LocusLink was superseded by Entrez Gene.
113
edb3ac6 @peterjc Removing modules Bio.SGMLExtractor, Bio.CDD, Bio.Gobase and Bio.Rebas…
peterjc authored
114 Bio.SGMLExtractor
115 =================
116 Deprecated as of Release 1.46, removed in Release 1.49.
117
d01c450 Getting ready for release 1.46.
mdehoon authored
118 Bio.Rebase
119 ==========
edb3ac6 @peterjc Removing modules Bio.SGMLExtractor, Bio.CDD, Bio.Gobase and Bio.Rebas…
peterjc authored
120 Deprecated as of Release 1.46, removed in Release 1.49.
d01c450 Getting ready for release 1.46.
mdehoon authored
121
122 Bio.Gobase
123 ==========
edb3ac6 @peterjc Removing modules Bio.SGMLExtractor, Bio.CDD, Bio.Gobase and Bio.Rebas…
peterjc authored
124 Deprecated as of Release 1.46, removed in Release 1.49.
d01c450 Getting ready for release 1.46.
mdehoon authored
125
126 Bio.CDD
127 =======
edb3ac6 @peterjc Removing modules Bio.SGMLExtractor, Bio.CDD, Bio.Gobase and Bio.Rebas…
peterjc authored
128 Deprecated as of Release 1.46, removed in Release 1.49.
d01c450 Getting ready for release 1.46.
mdehoon authored
129
21059b1 @peterjc Bio.biblio was deprecated for Biopython 1.45, but I didn't remember t…
peterjc authored
130 Bio.biblio
131 ==========
b927439 @peterjc Bio.WWW deprecation, and updating old entries to say when they were r…
peterjc authored
132 Deprecated as of Release 1.45, removed in Release 1.48
21059b1 @peterjc Bio.biblio was deprecated for Biopython 1.45, but I didn't remember t…
peterjc authored
133
4556db2 @peterjc Bringing these up to date with changes since Biopython 1.44
peterjc authored
134 Bio.WWW
135 =======
b927439 @peterjc Bio.WWW deprecation, and updating old entries to say when they were r…
peterjc authored
136 The modules under Bio.WWW were deprecated in Release 1.45, and removed in 1.48.
137 The remaining stub Bio.WWW was deprecated in Release 1.48.
138
4556db2 @peterjc Bringing these up to date with changes since Biopython 1.44
peterjc authored
139 The functionality in Bio.WWW.SCOP, Bio.WWW.InterPro and Bio.WWW.ExPASy
140 is now available from Bio.SCOP, Bio.InterPro and Bio.ExPASy instead.
141
5145a4d @peterjc Bringing this up to date for Biopython 1.44
peterjc authored
142 Bio.SeqIO
143 =========
144 The old Bio.SeqIO.FASTA and Bio.SeqIO.generic were deprecated in favour of
bbfff83 @peterjc Removing Bio.ECell which was deprecated in Biopython 1.47 (also added…
peterjc authored
145 the new Bio.SeqIO module as of Release 1.44, removed in Release 1.47.
5145a4d @peterjc Bringing this up to date for Biopython 1.44
peterjc authored
146
fe10992 @peterjc Mentioning a few old modules deprecated in 1.44 and removed in 1.46
peterjc authored
147 Bio.Medline.NLMMedlineXML
148 =========================
bbfff83 @peterjc Removing Bio.ECell which was deprecated in Biopython 1.47 (also added…
peterjc authored
149 Deprecated in Release 1.44, removed in 1.46.
fe10992 @peterjc Mentioning a few old modules deprecated in 1.44 and removed in 1.46
peterjc authored
150
151 Bio.MultiProc
152 =============
bbfff83 @peterjc Removing Bio.ECell which was deprecated in Biopython 1.47 (also added…
peterjc authored
153 Deprecated in Release 1.44, removed in 1.46.
fe10992 @peterjc Mentioning a few old modules deprecated in 1.44 and removed in 1.46
peterjc authored
154
155 Bio.MarkupEditor
156 ================
bbfff83 @peterjc Removing Bio.ECell which was deprecated in Biopython 1.47 (also added…
peterjc authored
157 Deprecated in Release 1.44, removed in 1.46.
fe10992 @peterjc Mentioning a few old modules deprecated in 1.44 and removed in 1.46
peterjc authored
158
5145a4d @peterjc Bringing this up to date for Biopython 1.44
peterjc authored
159 Bio.lcc
160 =======
bbfff83 @peterjc Removing Bio.ECell which was deprecated in Biopython 1.47 (also added…
peterjc authored
161 Deprecated in favor of Bio.SeqUtils.lcc in Release 1.44, removed in 1.46.
5145a4d @peterjc Bringing this up to date for Biopython 1.44
peterjc authored
162
163 Bio.crc
164 =======
bbfff83 @peterjc Removing Bio.ECell which was deprecated in Biopython 1.47 (also added…
peterjc authored
165 Deprecated in favor of Bio.SeqUtils.CheckSum in Release 1.44, removed in 1.46.
5145a4d @peterjc Bringing this up to date for Biopython 1.44
peterjc authored
166
167 Bio.FormatIO
168 ============
bbfff83 @peterjc Removing Bio.ECell which was deprecated in Biopython 1.47 (also added…
peterjc authored
169 This was removed in Release 1.44 (a deprecation was not possible).
5145a4d @peterjc Bringing this up to date for Biopython 1.44
peterjc authored
170
fa7ab2d @peterjc Updating for recent deprecations
peterjc authored
171 Bio.expressions (and therefore Bio.config, Bio.dbdefs, Bio.formatdefs, Bio.dbdefs)
5145a4d @peterjc Bringing this up to date for Biopython 1.44
peterjc authored
172 ===============
bbfff83 @peterjc Removing Bio.ECell which was deprecated in Biopython 1.47 (also added…
peterjc authored
173 These were deprecated in Release 1.44, and removed in Release 1.49.
5145a4d @peterjc Bringing this up to date for Biopython 1.44
peterjc authored
174
175 Bio.Kabat
176 =========
bbfff83 @peterjc Removing Bio.ECell which was deprecated in Biopython 1.47 (also added…
peterjc authored
177 This was deprecated in Release 1.43 and removed in Release 1.44.
5145a4d @peterjc Bringing this up to date for Biopython 1.44
peterjc authored
178
34b4f31 Added the functions 'complement' and 'reverse_complement' to Bio.Seq'…
mdehoon authored
179 Bio.SeqUtils
180 ============
181 The functions 'complement' and 'antiparallel' in Bio.SeqUtils have been
76300d6 @peterjc Updates include deprecation of Martel/Mindy
peterjc authored
182 deprecated as of Release 1.31, and removed in Release 1.43.
183 Use the functions 'complement' and 'reverse_complement' in Bio.Seq instead.
34b4f31 Added the functions 'complement' and 'reverse_complement' to Bio.Seq'…
mdehoon authored
184
185 Bio.GFF
186 =======
187 The functions 'forward_complement' and 'antiparallel' in Bio.GFF.easy have been
76300d6 @peterjc Updates include deprecation of Martel/Mindy
peterjc authored
188 deprecated as of Release 1.31, and removed in Release 1.43.
189 Use the functions 'complement' and 'reverse_complement' in Bio.Seq instead.
efd9b60 Added blast to qblast change to DEPRECATED file
chapmanb authored
190
1c02f0c Bio.sequtils and Bio.SeqUtils were duplicated code, and even worse we…
chapmanb authored
191 Bio.sequtils
b0acc00 Added instructions on how to move to Bio.Cluster from Bio.kMeans and
mdehoon authored
192 ============
bbfff83 @peterjc Removing Bio.ECell which was deprecated in Biopython 1.47 (also added…
peterjc authored
193 Deprecated as of Release 1.30, removed in Release 1.42.
1c02f0c Bio.sequtils and Bio.SeqUtils were duplicated code, and even worse we…
chapmanb authored
194 Use Bio.SeqUtils instead.
b0acc00 Added instructions on how to move to Bio.Cluster from Bio.kMeans and
mdehoon authored
195
909bae9 Deprecated Bio.SVM and recommend usage of libsvm.
chapmanb authored
196 Bio.SVM
197 =======
bbfff83 @peterjc Removing Bio.ECell which was deprecated in Biopython 1.47 (also added…
peterjc authored
198 Deprecated as of Release 1.30, removed in Release 1.42.
909bae9 Deprecated Bio.SVM and recommend usage of libsvm.
chapmanb authored
199 The Support Vector Machine code in Biopython has been superceeded by a
200 more robust (and maintained) SVM library, which includes a python
201 interface. We recommend using LIBSVM:
202
203 http://www.csie.ntu.edu.tw/~cjlin/libsvm/
b0acc00 Added instructions on how to move to Bio.Cluster from Bio.kMeans and
mdehoon authored
204
23b046b Removed internal references to RecordFile, which are really not needed.
chapmanb authored
205 Bio.RecordFile
206 ==============
bbfff83 @peterjc Removing Bio.ECell which was deprecated in Biopython 1.47 (also added…
peterjc authored
207 Deprecated as of Release 1.30, removed in Release 1.42.
23b046b Removed internal references to RecordFile, which are really not needed.
chapmanb authored
208 RecordFile wasn't completely implemented and duplicates the work
41a4497 @peterjc Declaring Bio.Transcribe and Bio.Translate as obsolete and likely to …
peterjc authored
209 of most standard parsers.
23b046b Removed internal references to RecordFile, which are really not needed.
chapmanb authored
210
b0acc00 Added instructions on how to move to Bio.Cluster from Bio.kMeans and
mdehoon authored
211 Bio.kMeans and Bio.xkMeans
212 ==========================
bbfff83 @peterjc Removing Bio.ECell which was deprecated in Biopython 1.47 (also added…
peterjc authored
213 Deprecated as of Release 1.30, removed in Release 1.42.
b0acc00 Added instructions on how to move to Bio.Cluster from Bio.kMeans and
mdehoon authored
214
215 The k-Means algorithm is an algorithm for unsupervised clustering of data.
216 Biopython includes an implementation of the k-means clustering algorithm
217 in kMeans.py. Recently, a larger set of clustering algorithms entered
218 Biopython as Bio.Cluster. As the kcluster routine in Bio.Cluster also implements
219 the k-means clustering algorithm, the kMeans.py module has been deprecated.
220 Below you will find a description of how to switch from kMeans.py to
221 Bio.Cluster's kcluster.
222
223 The function kcluster in Bio.Cluster performs k-means or k-medians clustering.
224 The corresponding function in kMeans.py is called cluster. This function takes
225 the following arguments:
226
227 o data
228 o k
229 o distance_fn
230 o init_centroids_fn
231 o calc_centroid_fn
232 o max_iterations
233 o update_fn
234
235 The function kcluster in Bio.Cluster takes the following arguments:
236
237 o data
238 o nclusters
239 o mask
240 o weight
241 o transpose
242 o npass
243 o method
244 o dist
245 o initialid
246
247
248 Arguments for kMeans.py's cluster, and their equivalents in Bio.Cluster
249 -----------------------------------------------------------------------
250
251
252 o data:
253
254 In kMeans.py, data is a list of vectors, each containing the same number of
255 data points. Within the context of clustering genes based on their gene
256 expression values, each vector would correspond to the gene expression data of
257 one particular gene, and the values in the vector would correspond to the
258 measured gene expression value by the different microarrays. The cluster
259 routine in kMeans.py always performs a row-wise clustering by grouping vectors.
260
261 The argument data to Bio.Cluster's kcluster has the same structure as in
262 kMeans.py. However, Bio.Cluster allows row-wise and column-wise clustering by
263 the transpose argument. If transpose==0 (the default value), kcluster performs
264 row-wise clustering, consistent with kMeans.py. If transpose==1, kcluster
265 performs column-wise clustering. The same behavior can be obtained, of course,
266 by transposing the data array before calling kcluster.
267
268
269 o k:
270
271 The desired number of clusters is specified by the input argument k in
272 kMeans.py. The corresponding argument in Bio.Cluster's kcluster is nclusters.
273
274 o distance_fn:
275
276 In kMeans.py, the argument distance_fn represents the distance function to
277 calculate the distances between items and cluster centroids. This argument
278 corresponds to a true Python function. The default value is the Euclidean
279 distance, implemented as distance.euclidean in distance.py. User-defined
280 distance functions can also be used.
281
282 The k-means routine in Bio.Cluster does not allow user-specified distance
283 functions. Instead, it provides the following nine built-in distance functions,
284 depending on the argument dist:
285
286 dist=='e': Euclidean distance
287 dist=='h': Harmonically summed Euclidean distance
288 dist=='b': City-block distance
289 dist=='c': Pearson correlation
290 dist=='a': absolute value of the Pearson correlation
291 dist=='u': uncentered correlation
292 dist=='x': absolute uncentered correlation
293 dist=='s': Spearmans rank correlation
294 dist=='k': Kendalls tau
295
296 User-defined distance functions are possible only by modifying the C code in
297 cluster.c (which may not be as hard as it sounds). The default distance function
298 is the Euclidean distance (distance=='e'). Note that in Bio.Cluster the
299 Euclidean distance is defined as the sum of squared differences, whereas in
300 kMeans.py the square root of this quantity is taken. This does not affect the
301 clustering result.
302
303 o init_centroids_fn:
304
305 This function specifies the initial choice for the cluster centroids. By
306 default, cluster in kMeans.py uses a random initial choice of cluster centroids
307 by randomly choosing k data vectors from the input vectors in the data input
308 argument. Alternatively, the user can specify a user-defined function to choose
309 the initial cluster centroids.
310
311 In Bio.Cluster, the k-means algorithm in kcluster starts from an initial cluster
312 assignment instead of an initial choice of cluster centroids. As far as I know,
313 these two initialization methods are equivalent in practice. Similar to the
314 cluster routine in kMeans.py, Bio.Cluster's kcluster performs a random initial
315 assignment of items to clusters. Alternatively, users can specify a
316 (deterministic) initial clustering via the initialid argument. This argument is
317 None by default. If not None, it should be a 1D array (or list) containing the
318 number (between 0 and nclusters-1) of the cluster to which each item is
319 assigned initially.
320
321 Note that the k-means routine in Bio.Cluster performs automatic repeats of the
322 algorithm, each time starting from a different random initial clustering. See
323 the comment for the npass argument below.
324
325 o calc_centroid_fn:
326
327 This argument specifies how to calculate the cluster centroids, given the data
328 vectors of the items that belong to each cluster. By default, the mean over the
329 vectors is calculated. A user-defined function can also be used.
330
331 Bio.Cluster's kcluster does not allow user-defined functions. Instead, the
332 method to calculate the cluster centroid is determined by the argument method,
333 which can be either 'a' (arithmetic mean) or 'm' (median). The default is to
334 calculate the mean ('a').
335
336 o max_iterations:
337
338 The cluster routine in kMeans.py has an argument max_iterations, which is used
339 to stop the iteration it the routine does not converge after the given number of
340 iterations.
341
342 The kcluster routine in Bio.Cluster does not have such an argument. The failure
343 of a k-means algorithm to converge is due to the occurrence of periodic
344 clustering solutions during the course of the k-means algorithm. The kcluster
345 routine in Bio.Cluster automatically checks for the occurrence of such a
346 periodicity in the solutions. If a periodic behavior is detected, the algorithm
347 is interrupted and the last clustering solution is returned. Accordingly, the
348 kcluster routine is guaranteed to return a clustering solution. Also see the
349 discussion of the npass argument below.
350
351 o update_fn:
352
353 The argument update_fn to cluster in kMeans.py is a hook function that is
354 called at the beginning of every iteration and passed the iteration number,
355 cluster centroids, and current cluster assignments. It is used by xkMeans.py,
356 which provides a visualization of k-means clustering. Currently there is no
357 equivalent in Bio.Cluster.
358
359
360 Other arguments for Bio.Cluster's kcluster.
361 -------------------------------------------
362
363 Three arguments in Bio.Cluster's kcluster do not have a direct equivalent in
364 kMeans.py's cluster.
365
366 o mask:
367
368 Microarray experiments tend to suffer from a large number of missing data. The
369 argument mask to Bio.Cluster's kcluster lets the user specify which data are
370 missing. This argument is an array with the same shape as data, and contains
371 a 1 for each data point that is present, and a 0 for a missing data point:
372
373 mask[i,j]==1: data[i,j] is valid
374 mask[i,j]==0: data[i,j] is a missing data point
375
376 Missing data points are ignored by the clustering algorithm. By default, mask
377 is an array containing 1's everywhere.
378
379 o weight:
380
381 The weight argument is used to put different weights on different data point.
382 For example, when clustering genes based on their gene expression profile, we
383 may want to attach a bigger weight to some microarrays compared to others. By
384 default, the weight argument contains equal weights of 1.0 for all data points.
385 Note that for row-wise clustering, the weight argument is a 1D vector whose
386 length is equal to the number of columns. For column-wise clustering, the length
387 of this argument is equal to the number of rows.
388
389 o npass:
390
391 Typical implementations of the k-means clustering algorithm rely on a random
392 initialization. Unlike Self-Organizing Maps, however, the k-means algorithm has
393 a clearly defined goal, which is to minimize the within-cluster sum of
394 distances. Different k-means clustering solutions (based on different initial
395 clusterings) can therefore be compared to each other directly. In order to
396 increase the chance of finding the optimal k-means clustering solution, the
397 k-means routine in Bio.Cluster automatically repeats the algorithm npass times,
398 each time starting from a different initial random clustering. The best
399 clustering solution, as well as in how many of the npass attempts it was found,
400 is returned to the user. For more information, see the output variable nfound
401 below.
402
403
404 Return values
405 -------------
406
407 The cluster routine in kMeans.py returns two values:
408
409 o centroids
410 o clusters
411
412 The kcluster routine in Bio.Cluster returns four values:
413
414 o clusterid
415 o centroids
416 o error
417 o nfound
418
419
420 o centroids:
421
422 The centroids return value contains the centroids of the k clusters that were
423 found, and corresponds to the centroids return value from Bio.Cluster's
424 kcluster routine.
425
426 o clusters:
427
428 The clusters return value contains the number of the cluster to which each
429 vector was assigned. The corresponding return value in Bio.Cluster's kcluster
430 is clusterid.
431
432 o error:
433
434 The error return value from Bio.Cluster's kcluster is the within-cluster sum of
435 distances for the optimal clustering solution that was found. This value can be
436 used to compare different clustering solutions to each other.
437
438 o nfound:
439
440 The nfound return value from Bio.Cluster's kcluster shows in how many of the
441 npass runs the optimal clustering solution was found. Accordingly, nfound is at
442 least 1 and at most equal to npass. A large value for nfound is an indication
443 that the clustering solution that was found is optimal. On the other hand, if
444 nfound is equal to 1, it is very well possible that a better clustering solution
445 exists than the one found by kcluster.
Something went wrong with that request. Please try again.