Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Newer
Older
100644 405 lines (310 sloc) 14.49 kb
1c02f0c Bio.sequtils and Bio.SeqUtils were duplicated code, and even worse were ...
chapmanb authored
1 This file provides documentation for modules in Biopython that have been moved
2 or deprecated in favor of other modules. This provides some quick and easy
3 to find documentation about how to update your code to work again.
4
2669c4d - Updates to move from Numeric python to NumPy. Python modules have back...
chapmanb authored
5 Numeric support
6 ===============
76300d6 Peter Cock Updates include deprecation of Martel/Mindy
peterjc authored
7 Following the release of 1.48, Numeric support in Biopython is discontinued.
8 Limited support is still available for python modules via back compatible
9 imports, but C modules will not work. Please move to NumPy.
2669c4d - Updates to move from Numeric python to NumPy. Python modules have back...
chapmanb authored
10
fb76593 Peter Cock Labelling Martel and Bio.Mindy as obsolete, updating the news about this...
peterjc authored
11 Martel
12 ======
76300d6 Peter Cock Updates include deprecation of Martel/Mindy
peterjc authored
13 Declared obsolete in Release 1.48, deprecated in Release 1.49
fb76593 Peter Cock Labelling Martel and Bio.Mindy as obsolete, updating the news about this...
peterjc authored
14
15 Bio.Mindy
16 =========
76300d6 Peter Cock Updates include deprecation of Martel/Mindy
peterjc authored
17 Declared obsolete in Release 1.48, deprecated in Release 1.49
18
43419aa Peter Cock Deprecating Martel/Mindy related modules Bio.Std, Bio.StdHandler and Bio...
peterjc authored
19 Bio.builders, Bio.Std, Bio.StdHandler, Bio.Decode
20 =================================================
99851e2 Peter Cock Fixed a typo
peterjc authored
21 Part of the Martel/Mindy infrastructure, these was deprecated in Release 1.49
fb76593 Peter Cock Labelling Martel and Bio.Mindy as obsolete, updating the news about this...
peterjc authored
22
fd2e06b Peter Cock Adding minimal module docstrings for the deprecated Bio.Writers/Bio.writ...
peterjc authored
23 Bio.Writer and Bio.writers
24 ==========================
25 Deprecated in Release 1.48
26
6ce5052 Peter Cock Bio.Emboss.Primer was deprecated in Biopython 1.48
peterjc authored
27 Bio.Emboss.Primer
28 =================
29 Deprecated in Release 1.48, this parser was replaced by Bio.Emboss.Primer3 and
30 Bio.Emboss.PrimerSearch instead.
31
fb76593 Peter Cock Labelling Martel and Bio.Mindy as obsolete, updating the news about this...
peterjc authored
32 Bio.MetaTool
33 ============
34 Deprecated in Release 1.48, this was a parser from the output of MetaTool 3.5
35 which is now obsolete.
36
81015eb Peter Cock Declaring Bio.PubMed and the online parts of Bio.GenBank as OBSOLETE, an...
peterjc authored
37 Bio.GenBank
38 ===========
39 The online functionality (search_for, download_many, and NCBIDictionary) was
40 declared obsolete in Release 1.48, with the intention of an official deprecation
41 in the following release. Please use Bio.Entrez instead.
42
43 Bio.PubMed
44 ==========
45 Declared obsolete in Release 1.48, with the intention of an official deprecation
46 in the following release. Please use Bio.Entrez instead.
47
01e6e76 Peter Cock Bio.EUtils deprecated in favour of Bio.Entrez
peterjc authored
48 Bio.EUtils
49 ==========
50 Deprecated in favor of Bio.Entrez in Release 1.48
51
fa7ab2d Peter Cock Updating for recent deprecations
peterjc authored
52 Bio.Blast.NCBIWWW
53 =================
54 The HTML BLAST parser was deprecated as of Release 1.48
b927439 Peter Cock Bio.WWW deprecation, and updating old entries to say when they were remo...
peterjc authored
55 The deprecated functions blast and blasturl were removed in Release 1.44
fa7ab2d Peter Cock Updating for recent deprecations
peterjc authored
56
57 Bio.Saf
58 =======
59 Deprecated as of Release 1.48, as it appears to have no users, and relies
60 on Martel which doesn't work properly with mxTextTools 3.0
61
ad46521 Peter Cock Deprecating Bio.NBRF in favour of the 'pir' format in Bio.SeqIO
peterjc authored
62 Bio.NBRF
63 ========
64 Deprecated as of Release 1.48 in favor of the "pir" format in Bio.SeqIO
65
5be4221 Peter Cock Deprecating Bio.IntelliGenetics in favour of the ig format in Bio.SeqIO
peterjc authored
66 Bio.IntelliGenetics
67 ===================
fa7ab2d Peter Cock Updating for recent deprecations
peterjc authored
68 Deprecated as of Release 1.48 in favor of the "ig" format in Bio.SeqIO
d01c450 Getting ready for release 1.46.
mdehoon authored
69
5e507c9 Updating for release 1.47.
mdehoon authored
70 Bio.ECell
71 =========
72 Deprecated as of Release 1.47, as it appears to have no users, and the code
73 does not seem relevant for ECell 3.
74
d01c450 Getting ready for release 1.46.
mdehoon authored
75 Bio.Rebase
76 ==========
77 Deprecated as of Release 1.46.
78
79 Bio.Gobase
80 ==========
81 Deprecated as of Release 1.46.
82
83 Bio.CDD
84 =======
85 Deprecated as of Release 1.46.
86
21059b1 Peter Cock Bio.biblio was deprecated for Biopython 1.45, but I didn't remember to u...
peterjc authored
87 Bio.biblio
88 ==========
b927439 Peter Cock Bio.WWW deprecation, and updating old entries to say when they were remo...
peterjc authored
89 Deprecated as of Release 1.45, removed in Release 1.48
21059b1 Peter Cock Bio.biblio was deprecated for Biopython 1.45, but I didn't remember to u...
peterjc authored
90
4556db2 Peter Cock Bringing these up to date with changes since Biopython 1.44
peterjc authored
91 Bio.WWW
92 =======
b927439 Peter Cock Bio.WWW deprecation, and updating old entries to say when they were remo...
peterjc authored
93 The modules under Bio.WWW were deprecated in Release 1.45, and removed in 1.48.
94 The remaining stub Bio.WWW was deprecated in Release 1.48.
95
4556db2 Peter Cock Bringing these up to date with changes since Biopython 1.44
peterjc authored
96 The functionality in Bio.WWW.SCOP, Bio.WWW.InterPro and Bio.WWW.ExPASy
97 is now available from Bio.SCOP, Bio.InterPro and Bio.ExPASy instead.
98
5145a4d Peter Cock Bringing this up to date for Biopython 1.44
peterjc authored
99 Bio.SeqIO
100 =========
101 The old Bio.SeqIO.FASTA and Bio.SeqIO.generic were deprecated in favour of
b927439 Peter Cock Bio.WWW deprecation, and updating old entries to say when they were remo...
peterjc authored
102 the new Bio.SeqIO module as of Release 1.44, removed in Release 1.47
5145a4d Peter Cock Bringing this up to date for Biopython 1.44
peterjc authored
103
fe10992 Peter Cock Mentioning a few old modules deprecated in 1.44 and removed in 1.46
peterjc authored
104 Bio.Medline.NLMMedlineXML
105 =========================
106 Deprecated in Release 1.44, removed in 1.46
107
108 Bio.MultiProc
109 =============
110 Deprecated in Release 1.44, removed in 1.46
111
112 Bio.MarkupEditor
113 ================
114 Deprecated in Release 1.44, removed in 1.46
115
5145a4d Peter Cock Bringing this up to date for Biopython 1.44
peterjc authored
116 Bio.lcc
117 =======
b927439 Peter Cock Bio.WWW deprecation, and updating old entries to say when they were remo...
peterjc authored
118 Deprecated in favor of Bio.SeqUtils.lcc in Release 1.44, removed in 1.46
5145a4d Peter Cock Bringing this up to date for Biopython 1.44
peterjc authored
119
120 Bio.crc
121 =======
b927439 Peter Cock Bio.WWW deprecation, and updating old entries to say when they were remo...
peterjc authored
122 Deprecated in favor of Bio.SeqUtils.CheckSum in Release 1.44, removed in 1.46
5145a4d Peter Cock Bringing this up to date for Biopython 1.44
peterjc authored
123
124 Bio.FormatIO
125 ============
126 This was removed in Release 1.44
127
fa7ab2d Peter Cock Updating for recent deprecations
peterjc authored
128 Bio.expressions (and therefore Bio.config, Bio.dbdefs, Bio.formatdefs, Bio.dbdefs)
5145a4d Peter Cock Bringing this up to date for Biopython 1.44
peterjc authored
129 ===============
76300d6 Peter Cock Updates include deprecation of Martel/Mindy
peterjc authored
130 These were deprecated in Release 1.44, and removed in Release 1.49
5145a4d Peter Cock Bringing this up to date for Biopython 1.44
peterjc authored
131
132 Bio.Kabat
133 =========
134 This was deprecated in Release 1.43 and removed in Release 1.44
135
34b4f31 Added the functions 'complement' and 'reverse_complement' to Bio.Seq's S...
mdehoon authored
136 Bio.SeqUtils
137 ============
138 The functions 'complement' and 'antiparallel' in Bio.SeqUtils have been
76300d6 Peter Cock Updates include deprecation of Martel/Mindy
peterjc authored
139 deprecated as of Release 1.31, and removed in Release 1.43.
140 Use the functions 'complement' and 'reverse_complement' in Bio.Seq instead.
34b4f31 Added the functions 'complement' and 'reverse_complement' to Bio.Seq's S...
mdehoon authored
141
142 Bio.GFF
143 =======
144 The functions 'forward_complement' and 'antiparallel' in Bio.GFF.easy have been
76300d6 Peter Cock Updates include deprecation of Martel/Mindy
peterjc authored
145 deprecated as of Release 1.31, and removed in Release 1.43.
146 Use the functions 'complement' and 'reverse_complement' in Bio.Seq instead.
efd9b60 Added blast to qblast change to DEPRECATED file
chapmanb authored
147
1c02f0c Bio.sequtils and Bio.SeqUtils were duplicated code, and even worse were ...
chapmanb authored
148 Bio.sequtils
b0acc00 Added instructions on how to move to Bio.Cluster from Bio.kMeans and
mdehoon authored
149 ============
b927439 Peter Cock Bio.WWW deprecation, and updating old entries to say when they were remo...
peterjc authored
150 Deprecated as of Release 1.30, removed in Release 1.42
1c02f0c Bio.sequtils and Bio.SeqUtils were duplicated code, and even worse were ...
chapmanb authored
151 Use Bio.SeqUtils instead.
b0acc00 Added instructions on how to move to Bio.Cluster from Bio.kMeans and
mdehoon authored
152
909bae9 Deprecated Bio.SVM and recommend usage of libsvm.
chapmanb authored
153 Bio.SVM
154 =======
b927439 Peter Cock Bio.WWW deprecation, and updating old entries to say when they were remo...
peterjc authored
155 Deprecated as of Release 1.30, removed in Release 1.42
909bae9 Deprecated Bio.SVM and recommend usage of libsvm.
chapmanb authored
156 The Support Vector Machine code in Biopython has been superceeded by a
157 more robust (and maintained) SVM library, which includes a python
158 interface. We recommend using LIBSVM:
159
160 http://www.csie.ntu.edu.tw/~cjlin/libsvm/
b0acc00 Added instructions on how to move to Bio.Cluster from Bio.kMeans and
mdehoon authored
161
23b046b Removed internal references to RecordFile, which are really not needed.
chapmanb authored
162 Bio.RecordFile
163 ==============
b927439 Peter Cock Bio.WWW deprecation, and updating old entries to say when they were remo...
peterjc authored
164 Deprecated as of Release 1.30, removed in Release 1.42
23b046b Removed internal references to RecordFile, which are really not needed.
chapmanb authored
165 RecordFile wasn't completely implemented and duplicates the work
166 of most standard parsers. We recommend using a specific iterator
167 (Bio.Fasta.Iterator for example) without a parser to get back
168 text records.
169
b0acc00 Added instructions on how to move to Bio.Cluster from Bio.kMeans and
mdehoon authored
170 Bio.kMeans and Bio.xkMeans
171 ==========================
b927439 Peter Cock Bio.WWW deprecation, and updating old entries to say when they were remo...
peterjc authored
172 Deprecated as of Release 1.30, removed in Release 1.42
b0acc00 Added instructions on how to move to Bio.Cluster from Bio.kMeans and
mdehoon authored
173
174 The k-Means algorithm is an algorithm for unsupervised clustering of data.
175 Biopython includes an implementation of the k-means clustering algorithm
176 in kMeans.py. Recently, a larger set of clustering algorithms entered
177 Biopython as Bio.Cluster. As the kcluster routine in Bio.Cluster also implements
178 the k-means clustering algorithm, the kMeans.py module has been deprecated.
179 Below you will find a description of how to switch from kMeans.py to
180 Bio.Cluster's kcluster.
181
182 The function kcluster in Bio.Cluster performs k-means or k-medians clustering.
183 The corresponding function in kMeans.py is called cluster. This function takes
184 the following arguments:
185
186 o data
187 o k
188 o distance_fn
189 o init_centroids_fn
190 o calc_centroid_fn
191 o max_iterations
192 o update_fn
193
194 The function kcluster in Bio.Cluster takes the following arguments:
195
196 o data
197 o nclusters
198 o mask
199 o weight
200 o transpose
201 o npass
202 o method
203 o dist
204 o initialid
205
206
207 Arguments for kMeans.py's cluster, and their equivalents in Bio.Cluster
208 -----------------------------------------------------------------------
209
210
211 o data:
212
213 In kMeans.py, data is a list of vectors, each containing the same number of
214 data points. Within the context of clustering genes based on their gene
215 expression values, each vector would correspond to the gene expression data of
216 one particular gene, and the values in the vector would correspond to the
217 measured gene expression value by the different microarrays. The cluster
218 routine in kMeans.py always performs a row-wise clustering by grouping vectors.
219
220 The argument data to Bio.Cluster's kcluster has the same structure as in
221 kMeans.py. However, Bio.Cluster allows row-wise and column-wise clustering by
222 the transpose argument. If transpose==0 (the default value), kcluster performs
223 row-wise clustering, consistent with kMeans.py. If transpose==1, kcluster
224 performs column-wise clustering. The same behavior can be obtained, of course,
225 by transposing the data array before calling kcluster.
226
227
228 o k:
229
230 The desired number of clusters is specified by the input argument k in
231 kMeans.py. The corresponding argument in Bio.Cluster's kcluster is nclusters.
232
233 o distance_fn:
234
235 In kMeans.py, the argument distance_fn represents the distance function to
236 calculate the distances between items and cluster centroids. This argument
237 corresponds to a true Python function. The default value is the Euclidean
238 distance, implemented as distance.euclidean in distance.py. User-defined
239 distance functions can also be used.
240
241 The k-means routine in Bio.Cluster does not allow user-specified distance
242 functions. Instead, it provides the following nine built-in distance functions,
243 depending on the argument dist:
244
245 dist=='e': Euclidean distance
246 dist=='h': Harmonically summed Euclidean distance
247 dist=='b': City-block distance
248 dist=='c': Pearson correlation
249 dist=='a': absolute value of the Pearson correlation
250 dist=='u': uncentered correlation
251 dist=='x': absolute uncentered correlation
252 dist=='s': Spearmans rank correlation
253 dist=='k': Kendalls tau
254
255 User-defined distance functions are possible only by modifying the C code in
256 cluster.c (which may not be as hard as it sounds). The default distance function
257 is the Euclidean distance (distance=='e'). Note that in Bio.Cluster the
258 Euclidean distance is defined as the sum of squared differences, whereas in
259 kMeans.py the square root of this quantity is taken. This does not affect the
260 clustering result.
261
262 o init_centroids_fn:
263
264 This function specifies the initial choice for the cluster centroids. By
265 default, cluster in kMeans.py uses a random initial choice of cluster centroids
266 by randomly choosing k data vectors from the input vectors in the data input
267 argument. Alternatively, the user can specify a user-defined function to choose
268 the initial cluster centroids.
269
270 In Bio.Cluster, the k-means algorithm in kcluster starts from an initial cluster
271 assignment instead of an initial choice of cluster centroids. As far as I know,
272 these two initialization methods are equivalent in practice. Similar to the
273 cluster routine in kMeans.py, Bio.Cluster's kcluster performs a random initial
274 assignment of items to clusters. Alternatively, users can specify a
275 (deterministic) initial clustering via the initialid argument. This argument is
276 None by default. If not None, it should be a 1D array (or list) containing the
277 number (between 0 and nclusters-1) of the cluster to which each item is
278 assigned initially.
279
280 Note that the k-means routine in Bio.Cluster performs automatic repeats of the
281 algorithm, each time starting from a different random initial clustering. See
282 the comment for the npass argument below.
283
284 o calc_centroid_fn:
285
286 This argument specifies how to calculate the cluster centroids, given the data
287 vectors of the items that belong to each cluster. By default, the mean over the
288 vectors is calculated. A user-defined function can also be used.
289
290 Bio.Cluster's kcluster does not allow user-defined functions. Instead, the
291 method to calculate the cluster centroid is determined by the argument method,
292 which can be either 'a' (arithmetic mean) or 'm' (median). The default is to
293 calculate the mean ('a').
294
295 o max_iterations:
296
297 The cluster routine in kMeans.py has an argument max_iterations, which is used
298 to stop the iteration it the routine does not converge after the given number of
299 iterations.
300
301 The kcluster routine in Bio.Cluster does not have such an argument. The failure
302 of a k-means algorithm to converge is due to the occurrence of periodic
303 clustering solutions during the course of the k-means algorithm. The kcluster
304 routine in Bio.Cluster automatically checks for the occurrence of such a
305 periodicity in the solutions. If a periodic behavior is detected, the algorithm
306 is interrupted and the last clustering solution is returned. Accordingly, the
307 kcluster routine is guaranteed to return a clustering solution. Also see the
308 discussion of the npass argument below.
309
310 o update_fn:
311
312 The argument update_fn to cluster in kMeans.py is a hook function that is
313 called at the beginning of every iteration and passed the iteration number,
314 cluster centroids, and current cluster assignments. It is used by xkMeans.py,
315 which provides a visualization of k-means clustering. Currently there is no
316 equivalent in Bio.Cluster.
317
318
319 Other arguments for Bio.Cluster's kcluster.
320 -------------------------------------------
321
322 Three arguments in Bio.Cluster's kcluster do not have a direct equivalent in
323 kMeans.py's cluster.
324
325 o mask:
326
327 Microarray experiments tend to suffer from a large number of missing data. The
328 argument mask to Bio.Cluster's kcluster lets the user specify which data are
329 missing. This argument is an array with the same shape as data, and contains
330 a 1 for each data point that is present, and a 0 for a missing data point:
331
332 mask[i,j]==1: data[i,j] is valid
333 mask[i,j]==0: data[i,j] is a missing data point
334
335 Missing data points are ignored by the clustering algorithm. By default, mask
336 is an array containing 1's everywhere.
337
338 o weight:
339
340 The weight argument is used to put different weights on different data point.
341 For example, when clustering genes based on their gene expression profile, we
342 may want to attach a bigger weight to some microarrays compared to others. By
343 default, the weight argument contains equal weights of 1.0 for all data points.
344 Note that for row-wise clustering, the weight argument is a 1D vector whose
345 length is equal to the number of columns. For column-wise clustering, the length
346 of this argument is equal to the number of rows.
347
348 o npass:
349
350 Typical implementations of the k-means clustering algorithm rely on a random
351 initialization. Unlike Self-Organizing Maps, however, the k-means algorithm has
352 a clearly defined goal, which is to minimize the within-cluster sum of
353 distances. Different k-means clustering solutions (based on different initial
354 clusterings) can therefore be compared to each other directly. In order to
355 increase the chance of finding the optimal k-means clustering solution, the
356 k-means routine in Bio.Cluster automatically repeats the algorithm npass times,
357 each time starting from a different initial random clustering. The best
358 clustering solution, as well as in how many of the npass attempts it was found,
359 is returned to the user. For more information, see the output variable nfound
360 below.
361
362
363 Return values
364 -------------
365
366 The cluster routine in kMeans.py returns two values:
367
368 o centroids
369 o clusters
370
371 The kcluster routine in Bio.Cluster returns four values:
372
373 o clusterid
374 o centroids
375 o error
376 o nfound
377
378
379 o centroids:
380
381 The centroids return value contains the centroids of the k clusters that were
382 found, and corresponds to the centroids return value from Bio.Cluster's
383 kcluster routine.
384
385 o clusters:
386
387 The clusters return value contains the number of the cluster to which each
388 vector was assigned. The corresponding return value in Bio.Cluster's kcluster
389 is clusterid.
390
391 o error:
392
393 The error return value from Bio.Cluster's kcluster is the within-cluster sum of
394 distances for the optimal clustering solution that was found. This value can be
395 used to compare different clustering solutions to each other.
396
397 o nfound:
398
399 The nfound return value from Bio.Cluster's kcluster shows in how many of the
400 npass runs the optimal clustering solution was found. Accordingly, nfound is at
401 least 1 and at most equal to npass. A large value for nfound is an indication
402 that the clustering solution that was found is optimal. On the other hand, if
403 nfound is equal to 1, it is very well possible that a better clustering solution
404 exists than the one found by kcluster.
Something went wrong with that request. Please try again.