Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Newer
Older
100644 388 lines (297 sloc) 14.017 kb
1c02f0c Bio.sequtils and Bio.SeqUtils were duplicated code, and even worse were ...
chapmanb authored
1 This file provides documentation for modules in Biopython that have been moved
2 or deprecated in favor of other modules. This provides some quick and easy
3 to find documentation about how to update your code to work again.
4
2669c4d - Updates to move from Numeric python to NumPy. Python modules have back...
chapmanb authored
5 Numeric support
6 ===============
76300d6 @peterjc Updates include deprecation of Martel/Mindy
peterjc authored
7 Following the release of 1.48, Numeric support in Biopython is discontinued.
8 Limited support is still available for python modules via back compatible
9 imports, but C modules will not work. Please move to NumPy.
2669c4d - Updates to move from Numeric python to NumPy. Python modules have back...
chapmanb authored
10
fb76593 @peterjc Labelling Martel and Bio.Mindy as obsolete, updating the news about this...
peterjc authored
11 Martel
12 ======
76300d6 @peterjc Updates include deprecation of Martel/Mindy
peterjc authored
13 Declared obsolete in Release 1.48, deprecated in Release 1.49
fb76593 @peterjc Labelling Martel and Bio.Mindy as obsolete, updating the news about this...
peterjc authored
14
15 Bio.Mindy
16 =========
76300d6 @peterjc Updates include deprecation of Martel/Mindy
peterjc authored
17 Declared obsolete in Release 1.48, deprecated in Release 1.49
18
19 Bio.builders
20 ============
21 Part of the Martle/Mindy infrastructure, this was deprecated in Release 1.49
fb76593 @peterjc Labelling Martel and Bio.Mindy as obsolete, updating the news about this...
peterjc authored
22
fd2e06b @peterjc Adding minimal module docstrings for the deprecated Bio.Writers/Bio.writ...
peterjc authored
23 Bio.Writer and Bio.writers
24 ==========================
25 Deprecated in Release 1.48
26
fb76593 @peterjc Labelling Martel and Bio.Mindy as obsolete, updating the news about this...
peterjc authored
27 Bio.MetaTool
28 ============
29 Deprecated in Release 1.48, this was a parser from the output of MetaTool 3.5
30 which is now obsolete.
31
81015eb @peterjc Declaring Bio.PubMed and the online parts of Bio.GenBank as OBSOLETE, an...
peterjc authored
32 Bio.GenBank
33 ===========
34 The online functionality (search_for, download_many, and NCBIDictionary) was
35 declared obsolete in Release 1.48, with the intention of an official deprecation
36 in the following release. Please use Bio.Entrez instead.
37
38 Bio.PubMed
39 ==========
40 Declared obsolete in Release 1.48, with the intention of an official deprecation
41 in the following release. Please use Bio.Entrez instead.
42
01e6e76 @peterjc Bio.EUtils deprecated in favour of Bio.Entrez
peterjc authored
43 Bio.EUtils
44 ==========
45 Deprecated in favor of Bio.Entrez in Release 1.48
46
fa7ab2d @peterjc Updating for recent deprecations
peterjc authored
47 Bio.Blast.NCBIWWW
48 =================
49 The HTML BLAST parser was deprecated as of Release 1.48
b927439 @peterjc Bio.WWW deprecation, and updating old entries to say when they were remo...
peterjc authored
50 The deprecated functions blast and blasturl were removed in Release 1.44
fa7ab2d @peterjc Updating for recent deprecations
peterjc authored
51
52 Bio.Saf
53 =======
54 Deprecated as of Release 1.48, as it appears to have no users, and relies
55 on Martel which doesn't work properly with mxTextTools 3.0
56
ad46521 @peterjc Deprecating Bio.NBRF in favour of the 'pir' format in Bio.SeqIO
peterjc authored
57 Bio.NBRF
58 ========
59 Deprecated as of Release 1.48 in favor of the "pir" format in Bio.SeqIO
60
5be4221 @peterjc Deprecating Bio.IntelliGenetics in favour of the ig format in Bio.SeqIO
peterjc authored
61 Bio.IntelliGenetics
62 ===================
fa7ab2d @peterjc Updating for recent deprecations
peterjc authored
63 Deprecated as of Release 1.48 in favor of the "ig" format in Bio.SeqIO
d01c450 Getting ready for release 1.46.
mdehoon authored
64
5e507c9 Updating for release 1.47.
mdehoon authored
65 Bio.ECell
66 =========
67 Deprecated as of Release 1.47, as it appears to have no users, and the code
68 does not seem relevant for ECell 3.
69
d01c450 Getting ready for release 1.46.
mdehoon authored
70 Bio.Rebase
71 ==========
72 Deprecated as of Release 1.46.
73
74 Bio.Gobase
75 ==========
76 Deprecated as of Release 1.46.
77
78 Bio.CDD
79 =======
80 Deprecated as of Release 1.46.
81
21059b1 @peterjc Bio.biblio was deprecated for Biopython 1.45, but I didn't remember to u...
peterjc authored
82 Bio.biblio
83 ==========
b927439 @peterjc Bio.WWW deprecation, and updating old entries to say when they were remo...
peterjc authored
84 Deprecated as of Release 1.45, removed in Release 1.48
21059b1 @peterjc Bio.biblio was deprecated for Biopython 1.45, but I didn't remember to u...
peterjc authored
85
4556db2 @peterjc Bringing these up to date with changes since Biopython 1.44
peterjc authored
86 Bio.WWW
87 =======
b927439 @peterjc Bio.WWW deprecation, and updating old entries to say when they were remo...
peterjc authored
88 The modules under Bio.WWW were deprecated in Release 1.45, and removed in 1.48.
89 The remaining stub Bio.WWW was deprecated in Release 1.48.
90
4556db2 @peterjc Bringing these up to date with changes since Biopython 1.44
peterjc authored
91 The functionality in Bio.WWW.SCOP, Bio.WWW.InterPro and Bio.WWW.ExPASy
92 is now available from Bio.SCOP, Bio.InterPro and Bio.ExPASy instead.
93
5145a4d @peterjc Bringing this up to date for Biopython 1.44
peterjc authored
94 Bio.SeqIO
95 =========
96 The old Bio.SeqIO.FASTA and Bio.SeqIO.generic were deprecated in favour of
b927439 @peterjc Bio.WWW deprecation, and updating old entries to say when they were remo...
peterjc authored
97 the new Bio.SeqIO module as of Release 1.44, removed in Release 1.47
5145a4d @peterjc Bringing this up to date for Biopython 1.44
peterjc authored
98
99 Bio.lcc
100 =======
b927439 @peterjc Bio.WWW deprecation, and updating old entries to say when they were remo...
peterjc authored
101 Deprecated in favor of Bio.SeqUtils.lcc in Release 1.44, removed in 1.46
5145a4d @peterjc Bringing this up to date for Biopython 1.44
peterjc authored
102
103 Bio.crc
104 =======
b927439 @peterjc Bio.WWW deprecation, and updating old entries to say when they were remo...
peterjc authored
105 Deprecated in favor of Bio.SeqUtils.CheckSum in Release 1.44, removed in 1.46
5145a4d @peterjc Bringing this up to date for Biopython 1.44
peterjc authored
106
107 Bio.FormatIO
108 ============
109 This was removed in Release 1.44
110
fa7ab2d @peterjc Updating for recent deprecations
peterjc authored
111 Bio.expressions (and therefore Bio.config, Bio.dbdefs, Bio.formatdefs, Bio.dbdefs)
5145a4d @peterjc Bringing this up to date for Biopython 1.44
peterjc authored
112 ===============
76300d6 @peterjc Updates include deprecation of Martel/Mindy
peterjc authored
113 These were deprecated in Release 1.44, and removed in Release 1.49
5145a4d @peterjc Bringing this up to date for Biopython 1.44
peterjc authored
114
115 Bio.Kabat
116 =========
117 This was deprecated in Release 1.43 and removed in Release 1.44
118
34b4f31 Added the functions 'complement' and 'reverse_complement' to Bio.Seq's S...
mdehoon authored
119 Bio.SeqUtils
120 ============
121 The functions 'complement' and 'antiparallel' in Bio.SeqUtils have been
76300d6 @peterjc Updates include deprecation of Martel/Mindy
peterjc authored
122 deprecated as of Release 1.31, and removed in Release 1.43.
123 Use the functions 'complement' and 'reverse_complement' in Bio.Seq instead.
34b4f31 Added the functions 'complement' and 'reverse_complement' to Bio.Seq's S...
mdehoon authored
124
125 Bio.GFF
126 =======
127 The functions 'forward_complement' and 'antiparallel' in Bio.GFF.easy have been
76300d6 @peterjc Updates include deprecation of Martel/Mindy
peterjc authored
128 deprecated as of Release 1.31, and removed in Release 1.43.
129 Use the functions 'complement' and 'reverse_complement' in Bio.Seq instead.
efd9b60 Added blast to qblast change to DEPRECATED file
chapmanb authored
130
1c02f0c Bio.sequtils and Bio.SeqUtils were duplicated code, and even worse were ...
chapmanb authored
131 Bio.sequtils
b0acc00 Added instructions on how to move to Bio.Cluster from Bio.kMeans and
mdehoon authored
132 ============
b927439 @peterjc Bio.WWW deprecation, and updating old entries to say when they were remo...
peterjc authored
133 Deprecated as of Release 1.30, removed in Release 1.42
1c02f0c Bio.sequtils and Bio.SeqUtils were duplicated code, and even worse were ...
chapmanb authored
134 Use Bio.SeqUtils instead.
b0acc00 Added instructions on how to move to Bio.Cluster from Bio.kMeans and
mdehoon authored
135
909bae9 Deprecated Bio.SVM and recommend usage of libsvm.
chapmanb authored
136 Bio.SVM
137 =======
b927439 @peterjc Bio.WWW deprecation, and updating old entries to say when they were remo...
peterjc authored
138 Deprecated as of Release 1.30, removed in Release 1.42
909bae9 Deprecated Bio.SVM and recommend usage of libsvm.
chapmanb authored
139 The Support Vector Machine code in Biopython has been superceeded by a
140 more robust (and maintained) SVM library, which includes a python
141 interface. We recommend using LIBSVM:
142
143 http://www.csie.ntu.edu.tw/~cjlin/libsvm/
b0acc00 Added instructions on how to move to Bio.Cluster from Bio.kMeans and
mdehoon authored
144
23b046b Removed internal references to RecordFile, which are really not needed.
chapmanb authored
145 Bio.RecordFile
146 ==============
b927439 @peterjc Bio.WWW deprecation, and updating old entries to say when they were remo...
peterjc authored
147 Deprecated as of Release 1.30, removed in Release 1.42
23b046b Removed internal references to RecordFile, which are really not needed.
chapmanb authored
148 RecordFile wasn't completely implemented and duplicates the work
149 of most standard parsers. We recommend using a specific iterator
150 (Bio.Fasta.Iterator for example) without a parser to get back
151 text records.
152
b0acc00 Added instructions on how to move to Bio.Cluster from Bio.kMeans and
mdehoon authored
153 Bio.kMeans and Bio.xkMeans
154 ==========================
b927439 @peterjc Bio.WWW deprecation, and updating old entries to say when they were remo...
peterjc authored
155 Deprecated as of Release 1.30, removed in Release 1.42
b0acc00 Added instructions on how to move to Bio.Cluster from Bio.kMeans and
mdehoon authored
156
157 The k-Means algorithm is an algorithm for unsupervised clustering of data.
158 Biopython includes an implementation of the k-means clustering algorithm
159 in kMeans.py. Recently, a larger set of clustering algorithms entered
160 Biopython as Bio.Cluster. As the kcluster routine in Bio.Cluster also implements
161 the k-means clustering algorithm, the kMeans.py module has been deprecated.
162 Below you will find a description of how to switch from kMeans.py to
163 Bio.Cluster's kcluster.
164
165 The function kcluster in Bio.Cluster performs k-means or k-medians clustering.
166 The corresponding function in kMeans.py is called cluster. This function takes
167 the following arguments:
168
169 o data
170 o k
171 o distance_fn
172 o init_centroids_fn
173 o calc_centroid_fn
174 o max_iterations
175 o update_fn
176
177 The function kcluster in Bio.Cluster takes the following arguments:
178
179 o data
180 o nclusters
181 o mask
182 o weight
183 o transpose
184 o npass
185 o method
186 o dist
187 o initialid
188
189
190 Arguments for kMeans.py's cluster, and their equivalents in Bio.Cluster
191 -----------------------------------------------------------------------
192
193
194 o data:
195
196 In kMeans.py, data is a list of vectors, each containing the same number of
197 data points. Within the context of clustering genes based on their gene
198 expression values, each vector would correspond to the gene expression data of
199 one particular gene, and the values in the vector would correspond to the
200 measured gene expression value by the different microarrays. The cluster
201 routine in kMeans.py always performs a row-wise clustering by grouping vectors.
202
203 The argument data to Bio.Cluster's kcluster has the same structure as in
204 kMeans.py. However, Bio.Cluster allows row-wise and column-wise clustering by
205 the transpose argument. If transpose==0 (the default value), kcluster performs
206 row-wise clustering, consistent with kMeans.py. If transpose==1, kcluster
207 performs column-wise clustering. The same behavior can be obtained, of course,
208 by transposing the data array before calling kcluster.
209
210
211 o k:
212
213 The desired number of clusters is specified by the input argument k in
214 kMeans.py. The corresponding argument in Bio.Cluster's kcluster is nclusters.
215
216 o distance_fn:
217
218 In kMeans.py, the argument distance_fn represents the distance function to
219 calculate the distances between items and cluster centroids. This argument
220 corresponds to a true Python function. The default value is the Euclidean
221 distance, implemented as distance.euclidean in distance.py. User-defined
222 distance functions can also be used.
223
224 The k-means routine in Bio.Cluster does not allow user-specified distance
225 functions. Instead, it provides the following nine built-in distance functions,
226 depending on the argument dist:
227
228 dist=='e': Euclidean distance
229 dist=='h': Harmonically summed Euclidean distance
230 dist=='b': City-block distance
231 dist=='c': Pearson correlation
232 dist=='a': absolute value of the Pearson correlation
233 dist=='u': uncentered correlation
234 dist=='x': absolute uncentered correlation
235 dist=='s': Spearmans rank correlation
236 dist=='k': Kendalls tau
237
238 User-defined distance functions are possible only by modifying the C code in
239 cluster.c (which may not be as hard as it sounds). The default distance function
240 is the Euclidean distance (distance=='e'). Note that in Bio.Cluster the
241 Euclidean distance is defined as the sum of squared differences, whereas in
242 kMeans.py the square root of this quantity is taken. This does not affect the
243 clustering result.
244
245 o init_centroids_fn:
246
247 This function specifies the initial choice for the cluster centroids. By
248 default, cluster in kMeans.py uses a random initial choice of cluster centroids
249 by randomly choosing k data vectors from the input vectors in the data input
250 argument. Alternatively, the user can specify a user-defined function to choose
251 the initial cluster centroids.
252
253 In Bio.Cluster, the k-means algorithm in kcluster starts from an initial cluster
254 assignment instead of an initial choice of cluster centroids. As far as I know,
255 these two initialization methods are equivalent in practice. Similar to the
256 cluster routine in kMeans.py, Bio.Cluster's kcluster performs a random initial
257 assignment of items to clusters. Alternatively, users can specify a
258 (deterministic) initial clustering via the initialid argument. This argument is
259 None by default. If not None, it should be a 1D array (or list) containing the
260 number (between 0 and nclusters-1) of the cluster to which each item is
261 assigned initially.
262
263 Note that the k-means routine in Bio.Cluster performs automatic repeats of the
264 algorithm, each time starting from a different random initial clustering. See
265 the comment for the npass argument below.
266
267 o calc_centroid_fn:
268
269 This argument specifies how to calculate the cluster centroids, given the data
270 vectors of the items that belong to each cluster. By default, the mean over the
271 vectors is calculated. A user-defined function can also be used.
272
273 Bio.Cluster's kcluster does not allow user-defined functions. Instead, the
274 method to calculate the cluster centroid is determined by the argument method,
275 which can be either 'a' (arithmetic mean) or 'm' (median). The default is to
276 calculate the mean ('a').
277
278 o max_iterations:
279
280 The cluster routine in kMeans.py has an argument max_iterations, which is used
281 to stop the iteration it the routine does not converge after the given number of
282 iterations.
283
284 The kcluster routine in Bio.Cluster does not have such an argument. The failure
285 of a k-means algorithm to converge is due to the occurrence of periodic
286 clustering solutions during the course of the k-means algorithm. The kcluster
287 routine in Bio.Cluster automatically checks for the occurrence of such a
288 periodicity in the solutions. If a periodic behavior is detected, the algorithm
289 is interrupted and the last clustering solution is returned. Accordingly, the
290 kcluster routine is guaranteed to return a clustering solution. Also see the
291 discussion of the npass argument below.
292
293 o update_fn:
294
295 The argument update_fn to cluster in kMeans.py is a hook function that is
296 called at the beginning of every iteration and passed the iteration number,
297 cluster centroids, and current cluster assignments. It is used by xkMeans.py,
298 which provides a visualization of k-means clustering. Currently there is no
299 equivalent in Bio.Cluster.
300
301
302 Other arguments for Bio.Cluster's kcluster.
303 -------------------------------------------
304
305 Three arguments in Bio.Cluster's kcluster do not have a direct equivalent in
306 kMeans.py's cluster.
307
308 o mask:
309
310 Microarray experiments tend to suffer from a large number of missing data. The
311 argument mask to Bio.Cluster's kcluster lets the user specify which data are
312 missing. This argument is an array with the same shape as data, and contains
313 a 1 for each data point that is present, and a 0 for a missing data point:
314
315 mask[i,j]==1: data[i,j] is valid
316 mask[i,j]==0: data[i,j] is a missing data point
317
318 Missing data points are ignored by the clustering algorithm. By default, mask
319 is an array containing 1's everywhere.
320
321 o weight:
322
323 The weight argument is used to put different weights on different data point.
324 For example, when clustering genes based on their gene expression profile, we
325 may want to attach a bigger weight to some microarrays compared to others. By
326 default, the weight argument contains equal weights of 1.0 for all data points.
327 Note that for row-wise clustering, the weight argument is a 1D vector whose
328 length is equal to the number of columns. For column-wise clustering, the length
329 of this argument is equal to the number of rows.
330
331 o npass:
332
333 Typical implementations of the k-means clustering algorithm rely on a random
334 initialization. Unlike Self-Organizing Maps, however, the k-means algorithm has
335 a clearly defined goal, which is to minimize the within-cluster sum of
336 distances. Different k-means clustering solutions (based on different initial
337 clusterings) can therefore be compared to each other directly. In order to
338 increase the chance of finding the optimal k-means clustering solution, the
339 k-means routine in Bio.Cluster automatically repeats the algorithm npass times,
340 each time starting from a different initial random clustering. The best
341 clustering solution, as well as in how many of the npass attempts it was found,
342 is returned to the user. For more information, see the output variable nfound
343 below.
344
345
346 Return values
347 -------------
348
349 The cluster routine in kMeans.py returns two values:
350
351 o centroids
352 o clusters
353
354 The kcluster routine in Bio.Cluster returns four values:
355
356 o clusterid
357 o centroids
358 o error
359 o nfound
360
361
362 o centroids:
363
364 The centroids return value contains the centroids of the k clusters that were
365 found, and corresponds to the centroids return value from Bio.Cluster's
366 kcluster routine.
367
368 o clusters:
369
370 The clusters return value contains the number of the cluster to which each
371 vector was assigned. The corresponding return value in Bio.Cluster's kcluster
372 is clusterid.
373
374 o error:
375
376 The error return value from Bio.Cluster's kcluster is the within-cluster sum of
377 distances for the optimal clustering solution that was found. This value can be
378 used to compare different clustering solutions to each other.
379
380 o nfound:
381
382 The nfound return value from Bio.Cluster's kcluster shows in how many of the
383 npass runs the optimal clustering solution was found. Accordingly, nfound is at
384 least 1 and at most equal to npass. A large value for nfound is an indication
385 that the clustering solution that was found is optimal. On the other hand, if
386 nfound is equal to 1, it is very well possible that a better clustering solution
387 exists than the one found by kcluster.
Something went wrong with that request. Please try again.