Skip to content
Newer
Older
100644 384 lines (294 sloc) 13.6 KB
1c02f0c Bio.sequtils and Bio.SeqUtils were duplicated code, and even worse we…
chapmanb authored
1 This file provides documentation for modules in Biopython that have been moved
2 or deprecated in favor of other modules. This provides some quick and easy
3 to find documentation about how to update your code to work again.
4
2669c4d - Updates to move from Numeric python to NumPy. Python modules have b…
chapmanb authored
5 Numeric support
6 ===============
76300d6 @peterjc Updates include deprecation of Martel/Mindy
peterjc authored
7 Following the release of 1.48, Numeric support in Biopython is discontinued.
8 Limited support is still available for python modules via back compatible
9 imports, but C modules will not work. Please move to NumPy.
2669c4d - Updates to move from Numeric python to NumPy. Python modules have b…
chapmanb authored
10
fb76593 @peterjc Labelling Martel and Bio.Mindy as obsolete, updating the news about t…
peterjc authored
11 Martel
12 ======
76300d6 @peterjc Updates include deprecation of Martel/Mindy
peterjc authored
13 Declared obsolete in Release 1.48, deprecated in Release 1.49
fb76593 @peterjc Labelling Martel and Bio.Mindy as obsolete, updating the news about t…
peterjc authored
14
15 Bio.Mindy
16 =========
76300d6 @peterjc Updates include deprecation of Martel/Mindy
peterjc authored
17 Declared obsolete in Release 1.48, deprecated in Release 1.49
18
19 Bio.builders
20 ============
21 Part of the Martle/Mindy infrastructure, this was deprecated in Release 1.49
fb76593 @peterjc Labelling Martel and Bio.Mindy as obsolete, updating the news about t…
peterjc authored
22
23 Bio.MetaTool
24 ============
25 Deprecated in Release 1.48, this was a parser from the output of MetaTool 3.5
26 which is now obsolete.
27
81015eb @peterjc Declaring Bio.PubMed and the online parts of Bio.GenBank as OBSOLETE,…
peterjc authored
28 Bio.GenBank
29 ===========
30 The online functionality (search_for, download_many, and NCBIDictionary) was
31 declared obsolete in Release 1.48, with the intention of an official deprecation
32 in the following release. Please use Bio.Entrez instead.
33
34 Bio.PubMed
35 ==========
36 Declared obsolete in Release 1.48, with the intention of an official deprecation
37 in the following release. Please use Bio.Entrez instead.
38
01e6e76 @peterjc Bio.EUtils deprecated in favour of Bio.Entrez
peterjc authored
39 Bio.EUtils
40 ==========
41 Deprecated in favor of Bio.Entrez in Release 1.48
42
fa7ab2d @peterjc Updating for recent deprecations
peterjc authored
43 Bio.Blast.NCBIWWW
44 =================
45 The HTML BLAST parser was deprecated as of Release 1.48
b927439 @peterjc Bio.WWW deprecation, and updating old entries to say when they were r…
peterjc authored
46 The deprecated functions blast and blasturl were removed in Release 1.44
fa7ab2d @peterjc Updating for recent deprecations
peterjc authored
47
48 Bio.Saf
49 =======
50 Deprecated as of Release 1.48, as it appears to have no users, and relies
51 on Martel which doesn't work properly with mxTextTools 3.0
52
ad46521 @peterjc Deprecating Bio.NBRF in favour of the 'pir' format in Bio.SeqIO
peterjc authored
53 Bio.NBRF
54 ========
55 Deprecated as of Release 1.48 in favor of the "pir" format in Bio.SeqIO
56
5be4221 @peterjc Deprecating Bio.IntelliGenetics in favour of the ig format in Bio.SeqIO
peterjc authored
57 Bio.IntelliGenetics
58 ===================
fa7ab2d @peterjc Updating for recent deprecations
peterjc authored
59 Deprecated as of Release 1.48 in favor of the "ig" format in Bio.SeqIO
d01c450 Getting ready for release 1.46.
mdehoon authored
60
5e507c9 Updating for release 1.47.
mdehoon authored
61 Bio.ECell
62 =========
63 Deprecated as of Release 1.47, as it appears to have no users, and the code
64 does not seem relevant for ECell 3.
65
d01c450 Getting ready for release 1.46.
mdehoon authored
66 Bio.Rebase
67 ==========
68 Deprecated as of Release 1.46.
69
70 Bio.Gobase
71 ==========
72 Deprecated as of Release 1.46.
73
74 Bio.CDD
75 =======
76 Deprecated as of Release 1.46.
77
21059b1 @peterjc Bio.biblio was deprecated for Biopython 1.45, but I didn't remember t…
peterjc authored
78 Bio.biblio
79 ==========
b927439 @peterjc Bio.WWW deprecation, and updating old entries to say when they were r…
peterjc authored
80 Deprecated as of Release 1.45, removed in Release 1.48
21059b1 @peterjc Bio.biblio was deprecated for Biopython 1.45, but I didn't remember t…
peterjc authored
81
4556db2 @peterjc Bringing these up to date with changes since Biopython 1.44
peterjc authored
82 Bio.WWW
83 =======
b927439 @peterjc Bio.WWW deprecation, and updating old entries to say when they were r…
peterjc authored
84 The modules under Bio.WWW were deprecated in Release 1.45, and removed in 1.48.
85 The remaining stub Bio.WWW was deprecated in Release 1.48.
86
4556db2 @peterjc Bringing these up to date with changes since Biopython 1.44
peterjc authored
87 The functionality in Bio.WWW.SCOP, Bio.WWW.InterPro and Bio.WWW.ExPASy
88 is now available from Bio.SCOP, Bio.InterPro and Bio.ExPASy instead.
89
5145a4d @peterjc Bringing this up to date for Biopython 1.44
peterjc authored
90 Bio.SeqIO
91 =========
92 The old Bio.SeqIO.FASTA and Bio.SeqIO.generic were deprecated in favour of
b927439 @peterjc Bio.WWW deprecation, and updating old entries to say when they were r…
peterjc authored
93 the new Bio.SeqIO module as of Release 1.44, removed in Release 1.47
5145a4d @peterjc Bringing this up to date for Biopython 1.44
peterjc authored
94
95 Bio.lcc
96 =======
b927439 @peterjc Bio.WWW deprecation, and updating old entries to say when they were r…
peterjc authored
97 Deprecated in favor of Bio.SeqUtils.lcc in Release 1.44, removed in 1.46
5145a4d @peterjc Bringing this up to date for Biopython 1.44
peterjc authored
98
99 Bio.crc
100 =======
b927439 @peterjc Bio.WWW deprecation, and updating old entries to say when they were r…
peterjc authored
101 Deprecated in favor of Bio.SeqUtils.CheckSum in Release 1.44, removed in 1.46
5145a4d @peterjc Bringing this up to date for Biopython 1.44
peterjc authored
102
103 Bio.FormatIO
104 ============
105 This was removed in Release 1.44
106
fa7ab2d @peterjc Updating for recent deprecations
peterjc authored
107 Bio.expressions (and therefore Bio.config, Bio.dbdefs, Bio.formatdefs, Bio.dbdefs)
5145a4d @peterjc Bringing this up to date for Biopython 1.44
peterjc authored
108 ===============
76300d6 @peterjc Updates include deprecation of Martel/Mindy
peterjc authored
109 These were deprecated in Release 1.44, and removed in Release 1.49
5145a4d @peterjc Bringing this up to date for Biopython 1.44
peterjc authored
110
111 Bio.Kabat
112 =========
113 This was deprecated in Release 1.43 and removed in Release 1.44
114
34b4f31 Added the functions 'complement' and 'reverse_complement' to Bio.Seq'…
mdehoon authored
115 Bio.SeqUtils
116 ============
117 The functions 'complement' and 'antiparallel' in Bio.SeqUtils have been
76300d6 @peterjc Updates include deprecation of Martel/Mindy
peterjc authored
118 deprecated as of Release 1.31, and removed in Release 1.43.
119 Use the functions 'complement' and 'reverse_complement' in Bio.Seq instead.
34b4f31 Added the functions 'complement' and 'reverse_complement' to Bio.Seq'…
mdehoon authored
120
121 Bio.GFF
122 =======
123 The functions 'forward_complement' and 'antiparallel' in Bio.GFF.easy have been
76300d6 @peterjc Updates include deprecation of Martel/Mindy
peterjc authored
124 deprecated as of Release 1.31, and removed in Release 1.43.
125 Use the functions 'complement' and 'reverse_complement' in Bio.Seq instead.
efd9b60 Added blast to qblast change to DEPRECATED file
chapmanb authored
126
1c02f0c Bio.sequtils and Bio.SeqUtils were duplicated code, and even worse we…
chapmanb authored
127 Bio.sequtils
b0acc00 Added instructions on how to move to Bio.Cluster from Bio.kMeans and
mdehoon authored
128 ============
b927439 @peterjc Bio.WWW deprecation, and updating old entries to say when they were r…
peterjc authored
129 Deprecated as of Release 1.30, removed in Release 1.42
1c02f0c Bio.sequtils and Bio.SeqUtils were duplicated code, and even worse we…
chapmanb authored
130 Use Bio.SeqUtils instead.
b0acc00 Added instructions on how to move to Bio.Cluster from Bio.kMeans and
mdehoon authored
131
909bae9 Deprecated Bio.SVM and recommend usage of libsvm.
chapmanb authored
132 Bio.SVM
133 =======
b927439 @peterjc Bio.WWW deprecation, and updating old entries to say when they were r…
peterjc authored
134 Deprecated as of Release 1.30, removed in Release 1.42
909bae9 Deprecated Bio.SVM and recommend usage of libsvm.
chapmanb authored
135 The Support Vector Machine code in Biopython has been superceeded by a
136 more robust (and maintained) SVM library, which includes a python
137 interface. We recommend using LIBSVM:
138
139 http://www.csie.ntu.edu.tw/~cjlin/libsvm/
b0acc00 Added instructions on how to move to Bio.Cluster from Bio.kMeans and
mdehoon authored
140
23b046b Removed internal references to RecordFile, which are really not needed.
chapmanb authored
141 Bio.RecordFile
142 ==============
b927439 @peterjc Bio.WWW deprecation, and updating old entries to say when they were r…
peterjc authored
143 Deprecated as of Release 1.30, removed in Release 1.42
23b046b Removed internal references to RecordFile, which are really not needed.
chapmanb authored
144 RecordFile wasn't completely implemented and duplicates the work
145 of most standard parsers. We recommend using a specific iterator
146 (Bio.Fasta.Iterator for example) without a parser to get back
147 text records.
148
b0acc00 Added instructions on how to move to Bio.Cluster from Bio.kMeans and
mdehoon authored
149 Bio.kMeans and Bio.xkMeans
150 ==========================
b927439 @peterjc Bio.WWW deprecation, and updating old entries to say when they were r…
peterjc authored
151 Deprecated as of Release 1.30, removed in Release 1.42
b0acc00 Added instructions on how to move to Bio.Cluster from Bio.kMeans and
mdehoon authored
152
153 The k-Means algorithm is an algorithm for unsupervised clustering of data.
154 Biopython includes an implementation of the k-means clustering algorithm
155 in kMeans.py. Recently, a larger set of clustering algorithms entered
156 Biopython as Bio.Cluster. As the kcluster routine in Bio.Cluster also implements
157 the k-means clustering algorithm, the kMeans.py module has been deprecated.
158 Below you will find a description of how to switch from kMeans.py to
159 Bio.Cluster's kcluster.
160
161 The function kcluster in Bio.Cluster performs k-means or k-medians clustering.
162 The corresponding function in kMeans.py is called cluster. This function takes
163 the following arguments:
164
165 o data
166 o k
167 o distance_fn
168 o init_centroids_fn
169 o calc_centroid_fn
170 o max_iterations
171 o update_fn
172
173 The function kcluster in Bio.Cluster takes the following arguments:
174
175 o data
176 o nclusters
177 o mask
178 o weight
179 o transpose
180 o npass
181 o method
182 o dist
183 o initialid
184
185
186 Arguments for kMeans.py's cluster, and their equivalents in Bio.Cluster
187 -----------------------------------------------------------------------
188
189
190 o data:
191
192 In kMeans.py, data is a list of vectors, each containing the same number of
193 data points. Within the context of clustering genes based on their gene
194 expression values, each vector would correspond to the gene expression data of
195 one particular gene, and the values in the vector would correspond to the
196 measured gene expression value by the different microarrays. The cluster
197 routine in kMeans.py always performs a row-wise clustering by grouping vectors.
198
199 The argument data to Bio.Cluster's kcluster has the same structure as in
200 kMeans.py. However, Bio.Cluster allows row-wise and column-wise clustering by
201 the transpose argument. If transpose==0 (the default value), kcluster performs
202 row-wise clustering, consistent with kMeans.py. If transpose==1, kcluster
203 performs column-wise clustering. The same behavior can be obtained, of course,
204 by transposing the data array before calling kcluster.
205
206
207 o k:
208
209 The desired number of clusters is specified by the input argument k in
210 kMeans.py. The corresponding argument in Bio.Cluster's kcluster is nclusters.
211
212 o distance_fn:
213
214 In kMeans.py, the argument distance_fn represents the distance function to
215 calculate the distances between items and cluster centroids. This argument
216 corresponds to a true Python function. The default value is the Euclidean
217 distance, implemented as distance.euclidean in distance.py. User-defined
218 distance functions can also be used.
219
220 The k-means routine in Bio.Cluster does not allow user-specified distance
221 functions. Instead, it provides the following nine built-in distance functions,
222 depending on the argument dist:
223
224 dist=='e': Euclidean distance
225 dist=='h': Harmonically summed Euclidean distance
226 dist=='b': City-block distance
227 dist=='c': Pearson correlation
228 dist=='a': absolute value of the Pearson correlation
229 dist=='u': uncentered correlation
230 dist=='x': absolute uncentered correlation
231 dist=='s': Spearmans rank correlation
232 dist=='k': Kendalls tau
233
234 User-defined distance functions are possible only by modifying the C code in
235 cluster.c (which may not be as hard as it sounds). The default distance function
236 is the Euclidean distance (distance=='e'). Note that in Bio.Cluster the
237 Euclidean distance is defined as the sum of squared differences, whereas in
238 kMeans.py the square root of this quantity is taken. This does not affect the
239 clustering result.
240
241 o init_centroids_fn:
242
243 This function specifies the initial choice for the cluster centroids. By
244 default, cluster in kMeans.py uses a random initial choice of cluster centroids
245 by randomly choosing k data vectors from the input vectors in the data input
246 argument. Alternatively, the user can specify a user-defined function to choose
247 the initial cluster centroids.
248
249 In Bio.Cluster, the k-means algorithm in kcluster starts from an initial cluster
250 assignment instead of an initial choice of cluster centroids. As far as I know,
251 these two initialization methods are equivalent in practice. Similar to the
252 cluster routine in kMeans.py, Bio.Cluster's kcluster performs a random initial
253 assignment of items to clusters. Alternatively, users can specify a
254 (deterministic) initial clustering via the initialid argument. This argument is
255 None by default. If not None, it should be a 1D array (or list) containing the
256 number (between 0 and nclusters-1) of the cluster to which each item is
257 assigned initially.
258
259 Note that the k-means routine in Bio.Cluster performs automatic repeats of the
260 algorithm, each time starting from a different random initial clustering. See
261 the comment for the npass argument below.
262
263 o calc_centroid_fn:
264
265 This argument specifies how to calculate the cluster centroids, given the data
266 vectors of the items that belong to each cluster. By default, the mean over the
267 vectors is calculated. A user-defined function can also be used.
268
269 Bio.Cluster's kcluster does not allow user-defined functions. Instead, the
270 method to calculate the cluster centroid is determined by the argument method,
271 which can be either 'a' (arithmetic mean) or 'm' (median). The default is to
272 calculate the mean ('a').
273
274 o max_iterations:
275
276 The cluster routine in kMeans.py has an argument max_iterations, which is used
277 to stop the iteration it the routine does not converge after the given number of
278 iterations.
279
280 The kcluster routine in Bio.Cluster does not have such an argument. The failure
281 of a k-means algorithm to converge is due to the occurrence of periodic
282 clustering solutions during the course of the k-means algorithm. The kcluster
283 routine in Bio.Cluster automatically checks for the occurrence of such a
284 periodicity in the solutions. If a periodic behavior is detected, the algorithm
285 is interrupted and the last clustering solution is returned. Accordingly, the
286 kcluster routine is guaranteed to return a clustering solution. Also see the
287 discussion of the npass argument below.
288
289 o update_fn:
290
291 The argument update_fn to cluster in kMeans.py is a hook function that is
292 called at the beginning of every iteration and passed the iteration number,
293 cluster centroids, and current cluster assignments. It is used by xkMeans.py,
294 which provides a visualization of k-means clustering. Currently there is no
295 equivalent in Bio.Cluster.
296
297
298 Other arguments for Bio.Cluster's kcluster.
299 -------------------------------------------
300
301 Three arguments in Bio.Cluster's kcluster do not have a direct equivalent in
302 kMeans.py's cluster.
303
304 o mask:
305
306 Microarray experiments tend to suffer from a large number of missing data. The
307 argument mask to Bio.Cluster's kcluster lets the user specify which data are
308 missing. This argument is an array with the same shape as data, and contains
309 a 1 for each data point that is present, and a 0 for a missing data point:
310
311 mask[i,j]==1: data[i,j] is valid
312 mask[i,j]==0: data[i,j] is a missing data point
313
314 Missing data points are ignored by the clustering algorithm. By default, mask
315 is an array containing 1's everywhere.
316
317 o weight:
318
319 The weight argument is used to put different weights on different data point.
320 For example, when clustering genes based on their gene expression profile, we
321 may want to attach a bigger weight to some microarrays compared to others. By
322 default, the weight argument contains equal weights of 1.0 for all data points.
323 Note that for row-wise clustering, the weight argument is a 1D vector whose
324 length is equal to the number of columns. For column-wise clustering, the length
325 of this argument is equal to the number of rows.
326
327 o npass:
328
329 Typical implementations of the k-means clustering algorithm rely on a random
330 initialization. Unlike Self-Organizing Maps, however, the k-means algorithm has
331 a clearly defined goal, which is to minimize the within-cluster sum of
332 distances. Different k-means clustering solutions (based on different initial
333 clusterings) can therefore be compared to each other directly. In order to
334 increase the chance of finding the optimal k-means clustering solution, the
335 k-means routine in Bio.Cluster automatically repeats the algorithm npass times,
336 each time starting from a different initial random clustering. The best
337 clustering solution, as well as in how many of the npass attempts it was found,
338 is returned to the user. For more information, see the output variable nfound
339 below.
340
341
342 Return values
343 -------------
344
345 The cluster routine in kMeans.py returns two values:
346
347 o centroids
348 o clusters
349
350 The kcluster routine in Bio.Cluster returns four values:
351
352 o clusterid
353 o centroids
354 o error
355 o nfound
356
357
358 o centroids:
359
360 The centroids return value contains the centroids of the k clusters that were
361 found, and corresponds to the centroids return value from Bio.Cluster's
362 kcluster routine.
363
364 o clusters:
365
366 The clusters return value contains the number of the cluster to which each
367 vector was assigned. The corresponding return value in Bio.Cluster's kcluster
368 is clusterid.
369
370 o error:
371
372 The error return value from Bio.Cluster's kcluster is the within-cluster sum of
373 distances for the optimal clustering solution that was found. This value can be
374 used to compare different clustering solutions to each other.
375
376 o nfound:
377
378 The nfound return value from Bio.Cluster's kcluster shows in how many of the
379 npass runs the optimal clustering solution was found. Accordingly, nfound is at
380 least 1 and at most equal to npass. A large value for nfound is an indication
381 that the clustering solution that was found is optimal. On the other hand, if
382 nfound is equal to 1, it is very well possible that a better clustering solution
383 exists than the one found by kcluster.
Something went wrong with that request. Please try again.