Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Newer
Older
100644 409 lines (313 sloc) 14.659 kB
1c02f0c Bio.sequtils and Bio.SeqUtils were duplicated code, and even worse we…
chapmanb authored
1 This file provides documentation for modules in Biopython that have been moved
2 or deprecated in favor of other modules. This provides some quick and easy
3 to find documentation about how to update your code to work again.
4
2669c4d - Updates to move from Numeric python to NumPy. Python modules have b…
chapmanb authored
5 Numeric support
6 ===============
76300d6 @peterjc Updates include deprecation of Martel/Mindy
peterjc authored
7 Following the release of 1.48, Numeric support in Biopython is discontinued.
8 Limited support is still available for python modules via back compatible
9 imports, but C modules will not work. Please move to NumPy.
2669c4d - Updates to move from Numeric python to NumPy. Python modules have b…
chapmanb authored
10
fb76593 @peterjc Labelling Martel and Bio.Mindy as obsolete, updating the news about t…
peterjc authored
11 Martel
12 ======
76300d6 @peterjc Updates include deprecation of Martel/Mindy
peterjc authored
13 Declared obsolete in Release 1.48, deprecated in Release 1.49
fb76593 @peterjc Labelling Martel and Bio.Mindy as obsolete, updating the news about t…
peterjc authored
14
15 Bio.Mindy
16 =========
76300d6 @peterjc Updates include deprecation of Martel/Mindy
peterjc authored
17 Declared obsolete in Release 1.48, deprecated in Release 1.49
18
43419aa @peterjc Deprecating Martel/Mindy related modules Bio.Std, Bio.StdHandler and …
peterjc authored
19 Bio.builders, Bio.Std, Bio.StdHandler, Bio.Decode
20 =================================================
edb3ac6 @peterjc Removing modules Bio.SGMLExtractor, Bio.CDD, Bio.Gobase and Bio.Rebas…
peterjc authored
21 Part of the Martel/Mindy infrastructure, these were deprecated in Release 1.49
fb76593 @peterjc Labelling Martel and Bio.Mindy as obsolete, updating the news about t…
peterjc authored
22
fd2e06b @peterjc Adding minimal module docstrings for the deprecated Bio.Writers/Bio.w…
peterjc authored
23 Bio.Writer and Bio.writers
24 ==========================
25 Deprecated in Release 1.48
26
6ce5052 @peterjc Bio.Emboss.Primer was deprecated in Biopython 1.48
peterjc authored
27 Bio.Emboss.Primer
28 =================
29 Deprecated in Release 1.48, this parser was replaced by Bio.Emboss.Primer3 and
30 Bio.Emboss.PrimerSearch instead.
31
fb76593 @peterjc Labelling Martel and Bio.Mindy as obsolete, updating the news about t…
peterjc authored
32 Bio.MetaTool
33 ============
34 Deprecated in Release 1.48, this was a parser from the output of MetaTool 3.5
35 which is now obsolete.
36
81015eb @peterjc Declaring Bio.PubMed and the online parts of Bio.GenBank as OBSOLETE,…
peterjc authored
37 Bio.GenBank
38 ===========
39 The online functionality (search_for, download_many, and NCBIDictionary) was
40 declared obsolete in Release 1.48, with the intention of an official deprecation
41 in the following release. Please use Bio.Entrez instead.
42
43 Bio.PubMed
44 ==========
45 Declared obsolete in Release 1.48, with the intention of an official deprecation
46 in the following release. Please use Bio.Entrez instead.
47
01e6e76 @peterjc Bio.EUtils deprecated in favour of Bio.Entrez
peterjc authored
48 Bio.EUtils
49 ==========
50 Deprecated in favor of Bio.Entrez in Release 1.48
51
fa7ab2d @peterjc Updating for recent deprecations
peterjc authored
52 Bio.Blast.NCBIWWW
53 =================
54 The HTML BLAST parser was deprecated as of Release 1.48
b927439 @peterjc Bio.WWW deprecation, and updating old entries to say when they were r…
peterjc authored
55 The deprecated functions blast and blasturl were removed in Release 1.44
fa7ab2d @peterjc Updating for recent deprecations
peterjc authored
56
57 Bio.Saf
58 =======
59 Deprecated as of Release 1.48, as it appears to have no users, and relies
60 on Martel which doesn't work properly with mxTextTools 3.0
61
ad46521 @peterjc Deprecating Bio.NBRF in favour of the 'pir' format in Bio.SeqIO
peterjc authored
62 Bio.NBRF
63 ========
64 Deprecated as of Release 1.48 in favor of the "pir" format in Bio.SeqIO
65
5be4221 @peterjc Deprecating Bio.IntelliGenetics in favour of the ig format in Bio.SeqIO
peterjc authored
66 Bio.IntelliGenetics
67 ===================
fa7ab2d @peterjc Updating for recent deprecations
peterjc authored
68 Deprecated as of Release 1.48 in favor of the "ig" format in Bio.SeqIO
d01c450 Getting ready for release 1.46.
mdehoon authored
69
5e507c9 Updating for release 1.47.
mdehoon authored
70 Bio.ECell
71 =========
72 Deprecated as of Release 1.47, as it appears to have no users, and the code
73 does not seem relevant for ECell 3.
74
edb3ac6 @peterjc Removing modules Bio.SGMLExtractor, Bio.CDD, Bio.Gobase and Bio.Rebas…
peterjc authored
75 Bio.SGMLExtractor
76 =================
77 Deprecated as of Release 1.46, removed in Release 1.49.
78
d01c450 Getting ready for release 1.46.
mdehoon authored
79 Bio.Rebase
80 ==========
edb3ac6 @peterjc Removing modules Bio.SGMLExtractor, Bio.CDD, Bio.Gobase and Bio.Rebas…
peterjc authored
81 Deprecated as of Release 1.46, removed in Release 1.49.
d01c450 Getting ready for release 1.46.
mdehoon authored
82
83 Bio.Gobase
84 ==========
edb3ac6 @peterjc Removing modules Bio.SGMLExtractor, Bio.CDD, Bio.Gobase and Bio.Rebas…
peterjc authored
85 Deprecated as of Release 1.46, removed in Release 1.49.
d01c450 Getting ready for release 1.46.
mdehoon authored
86
87 Bio.CDD
88 =======
edb3ac6 @peterjc Removing modules Bio.SGMLExtractor, Bio.CDD, Bio.Gobase and Bio.Rebas…
peterjc authored
89 Deprecated as of Release 1.46, removed in Release 1.49.
d01c450 Getting ready for release 1.46.
mdehoon authored
90
21059b1 @peterjc Bio.biblio was deprecated for Biopython 1.45, but I didn't remember t…
peterjc authored
91 Bio.biblio
92 ==========
b927439 @peterjc Bio.WWW deprecation, and updating old entries to say when they were r…
peterjc authored
93 Deprecated as of Release 1.45, removed in Release 1.48
21059b1 @peterjc Bio.biblio was deprecated for Biopython 1.45, but I didn't remember t…
peterjc authored
94
4556db2 @peterjc Bringing these up to date with changes since Biopython 1.44
peterjc authored
95 Bio.WWW
96 =======
b927439 @peterjc Bio.WWW deprecation, and updating old entries to say when they were r…
peterjc authored
97 The modules under Bio.WWW were deprecated in Release 1.45, and removed in 1.48.
98 The remaining stub Bio.WWW was deprecated in Release 1.48.
99
4556db2 @peterjc Bringing these up to date with changes since Biopython 1.44
peterjc authored
100 The functionality in Bio.WWW.SCOP, Bio.WWW.InterPro and Bio.WWW.ExPASy
101 is now available from Bio.SCOP, Bio.InterPro and Bio.ExPASy instead.
102
5145a4d @peterjc Bringing this up to date for Biopython 1.44
peterjc authored
103 Bio.SeqIO
104 =========
105 The old Bio.SeqIO.FASTA and Bio.SeqIO.generic were deprecated in favour of
b927439 @peterjc Bio.WWW deprecation, and updating old entries to say when they were r…
peterjc authored
106 the new Bio.SeqIO module as of Release 1.44, removed in Release 1.47
5145a4d @peterjc Bringing this up to date for Biopython 1.44
peterjc authored
107
fe10992 @peterjc Mentioning a few old modules deprecated in 1.44 and removed in 1.46
peterjc authored
108 Bio.Medline.NLMMedlineXML
109 =========================
110 Deprecated in Release 1.44, removed in 1.46
111
112 Bio.MultiProc
113 =============
114 Deprecated in Release 1.44, removed in 1.46
115
116 Bio.MarkupEditor
117 ================
118 Deprecated in Release 1.44, removed in 1.46
119
5145a4d @peterjc Bringing this up to date for Biopython 1.44
peterjc authored
120 Bio.lcc
121 =======
b927439 @peterjc Bio.WWW deprecation, and updating old entries to say when they were r…
peterjc authored
122 Deprecated in favor of Bio.SeqUtils.lcc in Release 1.44, removed in 1.46
5145a4d @peterjc Bringing this up to date for Biopython 1.44
peterjc authored
123
124 Bio.crc
125 =======
b927439 @peterjc Bio.WWW deprecation, and updating old entries to say when they were r…
peterjc authored
126 Deprecated in favor of Bio.SeqUtils.CheckSum in Release 1.44, removed in 1.46
5145a4d @peterjc Bringing this up to date for Biopython 1.44
peterjc authored
127
128 Bio.FormatIO
129 ============
130 This was removed in Release 1.44
131
fa7ab2d @peterjc Updating for recent deprecations
peterjc authored
132 Bio.expressions (and therefore Bio.config, Bio.dbdefs, Bio.formatdefs, Bio.dbdefs)
5145a4d @peterjc Bringing this up to date for Biopython 1.44
peterjc authored
133 ===============
76300d6 @peterjc Updates include deprecation of Martel/Mindy
peterjc authored
134 These were deprecated in Release 1.44, and removed in Release 1.49
5145a4d @peterjc Bringing this up to date for Biopython 1.44
peterjc authored
135
136 Bio.Kabat
137 =========
138 This was deprecated in Release 1.43 and removed in Release 1.44
139
34b4f31 Added the functions 'complement' and 'reverse_complement' to Bio.Seq'…
mdehoon authored
140 Bio.SeqUtils
141 ============
142 The functions 'complement' and 'antiparallel' in Bio.SeqUtils have been
76300d6 @peterjc Updates include deprecation of Martel/Mindy
peterjc authored
143 deprecated as of Release 1.31, and removed in Release 1.43.
144 Use the functions 'complement' and 'reverse_complement' in Bio.Seq instead.
34b4f31 Added the functions 'complement' and 'reverse_complement' to Bio.Seq'…
mdehoon authored
145
146 Bio.GFF
147 =======
148 The functions 'forward_complement' and 'antiparallel' in Bio.GFF.easy have been
76300d6 @peterjc Updates include deprecation of Martel/Mindy
peterjc authored
149 deprecated as of Release 1.31, and removed in Release 1.43.
150 Use the functions 'complement' and 'reverse_complement' in Bio.Seq instead.
efd9b60 Added blast to qblast change to DEPRECATED file
chapmanb authored
151
1c02f0c Bio.sequtils and Bio.SeqUtils were duplicated code, and even worse we…
chapmanb authored
152 Bio.sequtils
b0acc00 Added instructions on how to move to Bio.Cluster from Bio.kMeans and
mdehoon authored
153 ============
b927439 @peterjc Bio.WWW deprecation, and updating old entries to say when they were r…
peterjc authored
154 Deprecated as of Release 1.30, removed in Release 1.42
1c02f0c Bio.sequtils and Bio.SeqUtils were duplicated code, and even worse we…
chapmanb authored
155 Use Bio.SeqUtils instead.
b0acc00 Added instructions on how to move to Bio.Cluster from Bio.kMeans and
mdehoon authored
156
909bae9 Deprecated Bio.SVM and recommend usage of libsvm.
chapmanb authored
157 Bio.SVM
158 =======
b927439 @peterjc Bio.WWW deprecation, and updating old entries to say when they were r…
peterjc authored
159 Deprecated as of Release 1.30, removed in Release 1.42
909bae9 Deprecated Bio.SVM and recommend usage of libsvm.
chapmanb authored
160 The Support Vector Machine code in Biopython has been superceeded by a
161 more robust (and maintained) SVM library, which includes a python
162 interface. We recommend using LIBSVM:
163
164 http://www.csie.ntu.edu.tw/~cjlin/libsvm/
b0acc00 Added instructions on how to move to Bio.Cluster from Bio.kMeans and
mdehoon authored
165
23b046b Removed internal references to RecordFile, which are really not needed.
chapmanb authored
166 Bio.RecordFile
167 ==============
b927439 @peterjc Bio.WWW deprecation, and updating old entries to say when they were r…
peterjc authored
168 Deprecated as of Release 1.30, removed in Release 1.42
23b046b Removed internal references to RecordFile, which are really not needed.
chapmanb authored
169 RecordFile wasn't completely implemented and duplicates the work
170 of most standard parsers. We recommend using a specific iterator
171 (Bio.Fasta.Iterator for example) without a parser to get back
172 text records.
173
b0acc00 Added instructions on how to move to Bio.Cluster from Bio.kMeans and
mdehoon authored
174 Bio.kMeans and Bio.xkMeans
175 ==========================
b927439 @peterjc Bio.WWW deprecation, and updating old entries to say when they were r…
peterjc authored
176 Deprecated as of Release 1.30, removed in Release 1.42
b0acc00 Added instructions on how to move to Bio.Cluster from Bio.kMeans and
mdehoon authored
177
178 The k-Means algorithm is an algorithm for unsupervised clustering of data.
179 Biopython includes an implementation of the k-means clustering algorithm
180 in kMeans.py. Recently, a larger set of clustering algorithms entered
181 Biopython as Bio.Cluster. As the kcluster routine in Bio.Cluster also implements
182 the k-means clustering algorithm, the kMeans.py module has been deprecated.
183 Below you will find a description of how to switch from kMeans.py to
184 Bio.Cluster's kcluster.
185
186 The function kcluster in Bio.Cluster performs k-means or k-medians clustering.
187 The corresponding function in kMeans.py is called cluster. This function takes
188 the following arguments:
189
190 o data
191 o k
192 o distance_fn
193 o init_centroids_fn
194 o calc_centroid_fn
195 o max_iterations
196 o update_fn
197
198 The function kcluster in Bio.Cluster takes the following arguments:
199
200 o data
201 o nclusters
202 o mask
203 o weight
204 o transpose
205 o npass
206 o method
207 o dist
208 o initialid
209
210
211 Arguments for kMeans.py's cluster, and their equivalents in Bio.Cluster
212 -----------------------------------------------------------------------
213
214
215 o data:
216
217 In kMeans.py, data is a list of vectors, each containing the same number of
218 data points. Within the context of clustering genes based on their gene
219 expression values, each vector would correspond to the gene expression data of
220 one particular gene, and the values in the vector would correspond to the
221 measured gene expression value by the different microarrays. The cluster
222 routine in kMeans.py always performs a row-wise clustering by grouping vectors.
223
224 The argument data to Bio.Cluster's kcluster has the same structure as in
225 kMeans.py. However, Bio.Cluster allows row-wise and column-wise clustering by
226 the transpose argument. If transpose==0 (the default value), kcluster performs
227 row-wise clustering, consistent with kMeans.py. If transpose==1, kcluster
228 performs column-wise clustering. The same behavior can be obtained, of course,
229 by transposing the data array before calling kcluster.
230
231
232 o k:
233
234 The desired number of clusters is specified by the input argument k in
235 kMeans.py. The corresponding argument in Bio.Cluster's kcluster is nclusters.
236
237 o distance_fn:
238
239 In kMeans.py, the argument distance_fn represents the distance function to
240 calculate the distances between items and cluster centroids. This argument
241 corresponds to a true Python function. The default value is the Euclidean
242 distance, implemented as distance.euclidean in distance.py. User-defined
243 distance functions can also be used.
244
245 The k-means routine in Bio.Cluster does not allow user-specified distance
246 functions. Instead, it provides the following nine built-in distance functions,
247 depending on the argument dist:
248
249 dist=='e': Euclidean distance
250 dist=='h': Harmonically summed Euclidean distance
251 dist=='b': City-block distance
252 dist=='c': Pearson correlation
253 dist=='a': absolute value of the Pearson correlation
254 dist=='u': uncentered correlation
255 dist=='x': absolute uncentered correlation
256 dist=='s': Spearmans rank correlation
257 dist=='k': Kendalls tau
258
259 User-defined distance functions are possible only by modifying the C code in
260 cluster.c (which may not be as hard as it sounds). The default distance function
261 is the Euclidean distance (distance=='e'). Note that in Bio.Cluster the
262 Euclidean distance is defined as the sum of squared differences, whereas in
263 kMeans.py the square root of this quantity is taken. This does not affect the
264 clustering result.
265
266 o init_centroids_fn:
267
268 This function specifies the initial choice for the cluster centroids. By
269 default, cluster in kMeans.py uses a random initial choice of cluster centroids
270 by randomly choosing k data vectors from the input vectors in the data input
271 argument. Alternatively, the user can specify a user-defined function to choose
272 the initial cluster centroids.
273
274 In Bio.Cluster, the k-means algorithm in kcluster starts from an initial cluster
275 assignment instead of an initial choice of cluster centroids. As far as I know,
276 these two initialization methods are equivalent in practice. Similar to the
277 cluster routine in kMeans.py, Bio.Cluster's kcluster performs a random initial
278 assignment of items to clusters. Alternatively, users can specify a
279 (deterministic) initial clustering via the initialid argument. This argument is
280 None by default. If not None, it should be a 1D array (or list) containing the
281 number (between 0 and nclusters-1) of the cluster to which each item is
282 assigned initially.
283
284 Note that the k-means routine in Bio.Cluster performs automatic repeats of the
285 algorithm, each time starting from a different random initial clustering. See
286 the comment for the npass argument below.
287
288 o calc_centroid_fn:
289
290 This argument specifies how to calculate the cluster centroids, given the data
291 vectors of the items that belong to each cluster. By default, the mean over the
292 vectors is calculated. A user-defined function can also be used.
293
294 Bio.Cluster's kcluster does not allow user-defined functions. Instead, the
295 method to calculate the cluster centroid is determined by the argument method,
296 which can be either 'a' (arithmetic mean) or 'm' (median). The default is to
297 calculate the mean ('a').
298
299 o max_iterations:
300
301 The cluster routine in kMeans.py has an argument max_iterations, which is used
302 to stop the iteration it the routine does not converge after the given number of
303 iterations.
304
305 The kcluster routine in Bio.Cluster does not have such an argument. The failure
306 of a k-means algorithm to converge is due to the occurrence of periodic
307 clustering solutions during the course of the k-means algorithm. The kcluster
308 routine in Bio.Cluster automatically checks for the occurrence of such a
309 periodicity in the solutions. If a periodic behavior is detected, the algorithm
310 is interrupted and the last clustering solution is returned. Accordingly, the
311 kcluster routine is guaranteed to return a clustering solution. Also see the
312 discussion of the npass argument below.
313
314 o update_fn:
315
316 The argument update_fn to cluster in kMeans.py is a hook function that is
317 called at the beginning of every iteration and passed the iteration number,
318 cluster centroids, and current cluster assignments. It is used by xkMeans.py,
319 which provides a visualization of k-means clustering. Currently there is no
320 equivalent in Bio.Cluster.
321
322
323 Other arguments for Bio.Cluster's kcluster.
324 -------------------------------------------
325
326 Three arguments in Bio.Cluster's kcluster do not have a direct equivalent in
327 kMeans.py's cluster.
328
329 o mask:
330
331 Microarray experiments tend to suffer from a large number of missing data. The
332 argument mask to Bio.Cluster's kcluster lets the user specify which data are
333 missing. This argument is an array with the same shape as data, and contains
334 a 1 for each data point that is present, and a 0 for a missing data point:
335
336 mask[i,j]==1: data[i,j] is valid
337 mask[i,j]==0: data[i,j] is a missing data point
338
339 Missing data points are ignored by the clustering algorithm. By default, mask
340 is an array containing 1's everywhere.
341
342 o weight:
343
344 The weight argument is used to put different weights on different data point.
345 For example, when clustering genes based on their gene expression profile, we
346 may want to attach a bigger weight to some microarrays compared to others. By
347 default, the weight argument contains equal weights of 1.0 for all data points.
348 Note that for row-wise clustering, the weight argument is a 1D vector whose
349 length is equal to the number of columns. For column-wise clustering, the length
350 of this argument is equal to the number of rows.
351
352 o npass:
353
354 Typical implementations of the k-means clustering algorithm rely on a random
355 initialization. Unlike Self-Organizing Maps, however, the k-means algorithm has
356 a clearly defined goal, which is to minimize the within-cluster sum of
357 distances. Different k-means clustering solutions (based on different initial
358 clusterings) can therefore be compared to each other directly. In order to
359 increase the chance of finding the optimal k-means clustering solution, the
360 k-means routine in Bio.Cluster automatically repeats the algorithm npass times,
361 each time starting from a different initial random clustering. The best
362 clustering solution, as well as in how many of the npass attempts it was found,
363 is returned to the user. For more information, see the output variable nfound
364 below.
365
366
367 Return values
368 -------------
369
370 The cluster routine in kMeans.py returns two values:
371
372 o centroids
373 o clusters
374
375 The kcluster routine in Bio.Cluster returns four values:
376
377 o clusterid
378 o centroids
379 o error
380 o nfound
381
382
383 o centroids:
384
385 The centroids return value contains the centroids of the k clusters that were
386 found, and corresponds to the centroids return value from Bio.Cluster's
387 kcluster routine.
388
389 o clusters:
390
391 The clusters return value contains the number of the cluster to which each
392 vector was assigned. The corresponding return value in Bio.Cluster's kcluster
393 is clusterid.
394
395 o error:
396
397 The error return value from Bio.Cluster's kcluster is the within-cluster sum of
398 distances for the optimal clustering solution that was found. This value can be
399 used to compare different clustering solutions to each other.
400
401 o nfound:
402
403 The nfound return value from Bio.Cluster's kcluster shows in how many of the
404 npass runs the optimal clustering solution was found. Accordingly, nfound is at
405 least 1 and at most equal to npass. A large value for nfound is an indication
406 that the clustering solution that was found is optimal. On the other hand, if
407 nfound is equal to 1, it is very well possible that a better clustering solution
408 exists than the one found by kcluster.
Something went wrong with that request. Please try again.