Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Newer
Older
100644 376 lines (288 sloc) 13.602 kb
1c02f0c Bio.sequtils and Bio.SeqUtils were duplicated code, and even worse we…
chapmanb authored
1 This file provides documentation for modules in Biopython that have been moved
2 or deprecated in favor of other modules. This provides some quick and easy
3 to find documentation about how to update your code to work again.
4
fb76593 @peterjc Labelling Martel and Bio.Mindy as obsolete, updating the news about t…
peterjc authored
5 Martel
6 ======
7 Declared obsolete in Release 1.48, with the intention of an official deprecation
8 or removal in a future release.
9
10 Bio.Mindy
11 =========
12 Declared obsolete in Release 1.48, with the intention of an official deprecation
13 in the following release.
14
15 Bio.MetaTool
16 ============
17 Deprecated in Release 1.48, this was a parser from the output of MetaTool 3.5
18 which is now obsolete.
19
81015eb @peterjc Declaring Bio.PubMed and the online parts of Bio.GenBank as OBSOLETE,…
peterjc authored
20 Bio.GenBank
21 ===========
22 The online functionality (search_for, download_many, and NCBIDictionary) was
23 declared obsolete in Release 1.48, with the intention of an official deprecation
24 in the following release. Please use Bio.Entrez instead.
25
26 Bio.PubMed
27 ==========
28 Declared obsolete in Release 1.48, with the intention of an official deprecation
29 in the following release. Please use Bio.Entrez instead.
30
01e6e76 @peterjc Bio.EUtils deprecated in favour of Bio.Entrez
peterjc authored
31 Bio.EUtils
32 ==========
33 Deprecated in favor of Bio.Entrez in Release 1.48
34
fa7ab2d @peterjc Updating for recent deprecations
peterjc authored
35 Bio.Blast.NCBIWWW
36 =================
37 The HTML BLAST parser was deprecated as of Release 1.48
b927439 @peterjc Bio.WWW deprecation, and updating old entries to say when they were r…
peterjc authored
38 The deprecated functions blast and blasturl were removed in Release 1.44
fa7ab2d @peterjc Updating for recent deprecations
peterjc authored
39
40 Bio.Saf
41 =======
42 Deprecated as of Release 1.48, as it appears to have no users, and relies
43 on Martel which doesn't work properly with mxTextTools 3.0
44
ad46521 @peterjc Deprecating Bio.NBRF in favour of the 'pir' format in Bio.SeqIO
peterjc authored
45 Bio.NBRF
46 ========
47 Deprecated as of Release 1.48 in favor of the "pir" format in Bio.SeqIO
48
5be4221 @peterjc Deprecating Bio.IntelliGenetics in favour of the ig format in Bio.SeqIO
peterjc authored
49 Bio.IntelliGenetics
50 ===================
fa7ab2d @peterjc Updating for recent deprecations
peterjc authored
51 Deprecated as of Release 1.48 in favor of the "ig" format in Bio.SeqIO
d01c450 Getting ready for release 1.46.
mdehoon authored
52
5e507c9 Updating for release 1.47.
mdehoon authored
53 Bio.ECell
54 =========
55 Deprecated as of Release 1.47, as it appears to have no users, and the code
56 does not seem relevant for ECell 3.
57
d01c450 Getting ready for release 1.46.
mdehoon authored
58 Bio.Rebase
59 ==========
60 Deprecated as of Release 1.46.
61
62 Bio.Gobase
63 ==========
64 Deprecated as of Release 1.46.
65
66 Bio.CDD
67 =======
68 Deprecated as of Release 1.46.
69
21059b1 @peterjc Bio.biblio was deprecated for Biopython 1.45, but I didn't remember t…
peterjc authored
70 Bio.biblio
71 ==========
b927439 @peterjc Bio.WWW deprecation, and updating old entries to say when they were r…
peterjc authored
72 Deprecated as of Release 1.45, removed in Release 1.48
21059b1 @peterjc Bio.biblio was deprecated for Biopython 1.45, but I didn't remember t…
peterjc authored
73
4556db2 @peterjc Bringing these up to date with changes since Biopython 1.44
peterjc authored
74 Bio.WWW
75 =======
b927439 @peterjc Bio.WWW deprecation, and updating old entries to say when they were r…
peterjc authored
76 The modules under Bio.WWW were deprecated in Release 1.45, and removed in 1.48.
77 The remaining stub Bio.WWW was deprecated in Release 1.48.
78
4556db2 @peterjc Bringing these up to date with changes since Biopython 1.44
peterjc authored
79 The functionality in Bio.WWW.SCOP, Bio.WWW.InterPro and Bio.WWW.ExPASy
80 is now available from Bio.SCOP, Bio.InterPro and Bio.ExPASy instead.
81
5145a4d @peterjc Bringing this up to date for Biopython 1.44
peterjc authored
82 Bio.SeqIO
83 =========
84 The old Bio.SeqIO.FASTA and Bio.SeqIO.generic were deprecated in favour of
b927439 @peterjc Bio.WWW deprecation, and updating old entries to say when they were r…
peterjc authored
85 the new Bio.SeqIO module as of Release 1.44, removed in Release 1.47
5145a4d @peterjc Bringing this up to date for Biopython 1.44
peterjc authored
86
87 Bio.lcc
88 =======
b927439 @peterjc Bio.WWW deprecation, and updating old entries to say when they were r…
peterjc authored
89 Deprecated in favor of Bio.SeqUtils.lcc in Release 1.44, removed in 1.46
5145a4d @peterjc Bringing this up to date for Biopython 1.44
peterjc authored
90
91 Bio.crc
92 =======
b927439 @peterjc Bio.WWW deprecation, and updating old entries to say when they were r…
peterjc authored
93 Deprecated in favor of Bio.SeqUtils.CheckSum in Release 1.44, removed in 1.46
5145a4d @peterjc Bringing this up to date for Biopython 1.44
peterjc authored
94
95 Bio.FormatIO
96 ============
97 This was removed in Release 1.44
98
fa7ab2d @peterjc Updating for recent deprecations
peterjc authored
99 Bio.expressions (and therefore Bio.config, Bio.dbdefs, Bio.formatdefs, Bio.dbdefs)
5145a4d @peterjc Bringing this up to date for Biopython 1.44
peterjc authored
100 ===============
101 This has been deprecated as of Release 1.44
102
103 Bio.Kabat
104 =========
105 This was deprecated in Release 1.43 and removed in Release 1.44
106
34b4f31 Added the functions 'complement' and 'reverse_complement' to Bio.Seq'…
mdehoon authored
107 Bio.SeqUtils
108 ============
109 The functions 'complement' and 'antiparallel' in Bio.SeqUtils have been
110 deprecated as of Release 1.31. Use the functions 'complement' and
111 'reverse_complement' in Bio.Seq instead.
112
113 Bio.GFF
114 =======
115 The functions 'forward_complement' and 'antiparallel' in Bio.GFF.easy have been
116 deprecated as of Release 1.31. Use the functions 'complement' and
117 'reverse_complement' in Bio.Seq instead.
efd9b60 Added blast to qblast change to DEPRECATED file
chapmanb authored
118
1c02f0c Bio.sequtils and Bio.SeqUtils were duplicated code, and even worse we…
chapmanb authored
119 Bio.sequtils
b0acc00 Added instructions on how to move to Bio.Cluster from Bio.kMeans and
mdehoon authored
120 ============
b927439 @peterjc Bio.WWW deprecation, and updating old entries to say when they were r…
peterjc authored
121 Deprecated as of Release 1.30, removed in Release 1.42
1c02f0c Bio.sequtils and Bio.SeqUtils were duplicated code, and even worse we…
chapmanb authored
122 Use Bio.SeqUtils instead.
b0acc00 Added instructions on how to move to Bio.Cluster from Bio.kMeans and
mdehoon authored
123
909bae9 Deprecated Bio.SVM and recommend usage of libsvm.
chapmanb authored
124 Bio.SVM
125 =======
b927439 @peterjc Bio.WWW deprecation, and updating old entries to say when they were r…
peterjc authored
126 Deprecated as of Release 1.30, removed in Release 1.42
909bae9 Deprecated Bio.SVM and recommend usage of libsvm.
chapmanb authored
127 The Support Vector Machine code in Biopython has been superceeded by a
128 more robust (and maintained) SVM library, which includes a python
129 interface. We recommend using LIBSVM:
130
131 http://www.csie.ntu.edu.tw/~cjlin/libsvm/
b0acc00 Added instructions on how to move to Bio.Cluster from Bio.kMeans and
mdehoon authored
132
23b046b Removed internal references to RecordFile, which are really not needed.
chapmanb authored
133 Bio.RecordFile
134 ==============
b927439 @peterjc Bio.WWW deprecation, and updating old entries to say when they were r…
peterjc authored
135 Deprecated as of Release 1.30, removed in Release 1.42
23b046b Removed internal references to RecordFile, which are really not needed.
chapmanb authored
136 RecordFile wasn't completely implemented and duplicates the work
137 of most standard parsers. We recommend using a specific iterator
138 (Bio.Fasta.Iterator for example) without a parser to get back
139 text records.
140
b0acc00 Added instructions on how to move to Bio.Cluster from Bio.kMeans and
mdehoon authored
141 Bio.kMeans and Bio.xkMeans
142 ==========================
b927439 @peterjc Bio.WWW deprecation, and updating old entries to say when they were r…
peterjc authored
143 Deprecated as of Release 1.30, removed in Release 1.42
b0acc00 Added instructions on how to move to Bio.Cluster from Bio.kMeans and
mdehoon authored
144
145 The k-Means algorithm is an algorithm for unsupervised clustering of data.
146 Biopython includes an implementation of the k-means clustering algorithm
147 in kMeans.py. Recently, a larger set of clustering algorithms entered
148 Biopython as Bio.Cluster. As the kcluster routine in Bio.Cluster also implements
149 the k-means clustering algorithm, the kMeans.py module has been deprecated.
150 Below you will find a description of how to switch from kMeans.py to
151 Bio.Cluster's kcluster.
152
153 The function kcluster in Bio.Cluster performs k-means or k-medians clustering.
154 The corresponding function in kMeans.py is called cluster. This function takes
155 the following arguments:
156
157 o data
158 o k
159 o distance_fn
160 o init_centroids_fn
161 o calc_centroid_fn
162 o max_iterations
163 o update_fn
164
165 The function kcluster in Bio.Cluster takes the following arguments:
166
167 o data
168 o nclusters
169 o mask
170 o weight
171 o transpose
172 o npass
173 o method
174 o dist
175 o initialid
176
177
178 Arguments for kMeans.py's cluster, and their equivalents in Bio.Cluster
179 -----------------------------------------------------------------------
180
181
182 o data:
183
184 In kMeans.py, data is a list of vectors, each containing the same number of
185 data points. Within the context of clustering genes based on their gene
186 expression values, each vector would correspond to the gene expression data of
187 one particular gene, and the values in the vector would correspond to the
188 measured gene expression value by the different microarrays. The cluster
189 routine in kMeans.py always performs a row-wise clustering by grouping vectors.
190
191 The argument data to Bio.Cluster's kcluster has the same structure as in
192 kMeans.py. However, Bio.Cluster allows row-wise and column-wise clustering by
193 the transpose argument. If transpose==0 (the default value), kcluster performs
194 row-wise clustering, consistent with kMeans.py. If transpose==1, kcluster
195 performs column-wise clustering. The same behavior can be obtained, of course,
196 by transposing the data array before calling kcluster.
197
198
199 o k:
200
201 The desired number of clusters is specified by the input argument k in
202 kMeans.py. The corresponding argument in Bio.Cluster's kcluster is nclusters.
203
204 o distance_fn:
205
206 In kMeans.py, the argument distance_fn represents the distance function to
207 calculate the distances between items and cluster centroids. This argument
208 corresponds to a true Python function. The default value is the Euclidean
209 distance, implemented as distance.euclidean in distance.py. User-defined
210 distance functions can also be used.
211
212 The k-means routine in Bio.Cluster does not allow user-specified distance
213 functions. Instead, it provides the following nine built-in distance functions,
214 depending on the argument dist:
215
216 dist=='e': Euclidean distance
217 dist=='h': Harmonically summed Euclidean distance
218 dist=='b': City-block distance
219 dist=='c': Pearson correlation
220 dist=='a': absolute value of the Pearson correlation
221 dist=='u': uncentered correlation
222 dist=='x': absolute uncentered correlation
223 dist=='s': Spearmans rank correlation
224 dist=='k': Kendalls tau
225
226 User-defined distance functions are possible only by modifying the C code in
227 cluster.c (which may not be as hard as it sounds). The default distance function
228 is the Euclidean distance (distance=='e'). Note that in Bio.Cluster the
229 Euclidean distance is defined as the sum of squared differences, whereas in
230 kMeans.py the square root of this quantity is taken. This does not affect the
231 clustering result.
232
233 o init_centroids_fn:
234
235 This function specifies the initial choice for the cluster centroids. By
236 default, cluster in kMeans.py uses a random initial choice of cluster centroids
237 by randomly choosing k data vectors from the input vectors in the data input
238 argument. Alternatively, the user can specify a user-defined function to choose
239 the initial cluster centroids.
240
241 In Bio.Cluster, the k-means algorithm in kcluster starts from an initial cluster
242 assignment instead of an initial choice of cluster centroids. As far as I know,
243 these two initialization methods are equivalent in practice. Similar to the
244 cluster routine in kMeans.py, Bio.Cluster's kcluster performs a random initial
245 assignment of items to clusters. Alternatively, users can specify a
246 (deterministic) initial clustering via the initialid argument. This argument is
247 None by default. If not None, it should be a 1D array (or list) containing the
248 number (between 0 and nclusters-1) of the cluster to which each item is
249 assigned initially.
250
251 Note that the k-means routine in Bio.Cluster performs automatic repeats of the
252 algorithm, each time starting from a different random initial clustering. See
253 the comment for the npass argument below.
254
255 o calc_centroid_fn:
256
257 This argument specifies how to calculate the cluster centroids, given the data
258 vectors of the items that belong to each cluster. By default, the mean over the
259 vectors is calculated. A user-defined function can also be used.
260
261 Bio.Cluster's kcluster does not allow user-defined functions. Instead, the
262 method to calculate the cluster centroid is determined by the argument method,
263 which can be either 'a' (arithmetic mean) or 'm' (median). The default is to
264 calculate the mean ('a').
265
266 o max_iterations:
267
268 The cluster routine in kMeans.py has an argument max_iterations, which is used
269 to stop the iteration it the routine does not converge after the given number of
270 iterations.
271
272 The kcluster routine in Bio.Cluster does not have such an argument. The failure
273 of a k-means algorithm to converge is due to the occurrence of periodic
274 clustering solutions during the course of the k-means algorithm. The kcluster
275 routine in Bio.Cluster automatically checks for the occurrence of such a
276 periodicity in the solutions. If a periodic behavior is detected, the algorithm
277 is interrupted and the last clustering solution is returned. Accordingly, the
278 kcluster routine is guaranteed to return a clustering solution. Also see the
279 discussion of the npass argument below.
280
281 o update_fn:
282
283 The argument update_fn to cluster in kMeans.py is a hook function that is
284 called at the beginning of every iteration and passed the iteration number,
285 cluster centroids, and current cluster assignments. It is used by xkMeans.py,
286 which provides a visualization of k-means clustering. Currently there is no
287 equivalent in Bio.Cluster.
288
289
290 Other arguments for Bio.Cluster's kcluster.
291 -------------------------------------------
292
293 Three arguments in Bio.Cluster's kcluster do not have a direct equivalent in
294 kMeans.py's cluster.
295
296 o mask:
297
298 Microarray experiments tend to suffer from a large number of missing data. The
299 argument mask to Bio.Cluster's kcluster lets the user specify which data are
300 missing. This argument is an array with the same shape as data, and contains
301 a 1 for each data point that is present, and a 0 for a missing data point:
302
303 mask[i,j]==1: data[i,j] is valid
304 mask[i,j]==0: data[i,j] is a missing data point
305
306 Missing data points are ignored by the clustering algorithm. By default, mask
307 is an array containing 1's everywhere.
308
309 o weight:
310
311 The weight argument is used to put different weights on different data point.
312 For example, when clustering genes based on their gene expression profile, we
313 may want to attach a bigger weight to some microarrays compared to others. By
314 default, the weight argument contains equal weights of 1.0 for all data points.
315 Note that for row-wise clustering, the weight argument is a 1D vector whose
316 length is equal to the number of columns. For column-wise clustering, the length
317 of this argument is equal to the number of rows.
318
319 o npass:
320
321 Typical implementations of the k-means clustering algorithm rely on a random
322 initialization. Unlike Self-Organizing Maps, however, the k-means algorithm has
323 a clearly defined goal, which is to minimize the within-cluster sum of
324 distances. Different k-means clustering solutions (based on different initial
325 clusterings) can therefore be compared to each other directly. In order to
326 increase the chance of finding the optimal k-means clustering solution, the
327 k-means routine in Bio.Cluster automatically repeats the algorithm npass times,
328 each time starting from a different initial random clustering. The best
329 clustering solution, as well as in how many of the npass attempts it was found,
330 is returned to the user. For more information, see the output variable nfound
331 below.
332
333
334 Return values
335 -------------
336
337 The cluster routine in kMeans.py returns two values:
338
339 o centroids
340 o clusters
341
342 The kcluster routine in Bio.Cluster returns four values:
343
344 o clusterid
345 o centroids
346 o error
347 o nfound
348
349
350 o centroids:
351
352 The centroids return value contains the centroids of the k clusters that were
353 found, and corresponds to the centroids return value from Bio.Cluster's
354 kcluster routine.
355
356 o clusters:
357
358 The clusters return value contains the number of the cluster to which each
359 vector was assigned. The corresponding return value in Bio.Cluster's kcluster
360 is clusterid.
361
362 o error:
363
364 The error return value from Bio.Cluster's kcluster is the within-cluster sum of
365 distances for the optimal clustering solution that was found. This value can be
366 used to compare different clustering solutions to each other.
367
368 o nfound:
369
370 The nfound return value from Bio.Cluster's kcluster shows in how many of the
371 npass runs the optimal clustering solution was found. Accordingly, nfound is at
372 least 1 and at most equal to npass. A large value for nfound is an indication
373 that the clustering solution that was found is optimal. On the other hand, if
374 nfound is equal to 1, it is very well possible that a better clustering solution
375 exists than the one found by kcluster.
Something went wrong with that request. Please try again.