Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Newer
Older
100644 361 lines (276 sloc) 13.218 kb
1c02f0c Bio.sequtils and Bio.SeqUtils were duplicated code, and even worse we…
chapmanb authored
1 This file provides documentation for modules in Biopython that have been moved
2 or deprecated in favor of other modules. This provides some quick and easy
3 to find documentation about how to update your code to work again.
4
81015eb @peterjc Declaring Bio.PubMed and the online parts of Bio.GenBank as OBSOLETE,…
peterjc authored
5 Bio.GenBank
6 ===========
7 The online functionality (search_for, download_many, and NCBIDictionary) was
8 declared obsolete in Release 1.48, with the intention of an official deprecation
9 in the following release. Please use Bio.Entrez instead.
10
11 Bio.PubMed
12 ==========
13 Declared obsolete in Release 1.48, with the intention of an official deprecation
14 in the following release. Please use Bio.Entrez instead.
15
01e6e76 @peterjc Bio.EUtils deprecated in favour of Bio.Entrez
peterjc authored
16 Bio.EUtils
17 ==========
18 Deprecated in favor of Bio.Entrez in Release 1.48
19
fa7ab2d @peterjc Updating for recent deprecations
peterjc authored
20 Bio.Blast.NCBIWWW
21 =================
22 The HTML BLAST parser was deprecated as of Release 1.48
b927439 @peterjc Bio.WWW deprecation, and updating old entries to say when they were r…
peterjc authored
23 The deprecated functions blast and blasturl were removed in Release 1.44
fa7ab2d @peterjc Updating for recent deprecations
peterjc authored
24
25 Bio.Saf
26 =======
27 Deprecated as of Release 1.48, as it appears to have no users, and relies
28 on Martel which doesn't work properly with mxTextTools 3.0
29
ad46521 @peterjc Deprecating Bio.NBRF in favour of the 'pir' format in Bio.SeqIO
peterjc authored
30 Bio.NBRF
31 ========
32 Deprecated as of Release 1.48 in favor of the "pir" format in Bio.SeqIO
33
5be4221 @peterjc Deprecating Bio.IntelliGenetics in favour of the ig format in Bio.SeqIO
peterjc authored
34 Bio.IntelliGenetics
35 ===================
fa7ab2d @peterjc Updating for recent deprecations
peterjc authored
36 Deprecated as of Release 1.48 in favor of the "ig" format in Bio.SeqIO
d01c450 Getting ready for release 1.46.
mdehoon authored
37
5e507c9 Updating for release 1.47.
mdehoon authored
38 Bio.ECell
39 =========
40 Deprecated as of Release 1.47, as it appears to have no users, and the code
41 does not seem relevant for ECell 3.
42
d01c450 Getting ready for release 1.46.
mdehoon authored
43 Bio.Rebase
44 ==========
45 Deprecated as of Release 1.46.
46
47 Bio.Gobase
48 ==========
49 Deprecated as of Release 1.46.
50
51 Bio.CDD
52 =======
53 Deprecated as of Release 1.46.
54
21059b1 @peterjc Bio.biblio was deprecated for Biopython 1.45, but I didn't remember t…
peterjc authored
55 Bio.biblio
56 ==========
b927439 @peterjc Bio.WWW deprecation, and updating old entries to say when they were r…
peterjc authored
57 Deprecated as of Release 1.45, removed in Release 1.48
21059b1 @peterjc Bio.biblio was deprecated for Biopython 1.45, but I didn't remember t…
peterjc authored
58
4556db2 @peterjc Bringing these up to date with changes since Biopython 1.44
peterjc authored
59 Bio.WWW
60 =======
b927439 @peterjc Bio.WWW deprecation, and updating old entries to say when they were r…
peterjc authored
61 The modules under Bio.WWW were deprecated in Release 1.45, and removed in 1.48.
62 The remaining stub Bio.WWW was deprecated in Release 1.48.
63
4556db2 @peterjc Bringing these up to date with changes since Biopython 1.44
peterjc authored
64 The functionality in Bio.WWW.SCOP, Bio.WWW.InterPro and Bio.WWW.ExPASy
65 is now available from Bio.SCOP, Bio.InterPro and Bio.ExPASy instead.
66
5145a4d @peterjc Bringing this up to date for Biopython 1.44
peterjc authored
67 Bio.SeqIO
68 =========
69 The old Bio.SeqIO.FASTA and Bio.SeqIO.generic were deprecated in favour of
b927439 @peterjc Bio.WWW deprecation, and updating old entries to say when they were r…
peterjc authored
70 the new Bio.SeqIO module as of Release 1.44, removed in Release 1.47
5145a4d @peterjc Bringing this up to date for Biopython 1.44
peterjc authored
71
72 Bio.lcc
73 =======
b927439 @peterjc Bio.WWW deprecation, and updating old entries to say when they were r…
peterjc authored
74 Deprecated in favor of Bio.SeqUtils.lcc in Release 1.44, removed in 1.46
5145a4d @peterjc Bringing this up to date for Biopython 1.44
peterjc authored
75
76 Bio.crc
77 =======
b927439 @peterjc Bio.WWW deprecation, and updating old entries to say when they were r…
peterjc authored
78 Deprecated in favor of Bio.SeqUtils.CheckSum in Release 1.44, removed in 1.46
5145a4d @peterjc Bringing this up to date for Biopython 1.44
peterjc authored
79
80 Bio.FormatIO
81 ============
82 This was removed in Release 1.44
83
fa7ab2d @peterjc Updating for recent deprecations
peterjc authored
84 Bio.expressions (and therefore Bio.config, Bio.dbdefs, Bio.formatdefs, Bio.dbdefs)
5145a4d @peterjc Bringing this up to date for Biopython 1.44
peterjc authored
85 ===============
86 This has been deprecated as of Release 1.44
87
88 Bio.Kabat
89 =========
90 This was deprecated in Release 1.43 and removed in Release 1.44
91
34b4f31 Added the functions 'complement' and 'reverse_complement' to Bio.Seq'…
mdehoon authored
92 Bio.SeqUtils
93 ============
94 The functions 'complement' and 'antiparallel' in Bio.SeqUtils have been
95 deprecated as of Release 1.31. Use the functions 'complement' and
96 'reverse_complement' in Bio.Seq instead.
97
98 Bio.GFF
99 =======
100 The functions 'forward_complement' and 'antiparallel' in Bio.GFF.easy have been
101 deprecated as of Release 1.31. Use the functions 'complement' and
102 'reverse_complement' in Bio.Seq instead.
efd9b60 Added blast to qblast change to DEPRECATED file
chapmanb authored
103
1c02f0c Bio.sequtils and Bio.SeqUtils were duplicated code, and even worse we…
chapmanb authored
104 Bio.sequtils
b0acc00 Added instructions on how to move to Bio.Cluster from Bio.kMeans and
mdehoon authored
105 ============
b927439 @peterjc Bio.WWW deprecation, and updating old entries to say when they were r…
peterjc authored
106 Deprecated as of Release 1.30, removed in Release 1.42
1c02f0c Bio.sequtils and Bio.SeqUtils were duplicated code, and even worse we…
chapmanb authored
107 Use Bio.SeqUtils instead.
b0acc00 Added instructions on how to move to Bio.Cluster from Bio.kMeans and
mdehoon authored
108
909bae9 Deprecated Bio.SVM and recommend usage of libsvm.
chapmanb authored
109 Bio.SVM
110 =======
b927439 @peterjc Bio.WWW deprecation, and updating old entries to say when they were r…
peterjc authored
111 Deprecated as of Release 1.30, removed in Release 1.42
909bae9 Deprecated Bio.SVM and recommend usage of libsvm.
chapmanb authored
112 The Support Vector Machine code in Biopython has been superceeded by a
113 more robust (and maintained) SVM library, which includes a python
114 interface. We recommend using LIBSVM:
115
116 http://www.csie.ntu.edu.tw/~cjlin/libsvm/
b0acc00 Added instructions on how to move to Bio.Cluster from Bio.kMeans and
mdehoon authored
117
23b046b Removed internal references to RecordFile, which are really not needed.
chapmanb authored
118 Bio.RecordFile
119 ==============
b927439 @peterjc Bio.WWW deprecation, and updating old entries to say when they were r…
peterjc authored
120 Deprecated as of Release 1.30, removed in Release 1.42
23b046b Removed internal references to RecordFile, which are really not needed.
chapmanb authored
121 RecordFile wasn't completely implemented and duplicates the work
122 of most standard parsers. We recommend using a specific iterator
123 (Bio.Fasta.Iterator for example) without a parser to get back
124 text records.
125
b0acc00 Added instructions on how to move to Bio.Cluster from Bio.kMeans and
mdehoon authored
126 Bio.kMeans and Bio.xkMeans
127 ==========================
b927439 @peterjc Bio.WWW deprecation, and updating old entries to say when they were r…
peterjc authored
128 Deprecated as of Release 1.30, removed in Release 1.42
b0acc00 Added instructions on how to move to Bio.Cluster from Bio.kMeans and
mdehoon authored
129
130 The k-Means algorithm is an algorithm for unsupervised clustering of data.
131 Biopython includes an implementation of the k-means clustering algorithm
132 in kMeans.py. Recently, a larger set of clustering algorithms entered
133 Biopython as Bio.Cluster. As the kcluster routine in Bio.Cluster also implements
134 the k-means clustering algorithm, the kMeans.py module has been deprecated.
135 Below you will find a description of how to switch from kMeans.py to
136 Bio.Cluster's kcluster.
137
138 The function kcluster in Bio.Cluster performs k-means or k-medians clustering.
139 The corresponding function in kMeans.py is called cluster. This function takes
140 the following arguments:
141
142 o data
143 o k
144 o distance_fn
145 o init_centroids_fn
146 o calc_centroid_fn
147 o max_iterations
148 o update_fn
149
150 The function kcluster in Bio.Cluster takes the following arguments:
151
152 o data
153 o nclusters
154 o mask
155 o weight
156 o transpose
157 o npass
158 o method
159 o dist
160 o initialid
161
162
163 Arguments for kMeans.py's cluster, and their equivalents in Bio.Cluster
164 -----------------------------------------------------------------------
165
166
167 o data:
168
169 In kMeans.py, data is a list of vectors, each containing the same number of
170 data points. Within the context of clustering genes based on their gene
171 expression values, each vector would correspond to the gene expression data of
172 one particular gene, and the values in the vector would correspond to the
173 measured gene expression value by the different microarrays. The cluster
174 routine in kMeans.py always performs a row-wise clustering by grouping vectors.
175
176 The argument data to Bio.Cluster's kcluster has the same structure as in
177 kMeans.py. However, Bio.Cluster allows row-wise and column-wise clustering by
178 the transpose argument. If transpose==0 (the default value), kcluster performs
179 row-wise clustering, consistent with kMeans.py. If transpose==1, kcluster
180 performs column-wise clustering. The same behavior can be obtained, of course,
181 by transposing the data array before calling kcluster.
182
183
184 o k:
185
186 The desired number of clusters is specified by the input argument k in
187 kMeans.py. The corresponding argument in Bio.Cluster's kcluster is nclusters.
188
189 o distance_fn:
190
191 In kMeans.py, the argument distance_fn represents the distance function to
192 calculate the distances between items and cluster centroids. This argument
193 corresponds to a true Python function. The default value is the Euclidean
194 distance, implemented as distance.euclidean in distance.py. User-defined
195 distance functions can also be used.
196
197 The k-means routine in Bio.Cluster does not allow user-specified distance
198 functions. Instead, it provides the following nine built-in distance functions,
199 depending on the argument dist:
200
201 dist=='e': Euclidean distance
202 dist=='h': Harmonically summed Euclidean distance
203 dist=='b': City-block distance
204 dist=='c': Pearson correlation
205 dist=='a': absolute value of the Pearson correlation
206 dist=='u': uncentered correlation
207 dist=='x': absolute uncentered correlation
208 dist=='s': Spearmans rank correlation
209 dist=='k': Kendalls tau
210
211 User-defined distance functions are possible only by modifying the C code in
212 cluster.c (which may not be as hard as it sounds). The default distance function
213 is the Euclidean distance (distance=='e'). Note that in Bio.Cluster the
214 Euclidean distance is defined as the sum of squared differences, whereas in
215 kMeans.py the square root of this quantity is taken. This does not affect the
216 clustering result.
217
218 o init_centroids_fn:
219
220 This function specifies the initial choice for the cluster centroids. By
221 default, cluster in kMeans.py uses a random initial choice of cluster centroids
222 by randomly choosing k data vectors from the input vectors in the data input
223 argument. Alternatively, the user can specify a user-defined function to choose
224 the initial cluster centroids.
225
226 In Bio.Cluster, the k-means algorithm in kcluster starts from an initial cluster
227 assignment instead of an initial choice of cluster centroids. As far as I know,
228 these two initialization methods are equivalent in practice. Similar to the
229 cluster routine in kMeans.py, Bio.Cluster's kcluster performs a random initial
230 assignment of items to clusters. Alternatively, users can specify a
231 (deterministic) initial clustering via the initialid argument. This argument is
232 None by default. If not None, it should be a 1D array (or list) containing the
233 number (between 0 and nclusters-1) of the cluster to which each item is
234 assigned initially.
235
236 Note that the k-means routine in Bio.Cluster performs automatic repeats of the
237 algorithm, each time starting from a different random initial clustering. See
238 the comment for the npass argument below.
239
240 o calc_centroid_fn:
241
242 This argument specifies how to calculate the cluster centroids, given the data
243 vectors of the items that belong to each cluster. By default, the mean over the
244 vectors is calculated. A user-defined function can also be used.
245
246 Bio.Cluster's kcluster does not allow user-defined functions. Instead, the
247 method to calculate the cluster centroid is determined by the argument method,
248 which can be either 'a' (arithmetic mean) or 'm' (median). The default is to
249 calculate the mean ('a').
250
251 o max_iterations:
252
253 The cluster routine in kMeans.py has an argument max_iterations, which is used
254 to stop the iteration it the routine does not converge after the given number of
255 iterations.
256
257 The kcluster routine in Bio.Cluster does not have such an argument. The failure
258 of a k-means algorithm to converge is due to the occurrence of periodic
259 clustering solutions during the course of the k-means algorithm. The kcluster
260 routine in Bio.Cluster automatically checks for the occurrence of such a
261 periodicity in the solutions. If a periodic behavior is detected, the algorithm
262 is interrupted and the last clustering solution is returned. Accordingly, the
263 kcluster routine is guaranteed to return a clustering solution. Also see the
264 discussion of the npass argument below.
265
266 o update_fn:
267
268 The argument update_fn to cluster in kMeans.py is a hook function that is
269 called at the beginning of every iteration and passed the iteration number,
270 cluster centroids, and current cluster assignments. It is used by xkMeans.py,
271 which provides a visualization of k-means clustering. Currently there is no
272 equivalent in Bio.Cluster.
273
274
275 Other arguments for Bio.Cluster's kcluster.
276 -------------------------------------------
277
278 Three arguments in Bio.Cluster's kcluster do not have a direct equivalent in
279 kMeans.py's cluster.
280
281 o mask:
282
283 Microarray experiments tend to suffer from a large number of missing data. The
284 argument mask to Bio.Cluster's kcluster lets the user specify which data are
285 missing. This argument is an array with the same shape as data, and contains
286 a 1 for each data point that is present, and a 0 for a missing data point:
287
288 mask[i,j]==1: data[i,j] is valid
289 mask[i,j]==0: data[i,j] is a missing data point
290
291 Missing data points are ignored by the clustering algorithm. By default, mask
292 is an array containing 1's everywhere.
293
294 o weight:
295
296 The weight argument is used to put different weights on different data point.
297 For example, when clustering genes based on their gene expression profile, we
298 may want to attach a bigger weight to some microarrays compared to others. By
299 default, the weight argument contains equal weights of 1.0 for all data points.
300 Note that for row-wise clustering, the weight argument is a 1D vector whose
301 length is equal to the number of columns. For column-wise clustering, the length
302 of this argument is equal to the number of rows.
303
304 o npass:
305
306 Typical implementations of the k-means clustering algorithm rely on a random
307 initialization. Unlike Self-Organizing Maps, however, the k-means algorithm has
308 a clearly defined goal, which is to minimize the within-cluster sum of
309 distances. Different k-means clustering solutions (based on different initial
310 clusterings) can therefore be compared to each other directly. In order to
311 increase the chance of finding the optimal k-means clustering solution, the
312 k-means routine in Bio.Cluster automatically repeats the algorithm npass times,
313 each time starting from a different initial random clustering. The best
314 clustering solution, as well as in how many of the npass attempts it was found,
315 is returned to the user. For more information, see the output variable nfound
316 below.
317
318
319 Return values
320 -------------
321
322 The cluster routine in kMeans.py returns two values:
323
324 o centroids
325 o clusters
326
327 The kcluster routine in Bio.Cluster returns four values:
328
329 o clusterid
330 o centroids
331 o error
332 o nfound
333
334
335 o centroids:
336
337 The centroids return value contains the centroids of the k clusters that were
338 found, and corresponds to the centroids return value from Bio.Cluster's
339 kcluster routine.
340
341 o clusters:
342
343 The clusters return value contains the number of the cluster to which each
344 vector was assigned. The corresponding return value in Bio.Cluster's kcluster
345 is clusterid.
346
347 o error:
348
349 The error return value from Bio.Cluster's kcluster is the within-cluster sum of
350 distances for the optimal clustering solution that was found. This value can be
351 used to compare different clustering solutions to each other.
352
353 o nfound:
354
355 The nfound return value from Bio.Cluster's kcluster shows in how many of the
356 npass runs the optimal clustering solution was found. Accordingly, nfound is at
357 least 1 and at most equal to npass. A large value for nfound is an indication
358 that the clustering solution that was found is optimal. On the other hand, if
359 nfound is equal to 1, it is very well possible that a better clustering solution
360 exists than the one found by kcluster.
Something went wrong with that request. Please try again.