forked from piskvorky/gensim
/
about.html
215 lines (184 loc) · 10.1 KB
/
about.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>About — gensim</title>
<link rel="stylesheet" href="_static/default.css" type="text/css" />
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
<script type="text/javascript">
var DOCUMENTATION_OPTIONS = {
URL_ROOT: '',
VERSION: '0.8.5',
COLLAPSE_INDEX: false,
FILE_SUFFIX: '.html',
HAS_SOURCE: true
};
</script>
<script type="text/javascript" src="http://ajax.googleapis.com/ajax/libs/jquery/1.4.2/jquery.min.js"></script>
<script type="text/javascript" src="_static/jquery.js"></script>
<script type="text/javascript" src="_static/underscore.js"></script>
<script type="text/javascript" src="_static/doctools.js"></script>
<link rel="author" title="About these documents" href="#" />
<link rel="top" title="gensim" href="index.html" />
<!-- twitter search widget
<script type="text/javascript" src="_static/widget.js"></script>
-->
<meta property="og:title" content="#gensim" />
<meta property="og:description" content="Efficient topic modelling in Python" />
<script type="text/javascript">
var _gaq = _gaq || [];
_gaq.push(['_setAccount', 'UA-24066335-1']);
_gaq.push(['_trackPageview']);
(function() {
var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
})();
</script>
</head>
<body>
<div class="related">
<h3>Navigation</h3>
<ul>
<li class="right" style="margin-right: 10px">
<a href="genindex.html" title="General Index"
accesskey="I">index</a></li>
<li class="right" >
<a href="py-modindex.html" title="Python Module Index"
>modules</a> |</li>
<li><a href="index.html">Gensim home</a>| </li>
<li><a href="tutorial.html">Tutorials</a>| </li>
<li><a href="http://groups.google.com/group/gensim">Support</a>| </li>
<li><a href="https://github.com/piskvorky/gensim/wiki">Contribute</a>| </li>
<li><a href="apiref.html">API reference</a>»</li>
</ul>
</div>
<div class="sphinxsidebar">
<div class="sphinxsidebarwrapper">
<h3><a href="index.html">Table Of Contents</a></h3>
<ul>
<li><a class="reference internal" href="#">About</a><ul>
<li><a class="reference internal" href="#history">History</a></li>
<li><a class="reference internal" href="#licensing">Licensing</a></li>
<li><a class="reference internal" href="#contributors">Contributors</a></li>
<li><a class="reference internal" href="#academic-citing">Academic citing</a></li>
</ul>
</li>
</ul>
<div id="searchbox" style="display: none">
<h3>Quick search</h3>
<form class="search" action="search.html" method="get">
<input type="text" name="q" size="24" />
<input type="submit" value="Go" />
<input type="hidden" name="check_keywords" value="yes" />
<input type="hidden" name="area" value="default" />
</form>
<p class="searchtip" style="font-size: 90%">
Enter search terms or a module, class or function name.
</p>
</div>
<script type="text/javascript">$('#searchbox').show(0);</script>
</div>
</div>
<div class="document">
<div class="documentwrapper">
<div class="bodywrapper">
<div class="body">
<div class="section" id="about">
<span id="id1"></span><h1>About<a class="headerlink" href="#about" title="Permalink to this headline">¶</a></h1>
<div class="section" id="history">
<h2>History<a class="headerlink" href="#history" title="Permalink to this headline">¶</a></h2>
<p>Gensim started off as a collection of various Python scripts for the Czech Digital Mathematics Library <a class="reference external" href="http://dml.cz/">dml.cz</a> in 2008,
where it served to generate a short list of the most similar articles to a given article (gensim = “generate similar”).
I also wanted to try these fancy “Latent Semantic Methods”, but the libraries that
realized the necessary computation were <a class="reference external" href="http://soi.stanford.edu/~rmunk/PROPACK/">not much fun to work with</a>.</p>
<p>Naturally, I set out to reinvent the wheel. Our <a class="reference external" href="http://radimrehurek.com/gensim/lrec2010_final.pdf">2010 LREC publication</a>
describes the initial design decisions behind gensim (clarity, efficiency and scalability)
and is fairly representative of how gensim works even today.</p>
<p>Later versions of gensim improved this efficiency and scalability tremendously (in fact,
I made algorithmic scalability of distributional semantics the topic of my <a class="reference external" href="http://radimrehurek.com/phd_rehurek.pdf">PhD thesis</a>).</p>
<p>By now, gensim is—to my knowledge—the most robust, efficient and hassle-free piece
of software to realize unsupervised semantic modelling from plain text. It stands
in contrast to brittle homework-assignment-implementations that do not scale on one hand,
and robust java-esque projects that do scale, but only if you’re willing to sacrifice
several weeks of your, your technician’s as well as your local scientist’s time just to run “hello world”.</p>
<p>In 2011, I started using <a class="reference external" href="https://github.com/piskvorky/gensim">Github</a> for source code hosting,
and the gensim website moved from my university hosting to its present domain.</p>
</div>
<div class="section" id="licensing">
<h2>Licensing<a class="headerlink" href="#licensing" title="Permalink to this headline">¶</a></h2>
<p>Gensim is licensed under the OSI-approved <a class="reference external" href="http://www.gnu.org/licenses/lgpl.html">GNU LGPL license</a>.
This means that it’s free for both personal and commercial use, but if you make any
modification to gensim that you distribute to other people, you have to disclose
the source code of these modifications.</p>
<p>Apart form that, you are free to redistribute gensim in any way you like, though you’re
not allowed to modify its license (doh!).</p>
<p>My intent here is, of course, to get more help and community involvement with the development of gensim.
The legalese is therefore less important to me than your input and contributions.
Contact me if LGPL doesn’t fit your bill but you’d still like to use it – we’ll work something out.</p>
<div class="admonition-see-also admonition seealso">
<p class="first admonition-title">See also</p>
<p class="last">I also host a document similarity package <cite>gensim.simserver</cite>. This is a high-level
interface to <cite>gensim</cite> functionality, and offers transactional remote (web-based)
document similarity queries and indexing. It uses gensim to do the heavy lifting:
you don’t need the <cite>simserver</cite> to use gensim, but you do need gensim to use the <cite>simserver</cite>.
Note that unlike gensim, <cite>gensim.simserver</cite> is licensed under <a class="reference external" href="http://www.gnu.org/licenses/agpl-3.0.html">Affero GPL</a>,
which is much more restrictive for inclusion in commercial projects.</p>
</div>
</div>
<div class="section" id="contributors">
<h2>Contributors<a class="headerlink" href="#contributors" title="Permalink to this headline">¶</a></h2>
<p>Credit goes to all the people who contributed to gensim, be it in <a class="reference external" href="http://groups.google.com/group/gensim">discussions</a>,
ideas, <a class="reference external" href="https://github.com/piskvorky/gensim/pulls">code contributions</a> or bug reports.
It’s really useful and motivating to get feedback, in any shape or form, so big thanks to you all!</p>
<p>Some honorable mentions are included in the <a class="reference external" href="https://github.com/piskvorky/gensim/blob/develop/CHANGELOG.txt">CHANGELOG.txt</a>.</p>
</div>
<div class="section" id="academic-citing">
<h2>Academic citing<a class="headerlink" href="#academic-citing" title="Permalink to this headline">¶</a></h2>
<p>Gensim has been used in many students’ final theses as well as research papers. When citing gensim,
please use <a class="reference external" href="bibtex_gensim.bib">this BibTeX entry</a>:</p>
<div class="highlight-python"><pre>@inproceedings{rehurek_lrec,
title = {{Software Framework for Topic Modelling with Large Corpora}},
author = {Radim {\v R}eh{\r u}{\v r}ek and Petr Sojka},
booktitle = {{Proceedings of the LREC 2010 Workshop on New
Challenges for NLP Frameworks}},
pages = {45--50},
year = 2010,
month = May,
day = 22,
publisher = {ELRA},
address = {Valletta, Malta},
note={\url{http://is.muni.cz/publication/884893/en}},
language={English}
}</pre>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="clearer"></div>
</div>
<div class="related">
<h3>Navigation</h3>
<ul>
<li class="right" style="margin-right: 10px">
<a href="genindex.html" title="General Index"
>index</a></li>
<li class="right" >
<a href="py-modindex.html" title="Python Module Index"
>modules</a> |</li>
<li><a href="index.html">Gensim home</a>| </li>
<li><a href="tutorial.html">Tutorials</a>| </li>
<li><a href="http://groups.google.com/group/gensim">Support</a>| </li>
<li><a href="https://github.com/piskvorky/gensim/wiki">Contribute</a>| </li>
<li><a href="apiref.html">API reference</a>»</li>
</ul>
</div>
<div class="footer">
© Copyright 2009-2012, Radim Řehůřek <radimrehurek(at)seznam.cz>.
Last updated on Jul 22, 2012.
</div>
</body>
</html>