Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
|Failed to load latest commit information.|
A github-based maven repositories for many artifacts that are useful, but are to date (03 April 2013) not yet available in public repositories (to the best of my knowledge). To depend on artifacts in this repo, please use: <repository> <id>antoine-tran-github-releases</id> <name>Tuan Tran's Personal Repository for Maven</name> <url> https://raw.github.com/antoine-tran/maven-repo/master/releases </url> </repository> The artifacts included are: 1. Javatools: A lightweight library that has many convenience Java classes for a variety of small tasks, such as parsing, database interaction or file handling. They were developed by Fabian M. Suchanek for the YAGO-NAGA project at Max-Planck Institut for Informatics. The Javatools are licensed under a Creative Commons Attribution 3.0 License by the YAGO-NAGA team. More details can be found at: http://www.mpi-inf.mpg.de/yago-naga/javatools/ This maven artifact redistributes the Javatools under the Creative Commons Attribution 3.0 License. It also extends the library by providing extra primitive data type supports for database handling. 2. Stanford NLP small maven artifacts: STanford NLP (http://nlp.stanford.edu/software/index.shtml) is a famous Java suite for NLP tasks such as POS tagging, named entity recognition, CRF word segmenter,… There are many libraries developed within the Stanford NLP frameworks which are very well written and highly reusable in several non NLP-related tasks as well. For example, the StringUtils class contains over 80 different utility methods for handling strings, or the optimization package provides many implementations of numerical minimizer: Stochastic Gradient Descent Minimizer, Online Limited-Memory Quasi-Newton BFGS, etc. Unfortunately, these libraries are not well known in Java community, and from time to time, I have seen many duplicated implementations in several frameworks, meaning hundreds hours have been wasted to re-invent one small thing. In a hope to support Java developers to build efficient algorithms quicker, I decided to re-distribute those universally useful tools in a public maven repository. I tried my best to keep track of the development of the original libraries, as well as to resolve the different versions of the library in different frameworks (for example, in Berkeley parser), however it is surely not always at its latest updates. Any contribution are welcome here. 3. Tuan4j: Working with Java for years, I have been collecting a number of small programs useful in many tasks, including those developed by me as well as by the Java community and are freely published elsewhere. I bundled them into one lightweight library, called tuan4j-core, tested them and optimized the components, making them highly reusable for any standard Java applications. As compared with Javatools, tuan4j emphasizes on small footprint, efficient primitive data type supports and standard JSRs compliance. Some libraries bundled in Tuan4j: - Apache Ant's Bzip2 files handling (the tool that is extremely helpful, but not visible to the Java community, simply because it was bundled within a misleading library name). - Collections with primitive data-type support: List, Pair, (Hash / Tree)Map (backed on Red-black tree implementation) - IO utilities: Serializable Null, convert streamed text file to an Iterator using non-block operators, lightweight command argument handler (in lieu of Apache Commons-util) - Memory-efficient generic data structures for machine learning algorithms: Feature set, data point in (very) high dimensional spaces - XML: JAXB marshalling / de-marshalling utilities - Math: Lightweight implementations of popular distributions (categorical, geometric, binomial, hypergeometric…), common algebraic operators (logarithmic operators,…), chi-square, sigmoid functions,… 4. Likelike: Likelike (http://code.google.com/p/likelike/) is an implementation of LSH (locality sensitive hashing) on Hadoop. It is public under the Apache License 2.0, and here you can find the refactored maven artifacts of the tool. Dependencies have been updated with newer versions of third-party libraries with several bug fixes and optimization, and it was engineered to consume less memory and better tailored to Hadoop stable versions 1.x 5. Hark-tweet-NLP: The public maven artifact of Hark-tweet-NLP (https://github.com/antoine-tran/hark-tweet-nlp), the Hadoop extension of CMU's ArkTweet (http://www.ark.cs.cmu.edu/TweetNLP/) is also put here. 6. Burstdetecion: A modified version of the Java implementation of Kleinberg's burst detection algorithm in text streams (J. Kleinberg. Bursty and Hierarchical Structure in Streams. Proc. 8th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, 2002), as originally developed by Aretha Alencar et al. at at CNS group, University of Indiana (http://wiki.cns.iu.edu/display/CISHELL/Burst+Detection), and later on by my colleague Gerhard Gossen at L3S Research Center (https://bitbucket.org/GerhardGossen). Some new features added: - Conversion of a time series from continuous values to binary values based on burst period checks, - A simplified implementation of the Bron–Kerbosch algorithm (http://en.wikipedia.org/wiki/Bron%E2%80%93Kerbosch_algorithm) to support the detection of disconnected bursts in arbitrary settings (original Kleinberg's algorithm implementation can result in overlapped bursts depending on the configuration of input states / gamma / density scaling parameters). 7. Cloud9-contrib: Cloud9 (http://lintool.github.io/Cloud9/) is an excellent Hadoop library for working on various big datasets (Wikipedia, ClueWeb,...). Cloud9-contrib is my (personal) attempts and habits of improving / extending the Cloud9 to have more features / programs in an updated Hadoop version. DISCLAIMER: I used cloud9-contrib for my daily tasks at L3S Research Center, and made the best efforts to fix all known bugs and deficiencies. However, as this is my personal attempts, please take it with a grail of salt if you intend to re-use the code. I'm not responsible for providing patches upon requests (although if you have some, you can try sending them to me. I might respond :) )