Permalink
Commits on Nov 21, 2011
Commits on Nov 17, 2011
  1. Merge pull request #1 from matpalm/emr

    slight generalisation so we can build on elastic mapreduce
    commoncrawl committed Nov 17, 2011
Commits on Nov 16, 2011
  1. Added introductory README

    committed Nov 16, 2011
Commits on Nov 14, 2011
  1. Fix directory tree.

    Ahad Rana committed Nov 14, 2011
  2. Minor modification to GoogleURL interface.

    Ahad Rana committed Nov 9, 2011
  3. Fixed bug in InputStream implementation.

    Ahad Rana committed Nov 9, 2011
  4. Includes, among other things, (1) added mergeutils project into commo…

    …ncrawl source tree (2) added query project into commoncrawl source tree (3) major refactoring of query project (4) bulk scan implementation (5) integration of parallel query functionality (6) bulk query support in cacheFE server (7) fix improper flush bug in Indexer code
    Ahad Rana committed with Ahad Rana Aug 10, 2011
  5. Emergency commit to fix Indexer Bug.

    Ahad Rana committed with Ahad Rana Jul 7, 2011
  6. Resolve project dependencies and make it build via build.xml file.

    Ahad Rana committed with Ahad Rana Jun 16, 2011
  7. Removed rendundant src directory under cc_src

    Ahad Rana committed with Ahad Rana Jun 8, 2011
  8. added BinaryComparableWithOffet to deal with comparables that need of…

    …fset information, added HBase BoundedRangeFileInputStream to utils, modified FlexBuffer to derive from BinaryComparableWithOffset modified SimHash code to produce simhash from byte stream instead of char stream extended TFileReader to have a ValueReader object, to allow for partial deserialization of thrift objects modifed TFileThriftObjectWriter to take replication factor as a parameter in constructor added TFileUtils to allow for introspection of TFile metadata modified TextBytes to derive from BinaryComparableWithOffset modified URLUtils to strip www prefix by default during canonicalization
    Ahad Rana committed with Ahad Rana Jun 7, 2011
  9. Added a new way to retrieve Value data from a TFileThriftObjectReader.

    Ahad Rana committed with Ahad Rana May 18, 2011
  10. Added support for reading/writing Thrift objects via TFile.

    Ahad Rana committed with Ahad Rana May 18, 2011
  11. Added basic Tuple support.

    Ahad Rana committed with Ahad Rana May 13, 2011
  12. Modifications necessary to support proper UTF-8 compliant Http Header…

    … parsing in NIOHttpConnection.
    Ahad Rana committed with Ahad Rana May 13, 2011
  13. Made RPCFrame thread safe to handle Muti-Threaded Actors reading/writ…

    …ing from a single channel.
    Ahad Rana committed with Ahad Rana May 13, 2011
  14. Added byte offset compatible API to TextBytes.

    Ahad Rana committed with Ahad Rana May 13, 2011
  15. Added support to extract substream from BufferListInputStream.

    Ahad Rana committed with Ahad Rana May 13, 2011
  16. Added more synchronization to public API calls.

    Ahad Rana committed with Ahad Rana May 13, 2011
  17. 1. Added server config file support to CommonCrawlServer. 2. Fixed mi…

    …ssing offset bug in TextBytes. 3. Add RawComparator,RawComparable support to FlexBuffer.
    Ahad Rana committed with Ahad Rana Apr 22, 2011
  18. Some modifications to MMapUtils and corresponding changes to RiceCode…

    …r. Also disable native compilation in build.xml by default.
    Ahad Rana committed with Ahad Rana Apr 11, 2011
  19. 1. Modified compiler to have generated setXXX methods return this so …

    …that data structure construction calls can be chained in a builder like pattern.
    
    2. Made ByteBufferInputStream derive from FSInputStream so that it
    becomes seekable (inherently supported by ByteBuffer anyhow).
    
    3. Made MMapFileInputStream derive from FSInputStream and made
    MMapFile.newInputStream return an FSDataInputStream, so that the
    mmap'd file stream can be interchanged with a HDFS stream in
    certain parts of the codebase.
    
    4. Reverted some changes in the RiceCoder for now, and also
    added a constructor for RiceCodeReader that takes an
    FSDataInputStream, thus enabling a MMapFileStream to be used
    to initialize a Reader object. This eliminated a buffer copy
    from an FSDataInputStream to ByteBuffer (the other type allowed
    in the Reader's constructor).
    
    5. Added a hasMoreData method to TFileReader to enable an
    after 'next' check to see if EOF condition has been hit. This
    is possible because the last call to TFile.Scanner's next sets
    it in an EOF state which can be checked via its atEnd method.
    Ahad Rana committed with Ahad Rana Apr 6, 2011
  20. More modifications to the TextBytes and FlexBuffer API. Plus, a bug f…

    …ix of FlexBuffer's clone method. Original version as cloning data members (via Object.clone) and then copy src contents into new object. Unfortunately, the data member clone ended up reusing the source's storage buffer within the context of the new object (BAD!). Clone is a deep clone, so new object need to allocate it's own storage!
    
    TODO: Integrate unit tests for FlexBuffer and TextBytes from CC private
    codebase to catch nasty bugs like these!
    Ahad Rana committed with Ahad Rana Mar 31, 2011
  21. A combined checkin that includes:

    1. Integration of RiceCoder from CC private src.
    2. Some Memory Mapped IO helper code (MMapUtils)
    3. Better shared / copy on write semantics for TextBytes and FlexBuffer
    4. Changes to various classes to reflect changes in TextBytes and FlexBuffer
       APIs.
    5. RPC Compiler / Code Generator modifications to accomodate new TextBytes
       /FlexBuffer API.
    6. TFile related helper utilities.
    7. Added Type Parameter to RPCStruct base class.
    Ahad Rana committed with Ahad Rana Mar 31, 2011
  22. More NodeJS related headaches.

    Ahad Rana committed with Ahad Rana Mar 3, 2011
  23. Remove NodeJS dependency yet again :-(

    Ahad Rana committed with Ahad Rana Mar 3, 2011
  24. More bug fixes.

    Ahad Rana committed with Ahad Rana Mar 3, 2011
  25. WebServer related fixes.

    Ahad Rana committed with Ahad Rana Feb 23, 2011
  26. Some formatting changes.

    Ahad Rana committed with Ahad Rana Feb 23, 2011
  27. Merged in Server and URLUtils components

    Ahad Rana committed with Ahad Rana Feb 23, 2011