Permalink
Commits on Oct 29, 2009
Commits on Oct 26, 2009
  1. Switching to 10.6 SDK

    Machx committed Oct 26, 2009
    cleaning up deprecated methods and using their replacements
  2. Make GITRef.h public otherwise when using cocoagit

    Machx committed Oct 26, 2009
    in apps it complains that it is missing
    
    Build 32/64 bit Universal Binaries
    
    For Including in Cocoa Apps make Release-GC the default
Commits on Jul 20, 2009
  1. [DEV] Make GITRepo+Enumerators.h a public header.

    rentzsch committed with geoffgarside Jul 17, 2009
    Signed-off-by: Geoff Garside <geoff@geoffgarside.co.uk>
  2. [NEW] Add -[GITDateTime compare:] for commit-time sortablity.

    rentzsch committed with geoffgarside Jul 16, 2009
    Signed-off-by: Geoff Garside <geoff@geoffgarside.co.uk>
Commits on Jun 17, 2009
  1. Refactor packSHA1 routine to use a lookup table instead of strchr()

    chapados committed with geoffgarside Apr 30, 2009
    Also, since this method now operates directly on bytes, create a new
    method 'packSHA1FromBytes' and call it from 'packSHA1FromString' passing
    a pointer to the bytes in the string.
    
    Signed-off-by: Geoff Garside <geoff@geoffgarside.co.uk>
  2. Added testcase for packSHA1FromBytes

    chapados committed with geoffgarside Apr 30, 2009
    Signed-off-by: Geoff Garside <geoff@geoffgarside.co.uk>
  3. Refactor GITPackReverseIndex to be simpler and faster.

    chapados committed with geoffgarside Apr 23, 2009
    Remove the slow, convoluted reverse index creation scheme and replace it with
    a CFArray of struct pointers that store the offset and the index of the entry
    in the pack index. This is similar to the git implementation, except we use a
    CFArray instead of a straight C solution. Note that we need to use a CFArray
    instead of NSArray because we want to store pointers to structs, not objects.
    NSArray can only store objects.
    
    Signed-off-by: Geoff Garside <geoff@geoffgarside.co.uk>
  4. optimization[2/2]: only parse essential commit info upon initialization

    chapados committed with geoffgarside Apr 30, 2009
    Minimal information needed to traverse the graph:
    + commit-id (sha1) to uniquely identify the commit.
    + tree-id (sha1) to access the tree and object-ids associated with the commit.
    + list of parent-id (sha1)s. + raw committer date, ignoring the timezone.
    
    For better efficiency, we extract only what we need, and cache the rest
    in an NSData object for later extraction.
    Lazily fetched properties are:
    author + authored + committer + committed + message
    
    Currently lazy-fetching is as simple as possible, a call to any of the above
    methods will cause all of the cached data to be parsed, filling in the rest of
    the data.
    
    Signed-off-by: Geoff Garside <geoff@geoffgarside.co.uk>
  5. Initialize a GITActor from partially processed object header data

    chapados committed with geoffgarside Apr 30, 2009
    Signed-off-by: Geoff Garside <geoff@geoffgarside.co.uk>
  6. Add GITDateTime initializaer that accepts a BSD date and (int)timezon…

    chapados committed with geoffgarside Apr 30, 2009
    …e offset
    
    Signed-off-by: Geoff Garside <geoff@geoffgarside.co.uk>
  7. optimization[1/2]: faster routines for extracting data from core objects

    chapados committed with geoffgarside Apr 30, 2009
    When walking the commit graph, parsing essential information from commits can
    be a potential bottleneck. This is not a big deal for small repositories, or
    in cases where the entire graph not traversed. However, for large repos such
    as git or the linux kernel, traversing the graph starting from HEAD takes
    several seconds. Shark profiling shows that the NSScanner-based
    GITCommit#parseRawData routine is slow, taking the same or more time than
    (zlib) inflate. In constrast, zlib inflate is the major limiting step in
    git-rev-list.
    
    This change-set introduces some custom parsing code in an attempt to make this
    go as fast as possible. The general approach is outlined below, with step 1
    implemented in this change-set.
    
    1. Remove NSScanner and drop straight down to raw C.
    The slow steps seemed to be loading the scanner and allocating NSStrings. We
    now access raw bytes directly and do not create an NSString unless we need it.
    To drive the 'parser', define an objectRecord struct for each line or
    substring that needs to be extracted. In some cases (i.e. sha1 strings) we
    know the length, so specifying it explicitly saves some scanning. This code is
    mostly used through a single-method Objective-C interface defined in a
    "Parsing" category on GITObject.
    
    2. Do _not_ parse the entire commit when the object is created.
    Minimal nformation needed to traverse the graph:
    + commit-id (sha1) to uniquely identify the commit.
    + tree-id (sha1) to access the tree and object-ids associated with the commit.
    + list of parent-id (sha1)s.
    + raw committer date, ignoring the timezone.
    
    Thus, for better efficiency, we extract only what we need, and cache the rest
    in an NSData object for later extraction. This means that properties are now
    fetched lazily:
    + author
    + authored
    + committer
    + committed
    + message
    
    Currently lazy-fetching is as simple as possible, a call to any of the above
    methods will cause all of the cached data to be parsed, filling in the rest of
    the properties and releasing the cached data. This is simple to implement, and
    the rationale is that long lists of commits/objects are likely to be needed
    quickly during network transfer or pack file creation. If the information
    associated with a commit is required, it is likely being displayed on demand
    for a user who will not need to look at 18000 commits at once.
    
    To summarize, it boils down to:
    Extract structural information required for building a graph as fast as
    possible. Fill in the rest when it is needed.
    
    Signed-off-by: Geoff Garside <geoff@geoffgarside.co.uk>
  8. update xcode project

    chapados committed with geoffgarside May 1, 2009
    Signed-off-by: Geoff Garside <geoff@geoffgarside.co.uk>
  9. Update test repo with new branches for testing revision walking and s…

    chapados committed with geoffgarside May 1, 2009
    …orting.
    
    Signed-off-by: Geoff Garside <geoff@geoffgarside.co.uk>
  10. Add fixture generator and .plist file for GITGraph sort order tests

    chapados committed with geoffgarside May 1, 2009
    Signed-off-by: Geoff Garside <geoff@geoffgarside.co.uk>
  11. Update count of branches in GITRepo test

    chapados committed with geoffgarside May 1, 2009
    Signed-off-by: Geoff Garside <geoff@geoffgarside.co.uk>
  12. Add tests for GITGraph sort methods.

    chapados committed with geoffgarside May 1, 2009
    Signed-off-by: Geoff Garside <geoff@geoffgarside.co.uk>
  13. git-rev-list CLI tool: uses API to reproduce output from 'git rev-list'

    chapados committed with geoffgarside May 1, 2009
    The tool currently supports output identical[1] to the following commands:
    
    git rev-list <commit-id>
    git rev-list --date-order <commit-id>
    git rev-list --topo-order <commit-id>
    
    Note that listing multiple commits, ranges or excluding commits is not
    supported, but branch heads can be specified in place of <commit-id>. See use
    for more info.
    
    This tool is exceptionally useful for profiling and comparison with the real
    git-rev-list. It plays well with Shark and Instruments as long as you setup
    the command arguments properly (hint: see the "Executables" grouping in
    Xcode). The '-r' flag allows you to specify any repo, so you can clone out the
    linux kernel and use that for testing, ect..
    
    [1] This is based on my testing with the git repository, comparing output
    using HEAD as the commit with the options specified above. This was actually a
    PITA to sort out, especially since --date-order is almost always identical to
    the default ordering, except in some rare cases. The test repo now contains a
    small version of a branch with enough strange date-madness edge-cases that
    will produce a unique sort ordering in each case. This should be useful for
    further development of these routines.
    
    Signed-off-by: Geoff Garside <geoff@geoffgarside.co.uk>
  14. Original Nu prototype for graph structure and topological sort

    chapados committed with geoffgarside May 1, 2009
    rev-list.nu
    This is superseded by the 'git-rev-list' cli tool, but spared from death for
    educational value + links to some useful references.
    
    set-diff.nu
    Simple brute force tool to test whether sets of lines from 2 files identical.
    I was a bit lost when I first started trying to sort out graph traversal and
    topological ordering. This helped compare output from the real git-rev-list to
    my incorrect output to at least make sure that I wasn't doing something really
    stupid. Since we're comparing sets, we don't care about order, only unique content.
    
    Signed-off-by: Geoff Garside <geoff@geoffgarside.co.uk>
  15. Enumerator subclasses for BFS/DFS traversal of GITCommit objects in a…

    chapados committed with geoffgarside May 1, 2009
    … repo.
    
    This also includes a GITRepo category that defines the enumerator factor methods.
    
    These are currently not used by the graph class, but they will likely be used eventually for some a few revision-walking tasks. Stay tuned.
    
    Signed-off-by: Geoff Garside <geoff@geoffgarside.co.uk>
  16. Add GITNode class to represent a GITObject in a graph.

    chapados committed with geoffgarside May 1, 2009
    A GITNode is a simple wrapper class that store a pointer to a GITObject and
    also caches/stores information related to the graph structure. A GITObject
    (currently a GITCommit) is required to create a node, since the nodes key is
    derived from the object's sha1. This key is used to uniquely identify the node
    in a graph.
    
    The idea behind separating the actual objects from the graph representation
    is:
    
    1. convenience of not added extra properties to GITObject/GITCommit
    2. encapsulation of graph storage/traversal algorithms
    3. possibility to create thread-safe graph structure for multi-threaded
    processing.
    
    Once constructed, we can still do traversals and sorting operations without
    the objects. They could be released to save space. In addition, this also
    makes it possible to create a minimal copy of the graph for multi-threaded
    operations, should the need arise. On the other hand, it may turn out that the
    overhead of creating extra objects is a bad idea.
    
    I'm also aware that this is a fairly naive way to handle a graph. There are
    almost certainly better/more efficient ways of storing and accessing the same
    information. This seems to work for now, but is subject to change should the
    need arise.
    
    Rough estimate of memory overhead for a graph of the git repository starting
    from HEAD, containing ~18400 commits:
    
    graph + nodes (with objects) = 23 MB
    graph + nodes (objects released) = 8 MB
    
    Signed-off-by: Geoff Garside <geoff@geoffgarside.co.uk>
  17. Add helper methods to GITRepo for retrieving commits associated with …

    chapados committed with geoffgarside May 1, 2009
    …branch heads
    
    Signed-off-by: Geoff Garside <geoff@geoffgarside.co.uk>
  18. Simple GITGraph class to represent a directed acyclic graph (DAG)

    chapados committed with geoffgarside May 1, 2009
    The graph class is a wrapper around an NSMutableDictionary of node objects.
    Each dictionary entry consists of a sha1 key (40-char NSString) and a
    GITObjectNode value that wraps the corresponding GITObject (GITCommit for
    now). Because of this arrangement, the graph acts like a standard Foundation
    collection in the sense that it retains objects that are added, and released
    objects upon removal.
    
    In addition to methods for adding/removing and querying nodes, the graph also
    supports specialized sort methods that return NSArrays of nodes sorted by date
    or topology. These lists are identical to the lists of objects obtained from
    git-rev-list with the default, --topo-order or --date-order flags.
    
    The interface currently lacks much documentation, but is relatively
    straightforward. For use examples, see the unit tests or the source
    'git-rev-list' tool.
    
    NOTES
    -----
    Each of the 'sort' utilities requires the entire graph to be built and
    held in memory prior to sorting. I think this is required for the topological
    sort, but not for the date-based sort. I think it should be possible to
    abstract out some of the functionality into enumerators or generators that can
    spit out objects on demand, without requiring as much overhead.
    
    Signed-off-by: Geoff Garside <geoff@geoffgarside.co.uk>
  19. Fix leaking GITCommit#parentShas.

    chapados committed with geoffgarside Apr 25, 2009
    Signed-off-by: Geoff Garside <geoff@geoffgarside.co.uk>
  20. Fix refStore memleak in GITRepo.

    chapados committed with geoffgarside Apr 23, 2009
    Signed-off-by: Geoff Garside <geoff@geoffgarside.co.uk>
  21. Fix memleak in symbolic-ref handling.

    chapados committed with geoffgarside Apr 23, 2009
    Signed-off-by: Geoff Garside <geoff@geoffgarside.co.uk>
  22. testing ways to optimize GITCommit#parseRawData:

    chapados committed with geoffgarside Apr 18, 2009
    Signed-off-by: Geoff Garside <geoff@geoffgarside.co.uk>
  23. Use NSArray for GITCommit#parents to preverse the order of parents of…

    chapados committed with geoffgarside Apr 17, 2009
    … a commit object.
    
    In order to consistantly produce the same topological ordering of a
    graph of commit objects, the order of the parent objects must be preserved.
    If the parent order is not preservedr, then commits with muliple parents will
    result in an arbitrarily chosen topological sort.
    
    Signed-off-by: Geoff Garside <geoff@geoffgarside.co.uk>
  24. GITObject#hash calls [[self sha1] hash] to match GITObject#isEqual:

    chapados committed with geoffgarside Apr 17, 2009
    The previous implementation was creating an NSScanner to scan [self sha1].
    This turned out to be a _massive_ bottleneck during preliminary testing of
    revision walking routines that iterate over a large set of commit objects.
    Calling [[self sha1] hash] here is faster by at least an order of magnitude.
    
    Signed-off-by: Geoff Garside <geoff@geoffgarside.co.uk>
Commits on Apr 15, 2009
  1. Refactor index lookup methods to use a binary search algorithm for sh…

    chapados committed Apr 15, 2009
    …a1 lookups.
    
    Start cleaning up the interface by adding -indexOfSha1: method that
    finds the index of a sha1 in the sha1(v2) or offset+sha1(v1) table
    and either returns the index or NSNotFound.  This is the same concept
    as NSArray#indexOfObject: The main difference is that right now
    indexOfSha1: is intended to be a private method, since other classes
    should not really care about the 'index' in the table.
    
    I think index lookups should be handled as follows:
    
    Internal methods that require an index as a parameter should throw
    an exception if the index is out of bounds, since it should never
    be passed an illegal index unless there is a programming error.
    
    The public methods for accessing properties associated with a sha1
    (i.e. offset, CRC) just check for an NSNotFound and bail (optionally with an error).
    Otherwise, the lookup succeeds.  If somehow -indexOfSha1: returns
    a value outside of the index range, something bad has happened, and having
    the exception will help track down the bug.
Commits on Apr 14, 2009
  1. Add test cases and assertions for version 1 pack index files.

    chapados committed Apr 14, 2009
    NOTE: The actual pack file is the same.  The only thing that has changed
    is that the .v1.idx file is a version 1 index for the pack file.  The tests
    make sure that we get the same objects/sha1/offsets from both versions.
  2. Add sha1/offset lookup & GITPackReverseIndex support to GITPackIndexV…

    chapados committed Apr 14, 2009
    …ersion1
    
    This changeset provides Pack Index V1 implementations of:
    - (packIndexEntry *) packEntryWithIndex:(NSUInteger)i;
    - (off_t) packOffsetWithIndex:(NSUInteger)i;
    - (NSData *)packedSha1WithIndex:(NSUInteger)i;
    
    packOffsetWithIndex is used to build the reverse index, which
    provides:
    - (off_t)baseOffsetWithOffset:(off_t)offset;
    - (off_t)nextOffsetWithOffset:(off_t)offset;
    - (NSString *)sha1WithOffset:(off_t)offset;
  3. Change Git.framework install_name to @rpath, specify Run paths in exe…

    chapados committed Apr 14, 2009
    …cutables.
    
    
    @rpath is the most flexible option for a framework, which allows it to be
    easily embedded or not.
    
    Apple Documentation is here: http://tinyurl.com/ccxz9z
    
    For the framework, we set the install_name (Installation Path) to be: @rpath
    
    The UnitTests.app and the cli tools each link to the framework and specify
    an rpath (Run path), which will be searched at runtime.  Libraries, ect..
    found in the run path will be loaded, and dyld will substitute the
    path for the @rpath variable allowing the lib to be loaded.
    
    UnitTests.app embeds the framework in its private 'Frameworks' dir. Thus, we
    add a '@executable_path/../Frameworks' run path to the target.
    
    The cli apps current set a run path of $(CONFIGURATION_BUILD_DIR), which is
    a special xcode variable that will resolve to 'build/<Config name>' ect.
    This works assuming that the executables are installed in the same top-level
    build dir as Git.framework.  This should work file for development purposes.