# Loading From Git
This will familiarize you with the different ways to access a GitRepo (or MultiGitRepo) object and how to use its data.


* Single Repo:
    * remote `get_repo("https://github.com/sbenthall/bigbang.git", in_type = "remote" )`
    * local `get_repo("~/urap/bigbang/archives/sample_git_repos/bigbang",  in_type = "local" )`
    * name `get_repo("bigbang", in_type = "name")`
* Multiple Repos:
   * With repo names: `get_multi_repo(repo_names=["bigbang","django"])`
   * With repo objects: `get_multi_repo(repos=[{list of existing GitRepo objects}]`
   * With Github Organization names `get_org_multirepo("glass-bead-labs")`

# Repo Locations
As of now, repos are clones into `archives/sample_git_repos/{repo_name}`. Their caches are stored at `archives/sample_git_repos/{repo_name}_backup.csv`.

# Caches
Caches are stored at `archives/sample_git_repos/{repo_name}_backup.csv`. They are the dumped `.csv` files of a GitRepo object's `commit_data` attribute, which is a pandas dataframe of all commit information. We can initialize a GitRepo object by feeding the cache's Pandas dataframe into the GitRepo init function. However, the init function will need to do some processing before it can use the cache as its commit data. It needs to convert the `"Touched File"` attribute of the cache dataframe from unicode `"[file1, file2, file3]"` to an actual list `["file1", "file2", "file3"]`. It will also need to convert the time index of the cache from string to datetime.


# Single Repos
Here, we can load in three ways. We can use a github url, a local path to a repo, or the name of a repo. All of these return a `GitRepo` object.

## Remote
A remote call to `get_repo` will extract the repo's name from its git url. Thus, `https://github.com/sbenthall/bigbang.git` will yield `bigbang` as its name. It will check if the repo already exists. If it doesn't it will send a shell command to clone the remote repository to `archives/sample_git_repos/{repo_name}`. It will then return `get_repo({name}, in_type="name")`. Before returning, however, it will cache the GitRepo object at `archives/sample_git_repos/{repo_name}_backup.csv` to make loading faster the next time.

## Local
A local call is the simplest. It will first extract the repo name from the filepath. Thus, `~/urap/bigbang/archives/sample_git_repos/bigbang` will yield `bigbang`. It will check to see if a git repo exists at the given address. If it does, it will initialize a GitPython object, which only needs a name and a filepath to a Git repo. Note that this option does not check or create a cache.

## Name
This is the preferred and easiest way to load a git repository. It works under the assumptions above about where a git repo and its cache should be stored. It will check to see if a cache exists. If it does, then it will load a GitPython object using that cache.

If a cache is not found, then the function constructs a filepath from the name, using the above rule about where repo locations. It will pass off the function to `get_repo(filepath, in_type="local")`. Before returning the answer, it will cache the result.

In [1]:
from bigbang import repo_loader  # The file that handles most loading

repo = repo_loader.get_repo(
    "https://github.com/sbenthall/bigbang.git", in_type="remote"
)
# repo = repo_loader.get_repo("../",  in_type = "local" ) # I commented this out because it may take too long
repo = repo_loader.get_repo("bigbang", in_type="name")
repo.commit_data

Unnamed: 0.1,Unnamed: 0,Commit Message,Committer Email,Committer Name,HEXSHA,Parent Commit,Time,Touched File,Person-ID
0.0,2015-04-13 22:49:33,Merge pull request #195 from jesscxu/master\n\...,sbenthall@gmail.com,Sebastian Benthall,e6f985d15ff4736a08e2112b6c7ff0c0d0836a75,"[02d30c7ba4b02e899c4f098531812ca390983c0b, 5b5...",2015-04-13 22:49:33,"[examples/viz/git/glass.json, examples/viz/git...",1
1.0,2015-04-13 22:44:21,Adding d3 visualization of GitDiff.ipynb graph\n,jcxu@berkeley.edu,Jessica Xu,5b54cc96d652a07b12b5c31d4f5ad5269e1aec37,[02d30c7ba4b02e899c4f098531812ca390983c0b],2015-04-13 22:44:21,"[examples/viz/git/glass.json, examples/viz/git...",2
2.0,2015-04-10 21:59:33,Merge pull request #194 from vsporeddy/master\...,sbenthall@gmail.com,Sebastian Benthall,02d30c7ba4b02e899c4f098531812ca390983c0b,"[3723718c356155a8c2c2104e813d61263a1f23c7, 2ec...",2015-04-10 21:59:33,[examples/File Dependency Network.ipynb],1
3.0,2015-04-10 18:19:22,Changed to directed graph,vs.poreddy@gmail.com,Venkata Poreddy,2ec31ee60878a08e5738dfa40245740e79dde97c,[f5316bf07da3d4d51ac3bc1875b24d10693daa02],2015-04-10 18:19:22,[examples/File Dependency Network.ipynb],3
4.0,2015-04-10 18:18:13,Merge pull request #3 from sbenthall/master\n\...,vs.poreddy@gmail.com,Venkata Poreddy,f5316bf07da3d4d51ac3bc1875b24d10693daa02,"[9aacab2a8eb5e7eabcb227caea5a82d99e5f8835, 372...",2015-04-10 18:18:13,"[bigbang/git_repo.py, bigbang/repo_loader.py]",3
5.0,2015-04-10 17:54:34,Merge pull request #192 from Aryan-Barbarian/m...,sbenthall@gmail.com,Sebastian Benthall,3723718c356155a8c2c2104e813d61263a1f23c7,"[a22c55ea0887bdff8f62e50d2abdca02f6fdbce6, ed6...",2015-04-10 17:54:34,"[bigbang/git_repo.py, bigbang/repo_loader.py]",1
6.0,2015-04-10 17:53:13,Merge pull request #193 from vsporeddy/master\...,sbenthall@gmail.com,Sebastian Benthall,a22c55ea0887bdff8f62e50d2abdca02f6fdbce6,"[2b1f678c8ad75458b6a6b7484bed0ca72baee298, 9aa...",2015-04-10 17:53:13,"[bigbang/get_dependencies.py, examples/File De...",1
7.0,2015-04-10 17:30:29,Fixed an issue where git repos with hyphens in...,aryan.falahatpisheh@berkeley.edu,Aryan Falahatpisheh,ed60740e26981e216542a258c0c5aa0afa50af95,[8dac7fc397738b057d7fbdcd2bea1552e6f88339],2015-04-10 17:30:29,[bigbang/repo_loader.py],4
8.0,2015-04-10 16:55:36,Update File Dependency Network.ipynb,vs.poreddy@gmail.com,Venkata Poreddy,9aacab2a8eb5e7eabcb227caea5a82d99e5f8835,[465c3a275bc341e2dab9d43c0363c2a7fff59b15],2015-04-10 16:55:36,[examples/File Dependency Network.ipynb],3
9.0,2015-04-10 16:54:44,Create get_dependencies.py,vs.poreddy@gmail.com,Venkata Poreddy,465c3a275bc341e2dab9d43c0363c2a7fff59b15,[95e074b3e32017adf92e74a8fb19e471bf95f1ee],2015-04-10 16:54:44,[bigbang/get_dependencies.py],3


# MultiRepos
These are the ways we can get MultiGitRepo objects. MultiGitRepo objects are GitRepos that were created with a list of GitRepos. Basically, a MultiGitRepo's `commit_data` contains the commit_data from all of its GitRepos. The only difference is that each entry has an extra attribute, `Repo Name` that tells us which Repo that commit is initially from.

## List of Repos / List of Repo Names (`get_multi_repo`)
This is rather simple. We can call the `get_multi_repo` method with either a list of repo names `["bigbang", "django", "scipy"]` or a list of actual GitRepo objects. This returns us the merged MultiGitRepo. Please note that this will not work if a local clone / cache of the repos does not exist for every repo name (e.g. if you ask for `["bigbang", "django", "scipy"]`, you must already have a local copy of those in your sample_git_repos directory.

## Github Organization's Repos (`get_org_multirepo`)
This is more useful to us. We can use this method to get a MultiGitRepo that contains the information from every repo in a Github Organization. This requires that we input the organization's name *exactly* as it appears on Github (edX, glass-bead-labs, codeforamerica, etc.)

It will look for `examples/{org_name}_urls.txt`, which should be a file that contains all of the git urls of the projects that belong to that organization. If this file doesn't yet exist, it will make a call to the Github API. This requires a stable internet connection, and it may randomly stall on requests that do not time out.

The function will then use the list of git urls and the `get_repo` method to get each repo. It will use this list of repos to create a MultiGitRepo object, using `get_multi_repo`.


Note that the examples below will not work if you don't have an internet connection, and may take some time to process. The first call may also fail if you do not have all of the repositories

In [2]:
# Using GitHub API
multirepo = repo_loader.get_org_multirepo("glass-bead-labs")

# List of repo names
multirepo = repo_loader.get_multi_repo(repo_names=["bigbang", "bead.glass"])

# List of actual repos
repo1 = repo_loader.get_repo("bigbang", in_type="name")
repo2 = repo_loader.get_repo("bead.glass", in_type="name")
multirepo = repo_loader.get_multi_repo(repos=[repo1, repo2])

multirepo.commit_data

Unnamed: 0.1,Unnamed: 0,Commit Message,Committer Email,Committer Name,HEXSHA,Parent Commit,Time,Touched File,Person-ID,Repo Name
0.0,2015-04-13 22:49:33,Merge pull request #195 from jesscxu/master\n\...,sbenthall@gmail.com,Sebastian Benthall,e6f985d15ff4736a08e2112b6c7ff0c0d0836a75,"[02d30c7ba4b02e899c4f098531812ca390983c0b, 5b5...",2015-04-13 22:49:33,"[examples/viz/git/glass.json, examples/viz/git...",1,bigbang
1.0,2015-04-13 22:44:21,Adding d3 visualization of GitDiff.ipynb graph\n,jcxu@berkeley.edu,Jessica Xu,5b54cc96d652a07b12b5c31d4f5ad5269e1aec37,[02d30c7ba4b02e899c4f098531812ca390983c0b],2015-04-13 22:44:21,"[examples/viz/git/glass.json, examples/viz/git...",2,bigbang
2.0,2015-04-10 21:59:33,Merge pull request #194 from vsporeddy/master\...,sbenthall@gmail.com,Sebastian Benthall,02d30c7ba4b02e899c4f098531812ca390983c0b,"[3723718c356155a8c2c2104e813d61263a1f23c7, 2ec...",2015-04-10 21:59:33,[examples/File Dependency Network.ipynb],1,bigbang
3.0,2015-04-10 18:19:22,Changed to directed graph,vs.poreddy@gmail.com,Venkata Poreddy,2ec31ee60878a08e5738dfa40245740e79dde97c,[f5316bf07da3d4d51ac3bc1875b24d10693daa02],2015-04-10 18:19:22,[examples/File Dependency Network.ipynb],3,bigbang
4.0,2015-04-10 18:18:13,Merge pull request #3 from sbenthall/master\n\...,vs.poreddy@gmail.com,Venkata Poreddy,f5316bf07da3d4d51ac3bc1875b24d10693daa02,"[9aacab2a8eb5e7eabcb227caea5a82d99e5f8835, 372...",2015-04-10 18:18:13,"[bigbang/git_repo.py, bigbang/repo_loader.py]",3,bigbang
5.0,2015-04-10 17:54:34,Merge pull request #192 from Aryan-Barbarian/m...,sbenthall@gmail.com,Sebastian Benthall,3723718c356155a8c2c2104e813d61263a1f23c7,"[a22c55ea0887bdff8f62e50d2abdca02f6fdbce6, ed6...",2015-04-10 17:54:34,"[bigbang/git_repo.py, bigbang/repo_loader.py]",1,bigbang
6.0,2015-04-10 17:53:13,Merge pull request #193 from vsporeddy/master\...,sbenthall@gmail.com,Sebastian Benthall,a22c55ea0887bdff8f62e50d2abdca02f6fdbce6,"[2b1f678c8ad75458b6a6b7484bed0ca72baee298, 9aa...",2015-04-10 17:53:13,"[bigbang/get_dependencies.py, examples/File De...",1,bigbang
7.0,2015-04-10 17:30:29,Fixed an issue where git repos with hyphens in...,aryan.falahatpisheh@berkeley.edu,Aryan Falahatpisheh,ed60740e26981e216542a258c0c5aa0afa50af95,[8dac7fc397738b057d7fbdcd2bea1552e6f88339],2015-04-10 17:30:29,[bigbang/repo_loader.py],4,bigbang
8.0,2015-04-10 16:55:36,Update File Dependency Network.ipynb,vs.poreddy@gmail.com,Venkata Poreddy,9aacab2a8eb5e7eabcb227caea5a82d99e5f8835,[465c3a275bc341e2dab9d43c0363c2a7fff59b15],2015-04-10 16:55:36,[examples/File Dependency Network.ipynb],3,bigbang
9.0,2015-04-10 16:54:44,Create get_dependencies.py,vs.poreddy@gmail.com,Venkata Poreddy,465c3a275bc341e2dab9d43c0363c2a7fff59b15,[95e074b3e32017adf92e74a8fb19e471bf95f1ee],2015-04-10 16:54:44,[bigbang/get_dependencies.py],3,bigbang
