<img width="10%" alt="Naas" src="https://landen.imgix.net/jtci2pxwjczr/assets/5ice39g4.png?w=160"/>

# GitHub - Clone open branches from repository on my local
<a href="https://app.naas.ai/user-redirect/naas/downloader?url=https://raw.githubusercontent.com/jupyter-naas/awesome-notebooks/master/GitHub/GitHub_Clone_open_branches_from_repository_on_my_local.ipynb" target="_parent"><img src="https://naasai-public.s3.eu-west-3.amazonaws.com/Open_in_Naas_Lab.svg"/></a><br><br><a href="https://bit.ly/3JyWIk6">Give Feedbacks</a> | <a href="https://github.com/jupyter-naas/awesome-notebooks/issues/new?assignees=&labels=bug&template=bug_report.md&title=GitHub+-+Clone+open+branches+from+repository+on+my+local:+Error+short+description">Bug report</a>

**Tags:** #github #snippet #operations #repository #efficiency

**Author:** [Antonio Georgiev](www.linkedin.com/in/antonio-georgiev-b672a325b)

**Last update:** 2023-07-24 (Created: 2023-07-24)

**Description:** This notebook streamlines your workflow by cloning open branches from a GitHub repository to your local machine, renaming the repository to match the branch name, and switching to the respective branch. This approach enhances efficiency by enabling you to work on multiple branches simultaneously without the need to constantly switch, thus avoiding conflicts. Before using this on Naas, ensure your SSH is properly configured (you can use the Naas_Configure_SSH.ipynb template for this).

**References:**
- [GitHub Documentation - Cloning a repository](https://docs.github.com/en/github/creating-cloning-and-archiving-repositories/cloning-a-repository)

## Input

### Import libraries

In [None]:
import os
import naas
import pandas as pd
import requests

### Setup Variables
- `token`: [Generate a personal access token](https://docs.github.com/en/github/authenticating-to-github/creating-a-personal-access-token)
- `repo_url`: URL of the repository to clone
- `cron`: cron params for naas scheduler
- `output_dir`: Output directory to clone repo. If None, we will create a folder with the name of the repo

In [9]:
# Inputs
repo_url = "https://github.com/jupyter-naas/awesome-notebooks"
token = naas.secret.get(name="GITHUB_TOKEN") or "YOUR_GITHUB_TOKEN"
cron = "0 * * * *"

# Outputs
output_dir = None

## Model

### Get branches with open PRs

In [None]:
def get_branches_with_open_prs(
    token,
    repo_url
):
    # Init
    data = []
    owner = repo_url.split("https://github.com/")[-1].split("/")[0]
    repository = repo_url.split(f"https://github.com/{owner}/")[-1].split("/")[0]
    
    # Requests
    url = f"https://api.github.com/repos/{owner}/{repository}/pulls"
    headers = {"Authorization": f"token {token}"}
    response = requests.get(url, headers=headers)
    pulls = response.json()
    
    # Data
    for pull in pulls:
        branch = pull['head']['ref']
        creator = pull['user']['login']
        creation_date = pull['created_at']
        
        data.append({
            'branch': branch,
            'creator': creator,
            'creation_date': creation_date
        })
    
    df = pd.DataFrame(data)
    
    # Sort values
    if len(df) > 0:
        df = df.sort_values(by="creation_date", ascending=False)
    return df.reset_index(drop=True)

branches_with_open_prs = get_branches_with_open_prs(token, repo_url)
print("Branches with open PRs:", len(branches_with_open_prs))
branches_with_open_prs.head(1)

### Get branches already cloned on my local

In [None]:
def get_all_folders(directory):
    if not directory:
        directory = '/home/ftp/'
    folders = []
    for item in os.listdir(directory):
        item_path = os.path.join(directory, item)
        if os.path.isdir(item_path):
            folders.append(item_path.split("/")[-1])
    return sorted(folders)

folders = get_all_folders(output_dir)
print('Branches in local:', len(folders))
folders[-1]

### Identify missing branches on local

In [None]:
missing_branches = [branch for branch in branches_with_open_prs['branch'] if not branch in folders]
print("Missing branches not cloned on my local machine:", len(missing_branches))
missing_branches

## Output

### Clone repository & Switch branch
Clone the repository from the given URL and create a local copy of it.

In [None]:
def clone_branch(repo_url, output_dir, branch_name):
    # Get GitHub owner and repo name
    owner = repo_url.split("https://github.com/")[-1].split("/")[0]
    repo_name = repo_url.split("/")[-1]
    
    # Add repo name with .git extension
    if not repo_name.endswith(".git"):
        repo_name = f"{repo_name}.git"
    repo = f"{owner}/{repo_name}"
        
    # Init output dir
    if not output_dir:
        output_dir = branch_name
    else:
        output_dir = os.path.join(output_dir, branch_name)
    
    # Create output directoy
    if not os.path.exists(output_dir):
        os.makedirs(output_dir)
        
    # GitHub Action
    !cd '{output_dir}'
    !git clone git@github.com:'{repo}' '{output_dir}'
    print(f"✅ GitHub repo cloned: {output_dir}")
    return output_dir

def switch_branch(output_dir, branch_name):
    # GitHub action
    !cd '{output_dir}' && git checkout '{branch_name}'
    print(f"✅ Switched to branch '{branch_name}'")
    
for branch_name in missing_branches:
    # Clone repo
    output_dir_repo = clone_branch(repo_url, output_dir, branch_name)
    
    # Switch branch
    switch_branch(output_dir_repo, branch_name)

### Schedule the notebook

In [None]:
naas.scheduler.add(cron=cron)