## Client Requirements
1. **Search Repositories**: Find repositories related to "machine learning" with at least 100 stars and written in Python.
2. **Fetch Commits**: Retrieve the latest 10 commits from the top repository found.
3. **Extract Repository Contents**: Inspect the contents of the top-level directory of the repository.

In [3]:
import requests
import json
import time
import pandas as pd

In [5]:
GITHUB_TOKEN = ""

BASE_URL = "https://api.github.com"

HEADERS = {
    "Authorization": f"Bearer {GITHUB_TOKEN}",
    "Accept": "application/vnd.github+json"
}


In [8]:
def search_repositories(query, per_page=10):
    """
    Searches for repositories based on a query.
    """
    url = f"{BASE_URL}/search/repositories"
    params = {"q": query, "per_page": per_page}
    response = requests.get(url, headers=HEADERS, params=params)

    if response.status_code == 200:
        return response.json()["items"]
    else:
        print(f"Error {response.status_code}: {response.json()['message']}")
        return []

In [10]:
def get_commits(owner, repo, per_page=10):
    """
    Fetches the latest commits for a given repository.
    """
    url = f"{BASE_URL}/repos/{owner}/{repo}/commits"
    params = {"per_page": per_page}
    response = requests.get(url, headers=HEADERS, params=params)

    if response.status_code == 200:
        return response.json()
    else:
        print(f"Error {response.status_code}: {response.json()['message']}")
        return []


In [12]:
def get_contents(owner, repo, path=""):
    """
    Fetches the contents of a repository.
    """
    url = f"{BASE_URL}/repos/{owner}/{repo}/contents/{path}"
    response = requests.get(url, headers=HEADERS)

    if response.status_code == 200:
        return response.json()
    else:
        print(f"Error {response.status_code}: {response.json()['message']}")
        return []


In [14]:
query = "machine learning language:Python"
repositories = search_repositories(query, per_page=5)

if repositories:
    for repo in repositories:
        print(f"Repository: {repo['full_name']}, Stars: {repo['stargazers_count']}")
else:
    print("No repositories found.")

Repository: josephmisiti/awesome-machine-learning, Stars: 66168
Repository: wepe/MachineLearning, Stars: 5256
Repository: Jack-Cherish/Machine-Learning, Stars: 9202
Repository: lawlite19/MachineLearning_Python, Stars: 7326
Repository: lazyprogrammer/machine_learning_examples, Stars: 8403


In [16]:
if repositories:
    owner = repositories[0]["owner"]["login"]
    repo_name = repositories[0]["name"]

    commits = get_commits(owner, repo_name, per_page=5)
    print("\nRecent Commits:")
    for commit in commits:
        print(f"SHA: {commit['sha']}, Message: {commit['commit']['message']}")
else:
    print("No repositories to fetch commits from.")


Recent Commits:
SHA: a9cfd245f6acb6a03407c370b8d520999285afa1, Message: Merge pull request #1001 from debashishc/master

docs(readme): added EspNet tool for speech processing tasks in Python
SHA: 3e760ed5c057280cb0b0a044dda9be44fc3618f1, Message: Merge pull request #1000 from anmorgan24/add-opik

Add Opik
SHA: 5f7293e5c249e217947a55ab5b1bc95d250678e1, Message: docs(readme): made EspNet description more concise
SHA: d5de41ccbec7725f69c2e54c8d18a1e564aab223, Message: docs(readme): add EspNet for Speech Processing Tasks
SHA: 50ecf61e10e2a368ccf153657226251952b1bded, Message: Add Opik

Comet recently sunset CometLLM and in its place launched Opik, a tool with most of the same capabilities as the old CometLLM but with a whole host of additional features and capabilities. This PR updates the readme accordingly.

Signed-off-by: Abby Morgan abigailm@comet.com


In [18]:
if repositories:
    contents = get_contents(owner, repo_name)
    print("\nRepository Contents:")
    for item in contents:
        print(f"Type: {item['type']}, Name: {item['name']}")
else:
    print("No repositories to fetch contents from.")


Repository Contents:
Type: file, Name: LICENSE
Type: file, Name: README.md
Type: file, Name: blogs.md
Type: file, Name: books.md
Type: file, Name: courses.md
Type: file, Name: events.md
Type: file, Name: meetups.md
Type: file, Name: ml-curriculum.md
Type: dir, Name: scripts


In [20]:
def check_rate_limit():
    """
    Checks the GitHub API rate limit.
    """
    url = f"{BASE_URL}/rate_limit"
    response = requests.get(url, headers=HEADERS)

    if response.status_code == 200:
        rate_limit = response.json()["rate"]
        print(f"Remaining requests: {rate_limit['remaining']}")
        return rate_limit
    else:
        print(f"Error {response.status_code}: {response.json()['message']}")
        return None

In [22]:
rate_limit = check_rate_limit()
if rate_limit and rate_limit["remaining"] == 0:
    wait_time = rate_limit["reset"] - int(time.time())
    print(f"Rate limit exceeded. Waiting for {wait_time} seconds.")
    time.sleep(wait_time)

Remaining requests: 4974


In [24]:
def save_to_file(data, filename):
    """
    Saves data to a JSON file.
    """
    with open(filename, "w") as file:
        json.dump(data, file, indent=4)
    print(f"Data saved to {filename}")

save_to_file(repositories, "../results/repositories.json")

save_to_file(commits, "../results/commits.json")

Data saved to repositories.json
Data saved to commits.json
