Skip to content

A custom crawler written for AWS CodeCommit based InnerSource repositories to be used as the input for the SAP InnerSource Portal https://github.com/SAP/project-portal-for-innersource.

License

Notifications You must be signed in to change notification settings

dchucks/innersource-codecommit-crawler

Repository files navigation

InnerSource AWS CodeCommit Crawler

Organizations setting up an InnerSource ecosystem in their intranet should be able to use any Source Code Control system. This project assists in setting up a crawler for AWS CodeCommit based InnerSource code repositories that can be utilized by the SAP InnerSource Portal. The crawler can made to fetch these details automatically every once a while using cron construct. Click here to know more about the Crawler.

The project creates a repos.json to be consumed by the SAP InnerSource Portal to display available InnerSource projects. The solution assumes that you have the CodeCommit repositories already setup and that the crawler is able to connect to them using AWS credentials (namely, aws_access_key_id and aws_secret_access_key).

The crawler implements a custom logic for assigning the activity score and omits the fields that are not available/relevant for CodeCommit (e.g. Fork or Star).

Installation

pip install -r requirements.txt

Usage

  1. (Optional) Add a tag to your InnerSource repos with key as type and value as innersource
  2. (Optional) Add an innersource.json file in each repo (a sample file is included in this repo), with the details about the project.
  3. Run python3 ./crawler.py, which will create a repos.json file containing the relevant metadata for the AWS CodeCommit repos
  4. Copy repos.json to your instance of the SAP InnerSource Portal and launch the portal as outlined in their installation instructions.

Customization

While the entire code can be customized according to your use case, a particular customization might be needed if your AWS CodeCommit installation contains repositories other than the InnerSource repos. In such a case you may want to filter out the InnerSource ones using tags, such as type = innersource. An example code to implement this filter is provided:

tag_data = cc_client.list_tags_for_resource(
	resourceArn = repo_metadata["Arn"]
)
repo_tags = tag_data["tags"]
repoType = repo_tags["type"]
if repoType != "innersource":
	break

CodeCommit Crawler

Similarly, you may chose to add an innersource.json file in each of your InnerSource repo (a sample file is included in this repo), with the details about the project. This helps in populating the fields on the portal information of which cannot be fetched from CodeCommit.

About

A custom crawler written for AWS CodeCommit based InnerSource repositories to be used as the input for the SAP InnerSource Portal https://github.com/SAP/project-portal-for-innersource.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages