Skip to content

Commit

Permalink
Merge 2c51f2b into 54423cf
Browse files Browse the repository at this point in the history
  • Loading branch information
mcdonnnj committed Feb 16, 2021
2 parents 54423cf + 2c51f2b commit 51f8eec
Show file tree
Hide file tree
Showing 25 changed files with 1,118 additions and 281 deletions.
2 changes: 1 addition & 1 deletion .coveragerc
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
# https://coverage.readthedocs.io/en/latest/config.html

[run]
source = src/example
source = src/hash_http_content
omit =
branch = true

Expand Down
2 changes: 1 addition & 1 deletion .github/lineage.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,4 @@ version: "1"

lineage:
skeleton:
remote-url: https://github.com/cisagov/skeleton-generic.git
remote-url: https://github.com/cisagov/skeleton-python-library.git
2 changes: 2 additions & 0 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,8 @@ jobs:
${{ hashFiles('**/requirements.txt') }}"
restore-keys: |
${{ env.BASE_CACHE_KEY }}
- name: Download and extract a serverless-chrome binary
run: ./get_serverless_chrome_binary.sh
- name: Install dependencies
run: |
python -m pip install --upgrade pip
Expand Down
10 changes: 5 additions & 5 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ all of which should be in this repository.

If you want to report a bug or request a new feature, the most direct
method is to [create an
issue](https://github.com/cisagov/skeleton-python-library/issues) in
issue](https://github.com/cisagov/hash-http-content/issues) in
this repository. We recommend that you first search through existing
issues (both open and closed) to check if your particular issue has
already been reported. If it has then you might want to add a comment
Expand All @@ -25,7 +25,7 @@ one.
## Pull requests ##

If you choose to [submit a pull
request](https://github.com/cisagov/skeleton-python-library/pulls),
request](https://github.com/cisagov/hash-http-content/pulls),
you will notice that our continuous integration (CI) system runs a
fairly extensive set of linters, syntax checkers, system, and unit tests.
Your pull request may fail these checks, and that's OK. If you want
Expand Down Expand Up @@ -111,9 +111,9 @@ can create and configure the Python virtual environment with these
commands:

```console
cd skeleton-python-library
pyenv virtualenv <python_version_to_use> skeleton-python-library
pyenv local skeleton-python-library
cd hash-http-content
pyenv virtualenv <python_version_to_use> hash-http-content
pyenv local hash-http-content
pip install --requirement requirements-dev.txt
```

Expand Down
58 changes: 36 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,25 +1,39 @@
# skeleton-python-library #

[![GitHub Build Status](https://github.com/cisagov/skeleton-python-library/workflows/build/badge.svg)](https://github.com/cisagov/skeleton-python-library/actions)
[![Coverage Status](https://coveralls.io/repos/github/cisagov/skeleton-python-library/badge.svg?branch=develop)](https://coveralls.io/github/cisagov/skeleton-python-library?branch=develop)
[![Total alerts](https://img.shields.io/lgtm/alerts/g/cisagov/skeleton-python-library.svg?logo=lgtm&logoWidth=18)](https://lgtm.com/projects/g/cisagov/skeleton-python-library/alerts/)
[![Language grade: Python](https://img.shields.io/lgtm/grade/python/g/cisagov/skeleton-python-library.svg?logo=lgtm&logoWidth=18)](https://lgtm.com/projects/g/cisagov/skeleton-python-library/context:python)
[![Known Vulnerabilities](https://snyk.io/test/github/cisagov/skeleton-python-library/develop/badge.svg)](https://snyk.io/test/github/cisagov/skeleton-python-library)

This is a generic skeleton project that can be used to quickly get a
new [cisagov](https://github.com/cisagov) Python library GitHub
project started. This skeleton project contains [licensing
information](LICENSE), as well as
[pre-commit hooks](https://pre-commit.com) and
[GitHub Actions](https://github.com/features/actions) configurations
appropriate for a Python library project.

## New Repositories from a Skeleton ##

Please see our [Project Setup guide](https://github.com/cisagov/development-guide/tree/develop/project_setup)
for step-by-step instructions on how to start a new repository from
a skeleton. This will save you time and effort when configuring a
new repository!
# hash-http-content #

[![GitHub Build Status](https://github.com/cisagov/hash-http-content/workflows/build/badge.svg)](https://github.com/cisagov/hash-http-content/actions)
[![Coverage Status](https://coveralls.io/repos/github/cisagov/hash-http-content/badge.svg?branch=develop)](https://coveralls.io/github/cisagov/hash-http-content?branch=develop)
[![Total alerts](https://img.shields.io/lgtm/alerts/g/cisagov/hash-http-content.svg?logo=lgtm&logoWidth=18)](https://lgtm.com/projects/g/cisagov/hash-http-content/alerts/)
[![Language grade: Python](https://img.shields.io/lgtm/grade/python/g/cisagov/hash-http-content.svg?logo=lgtm&logoWidth=18)](https://lgtm.com/projects/g/cisagov/hash-http-content/context:python)
[![Known Vulnerabilities](https://snyk.io/test/github/cisagov/hash-http-content/develop/badge.svg)](https://snyk.io/test/github/cisagov/hash-http-content)

This is a Python library to retrieve the contents of a given URL via HTTP (or
HTTPS) and hash the processed contents.

## Content processing ##

If an encoding is detected, this package will convert content into the UTF-8
encoding before proceeding.

Additional content processing is currently implemented for the following types
of content:

* HTML
* JSON

### HTML ###

HTML content is processed by leveraging the
[pyppeteer](https://github.com/pyppeteer/pyppeteer) package to execute any
JavaScript on a retrieved page. The result is then parsed by
[Beautiful Soup](https://www.crummy.com/software/BeautifulSoup/) to reduce the
content to the human visible portions of a page.

### JSON ###

JSON content is processed by using the
[`json` library](https://docs.python.org/3/library/json.html) that is part of
the Python standard library. It is read in and then output in a deterministic
manner to adjust for any styling differences between content.

## Contributing ##

Expand Down
2 changes: 1 addition & 1 deletion bump_version.sh
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ set -o nounset
set -o errexit
set -o pipefail

VERSION_FILE=src/example/_version.py
VERSION_FILE=src/hash_http_content/_version.py

HELP_INFORMATION="bump_version.sh (show|major|minor|patch|prerelease|build|finalize)"

Expand Down
52 changes: 52 additions & 0 deletions get_serverless_chrome_binary.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
#!/usr/bin/env bash

set -o nounset
set -o errexit
set -o pipefail

function usage {
echo "Usage:"
echo " ${0##*/} [options]"
echo
echo "Options:"
echo " -h, --help Show the help message."
echo " -l, --latest Pull down the latest release on GitHub."
exit "$1"
}

# Defaults to a specific version for use in GitHub Actions
DOWNLOAD_URL="https://github.com/adieuadieu/serverless-chrome/releases/download/v1.0.0-57/stable-headless-chromium-amazonlinux-2.zip"
LOCAL_FILE="serverless-chrome.zip"
LOCAL_DIR="tests/files/"


# Get the URL of the latest stable release available
function get_latest_stable_url {
releases_url="https://api.github.com/repos/adieuadieu/serverless-chrome/releases"
# Get the URL for the latest release's assets
latest_assets=$(curl -s "$releases_url" | jq -r '.[0].assets_url')
# Download the zip for the stable branch
DOWNLOAD_URL=$(curl -s "$latest_assets" | jq -r '.[] | select(.browser_download_url | contains("stable")) | .browser_download_url')
}

while (( "$#" ))
do
case "$1" in
-h|--help)
usage 0
;;
-l|--latest)
get_latest_stable_url
shift 1
;;
-*)
usage 1
;;
esac
done

# Follow redirects and output as the specified file name
curl -L --output "$LOCAL_FILE" "$DOWNLOAD_URL"
# Extract the specified file to the specified directory and overwrite without
# prompting
unzip -o "$LOCAL_FILE" -d "$LOCAL_DIR"
27 changes: 17 additions & 10 deletions setup.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
"""
This is the setup module for the example project.
This is the setup module for the hash-http-content project.
Based on:
Expand Down Expand Up @@ -42,16 +42,16 @@ def get_version(version_file):


setup(
name="example",
name="hash-http-content",
# Versions should comply with PEP440
version=get_version("src/example/_version.py"),
description="Example python library",
version=get_version("src/hash_http_content/_version.py"),
description="HTTP content hasher",
long_description=readme(),
long_description_content_type="text/markdown",
# NCATS "homepage"
url="https://www.us-cert.gov/resources/ncats",
# The project's main homepage
download_url="https://github.com/cisagov/skeleton-python-library",
download_url="https://github.com/cisagov/hash-http-content",
# Author details
author="Cyber and Infrastructure Security Agency",
author_email="ncats@hq.dhs.gov",
Expand All @@ -77,13 +77,20 @@ def get_version(version_file):
],
python_requires=">=3.6",
# What does your project relate to?
keywords="skeleton",
keywords="hash http requests",
packages=find_packages(where="src"),
package_dir={"": "src"},
package_data={"example": ["data/*.txt"]},
py_modules=[splitext(basename(path))[0] for path in glob("src/*.py")],
include_package_data=True,
install_requires=["docopt", "schema", "setuptools >= 24.2.0"],
install_requires=[
"beautifulsoup4",
"docopt",
"lxml",
"pyppeteer",
"requests",
"schema",
"setuptools >= 24.2.0",
],
extras_require={
"test": [
"coverage",
Expand All @@ -99,6 +106,6 @@ def get_version(version_file):
"pytest",
]
},
# Conveniently allows one to run the CLI tool as `example`
entry_points={"console_scripts": ["example = example.example:main"]},
# Conveniently allows one to run the CLI tool as `hash-url`
entry_points={"console_scripts": ["hash-url = hash_http_content.cli:main"]},
)
1 change: 0 additions & 1 deletion src/example/data/secret.txt

This file was deleted.

108 changes: 0 additions & 108 deletions src/example/example.py

This file was deleted.

9 changes: 6 additions & 3 deletions src/example/__init__.py → src/hash_http_content/__init__.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,12 @@
"""The example library."""
"""The hash-http-content library."""
# Standard Python Libraries
from typing import List

# We disable a Flake8 check for "Module imported but unused (F401)" here because
# although this import is not directly used, it populates the value
# package_name.__version__, which is used to get version information about this
# Python package.
from ._version import __version__ # noqa: F401
from .example import example_div
from .hasher import UrlHasher, UrlResult

__all__ = ["example_div"]
__all__: List[str] = ["UrlHasher", "UrlResult"]
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
"""Code to run if this package is used as a Python module."""

from .example import main
from .cli import main

main()
File renamed without changes.

0 comments on commit 51f8eec

Please sign in to comment.