Skip to content

Commit

Permalink
Updated readme to include example server responses. (#24)
Browse files Browse the repository at this point in the history
* Updated readme to include example server responses.

* Added redis server and explicit tests.

* Fixed error in redis test.

* Updated redis hostname to redis.

* Added port forwarding to github actions.
  • Loading branch information
tjacovich committed May 2, 2023
1 parent c731dd0 commit 066c2d8
Show file tree
Hide file tree
Showing 8 changed files with 74 additions and 15 deletions.
16 changes: 16 additions & 0 deletions .github/workflows/python_actions.yml
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,22 @@ jobs:
--health-timeout 5s
--health-retries 5
redis:
image: redis
env:
REDIS_HOST: localhost
REDIS_PORT: 6379
ports:
- 6379:6379
# Set health checks to wait until redis has started
options: >-
--health-cmd "redis-cli ping"
--health-interval 10s
--health-timeout 5s
--health-retries 5
steps:
- uses: actions/checkout@v2
- uses: actions/setup-python@v2
Expand Down
38 changes: 27 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
[![Python CI actions](https://github.com/tjacovich/ADSHarvesterPipeline/actions/workflows/python_actions.yml/badge.svg)](https://github.com/tjacovich/ADSHarvesterPipeline/actions/workflows/python_actions.yml) [![Coverage Status](https://coveralls.io/repos/github/tjacovich/SciXHarvesterPipeline/badge.svg)](https://coveralls.io/github/tjacovich/SciXHarvesterPipeline)
![Harvester Pipeline Flowchart](README_assets/Harvester_implementation.png?raw=true "Harvester Pipeline Flowchart")

## Setting Up a Development Environment.
### Installing dependencies and hooks
# Setting Up a Development Environment
## Installing dependencies and hooks
This project uses `pyproject.toml` to install necessary dependencies and otherwise set up a working development environment. To set up a local working environment, simply run the following:
```bash
virtualenv .venv
Expand All @@ -20,15 +20,16 @@ export GRPC_PYTHON_BUILD_SYSTEM_OPENSSL=1
export GRPC_PYTHON_BUILD_SYSTEM_ZLIB=1
```

### Testing with pytest
Tests can be run from the `SciXHarvester` directory using pytest:
## Testing with pytest
Tests can be run from the main directory using `pytest`:
```bash
cd SciXHarvester/
py.test
pytest
```

### Testing Against Kafka
#### The Kafka Environment
The `pytest` command line arguments are already specified in `pyproject.toml`.

## Testing Against Kafka
### The Kafka Environment
In order to set up a full development environment, a kafka instance must be created that contains at least:
- kafka broker
- kafka zookeeper
Expand All @@ -46,7 +47,7 @@ For `postgres`, we will need a database to store harvester data. We will also n
We will also need to create Harvester input and output topics which can be done either through python or by using the `kafka-ui`.
The relevant `AVRO` schemas from `SciXHarvester/AVRO_schemas/` must also be added to the schema registry using either python or the UI.

#### Launching The Harvester
### Launching The Harvester
The harvester requires `librdkafka` be installed. The source can be found [here](https://github.com/edenhill/librdkafka).
Installation on most `nix systems can be done by running the following:
```bash
Expand All @@ -61,7 +62,7 @@ python3 run.py HARVESTER_API
python3 run.py HARVESTER_APP
```

## Sending commands to the gRPC API
# Sending commands to the gRPC API

Currently, there are two methods that have been defined in the API for interacting with the Harvester Pipeline.
- `HARVESTER_INIT`: Initialize a job with given `job_args` passed into the script as a JSON.
Expand All @@ -76,7 +77,22 @@ python3 API/harvester_client.py HARVESTER_INIT --task "ARXIV" --task_args '{"har
python3 API/harvester_client.py HARVESTER_MONITOR --job_id '<job_id>'
```


The response will be a `json` decoding of the `AVRO` message returned from the server:
```bash
#HARVESTER_MONITOR
$ python3 SciXHarvester/API/harvester_client.py HARVESTER_MONITOR --job_id 'ad27dd32db6e6985e77f61efaf42d9657c7ef763f54044f955026ff4cccdfe9e'
---
{'hash': 'ad27dd32db6e6985e77f61efaf42d9657c7ef763f54044f955026ff4cccdfe9e', 'id': None, 'task': 'MONITOR', 'status': 'Success', 'task_args': {'ingest': None, 'ingest_type': None, 'daterange': None, 'resumptionToken': None, 'persistence': False}}
```
```bash
#HARVESTER_INIT with persistent connection.
$ python3 SciXHarvester/API/harvester_client.py HARVESTER_INIT --task "ARXIV" --task_args '{"ingest": "True", "ingest_type": "metadata", "daterange":"2023-05-02"}' --persistence
---
{'hash': 'ad27dd32db6e6985e77f61efaf42d9657c7ef763f54044f955026ff4cccdfe9e', 'id': None, 'task': 'ARXIV', 'status': 'Pending', 'task_args': {'ingest': True, 'ingest_type': 'metadata', 'daterange': '2023-05-02', 'resumptionToken': None, 'persistence': None}}
{'hash': 'ad27dd32db6e6985e77f61efaf42d9657c7ef763f54044f955026ff4cccdfe9e', 'id': None, 'task': 'ARXIV', 'status': 'Pending', 'task_args': {'ingest': True, 'ingest_type': 'metadata', 'daterange': '2023-05-02', 'resumptionToken': None, 'persistence': None}}
{'hash': 'ad27dd32db6e6985e77f61efaf42d9657c7ef763f54044f955026ff4cccdfe9e', 'id': None, 'task': 'ARXIV', 'status': 'Processing', 'task_args': {'ingest': True, 'ingest_type': 'metadata', 'daterange': '2023-05-02', 'resumptionToken': None, 'persistence': None}}
{'hash': 'ad27dd32db6e6985e77f61efaf42d9657c7ef763f54044f955026ff4cccdfe9e', 'id': None, 'task': 'ARXIV', 'status': 'Success', 'task_args': {'ingest': True, 'ingest_type': 'metadata', 'daterange': '2023-05-02', 'resumptionToken': None, 'persistence': None}}
```
## Maintainers

Taylor Jacovich
Binary file modified SciXHarvester/.coverage
Binary file not shown.
2 changes: 1 addition & 1 deletion SciXHarvester/API/harvester_server.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ def __init__(self, logger):
class Listener(Thread):
def __init__(self):
self.redis = redis.StrictRedis(
config.get("REDIS_HOST", "locahost"),
config.get("REDIS_HOST", "localhost"),
config.get("REDIS_PORT", 6379),
charset="utf-8",
decode_responses=True,
Expand Down
2 changes: 1 addition & 1 deletion SciXHarvester/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
SQLALCHEMY_URL = "postgresql://harvester:harvester@localhost:5432/harvester_pipeline_test"
SQLALCHEMY_ECHO = False
# REDIS Configuration
REDIS_HOST = "redis"
REDIS_HOST = "localhost"
REDIS_PORT = 6379
# Kafka Configuration
KAFKA_BROKER = "kafka:9092"
Expand Down
27 changes: 27 additions & 0 deletions SciXHarvester/tests/API/test_redis.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
import json
import logging
from unittest import TestCase

import redis

import SciXHarvester.API.harvester_server as hs
from harvester.db import write_status_redis


class TestRedisReadWrite(TestCase):
def test_redis_read_write(self):
listener = hs.Listener()
listener.subscribe()
job_id = "1234234215"
status = "Success"
logger = hs.Logging(logging)
redis_status = json.dumps({"job_id": job_id, "status": status})
redis_instance = redis.StrictRedis(
"localhost",
6379,
decode_responses=True,
)
write_status_redis(redis_instance, redis_status)
status = next(listener.get_status_redis(job_id, logger.logger))
print(status)
self.assertEqual(status, status)
2 changes: 1 addition & 1 deletion SciXHarvester/tests/harvester/test_harvester.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ def test_harvester_task(self):

def test_writing_harvester_output(self):
mock_app = Harvester_APP(proj_home="SciXHarvester/tests/stubdata/")
record_id = uuid.UUID("00052bae-8bdd-4dd1-b0d4-d4893189b71c")
record_id = uuid.uuid4()
date = "2023-04-28 17:48:29.354791"
s3_key = "/20230428/7eceaca5-9b62-4e10-a153-a882b209df9f"
checksum = "947e77d2c4b4ec4ffb55a089e92bc538"
Expand Down
2 changes: 1 addition & 1 deletion SciXHarvester/tests/stubdata/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
SQLALCHEMY_URL = "postgresql://harvester:harvester@localhost:5432/harvester_pipeline_test"
SQLALCHEMY_ECHO = False
# REDIS Configuration
REDIS_HOST = "redis"
REDIS_HOST = "localhost"
REDIS_PORT = 6379
# Kafka Configuration
KAFKA_BROKER = "kafka:9092"
Expand Down

0 comments on commit 066c2d8

Please sign in to comment.