Skip to content

Commit

Permalink
[bb tests][xl]: testing entire flow by running the pipelines (#59)
Browse files Browse the repository at this point in the history
* Fixtures for simple, excel, multi-resource and processing datasets
* inputs and expected results
* new setup for requirements, tox and Travis
* using moto_server to allow S3 operations
fixes #48
  • Loading branch information
zelima committed Oct 26, 2017
1 parent dc4192c commit 5ce6f46
Show file tree
Hide file tree
Showing 27 changed files with 882 additions and 3 deletions.
9 changes: 9 additions & 0 deletions .travis.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
sudo:
required

services:
- elasticsearch

dist:
trusty

Expand All @@ -21,5 +24,11 @@ install:
script:
- make test

before_script:
- moto_server &
- sleep 30
- curl localhost:9200
- curl localhost:5000

after_success:
- coveralls
4 changes: 3 additions & 1 deletion datapackage_pipelines_assembler/generator.py
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,9 @@ def s3_path(*parts):
else:
path = '/'.join(str(p) for p in parts)
bucket = os.environ['PKGSTORE_BUCKET']
return 'https://{}/{}'.format(bucket, path)
# Handle other s3 compatible server as well (for testing)
protocol = os.environ.get('S3_ENDPOINT_URL') or 'https://'
return '{}{}/{}'.format(protocol, bucket, path)


class Generator(GeneratorBase):
Expand Down
5 changes: 4 additions & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ def read(*paths):
INSTALL_REQUIRES = [
'datapackage-pipelines',
'datapackage-pipelines-elasticsearch>=0.0.3',
'datapackage-pipelines-aws>=0.0.8',
'datapackage-pipelines-aws>=0.0.9',
'psycopg2',
'tweepy',
'facebook-sdk',
Expand All @@ -29,6 +29,9 @@ def read(*paths):
TESTS_REQUIRE = [
'pylama',
'tox',
'moto',
'boto3',
'google-compute-engine'
]
README = read('README.md')
VERSION = read(PACKAGE, 'VERSION')
Expand Down
21 changes: 21 additions & 0 deletions tests/data/sample_birthdays.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
date,first_name,last_name
2016-10-15,Shaylynn,Eallis
2017-01-18,Patricia,Eefting
2017-03-01,Karrah,Couser
2017-03-17,Rhetta,Price
2016-12-23,Alexandros,Farrand
2017-05-01,Ado,Matejic
2016-10-28,Keene,Tonna
2017-01-31,Helena,Aiskovitch
2017-02-11,Leigh,Butner
2017-01-19,Perle,Work
2016-11-16,Delora,Pavolillo
2017-09-21,Marshall,Leall
2017-04-28,Olwen,Mullin
2016-12-27,Nerta,Enrique
2016-12-07,Ashlie,Bracey
2017-05-18,Dode,Ritmeier
2016-10-16,Agace,Kew
2017-04-08,Beckie,Dove
2017-01-20,Filippa,McPolin
2017-03-19,Madison,Sheekey
Binary file added tests/data/sample_birthdays.xlsx
Binary file not shown.
24 changes: 24 additions & 0 deletions tests/data/sample_birthdays_invalid.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
First three rows need to be removed
headers need to be reset
and dates need to be normalized
DATE,FIRST NAME ,LAST NAME
2016-10-15,Shaylynn,Eallis
2017-1-18,Patricia,Eefting
2017-3-1,Karrah,Couser
2017-3-17,Rhetta,Price
2016-12-23,Alexandros,Farrand
2017-5-1,Ado,Matejic
2016-10-28,Keene,Tonna
2017-1-31,Helena,Aiskovitch
2017-2-11,Leigh,Butner
2017-1-19,Perle,Work
2016-11-16,Delora,Pavolillo
2017-9-21,Marshall,Leall
2017-4-28,Olwen,Mullin
2016-12-27,Nerta,Enrique
2016-12-7,Ashlie,Bracey
2017-5-18,Dode,Ritmeier
2016-10-16,Agace,Kew
2017-4-8,Beckie,Dove
2017-1-20,Filippa,McPolin
2017-3-19,Madison,Sheekey
21 changes: 21 additions & 0 deletions tests/data/sample_emails.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
id,email
1,cjozsika0@github.io
2,smegainey1@twitter.com
3,eyesson2@mail.ru
4,aigoe3@usa.gov
5,tkalinowsky4@tamu.edu
6,eprime5@paypal.com
7,fwinchcum6@drupal.org
8,rrivilis7@nationalgeographic.com
9,mbrisley8@creativecommons.org
10,dmacavddy9@stumbleupon.com
11,lbromwicha@hostgator.com
12,kvargab@fotki.com
13,hlintsc@ning.com
14,hravenscraftd@nhs.uk
15,dtrencharde@chicagotribune.com
16,zkurtisf@ucla.edu
17,aindeg@wordpress.com
18,nadaneth@eepurl.com
19,kwerneri@msn.com
20,nmeardonj@springer.com
16 changes: 16 additions & 0 deletions tests/inputs/excel/assembler.source-spec.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
meta:
dataset: excel
findability: published
owner: datahub
ownerid: datahub
version: 1
inputs:
- kind: datapackage
parameters:
resource-mapping:
birthdays: ../../data/sample_birthdays.xlsx
url: datapackage.json
outputs:
- kind: zip
parameters:
out-file: 'excel.zip'
27 changes: 27 additions & 0 deletions tests/inputs/excel/datapackage.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
{
"name": "excel",
"resources": [
{
"name": "birthdays",
"path": "data/birthdays.xlsx",
"format": "xlsx",
"schema": {
"fields": [
{
"name": "date",
"type": "date"
},
{
"name": "first_name",
"type": "string"
},
{
"name": "last_name",
"type": "string"
}
],
"primaryKey": "date"
}
}
]
}
17 changes: 17 additions & 0 deletions tests/inputs/multiple_files/assembler.source-spec.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
meta:
dataset: multiple-files
findability: published
owner: datahub
ownerid: datahub
version: 1
inputs:
- kind: datapackage
parameters:
resource-mapping:
birthdays: ../../data/sample_birthdays.csv
emails: ../../data/sample_emails.csv
url: datapackage.json
outputs:
- kind: zip
parameters:
out-file: 'multiple-files.zip'
44 changes: 44 additions & 0 deletions tests/inputs/multiple_files/datapackage.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
{
"name": "multiple-files",
"resources": [
{
"name": "birthdays",
"path": "data/birthdays.csv",
"format": "csv",
"schema": {
"fields": [
{
"name": "date",
"type": "date"
},
{
"name": "first_name",
"type": "string"
},
{
"name": "last_name",
"type": "string"
}
],
"primaryKey": "date"
}
},
{
"name": "emails",
"path": "data/emails.csv",
"format": "csv",
"schema": {
"fields": [
{
"name": "id",
"type": "number"
},
{
"name": "email",
"type": "string"
}
]
}
}
]
}
26 changes: 26 additions & 0 deletions tests/inputs/needs_processing/assembler.source-spec.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
meta:
dataset: single-file-processed
findability: published
owner: datahub
ownerid: datahub
version: 1
inputs:
- kind: datapackage
parameters:
resource-mapping:
birthdays: ../../data/sample_birthdays_invalid.csv
url: datapackage.json
processing:
-
input: birthdays
tabulator:
skip_rows: 4
headers:
- date
- first_name
- last_name
output: birthdays
outputs:
- kind: zip
parameters:
out-file: 'single-file-processed.zip'
27 changes: 27 additions & 0 deletions tests/inputs/needs_processing/datapackage.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
{
"name": "single-file-processed",
"resources": [
{
"name": "birthdays",
"path": "data/birthdays.csv",
"format": "csv",
"schema": {
"fields": [
{
"name": "date",
"type": "date"
},
{
"name": "first_name",
"type": "string"
},
{
"name": "last_name",
"type": "string"
}
],
"primaryKey": "date"
}
}
]
}
16 changes: 16 additions & 0 deletions tests/inputs/single_file/assembler.source-spec.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
meta:
dataset: single-file
findability: published
owner: datahub
ownerid: datahub
version: 1
inputs:
- kind: datapackage
parameters:
resource-mapping:
birthdays: ../../data/sample_birthdays.csv
url: datapackage.json
outputs:
- kind: zip
parameters:
out-file: 'single-file.zip'
27 changes: 27 additions & 0 deletions tests/inputs/single_file/datapackage.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
{
"name": "single-file",
"resources": [
{
"name": "birthdays",
"path": "data/birthdays.csv",
"format": "csv",
"schema": {
"fields": [
{
"name": "date",
"type": "date"
},
{
"name": "first_name",
"type": "string"
},
{
"name": "last_name",
"type": "string"
}
],
"primaryKey": "date"
}
}
]
}
21 changes: 21 additions & 0 deletions tests/outputs/csv/sample_birthdays.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
date,first_name,last_name
2016-10-15,Shaylynn,Eallis
2017-01-18,Patricia,Eefting
2017-03-01,Karrah,Couser
2017-03-17,Rhetta,Price
2016-12-23,Alexandros,Farrand
2017-05-01,Ado,Matejic
2016-10-28,Keene,Tonna
2017-01-31,Helena,Aiskovitch
2017-02-11,Leigh,Butner
2017-01-19,Perle,Work
2016-11-16,Delora,Pavolillo
2017-09-21,Marshall,Leall
2017-04-28,Olwen,Mullin
2016-12-27,Nerta,Enrique
2016-12-07,Ashlie,Bracey
2017-05-18,Dode,Ritmeier
2016-10-16,Agace,Kew
2017-04-08,Beckie,Dove
2017-01-20,Filippa,McPolin
2017-03-19,Madison,Sheekey
24 changes: 24 additions & 0 deletions tests/outputs/csv/sample_birthdays_invalid.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
First three rows need to be removed
headers need to be reset
and dates need to be normalized
DATE,FIRST NAME ,LAST NAME
2016-10-15,Shaylynn,Eallis
2017-1-18,Patricia,Eefting
2017-3-1,Karrah,Couser
2017-3-17,Rhetta,Price
2016-12-23,Alexandros,Farrand
2017-5-1,Ado,Matejic
2016-10-28,Keene,Tonna
2017-1-31,Helena,Aiskovitch
2017-2-11,Leigh,Butner
2017-1-19,Perle,Work
2016-11-16,Delora,Pavolillo
2017-9-21,Marshall,Leall
2017-4-28,Olwen,Mullin
2016-12-27,Nerta,Enrique
2016-12-7,Ashlie,Bracey
2017-5-18,Dode,Ritmeier
2016-10-16,Agace,Kew
2017-4-8,Beckie,Dove
2017-1-20,Filippa,McPolin
2017-3-19,Madison,Sheekey
21 changes: 21 additions & 0 deletions tests/outputs/csv/sample_emails.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
id,email
1,cjozsika0@github.io
2,smegainey1@twitter.com
3,eyesson2@mail.ru
4,aigoe3@usa.gov
5,tkalinowsky4@tamu.edu
6,eprime5@paypal.com
7,fwinchcum6@drupal.org
8,rrivilis7@nationalgeographic.com
9,mbrisley8@creativecommons.org
10,dmacavddy9@stumbleupon.com
11,lbromwicha@hostgator.com
12,kvargab@fotki.com
13,hlintsc@ning.com
14,hravenscraftd@nhs.uk
15,dtrencharde@chicagotribune.com
16,zkurtisf@ucla.edu
17,aindeg@wordpress.com
18,nadaneth@eepurl.com
19,kwerneri@msn.com
20,nmeardonj@springer.com
Binary file added tests/outputs/excel/sample_birthdays.xlsx
Binary file not shown.

0 comments on commit 5ce6f46

Please sign in to comment.