New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sample data for tests #199

Merged
merged 5 commits into from Mar 6, 2015

Conversation

Projects
None yet
2 participants
@ewr
Copy link
Contributor

ewr commented Mar 6, 2015

This is intended to address #37, and to make work on the clean and load functions easier.

Calling manage.py downloadcalaccessrawdata --use-test-data will run the clean and load steps using sampled versions of the TSVs, which are included in example/test-data.

You can generate new sampled test data (assuming you already have the full data set downloaded), by running manage.py createsampledrawdata. Optionally, you can specify the number of lines to include from each file with manage.py createsampledrawdata --sample-rows=3000. Sampled rows are taken from across the files.

Eventually, a sampling mechanism will probably need to account for relationships inside the data. This version does not do that.

ewr added some commits Mar 5, 2015

First pass at management command to create sampled raw data
To create sampled data, you need to have already the raw data file
downloaded and unzipped (with the data files sitting in data/tsv).

Run `manage.py createsampledrawdata` to create sampled TSV data files in
`data/sampled`.

FIXME: There's currently no way to use these files once they are generated.
Use test-data dir for sampled data. Add `--use-test-data` to download
Sampled data now lives in `example/test-data/` instead of just a
subdirectory on `example/data` (or wherever your data dir is).

To clean and load using this sampled data, run:

    manage.py downloadcalaccessrawdata --use-test-data

This automatically skips the download, unzip, prep and clear steps.
Add sampled data in `example/test-data`
This adds 7.2MB of TSV files, sampled from the original Cal Access data. It
is intended to address #37, which asks for sample data for unit tests.

palewire added a commit that referenced this pull request Mar 6, 2015

@palewire palewire merged commit 2ac26e4 into california-civic-data-coalition:master Mar 6, 2015

1 check failed

continuous-integration/travis-ci/pr The Travis CI build failed
Details
@palewire

This comment has been minimized.

Copy link
Member

palewire commented Mar 6, 2015

Your incentive to return to #NICAR15: There's a t-shirt with your name on it. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment