Skip to content

Commit

Permalink
Continued working on the tutorial.
Browse files Browse the repository at this point in the history
  • Loading branch information
Alex Meadows committed Jul 29, 2014
1 parent 0539011 commit c001de7
Show file tree
Hide file tree
Showing 3 changed files with 65 additions and 2 deletions.
1 change: 0 additions & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,6 @@ script:
- "coverage run --source=etltest setup.py test"
- 'tox'
before_script:
- mysql -e 'create database etlUnitTest;'
- mysql -e 'source scripts/etlUnitTest_build.sql'
- wget http://sourceforge.net/projects/pentaho/files/Data%20Integration/5.0.1-stable/pdi-ce-5.0.1.A-stable.zip/download
- unzip download
Expand Down
63 changes: 62 additions & 1 deletion docs/source/tutorial/creating_sample_data_set.rst
Original file line number Diff line number Diff line change
@@ -1,2 +1,63 @@
Creating A Sample Data Set
==========================
==========================

Now that we have written our three tests, it's time to create a data set so that we can accurately test them.
Remember, we have three tests that will require data:

* Does first name get lower cased?
* Does an upper case first name not return as upper case in the target table?
* Does the birthday field get impacted by the data integration code?

First, let's create a new folder in our data directory (default is ``${ETL_TEST_ROOT}/Documents/etlTest/data``).::

cd ${ETL_TEST_ROOT}/Documents/etlTest/data
mkdir etlUnitTest

We created the ``etlUnitTest`` directory because that is the source where the data set we're about to create lives.
Since the ``users`` table is the source for our data integration, we should create a new YAML file called users.yml .::

touch etlUnitTest/users.yml
vi etlUnitTest/users.yml

.. include:: yaml_details_stub.rst

Now let's actually build our data set. Remember, we need a data set that will meet the requirements for our tests.
For our first record, let's include a standard, run of the mill users table record.::

1:
# Generic record from the users table.
user_id: 1
first_name: Bob
last_name: Richards
birthday: 2000-01-04
zipcode: 55555

Notice, the record is identified uniquely with ``1`` and that all the fields for record one are indented two spaces
to indicate they are all together. To give a value to a field, we just put a colon followed by a space and then the
value we need for it. i.e. ``column_name: column_value``.

The record we just created will work fine for our first test case, but what do we do for the next one? We could copy
the record and change the first_name field to ``BOB``, but that could run the risk of test collision when our test
suites and data sets get larger. Let's build a new record specific to this test: ::

1:
# Generic record from the users table.
user_id: 1
first_name: Bob
last_name: Richards
birthday: 2000-01-04
zipcode: 55555
2:
# Record for first_name all upper case.
user_id: 2
first_name: SARAH
last_name: Jenkins
birthday: 2000-02-02
zipcode: 12345

We indicate a new record in the YAML file by removing any indentation in the next line after the zipcode column for
record one and give our record another unique identifier (this time ``2``). We use the same column names as before,
but we now have a record that has an entirely upper-cased first_name field.

For the third test case, we could create a new record or we can utilize one of the existing records to test if the
birthday field is manipulated. For the birthday test, we will use record one. Now we can work on building our tests.
3 changes: 3 additions & 0 deletions docs/source/tutorial/yaml_details_stub.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
YAML (which stands for YAML Ain't a Markup Language) was designed to provide some of the same capabilities of XML
without the verboseness. To find out more about YAML, head over to `The Official YAML Website <http://www.yaml
.org/>`_ .

0 comments on commit c001de7

Please sign in to comment.