In [1]:
%run ./bonus_assignment.ipynb

@register_cell_magic
def fakedata_testing(line, cell):
    # wrapping everything inside a giant try-except block
    try:
        fakedata(line,cell)
        print("No exceptions encountered while generating DataFrames 🐼🎉🎈🎊")
    except Exception as e:
        template = "🚫🚫🚫🚫🚫 An exception of type {0} occurred. Arguments:\n{1!r}"
        message = template.format(type(e).__name__, e.args)
        print(message)

@register_line_magic
def test(line):
    try:
        exec(line)
        print("PASSED")
    except Exception as e:
        print("FAILED {}".format(e))

## How to test
* Use `%%fakedata_testing` instead of `%%fakedata` for cell magic.
* Use `%test` line magic to do simple assertion tests
* Everytime you make changes to `bonus_assignment.ipynb`, you must save that file and "Restart-Run all" in this notebook.
* Exceptions are suppressed so that all tests are run

Sample for defining the domain specific language:
```
%%fakedata_testing
persons
-------
first_name
last_name*
phone_number
random_number(5) as customer_number [1]*

purchases
---------
isbn10
credit_card_full
random_number(3) as price
random_number(5) as customer_number [1]
```

Sample for testing the generated DataFrame
```
%test assert len(persons) == 99,'"persons" should have 99 rows'
```

## An attempt at a more formal grammar

```
function_to_call  ::= <wordcharacters>
parameters        ::= "" | "(" ( wordcharacters | number ) ")"
as_name           ::= "" | "as" <whitespace> <wordcharacters>
column_name       ::= as_name | function_to_call
reference         ::= "" |  "[" number "]"
unique_mark       ::= "" | "*"
column_definition ::= <function_to_call> <parameters> <whitespace> \
                      <as_name> <whitespace> <reference> <unique_mark>
df_sep            ::= "--" ("-"*)
df_definition     ::= <wordcharacters> <newline> <df_sep> <newline> \
                      (<column_definition>*) <newline> <newline>
language_spec     ::= <def_definition>*
```
* *Note 1: The `<whitespace>` between `<parameters>` should likely be part of `<as_name>`*
* *Note 2: The `<whitespace>` after `<as_name>` should probably be optional*

In [11]:
%%fakedata_testing
persons
-------
first_name 
last_name *
phone_number
random_number(5) as customer_number [1]*

purchases
---------
isbn10
credit_card_full
random_number(3) as price
random_number(5) as customer_number [1]



No exceptions encountered while generating DataFrames 🐼🎉🎈🎊


In [16]:
%test assert len(persons) == 99,'"persons" should have 99 rows'
%test assert len(purchases) == 99,'"purchases" should have 99 rows'
%test assert len(persons.last_name.value_counts()) == 99,'"persons.last_name" should have 99 unique values'
%test assert len(persons.customer_number.value_counts()) == 99,'"persons.customer_number" should have 99 unique values'
%test assert len(purchases.customer_number.value_counts()) < 99,'"purchases.customer_numbers" should have less than 99 unique values'
%test assert purchases.customer_number.isin(persons.customer_number).all(), '"purchases.customer_number" should be a subset of "persons.customer_number"'
%test assert (purchases.columns == ['isbn10', 'credit_card_full', 'price', 'customer_number']).all(), '''"purchase.columns" should be ['isbn10', 'credit_card_full', 'price', 'customer_number']'''

0       Pollard
1     Blackburn
2           Ray
3        Glover
4        Greene
        ...    
94       Cooper
95     Alvarado
96       Cannon
97    Zimmerman
98       Melton
Name: first_name, Length: 99, dtype: object
PASSED
PASSED
PASSED
PASSED
PASSED
PASSED
PASSED


---
* test some other faker providers

In [4]:
%%fakedata_testing
companies
-------
company *
catch_phrase
country
date_time_this_century as established



No exceptions encountered while generating DataFrames 🐼🎉🎈🎊


In [5]:
%test assert len(companies) == 99,'"companies" should have 99 rows'
%test assert len(companies.company.value_counts()) == 99,'"companies.company" should have 99 unique values'
%test assert (companies.columns == ['company', 'catch_phrase', 'country', 'established']).all(), '''"purchase.columns" should be ['company', 'catch_phrase', 'country', 'established']'''

PASSED
PASSED
PASSED


---
* test for multiple references

In [6]:
%%fakedata_testing
familypersons
-------
first_name
last_name [2]*
random_number(5) as person_number [1]*

families
-----
last_name [2]
address



No exceptions encountered while generating DataFrames 🐼🎉🎈🎊


In [7]:
%test assert len(families) == 99,'"families" should have 99 rows'
%test assert len(families.last_name.value_counts()) < 99,'"families.last_name" should have less than 99 unique values'
%test assert families.last_name.isin(familypersons.last_name).all(), '"families.last_name" should be a subset of "persons.last_name"'

PASSED
PASSED
PASSED


---
* test for the creation of an empty DataFrame

In [8]:
%%fakedata_testing
empty_test_df
---



No exceptions encountered while generating DataFrames 🐼🎉🎈🎊


In [9]:
%test assert isinstance(empty_test_df, pd.DataFrame),'"empty_test_df" should be a DataFrame'
%test assert len(empty_test_df.columns) == 0,'"empty_test_df" should have 0 columns'

PASSED
PASSED
