## Testing

* Problems that could occur in data science aren’t always easily detectable; you might have values being encoded incorrectly, features being used inappropriately, unexpected data breaking assumptions

* **TEST DRIVEN DEVELOPMENT:** a development process where you write tests for tasks before you even write the code to implement those tasks.

* **UNIT TEST:** a type of test that covers a “unit” of code, usually a single function, independently from the rest of the program.

**Resources**
* [Four Ways Data Science Goes Wrong and How Test Driven Data Analysis Can Help](https://www.predictiveanalyticsworld.com/machinelearningtimes/four-ways-data-science-goes-wrong-and-how-test-driven-data-analysis-can-help/6947/)
* [Ned Batchelder: Getting Started Testing - Slides](https://speakerdeck.com/pycon2014/getting-started-testing-by-ned-batchelder)
* [Ned Batchelder: Getting Started Testing - Videos](https://www.youtube.com/watch?v=FxSsnHeWQBY)

**Unit Test Advantages and Disadvantages:**

The advantage of unit tests is that they are isolated from the rest of your program, and thus, no dependencies are involved. They don't require access to databases, APIs, or other external sources of information. However, passing unit tests isn’t always enough to prove that our program is working successfully. To show that all the parts of our program work with each other properly, communicating and transferring data between them correctly, we use integration tests. In this lesson, we'll focus on unit tests; however, when you start building larger programs, you will want to use integration tests as well.

[Integeration Testing](https://www.fullstackpython.com/integration-testing.html)

## Unit Testing Tools:

### 1. pytest

To install `pytest`, run `pip install -U pytest` in your terminal. 

* Create a test file starting with test_
* Define unit test functions that start with test_ inside the test file
* Enter pytest into your terminal in the directory of your test file and it will detect these tests for you!

test_ is the default - if you wish to change this, you can learn how to in this pytest
[configuration](https://docs.pytest.org/en/latest/customize.html)


In the test output, periods represent successful unit tests and F's represent failed unit tests. Since all you see is what test functions failed, it's wise to have only one assert statement per test. Otherwise, you wouldn't know exactly how many tests failed, and which tests failed.

Your tests won't be stopped by failed assert statements, but it will stop if you have syntax errors.

### 2. unittest

check Ned Batchelder tutorial folder.

### Test Driven Developement

* **TEST DRIVEN DEVELOPMENT:** writing tests before you write the code that’s being tested. Your test would fail at first, and you’ll know you’ve finished implementing a task when this test passes.
* Tests can check for all the different scenarios and edge cases you can think of, before even starting to write your function. This way, when you do start implementing your function, you can run this test to get immediate feedback on whether it works or not in all the ways you can think of, as you tweak your function.
* When refactoring or adding to your code, tests help you rest assured that the rest of your code didn't break while you were making those changes. Tests also helps ensure that your function behavior is repeatable, regardless of external parameters, such as hardware and time.

**resources**

[Data Science and Test Driven Development](https://www.linkedin.com/pulse/data-science-test-driven-development-sam-savage/)

[Test-Driven Development for Data Science
](https://engineering.pivotal.io/post/test-driven-development-for-data-science/)

[Test Driven Development is essential for good data science. Here’s why.](https://medium.com/uk-hydrographic-office/test-driven-development-is-essential-for-good-data-science-heres-why-db7975a03a44)

[Testing Your Code - general python TDD](https://docs.python-guide.org/writing/tests/)

## Logging

Logging is valuable for understanding the events that occur while running your program. For example, if you run your model over night and see that it's producing ridiculous results the next day, log messages can really help you understand more about the context in which this occurred.

* Be professional and clear

`Bad: Hmmm... this isn't working???
Bad: idk.... :(
Good: Couldn't parse file.`

* Be concise and use normal capitalization

`Bad: Start Product Recommendation Process
Bad: We have completed the steps necessary and will now proceed with the recommendation process for the records in our product database.
Good: Generating product recommendations.`

* Choose the appropriate level for logging

`DEBUG - level you would use for anything that happens in the program.
ERROR - level to record any error that occurs
INFO - level to record all actions that are user-driven or system specific, such as regularly scheduled operations`

* Provide any useful information

`Bad: Failed to read location data
Good: Failed to read location data: store_id 8324971`

## Code Reviews

[Code Review](https://github.com/lyst/MakingLyst/tree/master/code-reviews)

[Code Review Best Practices](https://www.kevinlondon.com/2015/05/05/code-review-best-practices.html)

#### Questions to Ask Yourself When Conducting a Code Review

Is the code clean and modular?
* Can I understand the code easily?
* Does it use meaningful names and whitespace?
* Is there duplicated code?
* Can you provide another layer of abstraction?
* Is each function and module necessary?
* Is each function or module too long?

Is the code efficient?
* Are there loops or other steps we can vectorize?
* Can we use better data structures to optimize any steps?
* Can we shorten the number of calculations needed for any steps?
* Can we use generators or multiprocessing to optimize any steps?

Is documentation effective?
* Are in-line comments concise and meaningful?
* Is there complex code that's missing documentation?
* Do function use effective docstrings?
* Is the necessary project documentation provided?

Is the code well tested?
* Does the code high test coverage?
* Do tests check for interesting cases?
* Are the tests readable?
* Can the tests be made more efficient?

Is the logging effective?
* Are log messages clear, concise, and professional?
* Do they include all relevant and useful information?
* Do they use the appropriate logging level?

#### Tips for Conducting a Code Review

* Use a code linter

[pylint can automatically check for coding standards and PEP 8 guidelines](https://www.pylint.org/)

* Explain issues and make suggestions

`BAD: Make model evaluation code its own module - too repetitive.`

`BETTER: Make the model evaluation code its own module. This will simplify models.py to be less repetitive and focus primarily on building models.`

`GOOD: How about we consider making the model evaluation code its own module? This would simplify models.py to only include code for building models. Organizing these evaluations methods into separate functions would also allow us to reuse them with different models without repeating code.`

* Keep your comments objective

Try to avoid using the words "I" and "you" in your comments. You want to avoid comments that sound personal to bring the attention of the review to the code and not to themselves.

`BAD: I wouldn't groupby genre twice like you did here... Just compute it once and use that for your aggregations.`

`BAD: You create this groupby dataframe twice here. Just compute it once, save it as groupby_genre and then use that to get your average prices and views.`

`GOOD: Can we group by genre at the beginning of the function and then save that as a groupby object? We could then reference that object to get the average prices and views without computing groupby twice.`

* Provide code examples
