In [1]:
# run this to shorten the data import from the files
import os
cwd = os.path.dirname(os.getcwd())+'/'
path_data = os.path.join(os.path.dirname(os.getcwd()), 'datasets/')


# How frequently is a function tested?

Many data scientists do not think much about testing, and just do it in the manual way when necessary. But once you see the big picture i.e. the life cycle of a function over the entire project, you appreciate how important testing really is and how frequently you need to test things.

Which of the following is true about testing?

### Possible Answers


    A function is tested just once during its life cycle, which is right after the first implementation.
    
    
    A function is tested after the first implementation and then any time the function is modified, and this happens mainly when new bugs are found.
    
    
    A function is tested after the first implementation and then any time the function is modified, which happens mainly when new bugs are found, new features are implemented or the code is refactored. {Answer}
    
    
    A function is tested every time the function is executed.

**Exactly! If the project goes on for a few years, you may end up testing the same function over a hundred times because of new bugs, new feature requests and refactoring!**

In [13]:
def row_to_list(row):
    row = row.rstrip()
    separated_entries = row.split("\t")
    if len(separated_entries) == 2:
        return separated_entries
    return None

In [14]:
def row_to_list_bugfix(row):
    row = row.rstrip()
    separated_entries = row.split("\t")
    if len(separated_entries) == 2 and "" not in separated_entries:
        return separated_entries    
    return None

In [17]:
# exercise 01

"""
Manual testing

The function row_to_list(), which you met in the video lesson, has the following expected return values for the arguments listed below.
Argument 	Expected return value 	Explanation

"2,081\t314,942\n" 	["2,081", "314,942"] 	Correct row format
"\t293,410\n" 	None 	Missing area
"1,463238,765\n" 	None 	Missing tab separator

row_to_list() has been defined and imported for you. Your job is to test the function manually in the IPython console.

While testing manually, notice how many times you have to repeat the same steps! The point is to experience the inefficiency of manual testing.
"""

# Instructions

"""
Question

Call row_to_list() in the IPython console on the three arguments listed in the table. Do the actual return values match the expected return values listed in the table?
Possible answers :
    
    Yes, the implementation returns the expected value in each case.
    
    No, the function returns None for the argument "2,081\t314,942\n" instead of the expected return value ["2,081", "314,942"].
    
    No. the function returns ["", "293,410"] for the argument "\t293,410\n" instead of the expected return value None. {Answer}
    
    No, the function returns False for the argument "1,463238,765\n" instead of the expected return value None.
---
Question

In the last step, you discovered a bug in our implementation of row_to_list(). Good job!

We have implemented a corresponding bug fix in a new function row_to_list_bugfix(). Call row_to_list_bugfix() in the IPython console on the three arguments listed in the table. Do the actual return values now match the expected return values listed in the table?

Possible answers:

    Yes, the implementation returns the expected value in each case. {answer}
    
    No, the function returns None for the argument "2,081\t314,942\n" instead of the expected return value ["2,081", "314,942"].
    
    No. the function returns ["", "293,410"] for the argument "\t293,410\n" instead of the expected return value None.
    
    No, the function returns False for the argument "1,463238,765\n" instead of the expected return value None..
"""

# solution

values = ["2,081\t314,942\n", "\t293,410\n", "1,463238,765\n"]

print("row_to_list")
for i in values:
    print(row_to_list(i))

#----------------------------------#

print("\nrow_to_list_bugfix")
for i in values:
    print(row_to_list_bugfix(i))

#----------------------------------#

# Conclusion

"""
Well done! Did you notice how manual testing involves repeating the same steps over and over in the IPython console? In this exercise, you just went through a single bug discovery and fixing phase. Just imagine doing this a hundred times over the entire life cycle of row_to_list(), including new feature implementation and refactoring phases! Unit testing can automate these repetitive steps, so that testing becomes easier, and you will learn it in the next lesson ;-)
"""

row_to_list
['2,081', '314,942']
['', '293,410']
None

row_to_list_bugfix
['2,081', '314,942']
None
None


'\nWell done! Did you notice how manual testing involves repeating the same steps over and over in the IPython console? In this exercise, you just went through a single bug discovery and fixing phase. Just imagine doing this a hundred times over the entire life cycle of row_to_list(), including new feature implementation and refactoring phases! Unit testing can automate these repetitive steps, so that testing becomes easier, and you will learn it in the next lesson ;-)\n'

In [19]:
# exercise 02

"""
Your first unit test using pytest

The data file containing housing area and prices uses commas as thousands separators, e.g. "2,081" or "314,942", as you can see in the IPython Shell.

The convert_to_int() function takes a comma separated integer string as argument, and returns the integer. Therefore, the expected return value of convert_to_int("2,081") is the integer 2081.

This function is defined in the module preprocessing_helpers.py. But it is not known if the function is working properly.
"""

# Instructions

"""

    Import the pytest package.
---

    Import the function convert_to_int().
---

    Complete the name of the unit test by adding the prefix which pytest uses to distinguish unit tests from ordinary functions.
---

    Complete the assert statement to check if convert_to_int() returns the expected value for the argument "2,081".

"""

# solution

# Import the pytest package
import pytest

# Import the function convert_to_int()
from preprocessing_helpers import convert_to_int

# Complete the unit test name by adding a prefix
def test_on_string_with_one_comma():
  # Complete the assert statement
  assert convert_to_int("2,081") == 2081

#----------------------------------#

# Conclusion

"""
You just wrote your first unit test using pytest. Congratulations! A unit test takes 5 minutes to write, but saves you many hours in the future.
"""

'\nYou just wrote your first unit test using pytest. Congratulations! A unit test takes 5 minutes to write, but saves you many hours in the future.\n'

# What causes a unit test to fail?

In the test result report, the character ., as shown below, stands for a passing test. A passing test is good news as it means that your function works as expected. The character F stands for a failing test. A failing test is bad news as this means that something is broken.

test_row_to_list.py .F.                                                  [100%]

Which of the following describes best why a unit test fails?

### Possible Answers


    The assert statement passes.
    
    
    The assert statement cannot be run because an exception is raised while running the unit test code.
    
    
    The assert statement raises an AssertionError.
    
    
    An exception is raised when running the unit test. This could be an AssertionError raised by the assert statement or another exception, e.g. NameError, which is raised before the assert statement can run. {Answer}

**Exactly! If you get an AssertionError, this means the function has a bug and you should fix it. If you get another exception, e.g. NameError, this means that something else is wrong with the unit test code and you should fix it so that the assert statement can actually run.***

In [21]:
# exercise 03

"""
Spotting and fixing bugs

To find bugs in functions, you need to follow a four step procedure.

    1.Write unit tests.
    2.Run them.
    3.Read the test result report and spot the bugs.
    4.Fix the bugs.

In a previous exercise, you wrote a unit test for the function convert_to_int(), which is supposed to convert a comma separated integer string like "2,081" to the integer 2081. You also ran the unit test and discovered that it is failing.

In this exercise, you will read the test result report from that exercise in detail, and then spot and fix the bug. This would equip you with all basic skills to start using unit tests for your projects.

The convert_to_int() function is defined in the file preprocessing_helpers.py. The unit test is available in the test module test_convert_to_int.py.
"""

# Instructions

"""
Question

Run the unit test in the test module test_convert_to_int.py in the IPython console. Read the test result report and spot the bug.

Which of the following describes the bug in the function convert_to_int(), if any?
Possible answers:

    convert_to_int("2,081") is expected to return the string "2081", but it is actually returning the integer 2081.

    convert_to_int("2,081") is expected to return the integer 2081, but it is actually returning the string "2081". {Answer}
    
    convert_to_int("2,081") is expected to return the integer 2081, but it is actually returning the string "2,081".
    
    The function convert_to_int() does not have a bug.
---

    Fix the convert_to_int() function so that it returns the integer 2081 instead of the string "2081" for the argument "2,081".

"""

# solution

!pytest test_convert_to_int.py

#----------------------------------#

def convert_to_int(string_with_comma):
    # Fix this line so that it returns an int, not a str
    return int(string_with_comma.replace(",", ""))

#----------------------------------#

# Conclusion

"""
Good work! Your boss and colleagues are going to really appreciate your new skill of reading test result reports, and then spotting and fixing the bugs, because this would mean fewer bugs in the code base in the long term.
"""

platform linux -- Python 3.11.6, pytest-7.4.4, pluggy-1.3.0
rootdir: /home/nero/Documents/Estudos/DataCamp/Python/courses/unit-testing-for-data-science-in-python/scripts
collected 1 item                                                               [0m

test_convert_to_int.py [31mF[0m[31m                                                 [100%][0m

[31m[1m________________________ test_on_string_with_one_comma _________________________[0m

    [94mdef[39;49;00m [92mtest_on_string_with_one_comma[39;49;00m():[90m[39;49;00m
>     [94massert[39;49;00m convert_to_int([33m"[39;49;00m[33m2,081[39;49;00m[33m"[39;49;00m) == [94m2081[39;49;00m[90m[39;49;00m
[1m[31mE     AssertionError: assert '2081' == 2081[0m
[1m[31mE      +  where '2081' = convert_to_int('2,081')[0m

[1m[31mtest_convert_to_int.py[0m:6: AssertionError
[31mFAILED[0m test_convert_to_int.py::[1mtest_on_string_with_one_comma[0m - AssertionError: assert '2081' == 2081


'\nGood work! Your boss and colleagues are going to really appreciate your new skill of reading test result reports, and then spotting and fixing the bugs, because this would mean fewer bugs in the code base in the long term.\n'

# Benefits of unit testing

You have been invited to a meeting where company executives are discussing whether developers should write unit tests. The CEO is unsure, and asks you about the benefits that unit testing might bring. In your response, which of the following benefits should you include?

    Time savings, leading to faster development of new features.
    Better user experience due to faster code execution.
    Improved documentation, which will help new colleagues understand the code base better.
    More user trust in the software product.
    Better user experience due to improved visualizations.
    Better user experience due to reduced downtime.

### Possible Answers


    1, 2, 4 and 6.
    
    
    1, 2, 3, 4 and 5.
    
    
    1, 3, 4 and 6. {Answer}
    
    
    All of them i.e. 1-6.

**You steered the CEO in the right direction! Time savings and reduced downtime are the major benefits of unit testing, while improved documentation and more user trust are great side effects.**

# Unit tests as documentation

Assume that you are a new collaborator of our linear regression project on housing area and prices.

While inspecting the project, you come across a function mystery_function() in the feature module. You want to figure out what this function does. As you know, reading the unit tests might give you the answer quickly!

The unit tests for the function is available in the test module test_mystery_function.py. You can read it, and any other file that you encounter, by using the !cat command in the IPython shell.

Having read the unit tests, can you guess what mystery_function() does?

### Possible answers
It converts data in a data file into a NumPy array. {Answer}
It slices a NumPy array and returns the first two rows.
It checks if data in a data file is clean. If clean, it returns True. If dirty, it returns False.

**You guessed it right and you didn't even take a look at the function definition! This is why - when onboarding new colleagues - it is a good idea to tell them to look at the unit tests if they are not sure about a function's purpose. In Chapter 2, you will see more functions from the feature and models module, and write more advanced unit tests using new pytest features.**
