# Python Data Science Toolbox (Part 1)

These are my notes for DataCamp's course [_Python Data Science Toolbox (Part 1)_](https://www.datacamp.com/courses/python-data-science-toolbox-part-1).

This course is presented by Hugo Bowne-Anderson, formerly Data Scientist at DataCamp. The collaborator is Francisco Castro.

Prerequisite:

- [_Intermediate Python_](../Intermediate%20Python/Intermediate%20Python.ipynb)

This course is part of these tracks:

- Data Scientist with Python
- Data Scientist Professional with Python
- Python Fundamentals
- Python Programmer

## Data Set

| Name | File |
| :--- | :--- |
| Tweets | tweets.csv |

## Imports
Imports are collected here for convenience and clarity.

In [None]:
import builtins
from functools import reduce

import pandas as pd

## Writing Functions
The exercises were very easy, so I did not include them here. The exercises included:
- defining a function
- defining 0, 1, or more parameters
- passing 0, 1, or more arguments
- docstrings
- multiple arguments and return values
- creating and unpacking tuples

In [None]:
# Read the data into a pandas DataFrame from the CSV file and have IPython
# show the first few rows.
tweets_df = pd.read_csv("tweets.csv")
tweets_df.head()

In [None]:
# Show information about the DataFrame.
tweets_df.info()

In [None]:
# Count the distinct values in the "lang" column.
# The language symbols are:
#   "en": English
#   "et": Estonian
#   "und": undetermined?

def count_entries(df, col_name):
    """
    Given a DataFrame and the name of a column, return a dictionary with
    counts of occurrences as value for each key.
    """
    langs_count = {}
    col = df[col_name]
    for entry in col:
        if entry in langs_count.keys():
            langs_count[entry] += 1
        else:
            langs_count[entry] = 1
    return langs_count

lang_counts = count_entries(tweets_df, "lang")
print(lang_counts)

## Scope, Default Arguments, and Variable-Length Arguments

### Scope

There are four types of scope (LEGB) when searching for an object:
- local scope (defined within a function)
- enclosing functions (nonlocal scope)
- global scope (defined in the main body of a script)
- built-in scope (names defined in the builtins module)

This is the LEGB (local, enclosing function, global, built-in) rule.

Assignment creates an object with local scope unless a declaration
is made that the object is global or nonlocal.

In [None]:
# Global and local objects with the same name.
# Here, a local variable is created within the function.
new_val = 10

def square(value):
    """
    Return the square of a number.
    """
    # Create a new_val variable with function scope. This variable
    # is different from the new_val variable outside the function.
    new_val = value ** 2
    return new_val

print(square(3))
print(new_val)

In [None]:
# Here the function uses the global variable.
# I try not to do use global variables.
new_val = 20

def square2(value):
    """
    Return the square of a number.
    """
    # When the function can't find new_val within the function, it
    # searches for and finds the new_val variable outside the function.
    return new_val ** 2

print(new_val)
print(square2(0))

In [None]:
# Use a global variable within the function.
new_val = 10

def square3(value):
    """
    Return the square of a number.
    """
    # Assign to the global variable named new_val.
    # Without the global declaration, a new local variable would
    # be created.
    global new_val
    new_val = new_val ** 2
    return new_val

print(new_val)
print(square3(3))
print(new_val)

In [None]:
# These are the objects with built-in scope.
# They can be referred to, for example, as builtins.open to
# differentiate them from another object named open.
print(dir(builtins))

### Nested Functions (Enclosing Functions)

These are the reasons for creating nested functions:
- To remove repetition from within a function.
- To return a function (object).

Nested functions are also covered in the [_Writing Functions in Python_](../Writing%20Functions%20in%20Python/Writing%20Functions%20in%20Python.ipynb) course.

In [None]:
# This is an example of using an inner function to perform
# a calculation needed by the outer functgion.

def mod2plus5(x1, x2, x3):
    """
    Return the remainder (after dividing by 2) plus 5 of three arguments.
    """
    def inner(x):
        """
        Return the remainder (after dividing by 2) plus 5 of the argument.
        """
        return x % 2 + 5
    
    return inner(x1), inner(x2), inner(x3)

print(mod2plus5(1, 2, 3))

In [None]:
# This is an example of a function that returns a function.
# This uses closures.

def raise_val(n):
    """
    Return the inner function.
    """

    def inner(x):
        """
        Raise x to the power of n.
        """
        raised = x ** n
        return raised

    return inner

square = raise_val(2)  # square is a function object
cube = raise_val(3)  # cube is a function object
print(square(2), cube(4))  # 4, 64

In [None]:
# Using a nonlocal variable. This indicates using a variable that is
# not global but is contained in the scope of the enclosing function.
# How many levels up does this go?
n = 10

def outer():
    """
    Print the value of n.
    """
    n = 1

    def inner():
        nonlocal n
        n = 2
        print(n)

    print(n)
    inner()
    print(n)

outer()
print(n)

In [None]:
# Another trivial example of using a nested function.
def three_shouts(word1, word2, word3):
    """
    Return a tuple of strings
    concatenated with '!!!'.
    """

    def inner(word):
        """
        Returns a string concatenated with '!!!'.
        """
        return word + '!!!'

    # Return a tuple of strings.
    return inner(word1), inner(word2), inner(word3)

print(three_shouts('a', 'b', 'c'))

### Closures

A closure means that the nested or inner function remembers the state of its enclosing scope when called. Thus, anything defined locally in the enclosing scope is available to the inner function even when the outer function has finished execution.

Closures are also covered in the [_Writing Functions in Python_](../Writing%20Functions%20in%20Python/Writing%20Functions%20in%20Python.ipynb) course.

In [None]:
# Example of a closure:
# Note how the value of n = 2 is preserved in the twice function,
# n = 3 is preserved in the thrice function.
def echo(n):
    """
    Return the inner_echo function.
    """

    def inner_echo(word1):
        """
        Concatenate n copies of word1, where
        n is in the scope of the enclosing function.
        """
        echo_word = word1 * n
        return echo_word

    return inner_echo

twice = echo(2)  # twice is a function
thrice = echo(3)  # thrice is a function
print(twice("hello"), thrice("hello"))

In [None]:
# Using the keyword nonlocal and nested functions.
def echo_shout(word):
    """
    Change the value of a nonlocal variable.
    """
    echo_word = word * 2
    print(echo_word)

    def shout():
        """
        Alter a variable in the enclosing scope.
        """
        nonlocal echo_word
        echo_word += "!!!"

    shout()
    print(echo_word)

echo_shout("hello")

### Default Arguments

In [None]:
def power(number, pow=1):
    """
    Raise number to the power of pow.
    """
    return number ** pow

print(power(4, 1))
print(power(4))
print(power(4, 2))

# Named arguments work correctly here.
print(power(number=23))
print(power(number=3, pow=3))
print(power(pow=4, number=7))

### Flexible Arguments

This allows passing any number of positional (non-keyword) arguments (for example, when computing the sum of the arguments). Using `*args` as the function parameter, the arguments are collected into a tuple. Using `**kwargs` as the function parameter, the arguments are collected into a dict.

In [None]:
def add_all(*args):
    """
    Return the sum of args, where args is a tuple.
    """
    print(type(args))
    sum_all = 0
    for num in args:
        sum_all += num
    return sum_all

print(add_all(1))
print(add_all(1, 2, 3))
print(add_all(1, 2, 3, 4, 5, 6))

In [None]:
# Flexible key-word arguments.

def print_all(**kwargs):
    """P
    Print out key-value pairs in **kwargs.
    """
    print(type(kwargs))
    for key, value in kwargs.items():
        print(key + ": " + value)

print_all(name="dumbledore", job="headmaster")

#### Exercises

In [None]:
# A function with one default argument.

def shout_echo(word1, echo=1):
    """
    Concatenate echo copies of word1 and three
    exclamation marks at the end of the string.
    """
    echo_word = word1 * echo
    shout_word = echo_word + "!!!"
    return shout_word

no_echo = shout_echo("Hey")
with_echo = shout_echo("Hey", echo=5)
with_echo = shout_echo("Hey", 5)  # This works as well.
with_echo = shout_echo(echo=5, word1="Hey")  # And this works, too.

print(no_echo)
print(with_echo)

In [None]:
# A function with multiple default arguments.

def shout_echo2(word1, echo=1, intense=False):
    """
    Concatenate echo copies of word1 and three
    exclamation marks at the end of the string.
    """
    echo_word = word1 * echo
    if intense is True:
        echo_word_new = echo_word.upper() + "!!!"
    else:
        echo_word_new = echo_word + "!!!"
    return echo_word_new

with_big_echo = shout_echo2("Hey", echo=5, intense=True)
big_no_echo = shout_echo2("Hey", intense=True)
with_small_echo = shout_echo2("Hey", echo=5)
small_no_echo = shout_echo2("Hey")

print(with_big_echo)
print(big_no_echo)
print(with_small_echo)
print(small_no_echo)

In [None]:
# A function with variable-length positional arguments (*args).

def gibberish(*args):
    """
    Concatenate strings in *args together.
    """
    hodgepodge = ""
    for word in args:
        hodgepodge += word
    return hodgepodge

one_word = gibberish("luke")
many_words = gibberish("luke", "leia", "han", "obi", "darth")

print(one_word)
print(many_words)

In [None]:
# A function with variable-length keyword arguments (**kwargs).

def report_status(**kwargs):
    """
    Print out the status of a movie character.
    """

    print("\nBEGIN: REPORT\n")
    for key, value in kwargs.items():
        print(key + ": " + value)
    print("\nEND REPORT")

report_status(name="luke", affiliation="jedi", status="missing")
report_status(name="anakin", affiliation="sith lord", status="deceased")

In [None]:
# Given a DataFrame, count the items in any column.

def count_entries2(df, col_name="lang"):
    """
    Return a dictionary with counts of
    occurrences as value for each key.
    """
    cols_count = {}
    # Extract column from DataFrame: col
    col = df[col_name]

    # Iterate over the column in DataFrame
    for entry in col:
        if entry in cols_count.keys():
            cols_count[entry] += 1
        else:
            cols_count[entry] = 1
    return cols_count

# Load the DataFrame from the CSV file.
tweets_df = pd.read_csv("tweets.csv")

# Count entries in the default column 'lang'.
result1 = count_entries2(tweets_df)
print(result1)

# Count entries in column 'source'.
print()
result2 = count_entries2(tweets_df, "source")
print(result2)

In [None]:
# Accept one or more column names.

def count_entries3(df, *args):
    """
    Return a dictionary with counts of
    occurrences as value for each key.
    """
    cols_count = {}
    for col_name in args:
        col = df[col_name]
        for entry in col:
            if entry in cols_count.keys():
                cols_count[entry] += 1
            else:
                cols_count[entry] = 1
    return cols_count

# Load the DataFrame from the CSV file.
tweets_df = pd.read_csv("tweets.csv")

# Count entries in column "lang".
result1 = count_entries3(tweets_df, "lang")
print(result1)
print()

# Count entries in columns "lang" and "source".
result2 = count_entries3(tweets_df, "lang", "source")
print(result2)

### Special Parameters / and *

For more information about using `/` and `*` in the function definition, see https://docs.python.org/3/tutorial/controlflow.html#function-examples. For the history of `/`, see https://peps.python.org/pep-0570/; this was introduced in Python 3.7.

```
def f(pos1, pos2, /, pos_or_kwd, *, kwd1, kwd2):
      -----------    ----------     ----------
        |             |                  |
        |        Positional or keyword   |
        |                                - Keyword only
         -- Positional only
```

In my code, especially constructors, I require keyword only arguments, e.g., `construct(*, ...)`.

## Lambda Functions and Error-Handling
### Lambda Functions

Use lambda functions for short and simple functions.

In [None]:
# Convert regular functions to lambda functions:
def raise_to_power1(x, y):
    return x ** y

raise_to_power2 = lambda x, y: x ** y

print(raise_to_power1(2, 3))
print(raise_to_power2(2, 3))

In [None]:
def add_bangs1(a):
    return a + "!!!"

add_bangs2 = lambda a: a + "!!!"

print(add_bangs1("hello"))
print(add_bangs2("hello"))

#### Exercise

In [None]:
def echo_word1(word1, echo):
    return word1 * echo

echo_word2 = lambda word1, echo: word1 * echo

print(echo_word1("hey", 5))
print(echo_word2("hey", 5))

### Anonymous Functions

Anonymous functions can be used with `map`, `filter`, and `reduce`. Anonymous functions and lambdas can be useful for the `map` function, which takes a function and a sequence of which the function is applied to the elements.

In [None]:
# Use map to apply a function to each element in a sequence.
nums = [48, 6, 9, 21, 1]
# square_all is a map object, which is an iterable.
square_all = map(lambda num: num ** 2, nums)
print(square_all)
# Convert the map object to a list and print the list.
print(list(square_all))

#### Exercises

In [None]:
# Example of using map.
# Create a sequence.
spells = ["protego", "accio", "expecto patronum", "legilimens"]
# Use map with an anonymous lambda to add "!!!" to each element of the sequence.
shout_spells = map(lambda item: item + "!!!", spells)
# Convert the map object to a list.
shout_spells_list = list(shout_spells)
# Print the list.
print(shout_spells_list)

In [None]:
# filter() and lambda functions.
# Use filter to filter out elements from a list that don't statisfy the
# given criteria.
fellowship = [
    "frodo",
    "samwise",
    "merry",
    "aragorn",
    "legolas",
    "boromir",
    "gimli",
    "gandalf",
    "pippin",
]
result = filter(lambda member: len(member) > 6, fellowship)
result_list = list(result)
print(result_list)

In [None]:
# reduce() and lambda functions.
# Perform a computation on a list and return a single value as a result.
# reduce must be imported from the functools module.
stark = ["robb", "sansa", "arya", "eddard", "jon"]
result = reduce(lambda item1, item2: item1 + item2, stark)
print(result)

### Error Handling

In [None]:
# An example of handling an exception.
# In this example, the exception is handled inside the function.
# This is not usually what you want to do. sqrt("hello") returns None
# rather than raising an exception.
def sqrt(x):
    """Return the square root of a number."""
    try:
        return x ** 0.5
    except TypeError:
        print("x must be an int or float")


print(sqrt(4))
print(sqrt(10))
print(sqrt("hello"))
print(sqrt(-9))

In [None]:
# Raise an error when calculating the square root of a negative number.
def sqrt2(x):
    """Return the square root of a number."""
    if x < 0:
        raise ValueError("x must be non-negative")
    try:
        return x ** 0.5
    except TypeError:
        print("x must be an int or float")


# To get the script to run, use a try/except here.
try:
    print(sqrt2(-9))
except Exception as ex:
    print("Caught exception:", ex)

#### Exercises

In [None]:
# Handle an exception using try ... except:
def shout_echo3(word1, echo=1):
    """Concatenate echo copies of word1 and three
    exclamation marks at the end of the string."""

    echo_word = ""
    shout_words = ""
    try:
        echo_word = word1 * echo
        shout_words = echo_word + "!!!"
    except:
        print("word1 must be a string and echo must be an integer.")
    return shout_words

print(shout_echo3("particle", echo=3))
print(shout_echo3("particle", echo="accelerator"))

In [None]:
# Raise a ValueError exception when echo < 0.
def shout_echo4(word1, echo=1):
    """
    Concatenate echo copies of word1 and three
    exclamation marks at the end of the string.
    """

    # Raise an error with raise.
    if echo < 0:
        raise ValueError("echo must be greater than 0")
    echo_word = word1 * echo
    shout_word = echo_word + "!!!"
    return shout_word

print(shout_echo4("particle", echo=5))
try:
    print(shout_echo4("particle", echo=-1))
except Exception as ex:
    print("An error occurred:", ex)

In [None]:
# Modify the DataFrame analyzer from earlier in the course to add
# error messages and exceptions using try...except and raise.

# The first exercise: Filter the tweets for retweets. For retweets,
# the first two characters of the text are "RT".
tweets_df = pd.read_csv("tweets.csv")
result = filter(lambda x: x[0:2] == "RT", tweets_df["text"])
res_list = list(result)
for tweet in res_list:
    print('"' + tweet + '"')

In [None]:
# Print an error when the requested column does not exist.
def count_entries4(df, col_name="lang"):
    """
    Return a dictionary with counts of occurrences as value for each key.
    """
    cols_count = {}
    try:
        col = df[col_name]
        for entry in col:
            if entry in cols_count.keys():
                cols_count[entry] += 1
            else:
                cols_count[entry] = 1
        return cols_count
    except:
        print("The DataFrame does not have a " + col_name + " column.")

tweets_df = pd.read_csv("tweets.csv")
result1 = count_entries4(tweets_df, "lang")
print(result1)

# This should fail.
result2 = count_entries4(tweets_df, "lang1")
print(result2)

#try:
#    result2 = count_entries4(tweets_df, "lang1")
#except Exception as ex:
#    print("Caught exception:", ex)

In [None]:
# Raise an exception when the requested column does not exist.
def count_entries5(df, col_name="lang"):
    """
    Return a dictionary with counts of occurrences as value for each key.
    """
    if col_name not in df.columns:
        raise ValueError("The DataFrame does not have a " + col_name + " column.")
    cols_count = {}
    col = df[col_name]
    for entry in col:
        if entry in cols_count.keys():
            cols_count[entry] += 1
        else:
            cols_count[entry] = 1
    return cols_count

tweets_df = pd.read_csv("tweets.csv")
result1 = count_entries5(tweets_df, "lang")
print(result1)

# This should fail with an exception.
try:
    result2 = count_entries5(tweets_df, "lang1")
except ValueError as ex:
    print("Caught exception:", ex)

## Summary of Learnings

We have learned how to:
- Write functions that accept single and multiple arguments.
- Write functions that return one or many values.
- Use default, flexible, and keyword arguments.
- Make use of global, local, and nonlocal scope in functions
- Write lambda functions
- Handle errors using exceptions (try...except and raise).

We haven't learned yet how to:
- Create lists with list comprehensions.
- Using iterators (which we've seen before)
- Apply our new skills to a case study.
