In [1]:
import numpy as np

In this course, the first of the [Python Data Science Toolbox courses](https://github.com/Nhan121/Lectures_notes-teaching-in-VN-/tree/master/Python%20Data%20Science%20Toolbox), you'll learn to write your very own functions and you'll have the opportunity to apply these newfound skills to questions that commonly arise in Data Science contexts.
## 1. User-defined functions
### 1.1. Theory
#### Built-in functions
Let's check out `Python's` built-in function `str`, which accepts an object such as a number and returns a string object. 

You can assign a call to `str` to a `variable` to store its return value.

In [2]:
x = str(5)
print(x)

5


While built-in Python functions are cool, as a Data Scientist, you'll need functions that have functionality specific to your needs.

In [3]:
print(type(x))

<class 'str'>


Fortunately, you can define your own functions in `Python`!

#### Defining a function
We'll now see how to define functions via an example, a function that squares a number. 
>- The function name `square` will be perfect for this. 
>- To **define** the function, We **begin** with the **`keyword def`**, followed by the function name `square`; this is then followed by a set of parentheses and a colon. 
>- This piece of code is called a function header. 
>- To complete the function definition, let's write the **function body** by squaring a value, say 4, and printing the output.

In [4]:
def square():          # function header
    new_value = 4**2   # function body
    print(new_value)

Right now, our square function does not have any parameters within the `parentheses`: `()`. We will add them later. Now, whenever this function is called, the code in the function body is run. 

In [5]:
square()

16


In this case, new_value is assigned the value of 4 ** 2 and then printed out. You can call the function as you do with pre-built functions: square. This should yield the value, 16.

#### Function parameters
What if you wanted to square any other number besides 4, though? To add that functionality, you add a parameter to the function definition in between the parentheses. 

- Here you see that we've added a parameter value and in the new function body, the variable new_value takes the square of value, which is then printed out.
- We can now square any number that we pass to the function square as an argument. A quick word on parameters and arguments: when you define a function, you write parameters in the function header. 
- When you call a function, you pass arguments into the function.

In [6]:
def square(value):
    new_value = value ** 2
    print(new_value)
    
square(4)

16


#### Return values from functions
The function square now accepts a single parameter and prints out its squared value. 
- But what if we don't want to print that value directly and instead we want to return the squared value and assign it to some variable? 
- You can have your function return the new value by adding the return keyword, followed by the value to return. 

Now we can assign to a variable num the result of the function call as you see here.

In [7]:
def square(value):
    new_value = value ** 2
    return new_value
    
square(4)

16

#### Docstrings
There's another essential aspect of writing functions in Python: `docstrings`. 

**`Docstrings`** are used to **describe what your function does**, such as the computations it performs or its return values. 

- These descriptions serve as documentation for your function so that anyone who reads your function's docstring understands what your function does, without having to trace through all the code in the function definition. 
- `Function docstrings` are placed in the immediate line after the function header and are placed in between triple quotation marks. 
- An appropriate `Docstring` for our function square is `'Returns the square of a value'`.

In [8]:
def square(value):
    """ Return a square of a value """
    new_value = value ** 2
    return new_value
    
square(4)

16

In [9]:
from IPython.display import Image
## Hình minh họa Image(fig1, height, width)
#Image(r'../input/thinkstatsfigure/Part1_chap4_fig1.jpg', height = 100, width = 483)

### 1.2. PRACTICES
#### Exercise 1.2.1. Strings in Python
In the video, you learned of another standard Python datatype, strings. Recall that these represent textual data. To assign the string `'DataCamp and Kaggle'` to a variable company, you execute:

In [10]:
company = 'DataCamp and Kaggle'

You've also learned to use the operations `+` and `*` with strings. Unlike with numeric types such as ints and floats, 
- the `+` operator **concatenates strings together**, while 
- the `*` **concatenates multiple copies** of a string together. 

In this exercise, you will use the + and * operations on strings to answer the question below. Execute the following code in the shell:

In [11]:
object1 = "data" + "analysis" + "visualization"
object2 = 1 * 3
object3 = "1" * 3

What are the values in object1, object2, and object3, respectively?
>- A. `object1` contains `"data + analysis + visualization"`, `object2` contains `"1*3"`, `object3` contains `13`.
>- B. `object1` contains `"data+analysis+visualization"`, `object2` contains `3`, `object3` contains `"13"`.
>- C. `object1` contains `"dataanalysisvisualization"`, `object2` contains `3`, `object3` contains `"111"`.

#### Answers and explain.
- The `+` operator concatenates `string` together so we will expect that `object1 = ` `"dataanalysisvisualization"`.
- The `*` concatenates multiple copies of string so we expect that `object3 = "111"`.
- Finally, `*` is a multiplication for the numerics type, hence `object2 = 3`.

So the final option be **`C`**

#### Exercise 1.2.2. Recapping built-in functions
Examine the functions `str` and `print` return values. A `float` variable `x` has been preloaded for this exercise. Run the code below in the console. Pay close attention to the results to answer the question that follows.

- Assign `str(x)` to a variable `y1: y1 = str(x)`
- Assign `print(x)` to a variable `y2: y2 = print(x)`
- Check the types of the variables `x`, `y1`, and `y2`.

What are the types of `x`, `y1`, and `y2`?

#### Answer & explaination.
- First, we have: `type(x) : float`
- Then, `y1 = str(x)` so `type(y1)` is a `str`
- Finally, it is important to remember that assigning a variable `y2` to a function that prints a value but does not return a value will result in that variable `y2` being of type `NoneType`.

#### Exercise 1.2.3. Write a simple function
You will now write your own function!

Define a function, `shout()`, which simply prints out a string with three exclamation marks `'!!!'` at the end. The code for the `square()` function that we wrote earlier is found below. You can use it as a pattern to define shout().

            def square():
                new_value = 4 ** 2
                return new_value
Note that the function body is indented 4 spaces already for you. Function bodies need to be indented by a consistent number of spaces and the choice of 4 is common.
#### SOLUTION.

In [12]:
# Define the function shout
def shout():
    """Print a string with three exclamation marks"""
    # Concatenate the strings: shout_word
    shout_word = 'congratulations' + '!!!'

    # Print shout_word
    print(shout_word)

# Call shout
shout()

congratulations!!!


#### Exercise 1.2.4. Single-parameter functions
Congratulations! You have successfully defined and called your own function! That's pretty cool.

In the previous exercise, you defined and called the function `shout()`, which printed out a string concatenated with `'!!!'`. You will now update `shout()` by adding a parameter so that it can accept and process any string argument passed to it. 

Also note that `shout(word)`, the part of the header that specifies the function name and parameter(s), is known as the *  signature of the function. You may encounter this term in the wild!

In [13]:
# Define shout with the parameter, word
def shout(word):
    """Print a string with three exclamation marks"""
    # Concatenate the strings: shout_word
    shout_word = word + '!!!'

    # Print shout_word
    print(shout_word)

# Call shout with the string 'congratulations'
shout('congratulations')

congratulations!!!


#### Exercise 1.2.5. Functions that return single values
You're getting very good at this! Try your hand at another modification to the `shout()` function so that it now returns a single value instead of printing within the function. 

Recall that the return keyword lets you return values from functions. Parts of the function `shout()`, which you wrote earlier, are shown. 

Returning values is generally more desirable than printing them out because, as you saw earlier, a `print()` call assigned to a variable has type `NoneType`.

#### SOLUTION.

In [14]:
# Define shout with the parameter, word
def shout(word):
    """Return a string with three exclamation marks"""
    # Concatenate the strings: shout_word
    shout_word = word + '!!!'

    # Replace print with return
    return shout_word

# Pass 'congratulations' to shout: yell
yell = shout('congratulations')

# Print yell
yell

'congratulations!!!'

## 2. Multiple parameters and return values
### 2.1. Theory.
#### Multiple function parameters
Let's tweak the square function we've been working on a little bit more. Suppose that, instead of simply squaring a value, we'd like to raise a value to the power of another value that's also passed to the function. We can do this by having our function accept two parameters instead of just one. 

You should also change your function name AND docstrings to reflect this new behavior. 
>- `raise_to_power` is an appropriate function name. Notice that there are now two parameters in the function header instead of one, value1 and value2. 
>- In the lines after that, the behavior of the overall function was also changed by raising value1 to the power of value2.
>- You can call the function by passing in two arguments because the function has two parameters, as declared in the function header. 
>- The order in which the arguments are passed correspond to the order of the parameters in the function header. 

In [15]:
def raise_to_power(value1, value2):
    """Raise value1 to power of value2"""
    new_value = value1 ** value2
    return new_value

raise_to_power(2, 3)

8

This means that when we call `raise_to_power(2, 3)`, when the function is executed, 2 would be assigned to value1 and 3 to value2. 

Looking at the function body, this means that the computation value1 to the power of value2 translates to 2 to the power of 3. This function call then returns the value 8.
#### A quick jump into tuples
You can also make your function return multiple values. You can do that by constructing objects known as `tuples` in your functions. 

- A tuple is like a list, in that it can contain multiple values. 
- There are some differences, however. 
>- Firstly, unlike a list, a tuple is immutable, that is, you cannot modify the values in a tuple once it has been constructed. 
>- Secondly, while lists are defined using square brackets, tuples are constructed using a set of parentheses.

In [16]:
odd_nums = (1, 3, 5)
print(type(odd_nums))

<class 'tuple'>


####  Unpacking tuples
Here we construct a tuple containing 3 elements. You can also unpack a tuple into several variables in one line. Doing so means that you assign to the variables `a`, `b`, and `c` the tuple values, in the order that they appear in the tuple.

In [17]:
a, b, c = odd_nums
print(a)
print(b)
print(c)

1
3
5


#### Accessing tuple elements
Additionally, you can also access individual tuple elements like you do with `lists`. Doing this here accesses the `second element` of the `tuple`. Why is that? 

Recall that with `lists`, you can use `zero-indexing` to access list elements. You can do the same thing with `tuples`!

In [18]:
second_num_tup = odd_nums[1]
print(odd_nums[1])
print(second_num_tup)

3
3


#### Returning multiple values
Let's now modify the behavior of your raise function. Instead of returning just the value of value1 raised to the power of value2, let's also return the value of `value2` raised to the power of `value1`. 

You thus need to make raise return two values instead of one. We can use what we now know of tuples to do this! 
>- We first change the name of our function and the docstring to reflect the new behavior of our function. 
>- We then, in the function body, construct a tuple consisting of the values we want the function to return and, also in the function body, we return the tuple! 

In [19]:
def raise_both(value1, value2):
    """Raise value1 to power of value2 and vice versa"""
    
    new_value1 = value1**value2
    new_value2 = value2**value1
    
    new_tupple = (new_value1, new_value2)
    
    return new_tupple

Calling the function constructed demonstrates that it does exactly what we want!

In [20]:
raise_both(2, 3)

(8, 9)

### 2.2. PRACTICES
#### Exercise 2.2.1. Functions with multiple parameters
Hugo discussed the use of multiple parameters in defining functions in the last lecture. You are now going to use what you've learned to modify the `shout()` function further. Here, you will modify `shout()` to accept two arguments. Parts of the function `shout()`, which you wrote earlier, are shown.

In [21]:
# Define shout with parameters word1 and word2
def shout(word1, word2):
    """Concatenate strings with three exclamation marks"""
    # Concatenate word1 with '!!!': shout1
    shout1 = word1 + '!!!'
    
    # Concatenate word2 with '!!!': shout2
    shout2 = word2 + '!!!'
    
    # Concatenate shout1 with shout2: new_shout
    new_shout = shout1 + shout2

    # Return new_shout
    return new_shout

# Pass 'congratulations' and 'you' to shout(): yell
yell = shout('congratulations', 'you')

# Print yell
print(yell)

congratulations!!!you!!!


#### Exercise 2.2.2. A brief introduction to tuples
Alongside learning about functions, you've also learned about tuples! Here, you will practice what you've learned about tuples: how to construct, unpack, and access tuple elements. Recall how Hugo unpacked the tuple even_nums in the video:

            a, b, c = even_nums

A three-element tuple named nums has been preloaded for this exercise. Before completing the script, perform the following:
- Print out the value of nums in the IPython shell. Note the elements in the tuple.
- In the `IPython shell`, try to change the first element of nums to the value 2 by doing an assignment: `nums[0] = 2`.

#### SOLUTION

In [22]:
nums = (3, 4, 6)

# Unpack nums into num1, num2, and num3
num1, num2, num3 = nums

# Construct even_nums
even_nums = (2, num2, num3)

#### Exercise 2.2.3. Functions that return multiple values
In the previous exercise, you constructed `tuples`, assigned `tuples` to `variables`, and `unpacked tuples`. 

Here you will return multiple values from a function using tuples. Let's now update our `shout()` function to return multiple values. Instead of returning just one string, we will return two strings with the string !!! concatenated to each.

Note that the return statement return x, y has the same result as return `(x, y)`: the former actually packs `x` and `y` into a tuple under the hood!

#### SOLUTION.

In [23]:
# Define shout_all with parameters word1 and word2
def shout_all(word1, word2):
    
    # Concatenate word1 with '!!!': shout1
    shout1 = word1 + '!!!'
    
    # Concatenate word2 with '!!!': shout2
    shout2 = word2 + '!!!'
    
    # Construct a tuple with shout1 and shout2: shout_words
    shout_words = (shout1, shout2)

    # Return shout_words
    return shout_words

# Pass 'congratulations' and 'you' to shout_all(): yell1, yell2
yell1, yell2 = shout_all('congratulations', 'you')

# Print yell1 and yell2
print(yell1)
print(yell2)

congratulations!!!
you!!!


## 3. Bringing it all together
### 3.1. Reminder
#### Basic ingredients of a function
- We have a **function header** which begins with the keyword `def`. This is followed by the function name, parameters in parentheses and a colon. 
- We then have the **function body**, which contains docstrings enclosed in triple quotation marks; docstrings describe what the function does; the rest of the function body performs the computation that the function does; the function body closes with the keyword return, followed by the value or values returned by the function.

### 3.2. PRACTICES

#### Exercise 3.2.1. Bringing it all together (1)
You've got your first taste of writing your own functions in the previous exercises. You've learned how to add parameters to your own function definitions, return a value or multiple values with tuples, and how to call the functions you've defined.

In this and the following exercise, you will bring together all these concepts and apply them to a simple data science problem. You will load a dataset and develop functionalities to extract simple insights from the data.

For this exercise, your goal is to recall how to load a dataset into a DataFrame. The dataset contains Twitter data and you will iterate over entries in a column to build a dictionary in which the keys are the names of languages and the values are the number of tweets in the given language. The file `tweets.csv` is available in your current directory.

In [24]:
# Import pandas
import pandas as pd

# Import Twitter data as DataFrame: df
df = pd.read_csv('tweets.csv')

# Initialize an empty dictionary: langs_count
langs_count = {}

# Extract column from DataFrame: col
col = df['lang']

# Iterate over lang column in DataFrame
for entry in col:

    # If the language is in langs_count, add 1 
    if entry in langs_count.keys():
        langs_count[entry] += 1  
    # Else add the language to langs_count, set the value to 1
    else:
        langs_count[entry] = 1

# Print the populated dictionary
print(langs_count)

{'eng': 96, 'und': 3, 'et': 1}


#### Exercise 3.2.2. Bringing it all together (2)
Great job! You've now defined the functionality for iterating over entries in a column and building a dictionary with keys the names of languages and values the number of tweets in the given language.

In this exercise, you will define a function with the functionality you developed in the previous exercise, return the resulting dictionary from within the function, and call the function with the appropriate arguments.

For your convenience, the `pandas package` has been `imported as pd` and the `'tweets.csv'` file has been imported into the `tweets_df` variable.


In [25]:
# Define count_entries()
def count_entries(df, col_name):
    """Return a dictionary with counts of 
    occurrences as value for each key."""

    # Initialize an empty dictionary: langs_count
    langs_count = {}
    
    # Extract column from DataFrame: col
    col = df[col_name]
    
    # Iterate over lang column in DataFrame
    for entry in col:

        # If the language is in langs_count, add 1
        if entry in langs_count.keys():
            langs_count[entry] += 1
        # Else add the language to langs_count, set the value to 1
        else:
            langs_count[entry] = 1

    # Return the langs_count dictionary
    return langs_count

# Call count_entries(): result
result = count_entries(tweets_df, 'lang')

# Print the result
print(result)

{'eng': 96, 'und': 3, 'et': 1}
