# Python Data Science Toolbox Part I
Note Taker: [Paris Zhang](https://www.linkedin.com/in/parisyunyuezhang/) on Jul 28, 2020  
Instructor: Hugo Bowne-Anderson  
Course [Link](https://campus.datacamp.com/courses/python-data-science-toolbox-part-1/)

**Course overview**:
1. Chapter 1 - Writing functions
  + tuples and returning multiple values
2. Chapter 2 - Scope and nested functions
  + Keyword scopes (local, enclosing, global, build-in)
  + Nested functions
  + Flexible arguments
  + Twitter example and generalization
3. Lambda functions and error-handling
  + `Lambda` functions and `map()`, `filter()`, and `reduce()`
  + Error handling

## Chapter 1 - Writing functions
### 1a. tuples and returning multiple values
`(x,y,z)` returns a `tuple` variable that can't be altered upon, but we may extract each element of the tuple by `a,b,c = (x,y,z)`.

In [4]:
import pandas as pd
tweets_df = pd.read_csv('tweets.csv')

def count_entries(df, col_name):
    """Return a dictionary with counts of 
    occurrences as value for each key."""

    # Initialize an empty dictionary: langs_count
    langs_count = {}
    
    col = df[col_name]
    
    for entry in col:

        # If the language is in langs_count, add 1
        if entry in langs_count.keys():
            langs_count[entry] += 1
        # Else add the language to langs_count, set the value to 1
        else:
            langs_count[entry] = 1

    return langs_count

result = count_entries(tweets_df, 'lang')

print(result)

{'en': 97, 'et': 1, 'und': 2}


## Chapter 2 - Scope and nested functions
### 2a. Keyword global

* **Scope searching criteria**: Python searches in the following order - LEGB
  + L - Local scope
  + E - Enclosing functions
  + G - Global
  + B - Built-in
* `global` - change value in the global scope
* `nonlocal` - change variable defined in the enclosing function

In [2]:
team = "teen titans"

def change_team():
    """Change the value of the global variable team."""

    # Use team in global scope
    global team
    # Change the value of team in global: team
    team = 'justice league'

print(team)

change_team()
print(team)

teen titans
justice league


### 2b. Nested functions

In [1]:
def echo(n):
    """Return the inner_echo function."""
    
    def inner_echo(word1):
        """Concatenate n copies of word1."""
        echo_word = word1 * n
        return echo_word

    return inner_echo

twice = echo(2)
thrice = echo(3)
print(twice('hello'), thrice('hello'))

hellohello hellohellohello


In [5]:
def echo_shout(word):
    """Change the value of a nonlocal variable"""
    
    # Concatenate word with itself: echo_word
    echo_word = word + word
    print(echo_word)
    
    def shout():
        """Alter a variable in the enclosing scope"""    
        # Use echo_word in nonlocal scope
        nonlocal echo_word 
        # Change echo_word to echo_word concatenated with '!!!'
        echo_word = echo_word + '!!!'
    
    shout()
    
    print(echo_word)

echo_shout('hello')

hellohello
hellohello!!!


### 2c. Flexible arguments
* `*args` - variable length arguments
* `**kwargs` - variable-length keyword arguments

In [7]:
def gibberish(*args):
    """Concatenate strings in *args together."""

    # Initialize an empty string: hodgepodge
    hodgepodge = ""

    # Concatenate the strings in args
    for word in args:
        hodgepodge += word

    return hodgepodge

one_word = gibberish("luke")
many_words = gibberish("luke", "leia", "han", "obi", "darth")

print(one_word)
print(many_words)

luke
lukeleiahanobidarth


In [8]:
def report_status(**kwargs):
    """Print out the status of a movie character."""

    print("\nBEGIN: REPORT\n")

    # Iterate over the key-value pairs of kwargs
    for key, value in kwargs.items():
        print(key + ": " + value)

    print("\nEND REPORT")

report_status(name="luke", affiliation="jedi", status="missing")
report_status(name="anakin", affiliation="sith lord", status="deceased")


BEGIN: REPORT

name: luke
affiliation: jedi
status: missing

END REPORT

BEGIN: REPORT

name: anakin
affiliation: sith lord
status: deceased

END REPORT


### 2d. Twitter example and generalization

In [9]:
def count_entries(df, *args):
    """Return a dictionary with counts of
    occurrences as value for each key."""
    
    cols_count = {}
    
    for col_name in args:
    
        col = df[col_name]
    
        for entry in col:
    
            if entry in cols_count.keys():
                cols_count[entry] += 1
            else:
                cols_count[entry] = 1

    return cols_count

result1 = count_entries(tweets_df, 'lang')
result2 = count_entries(tweets_df, 'lang', 'source')

print(result1)
print(result2)

{'en': 97, 'et': 1, 'und': 2}
{'en': 97, 'et': 1, 'und': 2, '<a href="http://twitter.com" rel="nofollow">Twitter Web Client</a>': 24, '<a href="http://www.facebook.com/twitter" rel="nofollow">Facebook</a>': 1, '<a href="http://twitter.com/download/android" rel="nofollow">Twitter for Android</a>': 26, '<a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>': 33, '<a href="http://www.twitter.com" rel="nofollow">Twitter for BlackBerry</a>': 2, '<a href="http://www.google.com/" rel="nofollow">Google</a>': 2, '<a href="http://twitter.com/#!/download/ipad" rel="nofollow">Twitter for iPad</a>': 6, '<a href="http://linkis.com" rel="nofollow">Linkis.com</a>': 2, '<a href="http://rutracker.org/forum/viewforum.php?f=93" rel="nofollow">newzlasz</a>': 2, '<a href="http://ifttt.com" rel="nofollow">IFTTT</a>': 1, '<a href="http://www.myplume.com/" rel="nofollow">Plume\xa0for\xa0Android</a>': 1}


## Chapter 3 - Lambda functions and error-handling
### 3a. Lambda functions
1. `map()` and `lambda` function
2. `filter()` offers a way to filter out elements from a list that don't satisfy certain criteria
3. `reduce()` is useful for performing some computation on a list and, unlike `map()` and `filter()`, returns a single value as a result

In [10]:
spells = ["protego", "accio", "expecto patronum", "legilimens"]
# Use map() to apply a lambda function over spells: shout_spells
shout_spells = map(lambda item: item + "!!!", spells)
# Convert shout_spells to a list: shout_spells_list
shout_spells_list = list(shout_spells)

print(shout_spells_list)

['protego!!!', 'accio!!!', 'expecto patronum!!!', 'legilimens!!!']


In [11]:
fellowship = ['frodo', 'samwise', 'merry', 'pippin', 'aragorn', 'boromir', 'legolas', 'gimli', 'gandalf']

# Use filter() to apply a lambda function over fellowship: result
result = filter(lambda member: len(member) > 6, fellowship)
result_list = list(result)
print(result_list)

['samwise', 'aragorn', 'boromir', 'legolas', 'gandalf']


In [12]:
# Import reduce from functools
from functools import reduce

stark = ['robb', 'sansa', 'arya', 'brandon', 'rickon']
# Use reduce() to apply a lambda function over stark: result
result = reduce(lambda item1, item2: item1 + item2, stark)
print(result)

robbsansaaryabrandonrickon


### 3b. Error handling
1. try-except
2. raising an error

In [16]:
def shout_echo(word1, echo=1):
    """Concatenate echo copies of word1 and three
    exclamation marks at the end of the string."""

    echo_word = ""
    shout_words = ""

    # Add exception handling with try-except
    try:
        echo_word = word1 * echo

        shout_words = echo_word + "!!!"
    except TypeError:
        print("word1 must be a string and echo must be an integer.")

    return shout_words

shout_echo("particle", echo="accelerator")

word1 must be a string and echo must be an integer.


''

In [19]:
def shout_echo(word1, echo=1):
    """Concatenate echo copies of word1 and three
    exclamation marks at the end of the string."""

    # Raise an error with raise
    if echo < 0:
        raise ValueError('echo must be greater than or equal to 0')

    echo_word = word1 * echo
    shout_word = echo_word + '!!!'

    return shout_word

shout_echo("particle", echo=5)

'particleparticleparticleparticleparticle!!!'

In [20]:
def count_entries(df, col_name='lang'):
    """Return a dictionary with counts of
    occurrences as value for each key."""

    cols_count = {}

    try:
        col = df[col_name]
        
        for entry in col:
    
            if entry in cols_count.keys():
                cols_count[entry] += 1
            else:
                cols_count[entry] = 1
    
        return cols_count

    except:
        print('The DataFrame does not have a ' + col_name + ' column.')

result1 = count_entries(tweets_df, 'lang')
print(result1)

{'en': 97, 'et': 1, 'und': 2}


In [21]:
def count_entries(df, col_name='lang'):
    """Return a dictionary with counts of
    occurrences as value for each key."""
    
    if col_name not in df.columns:
        raise ValueError('The DataFrame does not have a ' + col_name + ' column.')

    cols_count = {}
    col = df[col_name]
    
    for entry in col:

        if entry in cols_count.keys():
            cols_count[entry] += 1
        else:
            cols_count[entry] = 1
        
    return cols_count

result1 = count_entries(tweets_df, 'lang')
print(result1)

{'en': 97, 'et': 1, 'und': 2}
