# Warmup 🔥

**Task:** Take the lyrics we collected and write a loop so that all lines of text are in lowercase and consist only of alphanumerical characters (without punctuation).

In [4]:
lyrics = ['all along the western front people line up to receive',
          "They told him don't you ever come around here",
          "do what you feel now electric feel now do what you feel now electric feel",
          "Don't want to see your face, you better disappear",
          "Beat it, beat it, beat it!"]

In [None]:
lyrics_second = [....]

In [8]:
import re

In [9]:
lyrics_clean = []
for line in lyrics: 
    line_lower = line.lower()
    line_clean = re.sub("[^\w\s]", "", line_lower)
    lyrics_clean.append(line_clean) 

In [10]:
lyrics_clean

['all along the western front people line up to receive',
 'they told him dont you ever come around here',
 'do what you feel now electric feel now do what you feel now electric feel',
 'dont want to see your face you better disappear',
 'beat it beat it beat it']

In [None]:
# list comprehensions

In [12]:
lyrics_clean2 = [re.sub("[^\w\s]", "", line).lower() for line in lyrics]

In [13]:
lyrics_clean2

['all along the western front people line up to receive',
 'they told him dont you ever come around here',
 'do what you feel now electric feel now do what you feel now electric feel',
 'dont want to see your face you better disappear',
 'beat it beat it beat it']

# Python Functions

0. Why should we use them?
1. How to write them
2. How to make them interact
3. Parameters
4. return-values and scope
5. Checklist

## 0. Why should we use them? 🧐
- improve readability
- reusability and abstraction: DRY
- divide complex problems into smaller ones
- testability and easier bugfixes

## 1. How to write functions 📝

- A function should have a single purpose
- You start with the def keyword
- This is followed by the function_name in lower case with underscores
- Followed by parentheses `()` and a colon `:`
- After the `:` you need to indent the code of the function
- The indented part, the function definition, should always start with a docstring describing what the function does
- Parameters of a function are defined in the `()`
- One or more return statements (but not mandatory)

In [None]:
# Skeleton function
def clean_text(text):
    """This function transforms the input data into lowercase and strips it of punctuation.
    
    Parameters:
    -------------
    text: list of strings
    
    Returns:
    ------------
    A list with the transformed/cleaned texts.
    """
    # instructions
    ... # pass -> nothing happens
    return cleaned_text   #optional

In [24]:
def clean_text(text):
    """This function transforms the input data into lowercase and strips it of punctuation.
    
    Parameters:
    -------------
    text: list of strings
    
    Returns:
    ------------
    A list with the transformed/cleaned texts.
    """
    text_clean = []
    for line in text: 
        line_lower = line.lower()
        #line_clean = re.sub("[^\w\s]", "", line_lower)
        line_clean = only_alphanumeric(line_lower)
        text_clean.append(line_clean) 
    
    return text_clean

In [25]:
cleaned = clean_text(lyrics)

In [26]:
cleaned

['all along the western front people line up to receive',
 'they told him dont you ever come around here',
 'do what you feel now electric feel now do what you feel now electric feel',
 'dont want to see your face you better disappear',
 'beat it beat it beat it']

In [21]:
cleaned2 = clean_text(['What a nice Afternoon!!#?'])

In [22]:
cleaned2

['what a nice afternoon']

In [None]:
clean_text()

In [18]:
help(clean_text)

Help on function clean_text in module __main__:

clean_text(text)
    This function transforms the input data into lowercase and strips it of punctuation.
    
    Parameters:
    -------------
    text: list of strings
    
    Returns:
    ------------
    A list with the transformed/cleaned texts.



## 2. How to make functions interact with each other 🔄

In [23]:
def only_alphanumeric(text):
    """ """
    
    line_clean = re.sub("[^\w\s]", "", text)
    return line_clean

In [27]:
# Recursive functions
def recursive(text):
    """ """
    new_text = text[:-1]
    if len(text) > 0: 
        print(new_text)
        recursive(new_text)

In [28]:
recursive('Hello there')

Hello ther
Hello the
Hello th
Hello t
Hello 
Hello
Hell
Hel
He
H



In [29]:
recursive()

TypeError: recursive() missing 1 required positional argument: 'text'

## 3. How to use different Parameters 🖐

- Different kinds of parameters:


    - Required parameters
    - Optional parameters 

    - Positional parameters 
    - Keyword parameters
    
    - `*args`: A tuple of variable length
    - `**kwargs`: A dictionary of variable length

### 3.1. REQUIRED Parameters

In [36]:
def create_recipe(title, description):
    """ """
    
    print(f"Recipe for: {title.upper()}")
    print(description)

In [38]:
soup_desc = 'A wonderful dish to eat when you are hungry from all the scarping.'

In [42]:
create_recipe('Beautiful Soup', soup_desc)   # order is important
# positional argument & required

Recipe for: BEAUTIFUL SOUP
A wonderful dish to eat when you are hungry from all the scarping.


In [43]:
create_recipe(description=soup_desc, title='Beautiful Soup')
# keyword arguments & required

Recipe for: BEAUTIFUL SOUP
A wonderful dish to eat when you are hungry from all the scarping.


In [46]:
create_recipe('Beautiful Soup', description=soup_desc)  # all positional arguments come first

Recipe for: BEAUTIFUL SOUP
A wonderful dish to eat when you are hungry from all the scarping.


### 3.2. OPTIONAL Parameters

In [47]:
def create_recipe(title, description, no_people=4):
    """ """
    
    print(f"Recipe for: {title.upper()}")
    print(description)
    print(f"This dish serves {no_people} hungry programmers.")

In [49]:
create_recipe('Beautiful Soup', soup_desc, 8)

Recipe for: BEAUTIFUL SOUP
A wonderful dish to eat when you are hungry from all the scarping.
This dish serves 8 hungry programmers.


Don't put mutable arguments as defaults.   
Also be aware if mutable variables are changed inside the function, this will have an effect on the variable outside the function.

In [None]:
# mutable datatypes: 
- list, pd.DataFrame

### 3.3. ARGS and KWARGS

In [50]:
# (un-) packing operator
items = [1, 2, 3]

In [51]:
print(items)

[1, 2, 3]


In [52]:
print(*items)

1 2 3


In [54]:
# same as:
print(items[0], items[1], items[2])

1 2 3


In [70]:
def create_recipe(title, description, *args, no_people=4,):
    """ """
    
    print(f"Recipe for: {title.upper()}")
    print(description)
    print(f"This dish serves {no_people} hungry programmers.")
    print(f"And you will need these ingredients: {args}")

In [71]:
create_recipe('Beautiful Soup', soup_desc, "onion", "broth")

Recipe for: BEAUTIFUL SOUP
A wonderful dish to eat when you are hungry from all the scarping.
This dish serves 4 hungry programmers.
And you will need these ingredients: ('onion', 'broth')


In [72]:
def create_recipe(title, description, no_people=4, **kwargs,):
    """ """
    
    print(f"Recipe for: {title.upper()}")
    print(description)
    print(f"This dish serves {no_people} hungry programmers.")
    print(f"And you will need these ingredients: {kwargs}")

In [73]:
create_recipe('Beautiful Soup', soup_desc, veggie="onion", base="broth")

Recipe for: BEAUTIFUL SOUP
A wonderful dish to eat when you are hungry from all the scarping.
This dish serves 4 hungry programmers.
And you will need these ingredients: {'veggie': 'onion', 'base': 'broth'}


## 4. Return values and what is scope 🌎

- A function in Python always returns something, but if you do not tell it what to return it will return `None`
- Usually a function will have (at least one) `return`-statement

In [88]:
var = "I live outside the function   -> global"

In [91]:
def add_up(range_number, variab):
    """"""
    
    result = 0  # make sure that each time the function is called you start at 0.
    for _ in range(range_number):
        result += 0.673859
    return result, range_number

In [93]:
print(add_up(5, var))

(3.369295, 5)


In [82]:
result2 = add_up(5)

In [83]:
result2

3.369295

In [95]:
res3 = add_up(5, 3)
res3

(3.369295, 5)

In [96]:
from sklearn.feature_extraction.text import CountVectorizer, TfidfTransformer
from sklearn.naive_bayes import MultinomialNB
from sklearn.pipeline import make_pipeline

In [97]:
songs = [
    'Under my umbrella ella ella ella' ,
    'Shine bright like a diamond!',
    'Love the way you lie',
    'There is a house in New Orleans, they called the rising sun',
    'Fire on the sun',
    'Please dont let be misunderstood']
labels = ['rihanna']*3 + ['the animals']*3

In [98]:
def train_model(text, artists, stop_words = 'english', **kwargs):
    """
    This function preprocesses the input-data with a CountVectorizer and 
    a tfidf-transformer and trains a Naive Bayes model.
    
    Parameters
    ----------
    text : array_like
        List of documents/songs as strings.
        
    artists : array_like
        List of labels/artists as strings.
    
    stop_words: str
        String that specifies the language of the stopwords for 
        the CountVectorizer
    
    **kwargs: Arbitrary keyword arguments passed as hyperparamters for MultinomialNB.

    Returns
    -------
    A pre-trained sklearn.naive_bayes.MultinomialNB classification model.
    """
    
    cv = CountVectorizer(stop_words=stop_words)
    tf = TfidfTransformer()
    m = MultinomialNB(**kwargs)
    pipeline = make_pipeline(cv, tf, m)
    pipeline.fit(text, artists)
    return pipeline

In [99]:
m = train_model(songs, labels, alpha=3.0)

In [None]:
m.predict....

# What you can do with this knowledge: 💡
- check out Recap-Exercise in Course Material if you'd like to practise functions in general
- When you have some working code for the project, try putting it into functions: 

### Checklist for functions: ✅

- does it do one thing only? 
- does it have a docstring?
- does it have a good name? (action or verb, lowercase) 
- And good variable-names?
- no global variables referenced? all parameters passed? -> Abstract away all hardcoded values
- have you tested it with different inputs?


You can combine multiple functions in one function to rule them all 🧝‍♂️