------
# Special Functions
------

In our first lecture, we have seen how functions work in Python. Today, we'll go over some special built-in functions introduced by python developers that are very useful to any data scientist.

However, before we venture into these special functions, I wanted to take a detour and talk about **Object Oriented Programming (OOP)**.

**What is OOP?**
Simply put, everything in Python is an "object" and "classes" are a blueprint of objects. So when we write:

In [47]:
a = 2
b = "hello!"
c = [1,2,3]

We are creating an object `a` of class `int` holding the value `2`. The object `b` of class `str` holding the value `"hello!"`. Every class of objects comes with a set of predefined "procedures" (better known as functions), that we can apply on the objects.

Some objects are built into python, while others need to be "imported" (along with their functions) from existing pythonic modules or libraries for us to be able to access them and use them.

With that knowledge, let me introduce you to our first imported modules.

## Dates and Times

A lot of analysis you do might relate to dates and times. For instance, finding the average number of sales over a given period, selecting a list of products to data mine if they were purchased in a given period, or trying to find the period with the most activity in online discussion forum systems.

While we won't delve too deeply into time series analysis during this course, you still should be aware of the different ways in which Python stores date and time. One of the most common legacy methods for storing the date and time in online transactions systems is based on the offset from the epoch, which is January 1, 1970. There's a lot of historical cruft around this, but it's not uncommon to see systems storing the date of a transaction in seconds or milliseconds since this date. So if you see large numbers where you expect to see date and time, you'll need to convert them to make sense out of the data.

In Python, you can get the current time since the epoch using the time module. The `.time()` function in it returns *the current time in seconds* since the epoch (January 1st, 1970).

In [48]:
# we import a library using the `import` keyword followed by an `as` to create an abbreviation
import time as tm

tm.time()

1693909319.626261

This isn't very readable. We need to convert the `time` object to something easier to comprehend, like a **`datetime`** object.

The `datetime` module supplies classes to work with date and time. These classes provide a number of functions to deal with dates, times and time intervals.

Using this module, we can create a "**timestamp**" using the `.fromtimestamp()` function, which returns a `datetime` object allowing us to distinguish year, month, day, and so forth.

In [49]:
import datetime as dt

dtnow = dt.datetime.fromtimestamp(tm.time())
dtnow

datetime.datetime(2023, 9, 5, 14, 21, 59, 629640)

In [50]:
#you can also use the now() function
dt.datetime.now()

datetime.datetime(2023, 9, 5, 14, 21, 59, 632831)

The date time object also has handy attributes to get the representative year, month, day, hour, etc.

In [51]:
# get year, month, day, etc.from a datetime
dtnow.year, dtnow.month, dtnow.day, dtnow.hour, dtnow.minute, dtnow.second, dtnow.microsecond

(2023, 9, 5, 14, 21, 59, 629640)

Note that `.now()` is a function, while `.year` is an attribute. Can you tell me what the difference is?

`datetime` objects also allow for simple math, using "time deltas". Time delta is a duration expressing the difference between two date values.

For instance, here, we can create a `timedelta` of 90 days...




In [52]:
delta = dt.timedelta(days = 90)
delta

datetime.timedelta(days=90)

... then do subtraction and comparisons with another `datetime` object, say today.

In [53]:
# datetime.date.today() returns the current local date
today = dt.date.today()
print(today)

print(today-delta)

2023-09-05
2023-06-07


The `datetime` object also allows you to compare dates directly, to know which dates are more recent.

In [54]:
today > today-delta #compare dates

True

Being able to manipulate dates easily is very useful in data science, and the `datetime` function is one of the main perks offered by python to data scientists.

For example, this is commonly used in data science for creating sliding windows. For instance, you might want to look for any five day span of time where sales were highest, and flag that for follow ups.

This is just a glimpse at dates and times in Python. Later in the course, we're going to investigate dates and times a bit more using the pandas `datetime` library.

## map()

The `map` built-in function in Python has a very special feature, where unlike other functions it takes a "function" as its first parameter followed by as many "iterable" parameters as needed.

The map function signature looks like this:

`map (function, <iterable_parameter1>, <iterable_parameter2>, ... )`

The first parameter is the function that you want executed, and the second parameter, and every following parameter, is something which can be iterated upon. All the iterable arguments are unpacked together, and passed into the given function.

I know this sounds a little cryptic, so let's take a second and look at an example. Imagine we have two lists of numbers indicating prices from two different stores on exactly the same items. Say we wanted to find the minimum that we would have to pay if we bought the cheaper item between the two stores.

To do this, we could iterate through each list, comparing items and choosing the cheapest. However, with `map()`, we can do this comparison in a single statement!

In [55]:
store1 = [10.00, 11.00, 12.34, 2.34]
store2 = [9.00, 11.10, 12.34, 2.01]

cheapest = map(min, store1, store2)
cheapest

<map at 0x103e55b40>

Now, when we go to print out the map, we see that we get an odd reference value instead of a list of items that we're expecting. This is called *lazy evaluation*; which means that an object is evaluated when it is needed and not when it is created.

In Python, the `map()` function returns to you a "*map object*". So in other words, it doesn't actually try and run the function `min()` on two items, until you decide to look inside it for a value. This is an interesting design pattern of the language, and it's commonly used when dealing with big data. This allows us to have very efficient memory management, even though it might add a slight computational complexity.

So, how do we see the values inside the `map` object? Well, maps are iterable, just like lists and tuples, so we can use a for-loop to look at all of the values in the map.

In [56]:
for item in cheapest:
    print(item)

9.0
11.0
12.34
2.01


## Lambda

`lambda` is probably one of the most common built-in functions that you'll see appear more as you spend more and more time with Python and data science. Lambdas are Python's way of creating "anonymous" functions. These are the same as any other functions, but they have no name. The intent is that they're simple or short lived and it's easier just to write out the function in one line instead of going to the trouble of creating a named function.

The `lambda` syntax is fairly simple:

`lambda <parameter1>, <parameter2>, ... : <single_expression>`

It might, however, take a bit of time to get used to. You declare a lambda function with the word `lambda` followed by a list of arguments, followed by a colon and then a single expression. Note that there's only *one* expression to be evaluated in a lambda.

Note also, that a `lambda` *returns is a function reference*.

Let's look an example. Here's an example of `lambda` that takes in three parameters and adds the first two. It returns a function variable, `my_function`.




In [57]:
my_function = lambda a, b, c : a + b

my_function

<function __main__.<lambda>(a, b, c)>

In order to see the results of the function, we would execute `my_function` as we do any function we create, and pass in it three different parameters.

In [58]:
my_function(1, 2, 3)

3

Note that you can't have default values for lambda parameters and you can't have complex logic inside of the lambda itself because you're limited
to a single expression.

This makes lambdas much more limited than full function definitions. However, they're still very useful for simple little data cleaning tasks. And you'll see lots of examples of them on the web (and in our course). As such, its extremely important that you be able to read and write lambdas.

##List Comprehensions

Finally, I want to take a step back and revisit sequences we've seen
in Python. That is to say, tuples, lists, dictionaries and so forth. Sequences are structures that we can iterate over, and often we create these through loops or by reading data from a file.

Python has built in support for creating these collections using a more abbreviated syntax called "**list comprehensions**".

Like lists, list comprehensions are surrounded by brackets `[ ]`, but instead of having a sequence of data inside it, you enter an expression followed by `for loop` and `if-else` clauses.

To better understand it, let's look at an example. Let's write up a little for-loop iterating a number from 0 and 10, and appending to a list the value of the number multiplied by 3.










In [59]:
S=[]

for x in range(10):
  S.append(x*3)

S

[0, 3, 6, 9, 12, 15, 18, 21, 24, 27]

Python allows us to rewrite this as a "list comprehension" by putting the iteration on one line. We start the list comprehension with the value we want in the list. In this case, it's a number. The we follow it with the for-loop, and finally any condition clauses.

In [60]:
S = [x*3 for x in range(10)]

S

[0, 3, 6, 9, 12, 15, 18, 21, 24, 27]

As you can see, this is much more *compact* and it tends to be *faster* as well.

Let's try another example, with a condition clause. What if we want only the even numbers in `S`?

Using a for-loop, we can iterate through the list we just generated, and then check with the modulus operator (`%`) if modulus two of the number (ie `x%2`) returns `0`. If `True`, then we know `x` is even and should be added to our list.




In [61]:
M = []

for x in S:
  if x % 2 == 0:
    M.append(x)

M

[0, 6, 12, 18, 24]

We can rewrite this as a list comprehension by putting the iteration in one line. Again, we start the list comprehension with the value we want in the list. In this case, it's a number. Then we put it in the for-loop, and then finally, we add any condition clauses.

In [62]:
M = [x for x in S if x % 2 == 0]

M

[0, 6, 12, 18, 24]

Just like with lambdas, list comprehensions are a condensed format which may offer readability and performance benefits and you'll often find them being used in data science tutorials or on stack overflow. While regular for-loops and functions could do the same thing, you will find that when dealing with big data, lambdas and list comprehensions tend to be much faster and more efficient.

#Exercises

**Question 1:** Here is a list of faculty. Can you write a function and apply it using `map()` to get a list of all faculty titles and last names (e.g. ['Dr. AlShebli', 'Dr. O'Brian', …]) ?

In [63]:
#Note how John's name has an apostrophe, and Hannah's name has a special character
people = ["Dr. Bedoor Khalifa AlShebli", "Dr. John O'Brian", "Dr. Hannah Brückner"]

def split_title_and_name(person):
    names = person.split()
    return names[0] + ' ' + names[-1]

#list() converts the returned map object into a list
list(map(split_title_and_name, people))

['Dr. AlShebli', "Dr. O'Brian", 'Dr. Brückner']

**Question 2:** How many days has it been since you were born?

In [64]:
#answer here
my_birthdate = dt.date(2001, 12, 24)

(dt.date.today() - my_birthdate).days

7925

**Question 3:**
1. Given the list of numbers below, return a list the contains the square value of each odd number.
2. Given the list of names below, return a list of names that start with B.

Use list comprehensions

In [65]:
numbers = [8, 50, 62, 30, 90, 48, 21, 77, 28, 85, 86, 2, 87, 96, 45, 67, 60, 59, 41, 34]
names = ['Bedoor','Susan','Ahmed','Bayan','Sarah','Barbara']

# answer here
print([x**2 for x in numbers if x % 2 != 0])
print([x for x in names if x[0] == 'B'])

[441, 5929, 7225, 7569, 2025, 4489, 3481, 1681]
['Bedoor', 'Bayan', 'Barbara']


**Question 4:** Convert this function into a list comprehension.

In [66]:
def times_tables():
    lst = []
    for i in range(10):
        for j in range (10):
            lst.append(i*j)
    return lst

print(times_tables())

#answer here
print([x * y for y in range(10) for x in range(10)])

[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 0, 3, 6, 9, 12, 15, 18, 21, 24, 27, 0, 4, 8, 12, 16, 20, 24, 28, 32, 36, 0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 0, 6, 12, 18, 24, 30, 36, 42, 48, 54, 0, 7, 14, 21, 28, 35, 42, 49, 56, 63, 0, 8, 16, 24, 32, 40, 48, 56, 64, 72, 0, 9, 18, 27, 36, 45, 54, 63, 72, 81]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 0, 3, 6, 9, 12, 15, 18, 21, 24, 27, 0, 4, 8, 12, 16, 20, 24, 28, 32, 36, 0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 0, 6, 12, 18, 24, 30, 36, 42, 48, 54, 0, 7, 14, 21, 28, 35, 42, 49, 56, 63, 0, 8, 16, 24, 32, 40, 48, 56, 64, 72, 0, 9, 18, 27, 36, 45, 54, 63, 72, 81]


**Question 5:** Write a `lambda` function that will take a number and return its squared value if it is an integer (hint: look up `isinstance()` function), otherwise return "Can't calculate non-integers".



In [67]:
# answer here
# Write a lambda function that takes in an integer and prints it
print((lambda x: x ** 2 if isinstance(x, int) else "Can't calculate non-integers")(5))
print((lambda x: x ** 2 if isinstance(x, int) else "Can't calculate non-integers")('hi'))

25
Can't calculate non-integers


**Question 6:** Starting with the below equal sized lists `listA` and `listB`:

1. use list comprehension to round any float number in `listA`. (Hint: look up `round()`)

2. write a function named "equal" that returns `True` if 2 values are the same and `False` otherwise.

3. apply the function using `map()` to get a list of which values in `listA` and `listB` are equal. For example, for lists `[1,2,3]` and `[4,2,3]`, the output should be `[False,True,True]`.

In [68]:
listA = [20, 23.8, 89, 59.9, 64, 74, 17.154, 31, 11, 36, 27, 85, 40, 47.3, 69]

listB = [48, 24, 31, 60, 40, 35, 17, 27, 36, 15, 23, 83, 67, 47, 89]

# answer here
print([round(x) for x in listA])


def equal(a, b):
    return a == b


print(equal(1, 1))
print(equal(1, 2))

print(list(map(equal, listA, listB)), end=", ")
print()
print(list(map(equal, [round(x) for x in listA], listB)), end=", ")


[20, 24, 89, 60, 64, 74, 17, 31, 11, 36, 27, 85, 40, 47, 69]
True
False
[False, False, False, False, False, False, False, False, False, False, False, False, False, False, False], 
[False, True, False, True, False, False, True, False, False, False, False, False, False, True, False], 