# Worksheet 00

Name:  Mao Mao
UID: U02043894

### Topics

- course overview
- python review

### Course Overview

a) Why are you taking this course?

I'm taking CS506 Data Science Tools and Apps at BU to gain hands-on experience with key data science tools and applications. This course will enhance my understanding of data analytics, fostering a data-driven mindset crucial for effective communication with data teams. It aligns perfectly with my goal to integrate data science into my professional skillset, enabling me to turn data into actionable business insights.

b) What are your academic and professional goals for this semester?

This semester, my academic goal is to excel in my data science and artificial intelligence coursework, deepening my understanding of analytics tools and techniques. Professionally, I aim to apply these skills in a real-world project, enhancing my data-driven decision-making abilities. 

c) Do you have previous Data Science experience? If so, please expand.

Yes, I have previous experience in data science, specifically in the realm of machine learning. I've worked on several projects where I handled diverse datasets, applying machine learning algorithms to extract insights and solve complex problems. This experience has given me a practical understanding of how to manipulate and analyze data effectively, tailoring approaches to the unique characteristics of each dataset.

d) Data Science is a combination of programming, math (linear algebra and calculus), and statistics. Which of these three do you struggle with the most (you may pick more than one)?

As someone deeply immersed in data science, I find myself quite comfortable with its core pillars: programming, math (including linear algebra and calculus), and statistics. Each of these elements plays a crucial role in the field, and my familiarity with these tools and concepts has been an essential part of my journey in data science. I don't particularly struggle with any of these areas, as they are all integral parts of my skill set.

### Python review

#### Lambda functions

Python supports the creation of anonymous functions (i.e. functions that are not bound to a name) at runtime, using a construct called `lambda`. Instead of writing a named function as such:

In [17]:
def f(x):
    return x**2
f(8)

64

One can write an anonymous function as such:

In [18]:
(lambda x: x**2)(8)

64

A `lambda` function can take multiple arguments:

In [19]:
(lambda x, y : x + y)(2, 3)

5

The arguments can be `lambda` functions themselves:

In [20]:
(lambda x : x(3))(lambda y: 2 + y)

5

a) write a `lambda` function that takes three arguments `x, y, z` and returns `True` only if `x < y < z`.

In [21]:
(lambda x, y, z: x < y < z)
(lambda x, y, z: x < y < z)(1, 2, 3)

True

b) write a `lambda` function that takes a parameter `n` and returns a lambda function that will multiply any input it receives by `n`. For example, if we called this function `g`, then `g(n)(2) = 2n`

In [22]:
g = (lambda n: lambda x: x * n)
g(3)(2)

6

#### Map

`map(func, s)`

`func` is a function and `s` is a sequence (e.g., a list). 

`map()` returns an object that will apply function `func` to each of the elements of `s`.

For example if you want to multiply every element in a list by 2 you can write the following:

In [23]:
mylist = [1, 2, 3, 4, 5]
mylist_mul_by_2 = map(lambda x : 2 * x, mylist)
print(list(mylist_mul_by_2))

[2, 4, 6, 8, 10]


`map` can also be applied to more than one list as long as they are the same size:

In [24]:
a = [1, 2, 3, 4, 5]
b = [5, 4, 3, 2, 1]

a_plus_b = map(lambda x, y: x + y, a, b)
list(a_plus_b)

[6, 6, 6, 6, 6]

c) write a map that checks if elements are greater than zero

In [26]:
c = [-2, -1, 0, 1, 2]
gt_zero = map(lambda x: x > 0, c)
list(gt_zero)

[False, False, False, True, True]

d) write a map that checks if elements are multiples of 3

In [28]:
d = [1, 3, 6, 11, 2]
mul_of3 = map(lambda x: x % 3 == 0, d)
list(mul_of3)

[False, True, True, False, False]

#### Filter

`filter(function, list)` returns a new list containing all the elements of `list` for which `function()` evaluates to `True.`

e) write a filter that will only return even numbers in the list

In [29]:
e = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
evens = filter(lambda x: x % 2 == 0, e)
list(evens)

[2, 4, 6, 8, 10]

#### Reduce

`reduce(function, sequence[, initial])` returns the result of sequentially applying the function to the sequence (starting at an initial state). You can think of reduce as consuming the sequence via the function.

For example, let's say we want to add all elements in a list. We could write the following:

In [30]:
from functools import reduce

nums = [1, 2, 3, 4, 5]
sum_nums = reduce(lambda acc, x : acc + x, nums, 0)
print(sum_nums)

15


Let's walk through the steps of `reduce` above:

1) the value of `acc` is set to 0 (our initial value)
2) Apply the lambda function on `acc` and the first element of the list: `acc` = `acc` + 1 = 1
3) `acc` = `acc` + 2 = 3
4) `acc` = `acc` + 3 = 6
5) `acc` = `acc` + 4 = 10
6) `acc` = `acc` + 5 = 15
7) return `acc`

`acc` is short for `accumulator`.

f) `*challenging` Using `reduce` write a function that returns the factorial of a number. (recall: N! (N factorial) = N * (N - 1) * (N - 2) * ... * 2 * 1)

In [42]:
factorial = lambda x: (reduce(lambda acc, y: acc * y, range(1, x + 1), 1))
factorial(10)

3628800

g) `*challenging` Using `reduce` and `filter`, write a function that returns all the primes below a certain number

In [54]:
sieve = lambda x: reduce(lambda primes, x: list(filter(lambda p: p == x or p % x != 0, primes)), range(2, int(x**0.5) + 1), list(range(2, x)))
print(sieve(100))

[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]


sieve = lambda x: reduce(lambda primes, x: list(filter(lambda p: p == x or p % x != 0, primes)), range(2, int(x\*\*0.5) + 1), list(range(2, x)))
print(sieve(100))

### What is going on?

For each of the following code snippets, explain why the result may be unexpected and why the output is what it is:

1. The result is unexpected is because `myBank.is_overdrawn` in `if myBank.is_overdrawn :` returns a class function object instead of boolean value of true or false.
2. The correct way to call an function in a class is like this: `myBank.is_overdrawn()`. In this way, it will return a boolean value.
3. Since the boolean value of a class function objcet will always be `True`. `if myBank.is_overdrawn :` is always entered.

In [49]:
class Bank:
  def __init__(self, balance):
    self.balance = balance
  
  def is_overdrawn(self):
    return self.balance > 0

myBank = Bank(100)
if myBank.is_overdrawn :
  print("OVERDRAWN")
else:
  print("ALL GOOD")

OVERDRAWN


1. `range(4)` starts at 0, ends at 3: (0, 1, 2, 3).
2. At the end of each iteration, i is set to 10. 
3. However, at the begining of each iteration, `range(4)` is overriding the i variable.
4. Therefore, i is being overriden every time except the end of last iteration.

In [None]:
for i in range(4):
    print(i)
    i = 10

0
1
2
3
10


1. `row = [""] * 3` creates a list `row` with three empty strings. 

2. `board = [row] * 3` doesn't create three separate copies of `row`. Instead, it creates a list with three references to the same `row` object. 
When we print `board` before modifying it, it shows as expected: `[['', '', ''], ['', '', ''], ['', '', '']]`. However, all three inner lists are actually the same single list in memory.

3. `board[0][0] = "X"` modifies the first element of the first list in `board`. But since all three lists in `board` are references to the same list, this change is reflected in all three sublists.

4. When we print `board` after this modification, it shows `[['X', '', ''], ['X', '', ''], ['X', '', '']]`. This might be unexpected if we assumed that `board` consisted of three independent lists. But since they are all references to the same list, changing one changes them all.


In [None]:
row = [""] * 3 # row i['', '', '']
board = [row] * 3
print(board) # [['', '', ''], ['', '', ''], ['', '', '']]
board[0][0] = "X"
print(board)

[['', '', ''], ['', '', ''], ['', '', '']]
[['X', '', ''], ['X', '', ''], ['X', '', '']]


1. `funcs.append(some_func)` creates a list of function objects. They do not capture the value of `x` at the time of definition but reference the variable `x` itself.
2. At the end of last interation of for loop, variable `x = 2`. When these functions are called after the loop, `x` has reached its final value of `2`. Therefore, `funcs_results = [2, 2, 2]`.
3. As `x` changes in each iteration, the immediate call to `some_func()` returns the current value of `x`. `results = [0, 1, 2]` is expected.

In [None]:
funcs = []
results = []
for x in range(3):
    def some_func():
        return x
    funcs.append(some_func)
    results.append(some_func())  # note the function call here

funcs_results = [func() for func in funcs]
print(results) # [0,1,2]
print(funcs_results)

[0, 1, 2]
[2, 2, 2]


1. `f = open("./data.txt", "w+")` opens data.txt for writing and reading. If data.txt exists, it truncates the file to zero length (effectively clearing its contents), which is why it's initially empty. 
2. When we open data.txt with `f = open("./data.txt", "w+")` for the second time, it again truncates the file to zero length, erasing the previously written "1,2,3,4,5".
3. Therefore, `line.split("," = []`, and `sum([]) = 0`.

In [53]:
f = open("./data.txt", "w+")
f.write("1,2,3,4,5")
f.close()

nums = []
with open("./data.txt", "w+") as f:
  lines = f.readlines()
  for line in lines:
    nums += [int(x) for x in line.split(",")]

print(sum(nums))

[]
0
