### More Exercises on Essentials, Numpy, Matplotlib and SciPy

### Exercise 1
Generating an array of 10,000,000 random numbers, then set its negative elements to 0.  
You are requried to provide **three** implementations: one *using loop*, two *without using loop*.   
Please compare their performance by using the $\%time$ magic command on each implementation.

In [1]:
# Exercise 1

import numpy as np
import matplotlib.pyplot as plt

z = np.random.randn(10000000)

In [2]:
%%timeit
for i in range(z.size):
    if z[i] < 0:
        z[i] = 0
    else:
        z[i] = z[i]

1 loop, best of 3: 7.65 s per loop


In [3]:
z = np.random.randn(10000000)
%timeit z[z < 0] = 0

The slowest run took 9.42 times longer than the fastest. This could mean that an intermediate result is being cached.
100 loops, best of 3: 16 ms per loop


In [4]:
z = np.random.randn(10000000)
%timeit z[np.where(z < 0)] = 0

The slowest run took 9.73 times longer than the fastest. This could mean that an intermediate result is being cached.
100 loops, best of 3: 15.4 ms per loop


### Exercise 2
Generate a $4 x k$ matrix, where $k$ is an input given by user, this matrix should look like this:  
[  
 [0, 4, 8, ..., 4(k-1)],  
 [1, 5, 9, ..., 4(k-1)+1],  
 [2, 6,10, ..., 4(k-1)+2],  
 [3, 7,11, ..., 4(k-1)+3]  
]  
**Note:** try to avoid using loop, and implement it in less than 4 lines of code.

In [5]:
k =16
l = np.reshape(np.arange(k), (k//4, 4)).T

### Exercise 3
Write a function $gcd(a, b)$ to compute the *Greatest Common Divisor* for two naturals $a$ and $b$, which are provided by user input. Then write a function $lcm(a, b)$ by making use of your $gcd$ function to compute the *Least Common Multiple* for $a$ and $b$. Your $gcd$ function should present the result as well as the process on finding the result like below:  

```python
gcd(123456, 7890)  
0: gcd(123456, 7890)  
1: gcd(7890, 5106)  
2: gcd(5106, 2784)  
3: gcd(2784, 2322)  
4: gcd(2322, 462)  
5: gcd(462, 12)  
6: gcd(12, 6)  
Out[595]:  
6```


In [6]:
def gcd(a, b, d=None):
    if d is None:
        d = min(a, b)
    if a % d == 0 and b % d == 0:
        return d
    else:
        d = d - 1
        return gcd(a, b, d=d)
    
def lcm(a, b):
    d = gcm(a, b)
    m = min(a/d, b/d)
    return m * max(a, b)

In [7]:
z = gcd(462, 12)
z

6

### Exercise 4
Write a function to remove redundant white spaces from a line of text.  
e.g. '$This\ is \ \ \ \ \ \ \ \ \ \ \ for\ removing \ \ \ \ \ \ \ redundant\ spaces$'   
=>  '$This\ is\ for\ removing\ redundant\ spaces$'   

**Note**: try to avoid using loop, check out the built-in functions of Python

In [8]:
l = 'This is      for removing       redundant spaces'
" ".join(l.split())

'This is for removing redundant spaces'

### Exercise 5
Given an array, e.g., [2, 18, 9, 22, 17, 24, 8, 12, 27],  
generate a subarry from elements which are multiples of 3, e.g., [18, 9, 24, 12, 27]   

**Note**: try to avoid using loop, check out the built-in functions of Python

In [9]:
z = np.array([2, 18, 9, 22, 17, 24, 8, 12, 27])

d = z[z%3 == 0]
d

array([18,  9, 24, 12, 27])

### Exercise 6
Read a text file, get the 20 most frequent words from the text file, output the words as well as their counts.  
**Note:** 
1. You should exclude punctuations such as ", . ! # $ ...". In other words, only count English words
2. You should provide a dictionary to your code which exclude counting any word in this list, e.g., a dictionray may look like this: [to, a, as, this, for, in, on, but].

**Hint: ** You may need some knowledge on regular expressions and its usage in Python.

In [10]:
import pandas as pd

f = open('Walden.txt', 'r')
out = f.read()
word_list = out.split()
counter = {}
for word in word_list:
    if word in counter.keys():
        counter[word] += 1
    else:
        counter[word] = 1

In [11]:
series = pd.Series(counter)
series.sort_values(ascending=False)[:20]

the      6928
and      4474
of       3465
to       3038
a        2958
I        1989
in       1933
that     1261
is       1252
as       1132
it       1102
not      1008
for       921
was       865
or        854
with      848
which     835
my        743
be        717
his       702
dtype: int64

### Exercise 7
Given a large integer N, find out all the prime numbers less than N. You may optimize your program in multiple passes, and compare their performance to evaluate your improvent.   

**Note:** Suppose initially the only prime number that your program knows is 2.

In [12]:
def is_prime(n):
    divisors = []
    for i in range(1, n+1):
        if n % i == 0:
            divisors.append(i)
    return len(divisors)==2        

def find_prime(N):
    primes = []
    for i in range(1, N + 1):
        if is_prime(i):
            primes.append(i)
    return primes