# Python Advanced - Assignment 22

### Q1. What are the benefits of the built-in array package, if any?

1) memory efficiency - contiguous allocation of m/m, so less wastage and fast access
2) same data type can be stored

### Q2. What are some of the array package's limitations?

1) do not provide special functions to handle the data inside the array
2) we have to explicitly mention the type of data inside the array when using the built-in module. If we enter any other data type, it results in error

### Q3. Describe the main differences between the array and numpy packages.

1) Numpy arrays are dynamic in nature while Built-in array is static : Arrays are a homogenous data type structure. In built-in array, we have to mention the type of data that will be entered in the particular array and no other data type is allowed. Whereas, in numpy, there is typecasting. We can enter any type of data and python will implicitly convert all data types into the higher level type.
2) Numpy is a vast library that has tools to handle and modify data well like array multiplication, dot/outer products, arange, random and much more. The built-in array module lacks much of these utilities, which is why not many people use the built-in array.
3) Built-in array module creates 1-D arrays only, but Numpy can create more than 1-D arrays, matrices, etc.

### Q4. Explain the distinctions between the empty, ones, and zeros functions.

1) numpy.empty - creates an m x n matrix (or array) with uninitialized values (or random garbage values.
2) numpy.ones  - creates a m x n matrix (or array) with all values initialized to 1.
2) numpy.zeros  - creates a m x n matrix (or array) with all values initialized to 0.

In [1]:
import numpy as np
np_empty = np.empty((2,2), dtype='int')
print(np_empty)

[[-2070750273  -253414701]
 [  383365606   182795542]]


In [2]:
np_ones = np.ones((2,2), dtype='int')
print(np_ones)

[[1 1]
 [1 1]]


In [3]:
np_zeros = np.zeros((2,2), dtype='int')
print(np_zeros)

[[0 0]
 [0 0]]


### Q5. In the fromfunction function, which is used to construct new arrays, what is the role of the callable argument?

The `fromfunction`'s callable argument is crucial in deciding what the elements of the new array will be. It is a function which takes the coordinated of the array elements and computes new values by applying what has been defined in that function.

In [4]:
def func1(x,y):
    return x*2+y*2
np_fromfunc = np.fromfunction(func1, (3,3), dtype='int')
print(np_fromfunc)

[[0 2 4]
 [2 4 6]
 [4 6 8]]


### Q6. What happens when a numpy array is combined with a single-value operand (a scalar, such as an int or a floating-point value) through addition, as in the expression A + n?

The value `n`, will be added to all individual elements of the array. For eg., if array = [1,2,3] and the scalar value is 1, then if we add the array and 1, the new array will be [2,3,4]

In [5]:
a = np.array([1,2,3])
print(a+1)

[2 3 4]


### Q7. Can array-to-scalar operations use combined operation-assign operators (such as += or *=)? What is the outcome?

### Q8. Does a numpy array contain fixed-length strings? What happens if you allocate a longer string to one of these arrays?

Yes, numpy array has fixed length strings. If a linger string is allocated, it is truncated to the maximum length of that array at initialization.

In [6]:
a = np.array(['Hello','hi']) # max length at initialization is 5 so in future if length of string exceeds 5, it is truncated to 5
print(a)
a[1]='universe'
print(a)

['Hello' 'hi']
['Hello' 'unive']


### Q9. What happens when you combine two numpy arrays using an operation like addition (+) or multiplication (*)? What are the conditions for combining two numpy arrays?

There will be element-wise multiplication and addition if we multiply or add 2 arrays. The condition to add and multiply is that the shape of both the arrays should be same.

In [7]:
a = np.array([1,2,3]) # shape = (3,)
b = np.array([1,2]) # shape = (2,)
c = np.array([1,2,3]) # shape = (3,)

print('a =',a)
print('b =',b)
print('c =',c)

a = [1 2 3]
b = [1 2]
c = [1 2 3]


In [8]:
# ADDING
try:
    print(a+b)
except:
    print('a + b not possible as shape is not same')
finally:
    print('a + c = ',a+c)

a + b not possible as shape is not same
a + c =  [2 4 6]


In [9]:
# MULTIPLICATION
try:
    print(a*b)
except:
    print('a x b not possible as shape is not same')
finally:
    print('a x c = ',a*c)

a x b not possible as shape is not same
a x c =  [1 4 9]


### Q10. What is the best way to use a Boolean array to mask another array?

The best way is to simply use Boolean indexing. See example below:

In [10]:
a1 = np.array([1,2,3,4,5])
bool_mask = np.array([True, True, False, False, True])

a2 = a1[bool_mask] # this is boolean indexing
print('original array =',a1)
print('boolean mask =',bool_mask)
print('boolean masked array =',a2)

original array = [1 2 3 4 5]
boolean mask = [ True  True False False  True]
boolean masked array = [1 2 5]


### Q11. What are three different ways to get the standard deviation of a wide collection of data using both standard Python and its packages? Sort the three of them by how quickly they execute.

In [11]:
import numpy as np
import scipy.stats as sci
import statistics as stat
import time

data = np.random.randint(1,10000,10000)
print(data)

[1141 7909 1842 ...  399 3881 5758]


In [12]:
%%time
a = time.time()
std1 = np.std(data)
print('Numpy Standard Deviation =',std1)
b = time.time()
print('Time :',b-a)
print()

Numpy Standard Deviation = 2902.0194654220895
Time : 0.0

CPU times: total: 0 ns
Wall time: 999 µs


In [13]:
%%time
a = time.time()
std2 = sci.tstd(data)
print('Scipy Standard Deviation =',std2)
b = time.time()
print('Time :',b-a)
print()

Scipy Standard Deviation = 2902.164577278841
Time : 0.001394510269165039

CPU times: total: 0 ns
Wall time: 1.39 ms


In [14]:
# Statistics module computes stdev on the primitive data types of python like list, and not on numpy arrays
data = list(range(10000))

In [15]:
%%time
a = time.time()
std3 = stat.stdev(data)
print('Statistics Standard Deviation =',std3)
b = time.time()
print('Time :',b-a)
print()

Statistics Standard Deviation = 2886.8956799071675
Time : 0.003995418548583984

CPU times: total: 0 ns
Wall time: 5 ms


By observing the above computation, we can conclude that `Numpy is the fastest` and second is scipy.stats.tstd and the `Statistics.stdev is the worst`.

### 12. What is the dimensionality of a Boolean mask-generated array?

The dimensionality of the new generated array completely depends on the Boolean mask - It depends on the number of `True` you have in the mask.