# Creating and Reshaping Arrays

The exercises in this notebook will teach you to use a variety of common functions for creating and reshaping arrays.

In [1]:
import numpy as np
import matplotlib.pyplot as plt

## `np.array`

The most common way to create an array is to construct one from a python list.

**Example:** Passing a list of scalars produces a 1-dimensional array.

In [2]:
'''Use np.array to create our new numpy array'''
np.array([1, 2, 3])

array([1, 2, 3])

**Example:** Passing a list of lists produces a 2-dimensional array.

In [5]:
'''If we pass a lists of lists to no,it creates our 2x2 structure.This can be done for nxn also'''
np.array([[1, 2], 
          [2, 3]])

array([[1, 2],
       [2, 3]])

**Exercise:** Create a 1-dimensional array using `np.array` containing `[1, 2, 3, 4]`.

In [3]:
'''For this,we can use 3 different methods.We can use the arange(simillar to range in python,
pass in a normal python list) or create a new array'''

np.arange(1,5)
np.asarray([1,2,3,4])
np.array([1,2,3,4])

array([1, 2, 3, 4])

**Exercise:** Create a 2-dimensional array using `np.array` with two rows containing  `[1, 2, 3]` and `[4, 5, 6]`.

In [12]:
'''If we pass a lists of lists to no,it creates our 2x2 structure.This can be done for nxn also'''
np.array([[1,2,3],[4,5,6]])

array([[1, 2, 3],
       [4, 5, 6]])

## `np.arange`

**Exercise:** Construct an array containing values from 0 to 9, inclusive, in ascending order.

In [13]:
'''np.arange function is simillar to the range function as it gives us the same output which the range function gives
us in pure python'''

np.arange(10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

**Exercise:** Construct an array containing all the integers between 1 and 3 inclusive, in ascending order.

In [17]:
'''Linspace is used to return evenly spaced integers between 2 endpoints.It also has a 3rd argument which specifies the
number of outputs to return.If not specified,it returns the numbers separated by a 1'''

np.linspace(1,3)
np.arange(1,4)

array([1, 2, 3])

**Exercise:** Construct an array containing all the integers between 5 and 10 inclusive, in ascending order.

In [19]:
np.linspace(5,10)
np.arange(5,11)

array([ 5,  6,  7,  8,  9, 10])

**Exercise:** Construct an array containing all the integers between 1 and 10 inclusive, in descending order.

In [105]:
'''Aragen function with the step parameter(negative) in this case'''
np.arange(10,0,-1)

array([10,  9,  8,  7,  6,  5,  4,  3,  2,  1])

**Exercise:** Construct an array containing all the **even** integers between 2 and 10 inclusive, in ascending order.

In [21]:
'''Positive step parameter'''
np.arange(2,12,2)

array([ 2,  4,  6,  8, 10])

## `linspace`

**Exercise:** Construct an array containing 50 evenly-spaced values between -1 and 1.

In [4]:
# np.linspace?
np.linspace(-1,1)

array([-1.        , -0.95918367, -0.91836735, -0.87755102, -0.83673469,
       -0.79591837, -0.75510204, -0.71428571, -0.67346939, -0.63265306,
       -0.59183673, -0.55102041, -0.51020408, -0.46938776, -0.42857143,
       -0.3877551 , -0.34693878, -0.30612245, -0.26530612, -0.2244898 ,
       -0.18367347, -0.14285714, -0.10204082, -0.06122449, -0.02040816,
        0.02040816,  0.06122449,  0.10204082,  0.14285714,  0.18367347,
        0.2244898 ,  0.26530612,  0.30612245,  0.34693878,  0.3877551 ,
        0.42857143,  0.46938776,  0.51020408,  0.55102041,  0.59183673,
        0.63265306,  0.67346939,  0.71428571,  0.75510204,  0.79591837,
        0.83673469,  0.87755102,  0.91836735,  0.95918367,  1.        ])

## Exercise: `zeros`, `ones`, and `full`

**Exercise:** Construct arrays with the following shapes and values:
- 1-dimensional array with 10 entries, all containing the value 0.
- 2-dimensional array with 3 rows and 5 columns, all containing the value 1.
- 2-dimensional array with 5 rows and 3 columns, all containing the value 2.
- 3-dimensional array of shape (2, 5, 10), all containing the value 0.
- 5-dimensional array of shape (1, 2, 3, 4, 5), all containing the value 42.

In [7]:
'''This function is used to create an array of zeroes with a given shape'''
np.zeros(10)

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [8]:
'''This function is used to create an array of ones with a given shape'''
np.ones((3,5))

array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

In [5]:
'''This function is used to create an array of a given shape with a single value filled in it'''
np.full((5,3),2)

array([[2, 2, 2],
       [2, 2, 2],
       [2, 2, 2],
       [2, 2, 2],
       [2, 2, 2]])

In [6]:
np.zeros((2,5,10))

array([[[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]],

       [[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]]])

In [9]:
# Used to fill a given value with a definite shape.Parent function for np.ones and np.zeros?
np.full((1,2,3,4,5),42)

array([[[[[42, 42, 42, 42, 42],
          [42, 42, 42, 42, 42],
          [42, 42, 42, 42, 42],
          [42, 42, 42, 42, 42]],

         [[42, 42, 42, 42, 42],
          [42, 42, 42, 42, 42],
          [42, 42, 42, 42, 42],
          [42, 42, 42, 42, 42]],

         [[42, 42, 42, 42, 42],
          [42, 42, 42, 42, 42],
          [42, 42, 42, 42, 42],
          [42, 42, 42, 42, 42]]],


        [[[42, 42, 42, 42, 42],
          [42, 42, 42, 42, 42],
          [42, 42, 42, 42, 42],
          [42, 42, 42, 42, 42]],

         [[42, 42, 42, 42, 42],
          [42, 42, 42, 42, 42],
          [42, 42, 42, 42, 42],
          [42, 42, 42, 42, 42]],

         [[42, 42, 42, 42, 42],
          [42, 42, 42, 42, 42],
          [42, 42, 42, 42, 42],
          [42, 42, 42, 42, 42]]]]])

## `identity`

**Exercise:** Construct a 5 x 5 array with 1s along the diagonal and zeros everywhere else.

In [37]:
np.identity?

In [36]:
'''This function is used to create an array of a given shape with ony the main diagonal having the
value one and all spaces have the value zero filled in it'''
np.identity(5)

array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])

## Exercise: `random`

In [41]:
rng = np.random.RandomState(seed=42)

Construct an array containing 10 values drawn uniformly at random from the interval `[-1, 1]`.

In [54]:
# rng.uniform?
'''This function is used to create an array of a given size with values taken using the uniform distribution
between 2 endpoints'''
rng.uniform(-1,1,size=10)

array([-0.60065244,  0.02846888,  0.18482914, -0.90709917,  0.2150897 ,
       -0.65895175, -0.86989681,  0.89777107,  0.93126407,  0.6167947 ])

Construct a 3 x 3 array with values drawn from a normal distribution centered at 0 with a standard deviation of 2.5.

In [57]:
# rng.normal?
'''This function is used to create an array of a given shape with values taken from the normla distribution between
2 points'''
rng.normal(0,2.5,size=(3,3))

array([[-4.89917531, -3.32046512,  0.49215309],
       [ 1.84616645,  0.4284207 , -0.28912071],
       [-0.75275924, -3.69630498, -1.79961052]])

## `pandas.read_csv`

Many people use the `pandas` module to read numerical data from external sources. The `.csv` (comma-separated value) format is often used for small and medium-sized datasets.

In [58]:
import pandas as pd

We can read a CSV into a DataFrame useing `pandas.read_csv`.

In [59]:
'''This function is used to read the csv file(Extremely common for small and medium scale data sets) and also 
select the idnex columns,parse dates etc'''

prices = pd.read_csv('prices.csv', index_col='dt', parse_dates=['dt'])
prices.head()

Unnamed: 0_level_0,AAPL,MSFT,TSLA,MCD,BK
dt,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2017-10-02 13:31:00,154.34,74.88,342.33,156.38,52.736
2017-10-02 13:32:00,154.07,74.832,341.48,156.66,52.686
2017-10-02 13:33:00,153.72,74.835,341.83,156.324,52.756
2017-10-02 13:34:00,153.69,74.89,341.24,156.66,52.726
2017-10-02 13:35:00,153.45,74.81,341.873,156.67,52.706


DataFrames are composed of three parts:

- `index`, an array of row-labels
- `columns`, an array of column-labels
- `values`, an array of table values.

We can get a numpy array for each of these attributes by using the `.values` attribute:

In [60]:
'''The values argunment for any colum returns the values of that column as a numpy array'''
prices.index.values

array(['2017-10-02T13:31:00.000000000', '2017-10-02T13:32:00.000000000',
       '2017-10-02T13:33:00.000000000', ...,
       '2017-10-31T19:58:00.000000000', '2017-10-31T19:59:00.000000000',
       '2017-10-31T20:00:00.000000000'], dtype='datetime64[ns]')

In [61]:
prices.columns.values

array(['AAPL', 'MSFT', 'TSLA', 'MCD', 'BK'], dtype=object)

In [62]:
prices.values

array([[154.34 ,  74.88 , 342.33 , 156.38 ,  52.736],
       [154.07 ,  74.832, 341.48 , 156.66 ,  52.686],
       [153.72 ,  74.835, 341.83 , 156.324,  52.756],
       ...,
       [169.1  ,  83.18 , 331.44 , 166.82 ,  51.415],
       [169.13 ,  83.17 , 331.8  , 166.87 ,  51.42 ],
       [169.05 ,  83.18 , 331.52 , 166.9  ,  51.45 ]])

**Exercise:** Use `pd.read_csv` to load the file "volumes.csv".

In [63]:
dummynew=pd.read_csv("volumes.csv")

**Exercise:** Get a numpy array of datetimes representing the row-labels of the DataFrame.

In [70]:
dummynew['dt'].values

array(['2017-10-02 13:31:00+00:00', '2017-10-02 13:32:00+00:00',
       '2017-10-02 13:33:00+00:00', ..., '2017-10-31 19:58:00+00:00',
       '2017-10-31 19:59:00+00:00', '2017-10-31 20:00:00+00:00'],
      dtype=object)

**Exercise:** Get a numpy array of strings representing the column-labels of the DataFrame.

In [107]:
newarraystring=np.array(dummynew.columns)
# print(newarraystring+newarraystring)
dummynew.columns.values

dummynew.values

array([['2017-10-02 13:31:00+00:00', 420042.0, 409211.0, 49907.0,
        85774.0, 30276.0],
       ['2017-10-02 13:32:00+00:00', 161960.0, 49207.0, 18480.0, 6866.0,
        4511.0],
       ['2017-10-02 13:33:00+00:00', 118283.0, 24043.0, 47039.0, 3000.0,
        3001.0],
       ...,
       ['2017-10-31 19:58:00+00:00', 308468.0, 191973.0, 24702.0, 8959.0,
        32288.0],
       ['2017-10-31 19:59:00+00:00', 343843.0, 198143.0, 35814.0,
        13696.0, 56411.0],
       ['2017-10-31 20:00:00+00:00', 661452.0, 610933.0, 59772.0,
        58947.0, 177300.0]], dtype=object)

**Exercise:** Get a numpy array of floats representing the table values of the DataFrame.

## Reshaping Arrays

Once we've created or loaded an array, a common next step is to reshape the array.

The most general way to reshape an array is to use the `.reshape` method of `ndarray`. `.reshape` accepts a tuple of new dimensions and 

In [77]:
data = np.arange(12)

**Exercise:** Reshape `data` into an array with three rows and four columns.

In [81]:
# data.reshape?
'''Reshape allows us to change the ordering of the values in our matrix.This only works if the shape that you give
is actually valid for a given matrix.Eg a shape of (3,5) is invalid as 12 elements cannot be order in a matrix which
requires atleast 15 shapes
'''

data.reshape((3,4))

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

**Exercise:** Reshape `data` into an array with four rows and three columns:

In [79]:
data.reshape(4,3)

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11]])

**Exercise:** Reshape `data` into an array of shape `(2, 2, 3)`.

In [82]:
data.reshape(2,2,3)

array([[[ 0,  1,  2],
        [ 3,  4,  5]],

       [[ 6,  7,  8],
        [ 9, 10, 11]]])

## Transpose

A common pattern, especially when doing linear algebra with 2D arrays, is to need to "rotate" an array by 90 degrees. This operation is commonly known as "transposing" the array.

**Exercise:** Use the `.transpose()` method to convert data from a `2 x 4` array into a `4 x 2` array.

In [83]:
data = np.arange(8).reshape(2, 4)
data.T

array([[0, 4],
       [1, 5],
       [2, 6],
       [3, 7]])

In [None]:
data.transpose?

Transposing arrays is so common in linear algebra that numpy provides a shorthand for it. The `.T` property provides a transposed view of an array.

**Exercise:** Transpose `data` using the `.T` property.

In [86]:
'''Transpose basically reshapes the array by swapping the columns with the rows and vice versa'''
data.transpose()

array([[0, 4],
       [1, 5],
       [2, 6],
       [3, 7]])

## Measuring Performance of Numpy vs. Pure Python

We've seen that numpy allows us to run simple numerical computations much faster than pure Python. To show that, we used a few different functions and tools:

- We used a `dot_product` method implemented in pure Python:

```python
def python_dot_product(xs, ys):
    return sum(x * y for x, y in zip(xs, ys))
```
- We used two numpy implementations of `dot_product`:

```python
def manual_numpy_dot(xs, ys):
    return (xs * ys).sum()

def native_numpy_dot(xs, ys):
    return xs.dot(ys)
```

- We used IPython's `%%timeit` magic as a simple way to measure how long a cell takes to run on average.

Unfortunately, nothing in programming comes for free. Numpy allows us to speed up computations on large arrays by performing one complex dispatch **per array** instead of a cheap dispatch **per array element**. **This only gives us a speedup if we have many array elements.**

### Exercise:
Using the ``%%timeit`` builtin, figure out how many data points you need to have for a numpy dot product to be faster than a pure-python implementation.

You can use the `make_list` function below to create Python lists of a given size. Use any of the functions from the exercises above to make numpy lists. Be sure not to include the list/array creation in your timings (that probably means you want to use separate cells for constructing arrays and testing timings).

In [87]:
def make_list(size):
    return list(range(size))

def python_dot_product(xs, ys):
    return sum(x * y for x, y in zip(xs, ys))

def manual_numpy_dot(xs, ys):
    return (xs * ys).sum()

def native_numpy_dot(xs, ys):
    return xs.dot(ys)

In [99]:
dummy1=np.arange(10000000).reshape(500,20000)

In [100]:
%%timeit
python_dot_product(dummy1,dummy1)

19.5 ms ± 687 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [101]:
%%timeit
dummy1.dot(dummy1.T)

4.24 s ± 39.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
