# NumPy Basics

Numerical Python (NumPy) is a successor to the Numeric package. It was originally written by Travis Oliphant and had its first stable release in mid-2006.

In [12]:
my_list = ['1',2.0,3]

In [16]:
for i in range(len(my_list)):
    my_list[i] +=1

In [None]:
!pip install 

In [15]:
my_list

[2, 3, 4]

NumPy provides a simple yet powerful data structure: the **n-dimensional array**. 

**In this tutorial you’ll learn:**

- What **core concepts** in data science are made possible by NumPy
- How to create **NumPy arrays** using various methods
- How to manipulate NumPy arrays to perform **useful calculations**
- How to apply these new skills to **real-world problems**

## Choosing NumPy: The Benefits

Here are the top four benefits that NumPy can bring to your code:

1. **More speed:** NumPy uses algorithms written in C that complete in nanoseconds rather than seconds.
2. **Fewer loops:** NumPy helps you to reduce loops and keep from getting tangled up in iteration indices.
3. **Clearer code:** Without loops, your code will look more like the equations you’re trying to calculate.

Because of these benefits, NumPy is the de facto standard for multidimensional arrays in Python data science, and many of the most popular libraries are built on top of it.

## Installing NumPy

It’s time to get everything set up so you can start learning how to work with NumPy. There are a few different ways to do this, and you can’t go wrong by following the instructions on the NumPy website. But there are some extra details to be aware of that are outlined below.

### Installing NumPy With Anaconda


```bash
$ conda install numpy
```

This will install what you need for this NumPy tutorial, and you’ll be all set to go.

### Installing NumPy With `pip`

Although the NumPy project recommends using `conda` if you’re starting fresh, there’s nothing wrong with managing your environment yourself and just using good old `pip`, Pipenv, Poetry, or whatever other alternative to `pip` is your favorite.

Here are the commands to get set up with `pip`:

```bash
$ pip install numpy
```

After this, make sure your virtual environment is activated, and all your code should run as expected.

## The Need for NumPy Arrays

A fundamental question that beginners ask is. Why are arrays necessary for scientific computing at all? Surely, one can perform complex mathematical operations on any abstract data type, such as a list. The answer lies in the numerous properties of arrays that make them significantly more useful. In this section, let's go over a few of these properties to emphasize why something such as the NumPy ndarray object exists at all.

### Representing of matrices and vectors
In scientific literature, an expression such as $A_{ij}$ is typically used to denote the element in the $i^{th}$ row and $j^{th}$ column of array `A`. The corresponding expression in NumPy would simply be `A[i,j]`. For matrix operations, NumPy arrays also support **vectorization** (details are addressed later, Using NumPy Arrays), which speeds up execution greatly.

### Efficiency

NumPy arrays are better than most other data structures with respect to almost all of these characteristics (with a few exceptions such as pandas, DataFrames, or SciPy's sparse matrices, which we shall deal with later). Since NumPy arrays are statically typed and homogenous, fast mathematical operations can be implemented in compiled languages (the default implementation uses C and Fortran). Efficiency (the availability of fast algorithms working on homogeneous arrays) makes NumPy popular and important.

## Array Objects

NumPy provides an N-dimensional array type, the `ndarray`, which describes a collection of “items” of the same type. The items can be indexed using for example N integers.

**All ndarrays are homogeneous.**

An item extracted from an array, e.g., by indexing, is represented by a Python object whose type is one of the array scalar types built in NumPy. The array scalars allow easy manipulation of also more complicated arrangements of data.

## The N-dimensional Array (`ndarray`)

An `ndarray` is a (usually fixed-size) multidimensional container of items of the same type and size. The number of dimensions and items in an array is defined by its `shape`, which is a `tuple` of N non-negative integers that specify the sizes of each dimension. The type of items in the array is specified by a separate data-type object (`dtype`), one of which is associated with each ndarray.

As with other container objects in Python, the contents of an ndarray can be accessed and modified by indexing or slicing the array (using, for example, N integers), and via the methods and attributes of the ndarray.

### Creating Arrays

#### Creating Arrays from List

The simplest way to create an array is using the `array` function:


A 2-dimensional array of size 2 x 3, composed of 4-byte integer elements:

In [3]:
import numpy as np

In [9]:
x = np.array([[1.0, 2, 3], [4, 5, 6]])

In [10]:
type(x)

numpy.ndarray

In [11]:
x.shape

(2, 3)

In [12]:
x.dtype

dtype('float64')

In [12]:
x = np.array(['hello', 'world', 3])

In [13]:
type(x[2])

numpy.str_

The first condition is always true for Python lists and tuples. When creating an array from lists or tuples, the input may consist of different (heterogeneous) data types. The array function, however, will normally cast all input elements into the most suitable data type required for the array. For example, if a list contains both floats and integers, the resulting array will be of type float. If it contains an integer and a boolean, the resulting array will consist of integers.

One of the most handy ways of creating lists, and therefore arrays, of integers is using the `range` funct/ion:

In [14]:
x = range(5)

In [15]:
y = np.array(x)

In [20]:
x = np.arange(10)

In [21]:
x

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

#### Creating random arrays

The `random` module in NumPy provides various functions to create random arrays of any data type. :
- Create random arrays
- Create random permutations of arrays
- Generate arrays with specific probability distributions

For the purposes of this section, we will be focusing on two important functions in the random module-`rand` and `random`.

In [17]:
x = np.random.rand(2, 2, 2, 4)

In [19]:
x.shape

(2, 2, 2, 4)

In [87]:
x.shape

(2, 2, 2, 4)

In [88]:
y = np.random.random((2, 3, 4))

In [89]:
y.shape

(2, 3, 4)

In [90]:
y

array([[[0.33603821, 0.87942418, 0.8595046 , 0.59967352],
        [0.39591039, 0.62590545, 0.48937191, 0.93119028],
        [0.98885106, 0.48437713, 0.50686744, 0.03488253]],

       [[0.43570412, 0.58526035, 0.00675684, 0.93169393],
        [0.51591668, 0.29233314, 0.83111965, 0.27099091],
        [0.58357156, 0.51186415, 0.17128149, 0.76128687]]])

**What is the difference in the 2 above statements?**

To create random arrays of integers you can use `randint` function.

In [3]:
LOW, HIGH = 1, 11 
SIZE = 10
np.random.randint(LOW, HIGH, size=SIZE) 

array([2, 4, 4, 8, 7, 1, 1, 1, 3, 4])

The `randint` function takes three arguments, of which two are optional. The first argument denotes the desired lower limit of the output values, and the second optional argument denotes the (exclusive) upper limit of the output values. The optional `size` argument is a tuple that determines the shape of the output array.

In [45]:
np.random.randint(10) 

4

#### Other arrays

There are a few other array creation functions, such as `zeros()`, `ones()`, `eye()`, and others (similar to the ones in MATLAB) that can be used to create NumPy arrays. Their use is fairly straightforward.

In [94]:
np.zeros((3, 4))

array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

In [95]:
np.ones((3, 5))

array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

In [98]:
np.eye(5)

array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])

In [107]:
np.eye(3, 4)

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.]])

In [108]:
np.eye(3, 4, 2)

array([[0., 0., 1., 0.],
       [0., 0., 0., 1.],
       [0., 0., 0., 0.]])

## Array Data Types

Data types are another important intrinsic aspect of a NumPy array alongside its memory layout and indexing (Will be discussed later). The data type of a NumPy array can be found by simply checking the `dtype` attribute of the array. Try out the following examples to check the data types of different arrays:

In [109]:
x = np.random.random((10, 10)) 

In [111]:
x.dtype

dtype('float64')

In [21]:
x = np.array(range(10)) 

In [22]:
x

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [113]:
x.dtype

dtype('int64')

In [114]:
x = np.array(['hello', 'world']) 

In [116]:
x.dtype

dtype('<U5')

Many array creation functions provide a default array data type. For example, the `np.zeros` and `np.ones` functions create arrays that are full of floats by default. But it is possible to make them create arrays of other data types too. Consider the following examples that demonstrate how to use the dtype argument to create arrays of arbitrary data types.

In [121]:
x = np.ones((3, 3), dtype=int) 

In [122]:
x.dtype

dtype('int64')

In [28]:
x = np.zeros((3, 3))

In [29]:
x.dtype

dtype('float64')

In [125]:
x.dtype

dtype('int32')

For a complete list of data types supported by NumPy, refer to https://numpy.org/doc/stable/user/basics.types.html.

## Hello NumPy: Curving Test Grades Tutorial

This first example introduces a few core concepts in NumPy that you’ll use throughout the rest of the tutorial:

- Creating arrays using `numpy.array()`
- Treating complete arrays like individual values to make vectorized calculations more readable
- Using built-in NumPy functions to modify and aggregate the data


These concepts are the core of using NumPy effectively.

The scenario is this: You’re a teacher who has just graded your students on a recent test. Unfortunately, you may have made the test too challenging, and most of the students did worse than expected. To help everybody out, you’re going to **curve** everyone’s grades.

In [126]:
import numpy as np

CURVE_CENTER = 80
grades = np.array([72, 35, 64, 88, 51, 90, 74, 12])

def curve(grades):
    average = grades.mean()
    change = CURVE_CENTER - average
    new_grades = grades + change
    
    return np.clip(new_grades, grades, 100)

curve(grades)

array([ 91.25,  54.25,  83.25, 100.  ,  70.25, 100.  ,  93.25,  31.25])

The original scores have been increased based on where they were in the pack, but none of them were pushed over 100%.

Here are the important highlights:

- **Line 1** imports NumPy using the `np` alias, which is a common convention that saves you a few keystrokes.
- **Line 3** creates your first NumPy array, which is one-dimensional and has a shape of `(8,)` and a data type of `int64`. Don’t worry too much about these details yet. You’ll explore them in more detail later.
- **Line 5** takes the average of all the scores using [`.mean()`](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.mean.html#numpy.ndarray.mean). Arrays have a lot of [methods](https://numpy.org/doc/stable/reference/arrays.ndarray.html#array-methods).

On line 7, you take advantage of two important concepts at once:

- Vectorization
- Broadcasting

**Vectorization** is the process of performing the same operation in the same way for each element in an array. This removes `for` loops from your code but achieves the same result.

**Broadcasting** is the process of extending two arrays of different shapes and figuring out how to perform a vectorized calculation between them. Remember, grades is an array of numbers of shape `(8,)` and change is a **scalar**, or single number, essentially with shape `(1,)`. In this case, NumPy adds the scalar to each item in the array and returns a new array with the results.

Finally, on line 8, you limit, or **clip**, the values to a set of minimums and maximums.

## Vectorized Operations

In [36]:
x = np.array([1, 2, 3, 4])

In [5]:
x + 2

array([3, 4, 5, 6])

As mentioned, all the elements in the array are added by 1 simultaneously. This is very different from Python or most other programming languages. The elements in a NumPy Array all have the same `dtype`; in the preceding example, this is `numpy.int` (this is either 32 or 64-bit depending on the machine); therefore, NumPy can save time on checking the type of each element at runtime, which, ordinarily, is done by Python. So, just apply these arithmetic operations:

In [35]:
y = np.array([-1, 2, 3, 0]) 

In [138]:
x * y

array([-1,  4,  9,  0])

The result still returns the same shape of NumPy Arrays. A matrix multiplication in NumPy will use `numpy.dot()` or `@` operator. Take a look at this example:

In [146]:
np.dot(x, y)

12

In [148]:
x @ y

12

NumPy also supports logic comparison between two arrays, and the comparison is vectorized as well. The result returns a Boolean, and NumPy Array indicates which element in both arrays is equal. If two different shapes of arrays are compared, the result would only return one `False`, which indicates that the two arrays are different, and would really compare each element:

In [41]:
x

array([1, 2, 3, 4])

In [42]:
y

array([-1,  2,  3,  0])

In [38]:
my_bol = x == y

In [39]:
my_bol

array([False,  True,  True, False])

In [43]:
x[my_bol]

array([2, 3])

In [32]:
my_bol

False

From the preceding examples, we get an insight into NumPy's element-wise operations, but what's the benefit of using them? How can we know that an optimization has been made through these NumPy operations? We will use the `%timeit` to show you the difference between NumPy operations and the Python `for` loop:

In [46]:
x = np.arange(1000)

In [47]:
%timeit x + 1

985 ns ± 16 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


In [48]:
y = range(1000)

In [49]:
%timeit [i+1 for i in y]

48.7 µs ± 1.51 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
