# Introduction

The `NumPy` (Numerical Python) library first appeared in 2006 and is the preferred
Python array implementation. It offers a high-performance,
richly functional ndimensional
array type called `ndarray`, which from this point forward we’ll refer to by
its synonym, array. NumPy is one of the many opensource
libraries that the
Anaconda Python distribution installs. Operations on arrays are up to two orders of magnitude faster than those on lists.

# Creating `arrays` from Existing Data 

The NumPy documentation recommends importing the numpy module as np so that
you can access its members with "np."

In [1]:
import numpy as np

The numpy module provides various functions for creating arrays. Here we use the
array function, which receives as an argument an array or other collection of
elements and returns a new array containing the argument’s elements.

In [None]:
numbers = np.array([2, 3, 5, 7, 11])

In [None]:
type(numbers)

In [None]:
numbers

### Multidimensional Arguments

The array function copies its argument’s dimensions.

In [None]:
np.array([[1, 2, 3], [4, 5, 6]])

#  `array` Attributes 

An array object provides attributes that enable you to discover information about its
structure and contents.

In [None]:
import numpy as np

In [None]:
integers = np.array([[1, 2, 3], [4, 5, 6]])

In [None]:
integers

In [None]:
floats = np.array([0.0, 0.1, 0.2, 0.3, 0.4])

In [None]:
floats

### Determining an `array`’s Element Type

The array function determines an array’s element type from its argument’s elements.
You can check the element type with an array’s `dtype` attribute.

In [None]:
integers.dtype

In [None]:
floats.dtype

### Determining an `array`’s Dimensions

The attribute `ndim` contains an array’s number of dimensions and the attribute
`shape` contains a tuple specifying an array’s dimensions.

In [None]:
integers.ndim

In [None]:
floats.ndim

In [None]:
integers.shape

In [None]:
floats.shape

### Determining an `array`’s Number of Elements and Element Size

You can view an array’s total number of elements with the attribute `size` and the
number of bytes required to store each element with `itemsize`.

In [None]:
integers.size

In [None]:
integers.itemsize

In [None]:
floats.size

In [None]:
floats.itemsize

### Iterating through a Multidimensional `array`’s Elements

In [None]:
for row in integers:
    for column in row:
        print(column, end='  ')
    print() 

In [None]:
for i in integers.flat:
    print(i, end='  ')

#  Filling `array`s with Specific Values

NumPy provides functions `zeros`, `ones` and `full` for creating arrays containing 0s,
1s or a specified value, respectively. By default, zeros and ones create arrays
containing float64 values. We’ll show how to customize the element type
momentarily. The first argument to these functions must be an integer or a tuple of
integers specifying the desired dimensions. For an integer, each function returns a one-dimensional
array with the specified number of elements.

In [None]:
import numpy as np

In [None]:
np.zeros(5)

In [None]:
np.ones((2, 4), dtype=int)

In [None]:
np.full((3, 5), 13)

#  Creating `array`s from Ranges 

NumPy provides optimized functions for creating arrays from ranges.

### Creating Integer Ranges with `arange`

Let’s use NumPy’s `arange` function to create integer ranges—similar to using builtin
function range. In each case, arange first determines the resulting array’s number
of elements, allocates the memory, then stores the specified range of values in the
array.

In [None]:
import numpy as np

In [None]:
np.arange(5)

In [None]:
np.arange(5, 10)

In [None]:
np.arange(10, 1, -2)

Though you can create arrays by passing ranges as arguments, always use arange
as it’s optimized for arrays.

### Creating Floating-Point Ranges with `linspace` 

You can produce evenly spaced floating-point
ranges with NumPy’s `linspace`
function. The function’s first two arguments specify the starting and ending values in
the range, and the ending value is included in the array. The optional keyword
argument num specifies the number of evenly spaced values to produce—this
argument’s default value is 50.

In [None]:
np.linspace(0.0, 1.0, num=5)

### Reshaping an `array` 

You also can create an array from a range of elements, then use array method
`reshape` to transform the one-dimensional
array into a multi-dimensional array. 

In [None]:
np.arange(1, 21).reshape(4, 5)

### Displaying Large `array`s 

When displaying an array, if there are 1000 items or more, NumPy drops the middle
rows, columns or both from the output.

In [None]:
np.arange(1, 100001).reshape(4, 25000)

In [None]:
np.arange(1, 100001).reshape(100, 1000)

#  List vs. `array` Performance: Introducing `%timeit` 

Most array operations execute significantly faster than corresponding list operations.

### Timing the Creation of a List Containing Results of 6,000,000 Die Rolls 

In [None]:
import random

In [None]:
%timeit rolls_list = \
   [random.randrange(1, 7) for i in range(0, 6_000_000)]

### Timing the Creation of an `array` Containing Results of 6,000,000 Die Rolls  

In [None]:
import numpy as np

In [None]:
%timeit rolls_array = np.random.randint(1, 7, 6_000_000)

### 60,000,000 and 600,000,000 Die Rolls  

In [None]:
%timeit rolls_array = np.random.randint(1, 7, 60_000_000)

In [None]:
%timeit rolls_array = np.random.randint(1, 7, 600_000_000)

### Customizing the %timeit Iterations  

In [None]:
%timeit -n3 -r2 rolls_array = np.random.randint(1, 7, 6_000_000)

#  `array` Operators

### Arithmetic Operations with `array`s and Individual Numeric Values

In [None]:
import numpy as np

In [None]:
numbers = np.arange(1, 6)

In [None]:
numbers

In [None]:
numbers * 2

In [None]:
numbers ** 3

In [None]:
numbers  # numbers is unchanged by the arithmetic operators

In [None]:
numbers += 10

In [None]:
numbers

### Broadcasting 

Normally, the arithmetic operations require as operands two arrays of the same size
and shape. When one operand is a single value, called a scalar, NumPy performs the
element-wise
calculations as if the scalar were an array of the same shape as the other
operand, but with the scalar value in all its elements. This is called **broadcasting**.


### Arithmetic Operations Between `array`s 

In [None]:
numbers2 = np.linspace(1.1, 5.5, 5)

In [None]:
numbers2

In [None]:
numbers * numbers2

### Comparing arrays

In [None]:
numbers

In [None]:
numbers >= 13

In [None]:
numbers2

In [None]:
numbers2 < numbers

In [None]:
numbers == numbers2

In [None]:
numbers == numbers

#  NumPy Calculation Methods

In [None]:
import numpy as np

In [None]:
grades = np.array([[87, 96, 70], [100, 87, 90],
                   [94, 77, 90], [100, 81, 82]])

In [None]:
grades

In [None]:
grades.sum()

In [None]:
grades.min()

In [None]:
grades.max()

In [None]:
grades.mean()

In [None]:
grades.std()

In [None]:
grades.var()

### Calculations by Row or Column

In [None]:
grades.mean(axis=0)

In [None]:
grades.mean(axis=1)

###  Universal Functions

NumPy offers dozens of standalone **universal functions **(or **ufuncs**) that perform
various element-wise
operations. Each performs its task using one or two array or
array-like
(such as lists) arguments. Some of these functions are called when you use
operators like + and * on arrays. Each returns a new array containing the results.

In [None]:
import numpy as np

In [None]:
numbers = np.array([1, 4, 9, 16, 25, 36])

In [None]:
np.sqrt(numbers)

In [None]:
numbers2 = np.arange(1, 7) * 10

In [None]:
numbers2

In [None]:
np.add(numbers, numbers2)

### Broadcasting with Universal Functions

In [None]:
np.multiply(numbers2, 5)

In [None]:
numbers3 = numbers2.reshape(2, 3)

In [None]:
numbers3

In [None]:
numbers4 = np.array([2, 4, 6])

In [None]:
np.multiply(numbers3, numbers4)

### Other Universal Functions

Refer to https://numpy.org/doc/stable/reference/ufuncs.html

#  Indexing and Slicing 

### Indexing with Two-Dimensional `array`s

In [None]:
import numpy as np

In [None]:
grades = np.array([[87, 96, 70], [100, 87, 90],
                   [94, 77, 90], [100, 81, 82]])

In [None]:
grades

In [None]:
grades[0, 1]  # row 0, column 1

### Selecting a Subset of a Two-Dimensional `array`’s Rows

In [None]:
grades[1]

In [None]:
grades[0:2]

In [None]:
grades[[1, 3]]

### Selecting a Subset of a Two-Dimensional `array`’s Columns

In [None]:
grades[:, 0]

In [None]:
grades[:, 1:3]

In [None]:
grades[:, [0, 2]]

#  Views: Shallow Copies

In [None]:
import numpy as np

In [None]:
numbers = np.arange(1, 6)

In [None]:
numbers

In [None]:
numbers2 = numbers.view()

In [None]:
numbers2

In [None]:
id(numbers)

In [None]:
id(numbers2)

In [None]:
numbers[1] *= 10

In [None]:
numbers2

In [None]:
numbers

In [None]:
numbers2[1] /= 10

In [None]:
numbers

In [None]:
numbers2

### Slice Views

In [None]:
numbers2 = numbers[0:3]

In [None]:
numbers2

In [None]:
id(numbers)

In [None]:
id(numbers2)

In [None]:
numbers2[3]

In [None]:
numbers[1] *= 20

In [None]:
numbers

In [None]:
numbers2

#  Deep Copies

In [None]:
import numpy as np

In [None]:
numbers = np.arange(1, 6)

In [None]:
numbers

In [None]:
numbers2 = numbers.copy()

In [None]:
numbers2

In [None]:
numbers[1] *= 10

In [None]:
numbers

In [None]:
numbers2

### Module `copy`—Shallow vs. Deep Copies for Other Types of Python Objects

If you need deep copies of other
types of Python objects, pass them to the `copy` module’s `deepcopy` function.

#  Reshaping and Transposing 

NumPy provides various other ways to reshape arrays.

### `reshape` vs. `resize` 

In [None]:
import numpy as np

In [None]:
grades = np.array([[87, 96, 70], [100, 87, 90]])

In [None]:
grades

The array methods reshape and resize both enable you to change an array’s
dimensions. Method reshape returns a view (shallow copy) of the original array with
the new dimensions. It does not modify the original array.

In [None]:
grades.reshape(1, 6)

In [None]:
grades

Method resize modifies the original array’s shape.

In [None]:
grades.resize(1, 6)

In [None]:
grades

### `flatten` vs. `ravel` 

In [None]:
grades = np.array([[87, 96, 70], [100, 87, 90]])

In [None]:
grades

You can take a multidimensional array and flatten it into a single dimension with the
methods flatten and ravel. Method flatten deep copies the original array’s data.

In [None]:
flattened = grades.flatten()

In [None]:
flattened

In [None]:
grades

In [None]:
flattened[0] = 100

In [None]:
flattened

In [None]:
grades

Method ravel produces a view of the original array, which shares the grades
array’s data.

In [None]:
raveled = grades.ravel()

In [None]:
raveled

In [None]:
grades

In [None]:
raveled[0] = 100

In [None]:
raveled

In [None]:
grades

### Transposing Rows and Columns

You can quickly transpose an array’s rows and columns—that is “flip” the array, so
the rows become the columns and the columns become the rows. The T attribute
returns a transposed view (shallow copy) of the array.

In [None]:
grades.T

In [None]:
grades

### Horizontal and Vertical Stacking

In [None]:
grades2 = np.array([[94, 77, 90], [100, 81, 82]])

In [None]:
np.hstack((grades, grades2))

In [None]:
np.vstack((grades, grades2))

#  Pandas `Series` and xxx

NumPy’s array is optimized for homogeneous numeric data that’s accessed via integer
indices. Data science presents unique demands for which more customized data
structures are required. Big data applications must support mixed data types,
customized indexing, missing data, data that’s not structured consistently and data that needs to be manipulated into forms appropriate for the databases and data analysis
packages you use.

Pandas is the most popular library for dealing with such data. It provides two key
collections—`Series` for one-dimensional
collections and
`DataFrames` for two-dimensional
collections. You can use pandas’ `MultiIndex` to
manipulate multi-dimensional
data in the context of Series and DataFrames.

## pandas `Series` 

A **Series** is an enhanced one-dimensional
array. Whereas arrays use only zero-based
integer indices, Series support custom indexing, including even non-integer
indices like strings. Series also offer additional capabilities that make them more convenient for many datascience
oriented tasks. For example, Series may have
missing data, and many Series operations ignore missing data by default.

### Creating a `Series` with Default Indices

By default, a Series has integer indices numbered sequentially from 0.

In [2]:
import pandas as pd

In [3]:
grades = pd.Series([87, 100, 94])

### Displaying a `Series`

Pandas displays a Series in twocolumn
format with the indices left aligned in the left
column and the values right aligned in the right column. After listing the Series
elements, pandas shows the data type (dtype) of the underlying array’s elements.

In [4]:
grades

0     87
1    100
2     94
dtype: int64

### Creating a `Series` with All Elements Having the Same Value

In [5]:
pd.Series(98.6, range(3))

0    98.6
1    98.6
2    98.6
dtype: float64

### Accessing a `Series`’ Elements

In [6]:
grades[0]

87

### Producing Descriptive Statistics for a `Series`

In [7]:
grades.count()

3

In [8]:
grades.mean()

93.66666666666667

In [9]:
grades.min()

87

In [10]:
grades.max()

100

In [11]:
grades.std()

6.506407098647712

In [12]:
grades.describe()

count      3.000000
mean      93.666667
std        6.506407
min       87.000000
25%       90.500000
50%       94.000000
75%       97.000000
max      100.000000
dtype: float64

### Creating a `Series` with Custom Indices

You can specify custom indices with the `index` keyword argument.

In [13]:
grades = pd.Series([87, 100, 94], index=['Wally', 'Eva', 'Sam'])

In [14]:
grades

Wally     87
Eva      100
Sam       94
dtype: int64

### Dictionary Initializers

If you initialize a Series with a dictionary, its keys become the Series’ indices, and
its values become the Series’ element values.

In [15]:
grades = pd.Series({'Wally': 87, 'Eva': 100, 'Sam': 94})

In [16]:
grades

Wally     87
Eva      100
Sam       94
dtype: int64

### Accessing a `Series`’ Elements Via Custom Indices

In [17]:
grades['Eva']

100

If the custom indices are strings that could represent valid Python identifiers, pandas
automatically adds them to the Series as attributes that you can access via a dot (.).

In [18]:
grades.Wally

87

In [19]:
grades.dtype

dtype('int64')

In [20]:
grades.values

array([ 87, 100,  94], dtype=int64)

### Creating a Series of Strings 

If a Series contains strings, you can use its `str` attribute to call string methods on
the elements.

In [21]:
hardware = pd.Series(['Hammer', 'Saw', 'Wrench'])

In [22]:
hardware

0    Hammer
1       Saw
2    Wrench
dtype: object

In [23]:
hardware.str.contains('a')

0     True
1     True
2    False
dtype: bool

In [24]:
hardware.str.upper()

0    HAMMER
1       SAW
2    WRENCH
dtype: object

#  `DataFrame`s 

A `DataFrame` is an enhanced two-dimensional
array. Like Series, DataFrames can
have custom row and column indices, and offer additional operations and capabilities
that make them more convenient for many data-science
oriented tasks. DataFrames
also support missing data. Each column in a DataFrame is a Series. The Series
representing each column may contain different element types.

### Creating a `DataFrame` from a Dictionary

In [25]:
import pandas as pd

In [26]:
grades_dict = {'Wally': [87, 96, 70], 'Eva': [100, 87, 90],
               'Sam': [94, 77, 90], 'Katie': [100, 81, 82],
               'Bob': [83, 65, 85]}

In [27]:
grades = pd.DataFrame(grades_dict)

In [28]:
grades

Unnamed: 0,Wally,Eva,Sam,Katie,Bob
0,87,100,94,100,83
1,96,87,77,81,65
2,70,90,90,82,85


### Customizing a `DataFrame`’s Indices with the `index` Attribute 

```python
pd.DataFrame(grades_dict, index=['Test1', 'Test2', 'Test3'])
```

In [29]:
grades.index = ['Test1', 'Test2', 'Test3']

In [30]:
grades

Unnamed: 0,Wally,Eva,Sam,Katie,Bob
Test1,87,100,94,100,83
Test2,96,87,77,81,65
Test3,70,90,90,82,85


### Accessing a `DataFrame`’s Columns 

In [31]:
grades['Eva']

Test1    100
Test2     87
Test3     90
Name: Eva, dtype: int64

In [32]:
grades.Sam

Test1    94
Test2    77
Test3    90
Name: Sam, dtype: int64

### Selecting Rows via the `loc` and `iloc` Attributes

In [33]:
grades.loc['Test1']

Wally     87
Eva      100
Sam       94
Katie    100
Bob       83
Name: Test1, dtype: int64

In [34]:
grades.iloc[1]

Wally    96
Eva      87
Sam      77
Katie    81
Bob      65
Name: Test2, dtype: int64

### Selecting Rows via Slices and Lists with the `loc` and `iloc` Attributes

In [35]:
grades.loc['Test1':'Test3']

Unnamed: 0,Wally,Eva,Sam,Katie,Bob
Test1,87,100,94,100,83
Test2,96,87,77,81,65
Test3,70,90,90,82,85


In [36]:
grades.iloc[0:2]

Unnamed: 0,Wally,Eva,Sam,Katie,Bob
Test1,87,100,94,100,83
Test2,96,87,77,81,65


In [37]:
grades.loc[['Test1', 'Test3']]

Unnamed: 0,Wally,Eva,Sam,Katie,Bob
Test1,87,100,94,100,83
Test3,70,90,90,82,85


In [38]:
grades.iloc[[0, 2]]

Unnamed: 0,Wally,Eva,Sam,Katie,Bob
Test1,87,100,94,100,83
Test3,70,90,90,82,85


### Selecting Subsets of the Rows and Columns 

In [39]:
grades.loc['Test1':'Test2', ['Eva', 'Katie']]

Unnamed: 0,Eva,Katie
Test1,100,100
Test2,87,81


In [None]:
grades.iloc[[0, 2], 0:3]

### Boolean Indexing

In [40]:
grades[grades >= 90]

Unnamed: 0,Wally,Eva,Sam,Katie,Bob
Test1,,100.0,94.0,100.0,
Test2,96.0,,,,
Test3,,90.0,90.0,,


In [41]:
grades[(grades >= 80) & (grades < 90)]

Unnamed: 0,Wally,Eva,Sam,Katie,Bob
Test1,87.0,,,,83.0
Test2,,87.0,,81.0,
Test3,,,,82.0,85.0


### Accessing a Specific `DataFrame` Cell by Row and Column

In [42]:
grades.at['Test2', 'Eva']

87

In [43]:
grades.iat[2, 0]

70

In [44]:
grades.at['Test2', 'Eva'] = 100

In [45]:
grades.at['Test2', 'Eva']

100

In [46]:
grades.iat[1, 1] = 87

In [47]:
grades.iat[1, 1]

87

### Descriptive Statistics

In [48]:
grades.describe()

Unnamed: 0,Wally,Eva,Sam,Katie,Bob
count,3.0,3.0,3.0,3.0,3.0
mean,84.333333,92.333333,87.0,87.666667,77.666667
std,13.203535,6.806859,8.888194,10.692677,11.015141
min,70.0,87.0,77.0,81.0,65.0
25%,78.5,88.5,83.5,81.5,74.0
50%,87.0,90.0,90.0,82.0,83.0
75%,91.5,95.0,92.0,91.0,84.0
max,96.0,100.0,94.0,100.0,85.0


In [50]:
pd.set_option('precision', 2)

OptionError: Pattern matched multiple keys

In [51]:
grades.describe()

Unnamed: 0,Wally,Eva,Sam,Katie,Bob
count,3.0,3.0,3.0,3.0,3.0
mean,84.333333,92.333333,87.0,87.666667,77.666667
std,13.203535,6.806859,8.888194,10.692677,11.015141
min,70.0,87.0,77.0,81.0,65.0
25%,78.5,88.5,83.5,81.5,74.0
50%,87.0,90.0,90.0,82.0,83.0
75%,91.5,95.0,92.0,91.0,84.0
max,96.0,100.0,94.0,100.0,85.0


In [52]:
grades.mean()

Wally    84.333333
Eva      92.333333
Sam      87.000000
Katie    87.666667
Bob      77.666667
dtype: float64

### Transposing the `DataFrame` with the `T` Attribute

In [None]:
grades.T

In [53]:
grades.T.describe()

Unnamed: 0,Test1,Test2,Test3
count,5.0,5.0,5.0
mean,92.8,81.2,83.4
std,7.661593,11.54123,8.234076
min,83.0,65.0,70.0
25%,87.0,77.0,82.0
50%,94.0,81.0,85.0
75%,100.0,87.0,90.0
max,100.0,96.0,90.0


In [54]:
grades.T.mean()

Test1    92.8
Test2    81.2
Test3    83.4
dtype: float64

### Sorting By Rows by Their Indices

In [55]:
grades.sort_index(ascending=False)

Unnamed: 0,Wally,Eva,Sam,Katie,Bob
Test3,70,90,90,82,85
Test2,96,87,77,81,65
Test1,87,100,94,100,83


### Sorting By Column Indices

In [56]:
grades.sort_index(axis=1)

Unnamed: 0,Bob,Eva,Katie,Sam,Wally
Test1,83,100,100,94,87
Test2,65,87,81,77,96
Test3,85,90,82,90,70


### Sorting By Column Values

In [None]:
grades.sort_values(by='Test1', axis=1, ascending=False)

In [None]:
grades.T.sort_values(by='Test1', ascending=False)

In [None]:
grades.loc['Test1'].sort_values(ascending=False)

### Copy vs. In-Place Sorting

By default the sort_index and sort_values return a copy of the original
DataFrame, which could require substantial memory in a big data application. You can
sort the DataFrame in place, rather than copying the data. To do so, pass the keyword
argument inplace=True to either sort_index or sort_values.
We’ve shown many pandas Series and DataFrame features. In the next chapter’s
Intro to Data Science section, we’ll use Series and DataFrames for data munging—
cleaning and preparing data for use in your database or analytics software.