# 3.2 Important NumPy Functions
NumPy is a Python library that is fundamental in many data analysis projects. Many data analysts depend on NumPy so much that it is almost always imported to the code, even when never used. In this reading, you will learn about some of the most important NumPy functions and when you might use them.

## Importing NumPy
NumPy can be imported from Google Colab using `import numpy` and is always imported under the alias `np`.

In [1]:
import numpy as np

## Functions
### `array()`
The most basic function in the NumPy library is the `array()` function, which turns a list into an array.

In [2]:
numbers_list = [42, 99, 12, 6, 15, 9]
numbers_array = np.array(numbers_list)

In [3]:
numbers_array

array([42, 99, 12,  6, 15,  9])

As you can see, in the code above, the `array()` function converted a list of numbers to an array. NumPy arrays are different from Python lists because they can not be modified after they are created and are *vectorized*, meaning that a single mathematical operation can be applied to each item of the array at once.

In [4]:
# This code gives an error because lists are not vectorized
numbers_list / 3

TypeError: unsupported operand type(s) for /: 'list' and 'int'

In [5]:
# This code returns a new array where each item is divided by 3
numbers_array / 3

array([14., 33.,  4.,  2.,  5.,  3.])

The `array()` function is the foundation for the dataframe. Each row and column of a Pandas dataframe is effectively a NumPy array and thus shares the same properties.

### `.reshape()`

The `.reshape()` method can be used with a NumPy array to change its shape. This is extremely useful for machine learning problems, where models like linear regression expect data to be in a specific format (like a column).

For example, the array created above is a single row of data.

In [6]:
numbers_array

array([42, 99, 12,  6, 15,  9])

However, the `.reshape()` method can be used to turn this row of data into a column. The parameters `.reshape(-1, 1) are extremely common in analysis projects for turning a row of data into a column that a machine learning model can use.

In [7]:
# The `.reshape()` method attaches to an array and transforms it
numbers_array.reshape(-1, 1)

array([[42],
       [99],
       [12],
       [ 6],
       [15],
       [ 9]])

### `linspace()`
The `linspace()` function is used to return an array of numbers given a start number, a stop number, and the quantity of numbers to return. For example, to return 101 evenly spaced numbers between 0 and 1, you could use the function below:

In [8]:
np.linspace(0, 1, 101)

array([0.  , 0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1 ,
       0.11, 0.12, 0.13, 0.14, 0.15, 0.16, 0.17, 0.18, 0.19, 0.2 , 0.21,
       0.22, 0.23, 0.24, 0.25, 0.26, 0.27, 0.28, 0.29, 0.3 , 0.31, 0.32,
       0.33, 0.34, 0.35, 0.36, 0.37, 0.38, 0.39, 0.4 , 0.41, 0.42, 0.43,
       0.44, 0.45, 0.46, 0.47, 0.48, 0.49, 0.5 , 0.51, 0.52, 0.53, 0.54,
       0.55, 0.56, 0.57, 0.58, 0.59, 0.6 , 0.61, 0.62, 0.63, 0.64, 0.65,
       0.66, 0.67, 0.68, 0.69, 0.7 , 0.71, 0.72, 0.73, 0.74, 0.75, 0.76,
       0.77, 0.78, 0.79, 0.8 , 0.81, 0.82, 0.83, 0.84, 0.85, 0.86, 0.87,
       0.88, 0.89, 0.9 , 0.91, 0.92, 0.93, 0.94, 0.95, 0.96, 0.97, 0.98,
       0.99, 1.  ])

This code is especially useful for machine learning and graphing. In many cases, graphs are created by running a series of numbers (like the one above) through a model and then graphing the associated output to determine what the relationship between the input and output is.

### `arange()`
The `arange()` function is similar to the `linspace()` function, but allows the user to specify a start number, a stop number, and how big each step should be between the two numbers:

In [9]:
np.arange(0, 1.01, 0.01)

array([0.  , 0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1 ,
       0.11, 0.12, 0.13, 0.14, 0.15, 0.16, 0.17, 0.18, 0.19, 0.2 , 0.21,
       0.22, 0.23, 0.24, 0.25, 0.26, 0.27, 0.28, 0.29, 0.3 , 0.31, 0.32,
       0.33, 0.34, 0.35, 0.36, 0.37, 0.38, 0.39, 0.4 , 0.41, 0.42, 0.43,
       0.44, 0.45, 0.46, 0.47, 0.48, 0.49, 0.5 , 0.51, 0.52, 0.53, 0.54,
       0.55, 0.56, 0.57, 0.58, 0.59, 0.6 , 0.61, 0.62, 0.63, 0.64, 0.65,
       0.66, 0.67, 0.68, 0.69, 0.7 , 0.71, 0.72, 0.73, 0.74, 0.75, 0.76,
       0.77, 0.78, 0.79, 0.8 , 0.81, 0.82, 0.83, 0.84, 0.85, 0.86, 0.87,
       0.88, 0.89, 0.9 , 0.91, 0.92, 0.93, 0.94, 0.95, 0.96, 0.97, 0.98,
       0.99, 1.  ])

### `np.random.random()`
The NumPy function `random.random()` is used to generate a random number between 0 and 1. This can be useful when randomness is needed in an analysis project, especially when testing models to see example output.


In [10]:
np.random.random(100)


array([0.21174322, 0.25370783, 0.02895659, 0.23012285, 0.77649394,
       0.33891929, 0.81440502, 0.88732636, 0.65713479, 0.21799954,
       0.12283967, 0.7612041 , 0.89954428, 0.78359084, 0.1131987 ,
       0.71999101, 0.80731666, 0.49296768, 0.77645471, 0.29323564,
       0.52169537, 0.27171892, 0.35604545, 0.32225843, 0.81019391,
       0.85542592, 0.01706743, 0.66479847, 0.81900848, 0.51915315,
       0.58940671, 0.9380446 , 0.19801117, 0.16668528, 0.56783879,
       0.7541158 , 0.05806208, 0.04669321, 0.89230233, 0.07845533,
       0.2104014 , 0.34242188, 0.58285091, 0.72394758, 0.23049393,
       0.47247268, 0.31646711, 0.13780058, 0.41856421, 0.24144521,
       0.32068005, 0.04582772, 0.69371476, 0.05255505, 0.36474729,
       0.17424254, 0.97818352, 0.92553902, 0.77615127, 0.04317521,
       0.99897577, 0.73265163, 0.33452705, 0.73955145, 0.78149492,
       0.36122528, 0.4291919 , 0.2950051 , 0.69624983, 0.12501256,
       0.93216667, 0.87134762, 0.66177044, 0.40543286, 0.58772

### `argmax()` and `argmin()`
The `argmax()` and `argmin()` functions are used to locate the largest/smallest values in a NumPy array. For example, using the `argmax()` function on the array previous created would return the number `1`, indicating that the value in position 1 is the greatest number.

In [14]:
# The greatest number in the array is in index 1
numbers_array

array([42, 99, 12,  6, 15,  9])

In [15]:
np.argmin(numbers_array)

np.int64(3)

In [16]:
np.argmax(numbers_array)

np.int64(1)

### `where()`
The `where()` function is similar to the `=IF()` formula in Microsoft Excel in that it takes a condition, a value to return if `True`, and a value to return if `False`. This is useful for turning an array of numbers into categories that can be more easily visualized.

For example, the array of numbers from above could be categorized into "big" and "small" numbers with the `where()` function.

In [17]:
np.where(numbers_array > 30, 'big', 'small')

array(['big', 'big', 'small', 'small', 'small', 'small'], dtype='<U5')

## Using NumPy functions with pandas dataframes
The NumPy functions above are used in a variety of contexts, but some of them are especially useful for manipulating dataframes.

For example, let's use the `where()` function to separate the passengers of the Titanic into young, middle-age, and old groups based on their age.

âš  Warning: Remember to upload the Titanic data set to Google Colab before running the cells below.

In [18]:
import pandas as pd
df = pd.read_csv('titanic.csv')
df.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


We can use the `where()` function from NumPy to create a new column on the dataframe that describes whether or not the passengers are young (0-25), middle age (26-50), or old (51-100).

In [19]:
# Like the Excel =IF() formula, two `where()` functions are nested inside of each other to produce a new array
np.where(df['Age'] > 50, 'old', np.where(df['Age'] > 25, 'middle age', 'young'))

array(['young', 'middle age', 'middle age', 'middle age', 'middle age',
       'young', 'old', 'young', 'middle age', 'young', 'young', 'old',
       'young', 'middle age', 'young', 'old', 'young', 'young',
       'middle age', 'young', 'middle age', 'middle age', 'young',
       'middle age', 'young', 'middle age', 'young', 'young', 'young',
       'young', 'middle age', 'young', 'young', 'old', 'middle age',
       'middle age', 'young', 'young', 'young', 'young', 'middle age',
       'middle age', 'young', 'young', 'young', 'young', 'young', 'young',
       'young', 'young', 'young', 'young', 'middle age', 'middle age',
       'old', 'young', 'young', 'middle age', 'young', 'young', 'young',
       'middle age', 'middle age', 'young', 'young', 'young',
       'middle age', 'young', 'young', 'middle age', 'middle age',
       'young', 'young', 'middle age', 'middle age', 'young', 'young',
       'young', 'young', 'middle age', 'young', 'middle age', 'young',
       'middle age', 'you

In [20]:
# This new array can be added as a new column to the dataframe
df['Age Group'] = np.where(df['Age'] > 50, 'old', np.where(df['Age'] > 25, 'middle age', 'young'))
df.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,Age Group
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S,young
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C,middle age
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S,middle age
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S,middle age
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S,middle age


This has been a short reading about important NumPy functions. While many, many other functions are available through NumPy, each one has a specific use case and can be learned as needed. Refer to the [NumPy user guide](https://numpy.org/doc/stable/user/index.html) as needed or just ask ChatGPT for help using different functions in the library to solve data analytics problems as they arise.

In [21]:
df.shape[1]

13