# Introduction to Numpy for Data Science

<img src="Image/numpy.png" width="300" height="500">

## What is Numpy?

Numpy is the fundamental package for scientific computing in Python. It has powerful methods to conduct calculations in Linear Algebra or Fourier Analysis if you are interested.

Remember from section2 that most data can be stored as lists in Python. For example, we can store the students' test scores as list:

```Python
scores = [90, 81, 85, 100, 76]
```

Also, remember that the column in a dataframe can be seen as a list:

```Python
df['Scores']
```
This is the same as obtaining a list of all the values in `'Scores'` column in a dataframe called `df`

Therefore, Numpy is frequently used with dataframes to obtain mean, variance and standard deviation of a specific column in a dataframe.

We are interested in figuring out the 

We will be using Numpy to compute these things from lists:

- Mean
- Variance
- Standard Deviation
- Maximum
- Minimum
- Median
- Interquartile Range

While this course will only focus the basics of Numpy, it is one of the most commonly used packages in Python. For students who want to have a better understanding of Numpy should refer to the following documentations:

- https://www.w3schools.com/python/numpy_intro.asp
- https://numpy.org/devdocs/user/quickstart.html
- https://www.tutorialspoint.com/numpy/index.htm

## 0. Import package

As with matplotlib and pandas, you should ALWAYS start with importing numpy package

In [2]:
import numpy as np

## 1. Mean

You can compute mean by using:

```Python
np.mean(a)
```
where `a` represents the list

In [3]:
scores = [90, 81, 85, 100, 76]
mean = np.mean(scores)

print("The average score of students is:", mean)

The average score of students is: 86.4


## 2. Variance

The variance can be computed by using:

```Python
np.var(a)
```
where `a` represents the list

In [5]:
scores = [90, 81, 85, 100, 76]
variance = np.var(scores)

print("The variance is", variance)

The variance is 67.44


## 3. Standard Deviation

The standard deviation can be computed by using:

```Python
np.std(a)
```
where `a` represents the list

In [6]:
scores = [90, 81, 85, 100, 76]
sd = np.std(scores)

print("The standard deviation is", sd)

The standard deviation is 8.212186067059124


Notice that the standard deviation can be calculated as:

$$
standard~deviation = \sqrt{variance}
$$

Let's check this:

```Python
np.sqrt(a)
```
can be used to compute the square root of `a`

In [9]:
scores = [90, 81, 85, 100, 76]
variance = np.var(scores)
std1 = np.sqrt(variance)

std2 = np.std(scores)

print("The standard deviation from variance is:", std1)
print("The standard deviation from numpy is:", std2)

The standard deviation from variance is: 8.212186067059124
The standard deviation from numpy is: 8.212186067059124


Notice that this is the same. This is one of the most important formulas that you have to know in statistics.

## 4. Maximum

Maximum value of a list can be computed by:

```Python
np.max(a)
```
where `a` represents a list

In [10]:
scores = [90, 81, 85, 100, 76]
maximum = np.max(scores)
print("The maximum is", maximum)

The maximum is 100


This is exactly the same as
```Python
max()
```
that we did before.

In [11]:
scores = [90, 81, 85, 100, 76]
maximum = max(scores)
print("The maximum is", maximum)

The maximum is 100


## 5. Minimum

Minimum value of a list can be computed by:
```Python
np.min(a)
```
where `a` represents a list

In [12]:
scores = [90, 81, 85, 100, 76]
minimum = np.min(scores)
print("The minimum is", minimum)

The minimum is 76


This is exactly the same as:

```Python
min()
```
that we did before

In [13]:
scores = [90, 81, 85, 100, 76]
minimum = min(scores)
print("The minimum is", minimum)

The minimum is 76


## 6. Median



Median can be computed by using:
```Python
np.median()
```

In [3]:
numbers = [1, 2, 3, 4, 5]
median = np.mean(numbers)
print("The median is", median)

The median is 3.0


## 7. Interquartile Range

```Python
np.percentile(x, q, interpolation='midpoint')
```
Returns the `q`th percentile of the data from list `x`

Remember that:

interquartile range = the third quartile - the first quartile

In [9]:
numbers = [1, 2, 3, 4]
q1 = np.percentile(numbers, 25, interpolation='midpoint')
q3 = np.percentile(numbers, 75, interpolation='midpoint')

print("The first quartile is", q1)
print("The second quartile is", q3)

print("The interquartile range is", q3-q1)

The first quartile is 1.5
The second quartile is 3.5
The interquartile range is 2.0


There are many useful things in Numpy, but we will only be focusing on these 7 functions in Numpy for this lab course.