# Section 1: Introduction to NumPy
* NumPy allows us to work with large sets of numbers and calculate statistical properties
* Some uses for NumPy include effficiently working with many numbers, generating random numbers, and performing numerical functions
* A NumPy array is a special list, because it can be used to do efficient computations



In [None]:
import numpy as np
my_array = np.array([1,2,3])
# Here's how to convert a regular list to a NumPy array:
my_list = [1,2,3]
my_array = np.array(my_list)

* Typically, you won't enter data directly, instead you'd be importing it
* NumPy allows you to import from a csv file using the `np.genfromtxt()` function

In [None]:
# Consider a CSV file with this data: 34, 9, 12
# Here is how we'd import it to NumPy:
csv_array = np.genfromtxt("sample.txt", delimiter=",")

* In a normal python list, you'd have to iterate through it to do operations
* With NumPy, you can easily perform operations on an entire list:

In [None]:
a = np.array([1,2,3])
a+3

array([4, 5, 6])

* Assuming two arrays have the same number of elements, you can also add or subtract arrays relative to eachother

In [None]:
a = np.array([1,2,3])
b = np.array([4,5,6])
a+b

array([5, 7, 9])

* Similarly to two dimensional lists, you can create two dimensional arrays
* To select elements from a 1-D array, use the same method as a python list
* The same applies to 2-D arrays
* To select an entire row or column, use `:`
* A new concept in NumPy is performing logical operations on arrays:


In [None]:
a = np.array([1,2,3,4])
print(a[(a>1) & (a<4)])

[2 3]


# Section 2: Statistics in NumPy
* Most statistical operators in NumPy are very self explanatory: `np.mean`, `np.median`, etc
* You can use `np.mean` to calculate the percentage of array elements that have a certain property
* The following code will find the percentage of the array that's above 8:

In [14]:
a = np.array([1,2,3,4,5,6,7,8,9,10])
np.mean(a>8)

0.2

* If we have a 2-D array, `np.mean` can calculate the means of the full array, as well as interior values

In [20]:
a = np.array([[1, 0, 0],
              [0, 0, 1],
              [1, 0, 1]])
print(np.mean(a))
print(np.mean(a, axis=0)) # This will find the mean of the columns
print(np.mean(a, axis=1)) # This will find the mean of the rows

0.4444444444444444
[0.66666667 0.         0.66666667]
[0.33333333 0.33333333 0.66666667]


* `np.sort` will sort data from smallest to largest
* A big concept in NumPy involves percentiles and quantiles
* A percentile is the point where n% of samples are below
* You can calculate these with `np.percentile`
* NumPy has several options for generating random numbers: `np.random.normal` is one of them
* It takes the following arguments: `loc`: the mean, `scale`: the standard deviation, `size`: the number of numbers to generate
* `np.random.binomial` provides a binomial distribution, which can be used for probability
* It takes the following arguments: `N`: number of samples, `P`: probability of success, `size`: number of experiments
* By pairing this with the `np.mean` method from earlier, you can calculate probabilities
* Let's take the example of a basketball player who normally shoots 80% on free throws. Tonight he shot 10 and made 9. What's the probability of this?

In [21]:
sample = np.random.binomial(10,0.8,10000) # He shot 10, normally shoots 80%, and we can run 10,000 tests
np.mean(sample == 9)

0.269

* The result was a 26.9% chance of him making 9 out of 10
