# Concept Set 6
## Topic: NumPy and Pandas

by Joe Ilagan

This concept set is meant for people who have no experience with programming at all. If you have experience with programming, you may ignore the concept sets that tackle topics you've mastered.

### 6.1: NumPy and Pandas

NumPy (henceforth `numpy`) and Pandas (henceforth `pandas`) are two of the most important libraries in Python for working with data.  

They are not standard libraries. This means that you have to install them through the Anaconda navigator.  

They come pre-installed in Google Colab environments.  

#### numpy

Source: https://numpy.org/doc/stable/user/whatisnumpy.html

`numpy` is "the fundamental package for scientific computing in Python." Many other scientific computing packages have `numpy` as a dependency because of its `array` object.  

The `numpy array` is similar to a vanilla Python list: it is a multi-dimensional, ordered collection of values. However, it differs from lists in some critical ways:  

1. Arrays in `numpy` are of a fixed size. Lists in Python are dynamically sized.  
2. Arrays can only contain a single data type. Lists in Python can contain many different data types.  

Given these restrictions, why use arrays over lists? When they can be used, arrays are _significantly_ faster than lists. The authors of the `numpy` package made use of several optimizations (such as a C implementation and parallelization) to achieve this performance. The performance difference between arrays and lists is so ridiculously large that Python would simply not be competitive as a data science language without `numpy`.  

#### pandas

Source: https://pandas.pydata.org/about/index.html  

`pandas` is a popular package for data analysis in Python. The focal point of `pandas` is the `DataFrame` object, which you may think of as an object that manages tabular data.  

The `DataFrame` object is a flexible, fast, and powerful way to transform tabular data. Though it has advanced features that are difficult to master, its basic interface is easy to use. For example, if we were to read a CSV file in `pandas`, we would simply need to load it into a `DataFrame` as such:  

`import pandas as pd`  
`df = pd.read_csv(file_path)`  

`pandas` does have some limitations:  
1. It is not suitable for "big data", which is any data that you need more than one computer to process properly.  
2. It is a large library (in terms of file size). If you only need to analyze simple and small data, you are probably better off using normal file-reading techniques.

Regardless, `pandas` remains a favorite of data scientists today. It is an excellent library for non-trivial data analysis.  

### Checkpoint

Install `numpy` and `pandas`. If you're using Anaconda, you can do this from the navigator. Make sure the cell below runs with no errors.

In [None]:
import numpy as np
import pandas as pd

### 6.2: NumPy

Source: https://numpy.org/doc/stable/user/absolute_beginners.html  

You will likely not use `numpy` itself very often. However, it is still important to know how it works, because almost all other scientific computing packages in Python are based on `numpy`.  

#### Array basics

Arrays are `numpy`'s basic data structure. As a beginner, it may be helpful not to think of them as "more rigid lists." Instead, think of them as matrices of numbers.  

You can create a trivial array from a Python list or range as such:  

In [14]:
list_1d = [1, 2, 3, 4, 5]

array_1d = np.array(list_1d)

array_1d

array([1, 2, 3, 4, 5])

In [15]:
list_2d = [
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
]

array_2d = np.array(list_2d)

array_2d

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

You can access elements in multi-dimensional arrays through index chaining.    

In [16]:
array_2d[1][0]

4

#### `numpy` methods to generate arrays

##### `np.array`

`np.array` simply generates an array based on a Python list or array that you pass it.  

##### `np.zeros`

`np.zeros` generates 

### 6.3: Pandas

In [2]:
import numpy as np

a = np.array(range(10))
a

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])