# Pandas 
Pandas is a very popular library for working with data (its goal is to be the most powerful and flexible open-source tool, and in our opinion, it has reached that goal). DataFrames are at the center of pandas. A DataFrame is structured like a table or spreadsheet. The rows and the columns both have indexes, and you can perform operations on rows or columns separately.

A pandas DataFrame can be easily changed and manipulated. Pandas has helpful functions for handling missing data, performing operations on columns and rows, and transforming data. If that wasn’t enough, a lot of SQL functions have counterparts in pandas, such as join, merge, filter by, and group by. With all of these powerful tools, it should come as no surprise that pandas is very popular among data scientists.

## Installation 
```bash
pip install pandas
```

After the installation we can import it as follows:
```python
import pandas as pd
```

## Pandas Series and Dataframes 
Just as the ndarray is the foundation of the NumPy library, the Series is the core object of the pandas library. A pandas Series is very similar to a one-dimensional NumPy array, but it has additional functionality that allows values in the Series to be indexed using labels. A NumPy array does not have the flexibility to do this. This labeling is useful when you are storing pieces of data that have other data associated with them. Say you want to store the ages of students in an online course to eventually figure out the average student age. If stored in a NumPy array, you could only access these ages with the internal ndarray indices 0,1,2.... With a Series object, the indices of values are set to 0,1,2... by default, but you can customize the indices to be other values such as student names so an age can be accessed using a name. Customized indices of a Series are established by sending values into the Series constructor, as you will see below.

A Series holds items of any one data type and can be created by sending in a scalar value, Python list, dictionary, or ndarray as a parameter to the pandas Series constructor. If a dictionary is sent in, the keys may be used as the indices.


In [4]:
import numpy as np
import pandas as pd

ages = np.array([13, 24, 18])
series1 = pd.Series(ages)
print(series1)

0    13
1    24
2    18
dtype: int64


We can customize the index of the Series by specifying it on the constructor

In [5]:
series2 = pd.Series(ages, index=["Emma", "Marco", "Sarah"])
print(series2)

Emma     13
Marco    24
Sarah    18
dtype: int64


## Dataframes 
This object is similar in form to a matrix as it consists of rows and columns. Both rows and columns can be indexed with integers or String names. One DataFrame can contain many different types of data types, but within a column, everything has to be the same data type. A column of a DataFrame is essentially a Series. All columns must have the same number of elements (rows).

There are different ways to fill a DataFrame such as with a CSV file, a SQL query, a Python list, or a dictionary. Here we have created a DataFrame using a Python list of lists. Each nested list represents the data in one row of the DataFrame. We use the keyword columns to pass in the list of our custom column names.

In [7]:
dataf = pd.DataFrame([
    ['John Smith','123 Main St',34],
    ['Jane Doe', '456 Maple Ave',28],
    ['Joe Schmo', '789 Broadway',51]
    ],
    columns=['name','address','age'])

print(dataf)

         name        address  age
0  John Smith    123 Main St   34
1    Jane Doe  456 Maple Ave   28
2   Joe Schmo   789 Broadway   51


The default row indices are 0,1,2..., but these can be changed. For example, they can be set to be the elements in one of the columns of the DataFrame. To use the names column as indices instead of the default numerical values, we can run the following command on our DataFrame:

In [8]:
dataf.set_index('name')
print(dataf)

         name        address  age
0  John Smith    123 Main St   34
1    Jane Doe  456 Maple Ave   28
2   Joe Schmo   789 Broadway   51


# NumPy
NumPy is an open-source Python library that facilitates efficient numerical operations on large quantities of data. There are a few functions that exist in NumPy that we use on pandas DataFrames. For us, the most important part about NumPy is that pandas is built on top of it. So, NumPy is a dependency of Pandas. 

## Installation 
```bash
pip install numpy
```

After the installation we can import it ad follows: 
```python 
import numpy ad np
```

## NumPy Arrays 
NumPy arrays are unique in that they are more flexible than normal Python lists. They are called ndarrays since they can have any number (n) of dimensions (d). They hold a collection of items of any one data type and can be either a vector (one-dimensional) or a matrix (multi-dimensional). NumPy arrays allow for fast element access and efficient data manipulation.

In [1]:
import numpy as np

list1 = [1, 2, 3, 4, 5]

array1 = np.array(list1)
print(array1)

[1 2 3 4 5]


In order to ger a two-dimensional ndarray from a list we must start with a python list of lists 

In [2]:
list2 = [list1, [6, 7, 8, 9, 10]]
array2 = np.array(list2)
print(array2)

[[ 1  2  3  4  5]
 [ 6  7  8  9 10]]


The main operations while using NumPy are: 
- Selecting array elements
- Slicing arrays
- Reshaping arrays
- Splitting arrays
- Combining arrays
- Numerical operations (min, max, mean, etc)

Mathematical operations can be performed on all values in a ndarray at one time rather than having to loop through values, as is necessary with a Python list. This is very helpful in many scenarios. Say you own a toy store and decide to decrease the price of all toys by $10 for a weekend sale. With the toy prices stored in an ndarray, you can easily facilitate this operation.


In [3]:
toyPrices = np.array([1, 2, 3, 4, 5])
print(toyPrices + 10)

[11 12 13 14 15]
