# Pandas - Series
## TOPICS:
1. What is Pandas?
2. Installation
3. Series
* 3.1 CREATE
* 3.2 READ
* 3.3 UPDATE
* 3.4 DELETE
4. Operations with Series
5. A first look at DataFrame (There will be more detail in a dedicated notebook about a dataframe.)





## 1. What is Pandas?

As you may guess that Pandas is very important. 

What is the name of the course main textbook? 

Answer: **Pandas for Everyone**, Daniel Y. Chen

* A library to help us deal with data.
* It also utilizes NumPy underneath.
* Some key parts in Pandas were inspired by R features (such as a data frame)
* We can use pandas to import data from various file format.
* We can use it for visualization.
* It is an open source project library.

## 2. Installation
Exactly the same process as installing NumPy. However, this time, we search for **pandas**.
![Install Pandas](assets/InstallPandas1.png)

Now,click **Apply** 

_Note: You may see a longer list than this._

![Install Pandas](assets/InstallPandas2.png)



** Import **
Very similar to what we did with NumPy. Let's import both of them

In [2]:
import pandas as pd
import numpy as np 

## 3. Series
Series is a 1-d array that can hold data and the associated label. We may consider a series as a mixture of **an 1-d array** and **a dictionary**. For most of the parts, a series works like 1-d array. The part that makes it work like a dictionary is that we can access the data by the label. This is very similar to using a key to access the value in dictionary.


In [6]:
name_list = ["Tom", "Tim", "Ted"]       # a list of strings
gpa_list = [2.3,2.4,2.5]               # a list of floats
np_arr = np.array(gpa_list)             # a numpy array
dct = {"Tom":2.3, "Tim":2.4, "Ted":2.5} # a dictionary 

### 3.1 CREATE
We can create a series from a list.

In [15]:
pd_s1 = pd.Series( data = gpa_list)
pd_s1

0    2.3
1    2.4
2    2.5
dtype: float64

The each data point can have **a label**. Think about it as an index.

In [8]:
pd_s2 = pd.Series( data = gpa_list, index = name_list)
pd_s2

Tom    2.3
Tim    2.4
Ted    2.5
dtype: float64

The order of the parameters indicates what they suppose to be. So, the first one is the data points, the second one is the label. Hence we can write it in a shorter way:

In [None]:
pd_s3 = pd.Series(gpa_list, name_list)
pd_s3

We can also create a series from a NumPy array as well.

In [None]:
pd_s4 = pd.Series(np_arr, name_list)
pd_s4

If we have a dictionary already, we can create a series from a dictionary.

In [13]:
pd_s5 = pd.Series(dct)
pd_s5

Tom    2.3
Tim    2.4
Ted    2.5
dtype: float64

### 3.2 READ
We can access the element by using the label.
If we don't specify the label manually. The row number (index) can be used to access the element.

In [9]:
s6 = pd.Series( [777.7, 888.8, 999.9] )
print(s6[0])
print(s6[1])
print(s6[2])

777.7
888.8
999.9


If we manually set the lable, we can use the label.

In [16]:
s7 = pd.Series( [22,25,27] , ['Cold','OK','Warm'] )
print(s7['Cold'])
print(s7['OK'])
print(s7['Warm'])

22
25
27


### 3.3 Update
* Change existing value
* Append new data to a series
We can update/change existing value by assigning a new value to it.


In [17]:
s8 = pd.Series( [22,25,27] , ['Cold','OK','Warm'] )
s8['Cold']=19
s8

Cold    19
OK      25
Warm    27
dtype: int64

We can also add/append a new value to the series.

In [18]:
s8['Hot'] = 35
s8

Cold    19
OK      25
Warm    27
Hot     35
dtype: int64

In [19]:
s8['Omg'] = -5
s8

Cold    19
OK      25
Warm    27
Hot     35
Omg     -5
dtype: int64

### 3.4 Delete
We can call a drop funtion to remove an item. This function will return **a new series**. **The current one still holds all element.**

In [20]:
s9 = pd.Series( [22,25,27] , ['Cold','OK','Warm'] )
s10 = s9.drop(labels=['Cold','Warm'])
print(s9)
print("---")
print(s10)

Cold    22
OK      25
Warm    27
dtype: int64
---
OK    25
dtype: int64


## 4. Operations with Series
When we add two series together, pandas will use labels to identify values to be added. 
Let's start with a simple case when all labels are just the row numbers.

In [None]:
s1 = pd.Series([10,20,30])
s2 = pd.Series([5,6,7])
s1+s2

Now, let's try to set the labels manually. Notice that pandas doesn't add by the order of the row !!! It adds by matching the label. If there is no matched label, it will return **NaN (not a number)**.

In [None]:
s3 = pd.Series( [10,20,30], ['Coffee','Tea','Coke'])
s4 = pd.Series( [5,6,7], ['Coffee','Milo','Tea'])
s3+s4

## 5. A first look at DataFrame
DataFrame is the main data structure we will use to hold data in the very similar to how Excel hold data.

In [None]:
from numpy.random import rand
np.random.seed(123)

# Note: rand(4,3) returns a numpy array of random values  (4 rows 3 columns).
df = pd.DataFrame( rand(4,3) , ['A','B','C','D'], ['m','n','o'] )
df

Looking at the result above, if we look at the first column only, we are seeing something that looks like a series. This gives us a hint that a dataframe seems like a collection of series.