
# <font color=#14F278>Unit 1 - Pandas Series</font>
---

## <font color=#14F278> 1. What is Pandas?</font>


<font color=#14F278>**Pandas**</font> is a <font color=#14F278>**Python library**</font>, specifically designed for data manipulation and analysis. So far we have explored the world of core Python and learnt about various objects like lists, strings and dictionaries. However, a vast amount of data in the world is stored and displayed in tables - there is not a single company in the world that hasn't 'indulged' in building a table in Excel!

#### <font color=#14F278> Why are Tables so Important?</font>

A <font color=#14F278> **Table**</font> is a data structure that organizes information into rows and columns, used to store and display data in a structured format. Tables are used everywhere - tables in Maths are known under the term 'Matrix' (and whole field called Linear Algebra has emerged since 1694 revolving around it); tables are also a building block in databases; Chances are that everyone of us already 'consumed' some information today in the form of a table, purely because of how easy it is to deliver the 'meaning' behind a dataset.

Here are some of the reasons why we use tables:
- effortless data entry
- automatic nomenclature: each cell is uniquely and easily identifiable
- collection of vast quantitative data in an organised way
- easy formatting, sorting and filtering

#### <font color=#14F278> What does Pandas have to do with Tables?</font>

Pandas is the biggest Python 'weapon' in handling and exploring tables. As a Python library, it provides high-performance data structures and data analysis tools. The two main data structures in Pandas are:
- <font color=#14F278>**Series**</font>
- <font color=#14F278>**DataFrame**</font>


---

## <font color=#14F278> 2. Imports: </font>

To use the <font color=#14F278> **Pandas** </font> library, we need to import it in the beginning of our code:
- `import pandas as pd`
- `pd` is the **alias** used for Pandas

In [2]:
# Imports
import pandas as pd

---
## <font color=#14F278> 3. Pandas Series - Definition: </font>
A <font color=#14F278>**Series**</font> is a <font color=#14F278>**1-dimensional array**</font> of indexed data. When we translate array into 'arrangement' and 1-dimensional into 'linear', a Series is simply a linear arrangement of indexed data. Series objects are able of holding data of any type, as long as all elements of the Series are of the same type - in that sense:
- Each Series has a <font color=#14F278>**single data type**</font> !
- We can think of each element's index as its <font color=#14F278>**key**</font> 

<center>
    <div>
        <img src="..\images\series_001.png"/>
    </div>
</center>

---
## <font color=#14F278> 4. Constructing a Pandas Series Object: </font>

A Series can be created in two ways - from a list and from a dictionary.

- From a <font color=#14F278>**List**</font> :
    - `pd.Series([item1, item2, ...])`
    - `pd.Series([item1, item2, ...], index = [index1, index2, ...])`
    
    
- From a <font color=#14F278>**Dictionary**</font> :
    - `pd.Series({index1: item1, index2:item2, ...})`

In [3]:
# Creating a Series from a list
s = pd.Series([1,2,3])
display(s)

0    1
1    2
2    3
dtype: int64

In [4]:
# Creating a Series from a list while specifying the index
s = pd.Series([1,2,3], index=['a','b','c'])
display(s)

a    1
b    2
c    3
dtype: int64

In [5]:
# Creating a Series from a dictionary
# This method is identical to using a list and specifying the indeces. i.e.
# s = pd.Series([111,222], index = 'a', 'b')
s = pd.Series({'a':111, 'b':222})
display(s)

a    111
b    222
dtype: int64

---
## <font color=#14F278> 5. Retrieving Series Values and Index: </font>


Now that we know how to construct a simple __Series__ (which essentially looks like an indexed column), let's learn how to obtain values and indeces from it:

In [6]:
# Obtain the values of a Series
s.values

array([111, 222], dtype=int64)

In [7]:
# Obtain the index of a Series
s.index

Index(['a', 'b'], dtype='object')

In [8]:
# Obtain a specific value of a Series by specifying its index
s['a']

111

---
## <font color=#14F278> 6. Series Data Types: </font>

- We already mentioned that each Series has a <font color=#14F278>**single data type** </font> - all elements in the Series share the same type.
- Series can take one of multiple data types - they can be of integer, float, string, python object data type, etc.
- Let's explore how to obtain a Series' data type and change it (where possible):

In [9]:
# Check Series data type
s.dtype

dtype('int64')

In [10]:
# Changing the data type of a Series - so called 'typecasting'
s.astype(float)

a    111.0
b    222.0
dtype: float64

In [14]:
s = pd.Series([[1],[2],[3]])

s

0    [1]
1    [2]
2    [3]
dtype: object

In [11]:
# Underlying data structure is a numpy array
type(s.values)

numpy.ndarray

---
## <font color=#14F278>7. Series Shape: </font>

- The concept of  <font color=#14F278> **shape** </font> allows us to obtain 'metadata' about Series (and later on about DataFrames).
- It essentially returns the object's dimensions in the form of a tuple. 
- Series are <font color=#14F278> **1-dimensional** </font>, so we only expect to obtain information on 1 dimension:

In [15]:
s = pd.Series({'a':1, 'b':2, 'c':3})
display(s)

a    1
b    2
c    3
dtype: int64

In [16]:
# Series object has 1 dimension with 3 elements
s.shape

(3,)

In [19]:
s.astype(int)

a    1
b    2
c    3
dtype: int32

---
## <font color=#14F278> 8. Summary: </font>


- Pandas is a Python library providing high-performance data structures and analysis tools
- Pandas 2 main objects are __Series__ and __DataFrames__
- Pandas Series is a 1-dimensional array of indexed data, which has a single data type
- Series can be constructed from __lists__ and __dictionaries__ using the `pd.Series()` method
- We can obtain Series values via the `.values` property
- We can obtain Series index via the `.index` property
- We can obtain Series data type via the `.dtype` property and we can type cast a Series via the `.astype()` method
- We can obtain the dimensions of a Series via the `.shape` property

---
## <font color=#FF8181> 9. Concept Check: </font>

1. What is a pandas series object? What attributes does it have?
2. How many dtypes (datatypes) can you have for the values of a single Series instance?
3. Construct a series object from a tuple containing `(10, 10.5, 11)`. What is the dtype for your series? Is this what you expect?
4. Construct a series object from a list containing `['10', '10.5', '11']`. What is the dtype for your series? Is this what you expect? If appropriate, how would you convert it to a numeric data type?
5. Construct a series object from a list containing `['10', '10.5', '11']`, with index set to `[3,4,5]`. How would you get the second element in your series?

In [None]:
# #1
# A pandas series object is a 1 dimensional array of indexed data.
# Can hold data of any type, but each series must be comprised of only 1 data type

In [None]:
# only one datatype for the values in a single series instance

In [34]:

my_tuple = (10, 10.5, 11)
s1 = pd.Series(my_tuple)
display(s1)
print(s1.dtype)

0    10.0
1    10.5
2    11.0
dtype: float64

float64


In [35]:

# Creating a Series from a list
s2 = pd.Series([10,10.5,11])
print(s2.dtype)
display(s2)

float64


0    10.0
1    10.5
2    11.0
dtype: float64

In [33]:
s = pd.Series([10,10.5,11], index=[3,4,5])
display(s)
s[4]

3    10.0
4    10.5
5    11.0
dtype: float64

10.5