## CMPINF 2100 Week 04

### Introduction to Pandas - Series

Pandas is built on top of NumPy. So whenever we work with Pandas it is also important to load in NumPy.

## Import Modules

In [1]:
import numpy as np
import pandas as pd

## Pandas Series

A Pandas Series is similar to, yet different from, Python lists and 1D NumPy arrays. There are certain influences from the DICTIONARIES!!

Lets create a Pandas Series FROM a Python list.

In [2]:
grocery_list = ['milk', 'bananas', 'apples', 'lunch meat', 'soup', 'oreos']

In [4]:
len(grocery_list)

6

In [5]:
type(grocery_list)

list

In [7]:
grocery_list[0]

'milk'

Lets convert the grocery_list to a Pandas Series

In [8]:
grocery_series = pd.Series(grocery_list)

In [9]:
%whos

Variable         Type      Data/Info
------------------------------------
grocery_list     list      n=6
grocery_series   Series    0          milk\n1       <...>     oreos\ndtype: object
np               module    <module 'numpy' from '/Ap<...>kages/numpy/__init__.py'>
pd               module    <module 'pandas' from '/A<...>ages/pandas/__init__.py'>


In [10]:
grocery_series

0          milk
1       bananas
2        apples
3    lunch meat
4          soup
5         oreos
dtype: object

In [12]:
grocery_series.unique

<bound method Series.unique of 0          milk
1       bananas
2        apples
3    lunch meat
4          soup
5         oreos
dtype: object>

Lets focus on the `.index` attribute of the Pandas Series object.

In [13]:
grocery_series.index

RangeIndex(start=0, stop=6, step=1)

In [14]:
grocery_series[0]

'milk'

In [15]:
grocery_series[1]

'bananas'

In [16]:
grocery_series[-1]

KeyError: -1

But the `.index` attribute does NOT need to be a range index integer!

In [17]:
grocery_series.index

RangeIndex(start=0, stop=6, step=1)

In [18]:
grocery_series.index = ["zero", "one", "two", "three", "four", "five"]

In [19]:
grocery_series.index

Index(['zero', 'one', 'two', 'three', 'four', 'five'], dtype='object')

In [20]:
grocery_series

zero           milk
one         bananas
two          apples
three    lunch meat
four           soup
five          oreos
dtype: object

In [21]:
grocery_list

['milk', 'bananas', 'apples', 'lunch meat', 'soup', 'oreos']

The Panda Series has both INDEX and VALUES!!

Kind of like a DICTIONARY and KEY/VALUE defining each ITEM!

In [22]:
grocery_series.values

array(['milk', 'bananas', 'apples', 'lunch meat', 'soup', 'oreos'],
      dtype=object)

In [23]:
grocery_series.index

Index(['zero', 'one', 'two', 'three', 'four', 'five'], dtype='object')

In [24]:
grocery_series["one"]

'bananas'

In [26]:
grocery_series[1]

'bananas'

Lets continue to practice by defining a new Series. The Series will define the `.index` attribute when the object is created.

In [27]:
more_groceries = pd.Series (["apple juice", "poptarts", "butter", "yogurt"],
                            index = ["item 1", "item 2", "item 3", "top gun"])

In [28]:
more_groceries

item 1     apple juice
item 2        poptarts
item 3          butter
top gun         yogurt
dtype: object

In [31]:
more_groceries[0]

'apple juice'

In [30]:
more_groceries["item 1"]

'apple juice'

IMPORTANT...PLEASE BE CAREFUL...if you define the `.index` attribute manually upon creation of the Series... the number of elements or entries for `index` MUST be the same as the num of entries for the values!!

In [32]:
pd.Series(["1", "2", "3", "4"],
          index = ["a", "b", "c"])

ValueError: Length of values (4) does not match length of index (3)

In [34]:
pd.Series(["1", "2", "3", "4"],
          index = ["a", "b", "c", "d"])

a    1
b    2
c    3
d    4
dtype: object

I think the `.index` attribute is confusing...because the `index` attribute does not need to be UNIQUE!!!

In [37]:
another_series = pd.Series(["a", "b", "c"],
                           index = ["1", "2", "2"])

In [38]:
another_series.values

array(['a', 'b', 'c'], dtype=object)

In [39]:
another_series.index

Index(['1', '2', '2'], dtype='object')

SLICING THE SERIES By given the index...will return MULTIPLE values if the index is NOT unique!!!

In [40]:
another_series['1']

'a'

In [41]:
another_series['2']

2    b
2    c
dtype: object

In [42]:
another_series['2'].values

array(['b', 'c'], dtype=object)

In [43]:
another_series['2'].index

Index(['2', '2'], dtype='object')

In [None]:
another_series['2'].values

## Combining Series
We can APPEND or EXTEND or COMBINE multiple Series together using the `pd.concat()` function.

In [44]:
pd.concat([grocery_series, more_groceries, another_series])

zero              milk
one            bananas
two             apples
three       lunch meat
four              soup
five             oreos
item 1     apple juice
item 2        poptarts
item 3          butter
top gun         yogurt
1                    a
2                    b
2                    c
dtype: object

When combining seprate Series ogether, sometimes we want a BRAND NEW `.index` attribute. To do so, we can use the `ignore_index` argument within `pd.concat`.

In [45]:
pd.concat([grocery_series, more_groceries, another_series], ignore_index=True)

0            milk
1         bananas
2          apples
3      lunch meat
4            soup
5           oreos
6     apple juice
7        poptarts
8          butter
9          yogurt
10              a
11              b
12              c
dtype: object

In [51]:
a_bigger_series = pd.concat([grocery_series, more_groceries, another_series], ignore_index=True).copy()

In [52]:
a_bigger_series

0            milk
1         bananas
2          apples
3      lunch meat
4            soup
5           oreos
6     apple juice
7        poptarts
8          butter
9          yogurt
10              a
11              b
12              c
dtype: object

In [53]:
a_bigger_series.index

RangeIndex(start=0, stop=13, step=1)

In [54]:
a_bigger_series[0]

'milk'

In [56]:
pd.concat([grocery_series, more_groceries, another_series, a_bigger_series])

zero              milk
one            bananas
two             apples
three       lunch meat
four              soup
five             oreos
item 1     apple juice
item 2        poptarts
item 3          butter
top gun         yogurt
1                    a
2                    b
2                    c
0                 milk
1              bananas
2               apples
3           lunch meat
4                 soup
5                oreos
6          apple juice
7             poptarts
8               butter
9               yogurt
10                   a
11                   b
12                   c
dtype: object

In [57]:
pd.concat([grocery_series, more_groceries, another_series, a_bigger_series]).values

array(['milk', 'bananas', 'apples', 'lunch meat', 'soup', 'oreos',
       'apple juice', 'poptarts', 'butter', 'yogurt', 'a', 'b', 'c',
       'milk', 'bananas', 'apples', 'lunch meat', 'soup', 'oreos',
       'apple juice', 'poptarts', 'butter', 'yogurt', 'a', 'b', 'c'],
      dtype=object)

In [61]:
pd.concat([grocery_series, more_groceries, another_series, a_bigger_series]).index

Index([   'zero',     'one',     'two',   'three',    'four',    'five',
        'item 1',  'item 2',  'item 3', 'top gun',       '1',       '2',
             '2',         0,         1,         2,         3,         4,
               5,         6,         7,         8,         9,        10,
              11,        12],
      dtype='object')

In [63]:
pd.concat([grocery_series, more_groceries, another_series, a_bigger_series], ignore_index=True)

0            milk
1         bananas
2          apples
3      lunch meat
4            soup
5           oreos
6     apple juice
7        poptarts
8          butter
9          yogurt
10              a
11              b
12              c
13           milk
14        bananas
15         apples
16     lunch meat
17           soup
18          oreos
19    apple juice
20       poptarts
21         butter
22         yogurt
23              a
24              b
25              c
dtype: object

## Summary

The Panda Series looks kind of like a mix between lists, 1D NumPy arrays, and Dictionaries.

Values are stored and associated with `.index` attribute. The Series can be sliced using the `.index` LOCATION.