#### Notes on 
# Quering Series Data Structure

In [1]:
# A pandas Series can be queried either by the index position or the index label. 
# To query by the index label, you can use the loc attribute. 
# To query by numeric location, starting at zero, use the iloc attribute. 
# If you don't give an index to the series when querying, the position and the label are effectively the same values. 

import pandas as pd
students_classes = {'Alice': 'Physics',
                   'Jack': 'Chemistry',
                   'Molly': 'English',
                   'Sam': 'History'}
s = pd.Series(students_classes)
s

Alice      Physics
Jack     Chemistry
Molly      English
Sam        History
dtype: object

In [2]:
# See the fourth entry - use the iloc attribute with the parameter 3.
s.iloc[3]

'History'

In [3]:
# What class Molly has - use the loc attribute with a parameter of Molly.
s.loc['Molly']

'English'

In [4]:
# Keep in mind that iloc and loc are not methods, they are attributes. 
# So you don't use parentheses to query them, but square brackets instead, which is called the indexing operator. 

In [7]:
# If your index is a list of integers? 
# Pandas can't determine automatically whether you're intending to query by index position or index label. 
# The safer option is to be more explicit and use the iloc or loc attributes directly.

class_code = {99: 'Physics',
              100: 'Chemistry',
              101: 'English',
              102: 'History'}
s = pd.Series(class_code)

In [8]:
# If we try and call s[0] we get a key error because there's no item in the classes list with 
# an index of zero, instead we have to call iloc explicitly if we want the first item.

s[0] 

KeyError: 0

#### How to get data out of the series? 

In [4]:
# A typical programmatic approach to this would be to iterate over all the items in the series

grades = pd.Series([90, 80, 70, 60])

total = 0
for grade in grades:
    total+=grade
print(total/len(grades))

75.0


In [5]:
# Pandas and the underlying numpy libraries support a method of computation called vectorization. 
# Vectorization works with most of the functions in the numpy library, including the sum function.

In [6]:
import numpy as np

total = np.sum(grades)
print(total/len(grades))

75.0


##### Adding new data using .loc attribute 
If the value you pass in as the index doesn't exist, then a new entry is added. indices can have mixed types. 

In [14]:
# Here's an example using a Series of a few numbers. 
s = pd.Series([1, 2, 3])

# We could add some new value, maybe a university course
s.loc['History'] = 102

s

0            1
1            2
2            3
History    102
dtype: int64

##### Index values are not unique
Makes pandas Series a little different conceptually from RDBMS

In [15]:
students_classes = pd.Series({'Alice': 'Physics',
                   'Jack': 'Chemistry',
                   'Molly': 'English',
                   'Sam': 'History'})
students_classes

Alice      Physics
Jack     Chemistry
Molly      English
Sam        History
dtype: object

In [16]:
# Series just for Kelly, which lists all of the courses she has taken.
kelly_classes = pd.Series(['Philosophy', 'Arts', 'Math'], index=['Kelly', 'Kelly', 'Kelly'])
kelly_classes

Kelly    Philosophy
Kelly          Arts
Kelly          Math
dtype: object

In [17]:
all_students_classes = students_classes.append(kelly_classes)

all_students_classes

Alice       Physics
Jack      Chemistry
Molly       English
Sam         History
Kelly    Philosophy
Kelly          Arts
Kelly          Math
dtype: object

In [18]:
# There are a couple of important considerations when using append. First, Pandas will take 
# the series and try to infer the best data types to use. In this example, everything is a string, 
# so there's no problems here. Second, the append method doesn't actually change the underlying Series
# objects, it instead returns a new series which is made up of the two appended together. This is
# a common pattern in pandas - by default returning a new object instead of modifying in place 

In [19]:
# Finally, we see that when we query the appended series for Kelly, we don't get a single value, 
# but a series itself. 
all_students_classes.loc['Kelly']

Kelly    Philosophy
Kelly          Arts
Kelly          Math
dtype: object