## Create Jupyter Notebook for the `Series` Module

In [2]:
import pandas as pd

## Create A `Series` Object from a Python List
- A one-dimensional labelled array, similar to the best feautres of a Python list and dictionary
- Data consistency is the most important value
- Step into the lobby of the library
- Tab completion for pandas top-level attributes / press Enter
- dtype: object for strings
- Index on the left (index labels do not have to be numeric)
- Index is a unique identifier
- Pandas default is numeric index starting from 0

In [4]:
ice_cream = ["Chocolate", "Vanilla", "Strawberry", "Rum Raisin"]
pd.Series(ice_cream)

0     Chocolate
1       Vanilla
2    Strawberry
3    Rum Raisin
dtype: object

In [5]:
lottery = [4, 8, 15, 16, 23, 42]
pd.Series(lottery)

0     4
1     8
2    15
3    16
4    23
5    42
dtype: int64

In [6]:
registrations = [True, False, False, False, True]
pd.Series(registrations)

0     True
1    False
2    False
3    False
4     True
dtype: bool

## Create A `Series` Object from A Python Dictionary
- The index does **not** have to be numeric and it does not have to start at 0.
- The `dtype` refers to the Series values, not Series index
- Index labels in a Series can have duplicates

In [7]:
webster = {
    "Aardvark": "An animal", 
    "Banana": "A fruit", 
    "Cyan": "A color"
}

pd.Series(webster)

Aardvark    An animal
Banana        A fruit
Cyan          A color
dtype: object

## Intro to Attributes
- Objects in Python have attributes and methods.
- Attributes return details/characteristics
- Attributes returns information about the object.
- Attributes are written after the object name separated by a period.
- Jupyter outputs the last execution; we need a variable to reuse the Series in a later cell.

In [4]:
about_me = ["Smart", "Handsome", "Charming", "Brilliant", "Humble"]
s = pd.Series(about_me)
s

0        Smart
1     Handsome
2     Charming
3    Brilliant
4       Humble
dtype: object

In [8]:
# ndarray of the values in the Series
s.values

array(['Smart', 'Handsome', 'Charming', 'Brilliant', 'Humble'],
      dtype=object)

In [6]:
# The object holding the Series index
s.index

RangeIndex(start=0, stop=5, step=1)

In [7]:
s.dtype

dtype('O')

## Intro to Methods
- Methods require parentheses, attributes do not.
- A method is a behavior or command we can give to the object.
- A method can potentially manipulate the object's internal data.

In [10]:
prices = [2.99, 4.45, 1.36]
s = pd.Series(prices)
s

0    2.99
1    4.45
2    1.36
dtype: float64

In [11]:
s.sum()

8.8

In [12]:
s.product()

18.095480000000006

In [13]:
s.mean()

2.9333333333333336

In [16]:
home_runs = pd.Series([3, 4, 8, 2])
home_runs.sum()
home_runs.mean()

4.25

## Parameters and Arguments
- The `Series` constructor is really just a method called on the `pandas` library.
- Constructors and methods have parameters.
- Parameters are like options. They determine how the method will execute.
- The user's inputs to those parameters are called the **arguments**.
- View parameters by using Shift + Tab inside parentheses.
- The first parameter to the `Series` constructor is `data`.

In [8]:
fruits   = ["Apple", "Orange", "Plum", "Grape", "Blueberry"]
weekdays = ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday"]

pd.Series(data = fruits, index = weekdays) # Use keywords to match parameters and arguments
pd.Series(fruits, weekdays)         # Arguments will be passed to parameters sequentially
pd.Series(fruits, index = weekdays) # Most common way

# Index values do NOT have to be unique.
# If duplicates exist in the index, some data operations will not be possible.
fruits   = ["Apple", "Orange", "Plum", "Grape", "Blueberry", "Watermelon"]
weekdays = ["Monday", "Tuesday", "Wednesday", "Thursday", "Wednesday", "Monday"]
pd.Series(data = fruits, index = weekdays)

Monday            Apple
Tuesday          Orange
Wednesday          Plum
Thursday          Grape
Wednesday     Blueberry
Monday       Watermelon
dtype: object

## Import `Series` with the `read_csv` function
- The `read_csv` function is available at the top-level of `pandas`.
- The function's first argument is the location and filename of the CSV.
- The `squeeze` parameter affects how the file will be read.
- `pandas` defaults to a `DataFrame`, even if CSV is a single column.
- A `True` argument to the `squeeze` argument will result in a `Series`.

In [5]:
pd.read_csv("pokemon.csv")
pd.read_csv("pokemon.csv", usecols = ["Pokemon"])
pd.read_csv("pokemon.csv", usecols = ["Pokemon"], squeeze = True)
pokemon = pd.read_csv("pokemon.csv", usecols = ["Pokemon"], squeeze = True)
pokemon

0       Bulbasaur
1         Ivysaur
2        Venusaur
3      Charmander
4      Charmeleon
          ...    
716       Yveltal
717       Zygarde
718       Diancie
719         Hoopa
720     Volcanion
Name: Pokemon, Length: 721, dtype: object

In [10]:
pd.read_csv("google_stock_price.csv")
pd.read_csv("google_stock_price.csv", usecols = ["Stock Price"])
pd.read_csv("google_stock_price.csv", usecols = ["Stock Price"], squeeze = True)
google = pd.read_csv("google_stock_price.csv", usecols = ["Stock Price"], squeeze = True)
google

0        50.12
1        54.10
2        54.65
3        52.38
4        52.95
         ...  
3007    772.88
3008    771.07
3009    773.18
3010    771.61
3011    782.22
Name: Stock Price, Length: 3012, dtype: float64

## The `.head()` and `.tail()` Methods
- The `head` method returns a specified number of rows from the top of the `Series`. 
- The `tail` method returns a specified number of rows from the end of the `Series`. 

In [11]:
pokemon = pd.read_csv("pokemon.csv", usecols = ["Pokemon"], squeeze = True)
google = pd.read_csv("google_stock_price.csv", usecols = ["Stock Price"], squeeze = True)

In [15]:
pokemon.head()
pokemon.head5
pokemon.head(1)

0    Bulbasaur
Name: Pokemon, dtype: object

In [18]:
google.tail()
google.tail(10)
google.tail(6)
google.tail(1)

3011    782.22
Name: Stock Price, dtype: float64

## Python Built-In Functions

In [11]:
pokemon = pd.read_csv("pokemon.csv", usecols = ["Pokemon"], squeeze = True)
len(pokemon)
type(pokemon)
dir(pokemon)
sorted(pokemon)
type(sorted(pokemon))
list(pokemon)
dict(pokemon)
max(pokemon)
min(pokemon)

google = pd.read_csv("google_stock_price.csv", squeeze = True)
max(google)
min(google)

49.950000000000003

## More `Series` Attributes

In [45]:
pokemon = pd.read_csv("pokemon.csv", usecols = ["Pokemon"], squeeze = True)
google = pd.read_csv("google_stock_price.csv", squeeze = True)

# Previous attributes
pokemon.values
google.values

pokemon.index
google.index

google.dtype  # "O" is short for Object, which is what pandas uses for strings
google.dtypes

# Returns True if every Series value is unique (no duplicates)
pokemon.is_unique
google.is_unique

pokemon.ndim
pokemon.shape # Shape (same as size or len) because 1-dimensional
pokemon.size  # Number of elements (cells) in the Series  - counts NaN

# The name attribute can be assigned to
pokemon.name
pokemon.name = "Pocket Monsters"
pokemon.name
pokemon.tail(3) # To check name

pokemon.to_frame().memory_usage()

Index                80
Pocket Monsters    5768
dtype: int64

## The `.sort_values()` Method
- If the data type is a string, the sorting will be alphabetical.
- If the data type is numeric, the sorting will be ascending - smallest to larges.t
- The `ascending` parameter can be reset to order in descending order.

In [13]:
pokemon = pd.read_csv("pokemon.csv", usecols = ["Pokemon"], squeeze = True)
google = pd.read_csv("google_stock_price.csv", squeeze = True)

In [14]:
pokemon.sort_values().head(5)
pokemon.sort_values(ascending = True).head(5)
pokemon.sort_values(ascending = False).head(5)

717     Zygarde
633    Zweilous
40        Zubat
569       Zorua
570     Zoroark
Name: Pokemon, dtype: object

In [15]:
google.sort_values().head(5)
google.sort_values(ascending = True).head(5)
google.sort_values(ascending = False).head(5)

3011    782.22
2859    776.60
3009    773.18
3007    772.88
3010    771.61
Name: Stock Price, dtype: float64

## The `inplace` Parameter
- Modifies the original object

In [16]:
google = pd.read_csv("google_stock_price.csv", squeeze = True)

In [17]:
google = google.sort_values(ascending = False)

In [18]:
google.sort_values(ascending = False, inplace = True)

## The `.sort_index()` Method

In [19]:
google = pd.read_csv("google_stock_price.csv", squeeze = True)

In [20]:
google.sort_values(inplace = True)
google.head(3)

11    49.95
9     50.07
0     50.12
Name: Stock Price, dtype: float64

In [21]:
google.sort_index()
google.sort_index(inplace = True)
google.head(3)

0    50.12
1    54.10
2    54.65
Name: Stock Price, dtype: float64

## Python's `in` Keyword
- The `in` keyword is used in Python to check if a value exists in a list.
- It works similarly for `Series`.
- However, `pandas` defaults to checking in the index, not the values.

In [61]:
google.tail(3)

3009    773.18
3010    771.61
3011    782.22
Name: Stock Price, dtype: float64

In [22]:
pokemon = pd.read_csv("pokemon.csv", usecols = ["Pokemon"], squeeze = True)
pokemon.head(3)

0    Bulbasaur
1      Ivysaur
2     Venusaur
Name: Pokemon, dtype: object

In [23]:
# Same thing
"Bulbasaur" in pokemon
"Bulbasaur" in pokemon.index

# Here we go!
"Bulbasaur" in pokemon.values

True

In [24]:
5 in pokemon
5 in pokemon.index

True

## Extract Values by Index Position

In [25]:
pokemon = pd.read_csv("pokemon.csv", usecols = ["Pokemon"], squeeze = True)
pokemon.head(3)

0    Bulbasaur
1      Ivysaur
2     Venusaur
Name: Pokemon, dtype: object

In [26]:
pokemon[500]
pokemon[[100, 200, 300]]
pokemon[0:50] # Inclusive : Exclusive
pokemon[25:68]
pokemon[:50]
pokemon[-30:-10]
pokemon[-30:] # Same as pokemon.tail(30)

Output = None

## Extract Values by Index Label

In [27]:
pokemon = pd.read_csv("pokemon.csv", index_col = "Pokemon", squeeze = True)
pokemon.head(3)

Pokemon
Bulbasaur    Grass
Ivysaur      Grass
Venusaur     Grass
Name: Type, dtype: object

In [28]:
pokemon[0]
pokemon[[100, 134]]
pokemon["Mewtwo"]
pokemon["Ditto"]
pokemon[["Charizard", "Jolteon"]]
pokemon[["Blastoise", "Venusaur", "Meowth"]]
# pokemon["Digimon"]  - Does not exist
pokemon[["Pikachu", "Digimon"]] # Still returns, even though one does not exist

Pokemon
Pikachu    Electric
Digimon         NaN
Name: Type, dtype: object

## The `.get` Method on a `Series`
- First argument is the index label to attempt to locate.
- Second argument is answer if NOT FOUND (default is `NaN`)

In [29]:
pokemon = pd.read_csv("pokemon.csv", index_col = "Pokemon", squeeze = True)
pokemon.head(2)

Pokemon
Bulbasaur    Grass
Ivysaur      Grass
Name: Type, dtype: object

In [30]:
pokemon.get("Moltres")
pokemon.get(["Moltres", "Meowth"])
pokemon.get(["Moltres", "Digimon"]) # Same as regular

pokemon.get("Digimon")
pokemon.get("Digimon", "Nonexistent")

'Nonexistent'

## Math Methods on `Series` Objects

In [31]:
google = pd.read_csv("google_stock_price.csv", squeeze = True)
google.head(3)

google.count()    # Ignores null values
google.sum()      # Sum of all items
google.mean()     # Average of all items
google.std()      # Standard deviation
google.min()      # Smallest number
google.max()      # Greatest number
google.median()   # Item in the middle of sorted pack
google.mode()     # Most frequent item
google.describe() # Deliver a bunch of statistics

count    3012.000000
mean      334.310093
std       173.187205
min        49.950000
25%       218.045000
50%       283.315000
75%       443.000000
max       782.220000
Name: Stock Price, dtype: float64

## The `idxmax` and `idxmin` Methods
- Return the index labels with the greatest and smallest values

In [32]:
google = pd.read_csv("google_stock_price.csv", squeeze = True)
google.head(1)

0    50.12
Name: Stock Price, dtype: float64

In [33]:
print(google.max())
print(google.min())

print(google.idxmax())
print(google.idxmin())

print(google[google.idxmax()])
print(google[google.idxmin()])

782.22
49.95
3011
11
782.22
49.95


## The `value_counts()` Method

In [34]:
pokemon = pd.read_csv("pokemon.csv", index_col = "Pokemon", squeeze = True)
pokemon.head(3)

pokemon.value_counts()

Water       105
Normal       93
Grass        66
Bug          63
Psychic      47
Fire         47
Rock         41
Electric     36
Ground       30
Dark         28
Poison       28
Fighting     25
Dragon       24
Ghost        23
Ice          23
Steel        22
Fairy        17
Flying        3
Name: Type, dtype: int64

## The `.apply` Method
- Invoke function on every value in a `Series`

In [35]:
google = pd.read_csv("google_stock_price.csv", squeeze = True)
google.head(3)

def classify_performance(number):
    if number < 300:
        return "OK"
    elif number >= 300 and number < 650:
        return "Satisfactory"
    else:
        return "Incredible!"
    
google.apply(classify_performance).head(5)

0    OK
1    OK
2    OK
3    OK
4    OK
Name: Stock Price, dtype: object

## The `.map` Method
- Maps the values of a `Series` to another collection.
- If dictionary is passed, maps `Series` values to dictionary keys
- If `Series` is passed, maps `Series` values to other `Series` index

In [36]:
# Pass Series
pokemon_names = pd.read_csv("pokemon.csv", usecols = ["Pokemon"], squeeze = True)
pokemon_types = pd.read_csv("pokemon.csv", index_col = "Pokemon", squeeze = True)
pokemon_names.map(pokemon_types)

# Pass dictionary
pokemon_names = pd.read_csv("pokemon.csv", usecols = ["Pokemon"], squeeze = True)
pokemon_types = pd.read_csv("pokemon.csv", index_col = "Pokemon", squeeze = True).to_dict()
pokemon_types # See dictionary
pokemon_names.map(pokemon_types)

# Pass method
def long_name(pokemon):
    if len(pokemon) > 8:
        return True
    else:
        return False

pokemon_names.map(long_name)

0       True
1      False
2      False
3       True
4       True
5       True
6      False
7       True
8       True
9      False
10     False
11      True
12     False
13     False
14     False
15     False
16      True
17     False
18     False
19     False
20     False
21     False
22     False
23     False
24     False
25     False
26      True
27      True
28     False
29     False
       ...  
691     True
692     True
693     True
694     True
695    False
696     True
697    False
698    False
699    False
700    False
701    False
702    False
703    False
704    False
705    False
706    False
707    False
708     True
709     True
710     True
711    False
712    False
713    False
714    False
715    False
716    False
717    False
718    False
719    False
720     True
Name: Pokemon, dtype: bool