## Create Jupyter Notebook for the `Series` Module

In [1]:
import pandas as pd
pd.__version__

'1.4.2'

In [58]:
df = pd.read_csv("Students_List_Export_2022-04-30_17-59-46.csv", parse_dates = ["Started Date", "Last Visited"])
del df["Questions Asked"]
del df["Questions Answered"]
regular_student = ~df["Udemy Business"]
started_class = df["Started Date"].notnull()
last_visited_class = df["Last Visited"].notnull()
decent_percent = df["Progress"] > 60
target = df[regular_student & started_class & last_visited_class & decent_percent]

In [62]:
target["Days Since Start"] = target["Last Visited"] - target["Started Date"]
goal = target.sort_values("Days Since Start", ascending = False)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  target["Days Since Start"] = target["Last Visited"] - target["Started Date"]


## Create A `Series` Object from a Python List
- A one-dimensional labelled array, similar to the best features of a Python list and dictionary. It stores data in a sequenced value
- Data consistency is the most important value (strings, numbers, Booleans)
- Step into the lobby of the library
- Tab completion for pandas top-level attributes / press Enter
- dtype: object for strings
- Index on the left (index labels do not have to be numeric)
- Pandas default is numeric index starting from 0

In [3]:
ice_cream = ["Chocolate", "Vanilla", "Strawberry", "Rum Raisin"]
pd.Series(ice_cream)

0     Chocolate
1       Vanilla
2    Strawberry
3    Rum Raisin
dtype: object

In [4]:
lottery = [4, 8, 15, 16, 23, 42]
pd.Series(lottery)

0     4
1     8
2    15
3    16
4    23
5    42
dtype: int64

In [5]:
registrations = [True, False, False, False, True]
pd.Series(registrations)

0     True
1    False
2    False
3    False
4     True
dtype: bool

## Create A `Series` Object from A Python Dictionary
- The index does **not** have to be numeric and it does not have to start at 0.
- The `dtype` refers to the Series values, not Series index
- Index labels in a Series can have duplicates

In [6]:
sushi = {
    "Salmon": "Orange",
    "Tuna": "Red",
    "Eel": "Brown"
}

pd.Series(sushi)

Salmon    Orange
Tuna         Red
Eel        Brown
dtype: object

## Intro to Methods
- Methods require parentheses, attributes do not.
- A method is a behavior or command we can give to the object.
- A method can potentially manipulate the object's internal data.

In [7]:
"hello".upper()

values = [1, 2, 3]
values.append(4)
values

[1, 2, 3, 4]

In [8]:
prices = pd.Series([2.99, 4.45, 1.36])
prices

0    2.99
1    4.45
2    1.36
dtype: float64

In [11]:
prices.sum()

8.8

In [12]:
prices.product()

18.095480000000006

In [13]:
prices.mean()

2.9333333333333336

## Intro to Attributes
- Objects in Python have attributes and methods.
- Attributes return details/characteristics
- Attributes returns information about the object.
- Attributes are written after the object name separated by a period.
- Jupyter outputs the last execution; we need a variable to reuse the Series in a later cell.

In [18]:
adjectives = pd.Series(["Smart", "Handsome", "Charming", "Brilliant", "Humble", "Smart"])
adjectives

0        Smart
1     Handsome
2     Charming
3    Brilliant
4       Humble
5        Smart
dtype: object

In [19]:
adjectives.size

6

In [21]:
adjectives.is_unique

False

In [22]:
adjectives.dtype

dtype('O')

In [10]:
# ndarray of the values in the Series
adjectives.values
type(adjectives.values)

numpy.ndarray

In [11]:
# The object holding the Series index
adjectives.index
type(adjectives.index)

pandas.core.indexes.range.RangeIndex

## Parameters and Arguments
- The `Series` constructor is really just a method called on the `pandas` library.
- Constructors and methods have parameters.
- Parameters are like options. They determine how the method will execute.
- The user's inputs to those parameters are called the **arguments**.
- View parameters by using Shift + Tab inside parentheses.
- The first parameter to the `Series` constructor is `data`.

In [8]:
# Parameter - The name we give to an expected input
# Argument - The concrete value we provide to a parameter

# Difficulty - Easy, Medium, Hard
# Volume - 1 through 10
# Subtitles - True, False

fruits   = ["Apple", "Orange", "Plum", "Grape", "Blueberry"]
weekdays = ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday"]

pd.Series(data = fruits, index = weekdays) # Use keywords to match parameters and arguments
pd.Series(fruits, weekdays)         # Arguments will be passed to parameters sequentially
pd.Series(fruits, index = weekdays) # Most common way

# Index values do NOT have to be unique.
# If duplicates exist in the index, some data operations will not be possible.
fruits   = ["Apple", "Orange", "Plum", "Grape", "Blueberry", "Watermelon"]
weekdays = ["Monday", "Tuesday", "Wednesday", "Thursday", "Wednesday", "Monday"]
pd.Series(data = fruits, index = weekdays)

Monday            Apple
Tuesday          Orange
Wednesday          Plum
Thursday          Grape
Wednesday     Blueberry
Monday       Watermelon
dtype: object

## Import `Series` with the `read_csv` function
- The `read_csv` function is available at the top-level of `pandas`.
- The function's first argument is the location and filename of the CSV.
- The `squeeze` parameter affects how the file will be read.
- `pandas` defaults to a `DataFrame`, even if CSV is a single column. Call `squeeze`
- Method chaining
- Truncation of rows

In [6]:
pd.read_csv("pokemon.csv")
pd.read_csv("pokemon.csv", usecols = ["Pokemon"])
pd.read_csv("pokemon.csv", usecols = ["Pokemon"]).squeeze("columns")
pokemon = pd.read_csv("pokemon.csv", usecols = ["Pokemon"]).squeeze("columns")
pokemon

0       Bulbasaur
1         Ivysaur
2        Venusaur
3      Charmander
4      Charmeleon
          ...    
716       Yveltal
717       Zygarde
718       Diancie
719         Hoopa
720     Volcanion
Name: Pokemon, Length: 721, dtype: object

In [7]:
pd.read_csv("google_stock_price.csv")
pd.read_csv("google_stock_price.csv", usecols = ["Stock Price"])
pd.read_csv("google_stock_price.csv", usecols = ["Stock Price"]).squeeze("columns")
google = pd.read_csv("google_stock_price.csv", usecols = ["Stock Price"]).squeeze("columns")
google

0        50.12
1        54.10
2        54.65
3        52.38
4        52.95
         ...  
3007    772.88
3008    771.07
3009    773.18
3010    771.61
3011    782.22
Name: Stock Price, Length: 3012, dtype: float64

## The head and tail Methods on a Series
- The `head` method returns a specified number of rows from the top of the `Series`. 
- The `tail` method returns a specified number of rows from the end of the `Series`. 

In [8]:
pokemon = pd.read_csv("pokemon.csv", usecols = ["Pokemon"]).squeeze("columns")
google = pd.read_csv("google_stock_price.csv", usecols = ["Stock Price"]).squeeze("columns")

In [9]:
pokemon.head()
pokemon.head(5)
pokemon.head(n = 5)

pokemon.head(1)

0    Bulbasaur
Name: Pokemon, dtype: object

In [10]:
google.tail()
google.tail(5)

google.tail(10)
google.tail(6)
google.tail(1)

3011    782.22
Name: Stock Price, dtype: float64

## Passing Series to Python's Built-In Functions

In [11]:
pokemon = pd.read_csv("pokemon.csv", usecols = ["Pokemon"]).squeeze("columns")
google = pd.read_csv("google_stock_price.csv", usecols = ["Stock Price"]).squeeze("columns")

pokemon.head()

0     Bulbasaur
1       Ivysaur
2      Venusaur
3    Charmander
4    Charmeleon
Name: Pokemon, dtype: object

In [None]:
len(pokemon)
type(pokemon)
dir(pokemon)
sorted(pokemon)
type(sorted(pokemon)) # list
list(pokemon)
dict(pokemon)
max(pokemon)
min(pokemon)

In [12]:
min(google)
max(google)

782.22

## The sort_values Method

In [68]:
pokemon = pd.read_csv("pokemon.csv", usecols = ["Pokemon"]).squeeze("columns")
google = pd.read_csv("google_stock_price.csv", usecols = ["Stock Price"]).squeeze("columns")

In [70]:
pokemon.sort_values()
pokemon.sort_values().head(5)
pokemon.sort_values(ascending = True)
pokemon.sort_values(ascending = False)

717     Zygarde
633    Zweilous
40        Zubat
569       Zorua
570     Zoroark
Name: Pokemon, dtype: object

In [15]:
google.sort_values()
google.sort_values(ascending = True).head(5)
google.sort_values(ascending = False).head(5)

3011    782.22
2859    776.60
3009    773.18
3007    772.88
3010    771.61
Name: Stock Price, dtype: float64

## The `sort_index` Method

In [100]:
pokemon = pd.read_csv("pokemon.csv", index_col = "Pokemon").squeeze("columns")
pokemon.sort_index()

Pokemon
Abomasnow      Grass
Abra         Psychic
Absol           Dark
Accelgor         Bug
Aegislash      Steel
              ...   
Zoroark         Dark
Zorua           Dark
Zubat         Poison
Zweilous        Dark
Zygarde       Dragon
Name: Type, Length: 721, dtype: object

In [101]:
pokemon.sort_index(ascending = False)

Pokemon
Zygarde       Dragon
Zweilous        Dark
Zubat         Poison
Zorua           Dark
Zoroark         Dark
              ...   
Aegislash      Steel
Accelgor         Bug
Absol           Dark
Abra         Psychic
Abomasnow      Grass
Name: Type, Length: 721, dtype: object

## Python's `in` Keyword
- The `in` keyword is used in Python to check if a value exists in a list.
- It works similarly for `Series`.
- However, `pandas` defaults to checking in the index, not the values.

In [160]:
pokemon = pd.read_csv("pokemon.csv", usecols = ["Pokemon"]).squeeze("columns")
pokemon.head()

0    Bulbasaur
1      Ivysaur
2     Venusaur
Name: Pokemon, dtype: object

In [161]:
"car" in "racecar"

True

In [162]:
2 in [1, 2, 3]

True

## Extract Series Value by Index Position

In [102]:
pokemon = pd.read_csv("pokemon.csv", usecols = ["Pokemon"]).squeeze("columns")
pokemon.head(3)

0    Bulbasaur
1      Ivysaur
2     Venusaur
Name: Pokemon, dtype: object

In [107]:
pokemon[0]
pokemon[500]

# Error if index position does not exist
# pokemon[1500]

pokemon[[100, 200, 300]]

pokemon[0:50] # Inclusive : Exclusive
pokemon[:50]  # From the beginning to index 50 (exclusive)

pokemon[25:68]
pokemon[700:] # From 700 to the end of the Series

# Negative index positions do not work
# pokemon[-1]

pokemon[-20:-10]
pokemon[-20:] # Same as pokemon.tail(20)

701      Dedenne
702      Carbink
703        Goomy
704      Sliggoo
705       Goodra
706       Klefki
707     Phantump
708    Trevenant
709    Pumpkaboo
710    Gourgeist
711     Bergmite
712      Avalugg
713       Noibat
714      Noivern
715      Xerneas
716      Yveltal
717      Zygarde
718      Diancie
719        Hoopa
720    Volcanion
Name: Pokemon, dtype: object

## Extract Series Value by Index Label

In [108]:
pokemon = pd.read_csv("pokemon.csv", index_col = "Pokemon").squeeze("columns")
pokemon.head()

Pokemon
Bulbasaur    Grass
Ivysaur      Grass
Venusaur     Grass
Name: Type, dtype: object

In [None]:
pokemon[0]
pokemon[[100, 134]]

In [113]:
pokemon["Bulbasaur"]
pokemon["Mewtwo"]

pokemon[["Charizard", "Jolteon", "Meowth"]]

# pokemon["Digimon"]

# pokemon[["Pikachu", "Digimon"]]

Pokemon
Blastoise     Water
Venusaur      Grass
Meowth       Normal
Name: Type, dtype: object

## The get Method on a Series
- First argument is the index label to attempt to locate.
- Second argument is answer if NOT FOUND (default is `NaN`)

In [2]:
pokemon = pd.read_csv("pokemon.csv", index_col = "Pokemon").squeeze("columns")
pokemon.head()

Pokemon
Bulbasaur     Grass
Ivysaur       Grass
Venusaur      Grass
Charmander     Fire
Charmeleon     Fire
Name: Type, dtype: object

In [5]:
pokemon.get(0)
pokemon.get("Bulbasaur")
pokemon.get([5, 10])
pokemon.get(["Moltres", "Meowth"])

# pokemon["Digimon"]
pokemon.get("Digimon")
# print(pokemon.get("Digimon"))

pokemon.get("Digimon", "Nonexistent")
pokemon.get("Moltres", "Nonexistent")

pokemon.get(["Moltres", "Digimon"], "Nonexistent")
pokemon.get([0, 500, 10000], "Nonexistent")

None


'Nonexistent'

## Overwrite a Series Value

In [2]:
pokemon = pd.read_csv("pokemon.csv", usecols = ["Pokemon"]).squeeze("columns")
pokemon.head()

0     Bulbasaur
1       Ivysaur
2      Venusaur
3    Charmander
4    Charmeleon
Name: Pokemon, dtype: object

In [3]:
pokemon[0] = "Borisaur"
pokemon.head()

# Will create new index position at end
# Will not populate values in between
pokemon[1500] = "Hello"
pokemon

pokemon[[1, 2, 4]] = ["Firemon", "Flamemon", "Blazemon"]
pokemon.head()

0      Borisaur
1       Ivysaur
2      Venusaur
3    Charmander
4    Charmeleon
Name: Pokemon, dtype: object

In [24]:
pokemon = pd.read_csv("pokemon.csv", index_col = "Pokemon").squeeze("columns")
pokemon.head()

pokemon["Bulbasaur"] = "Awesomeness"
pokemon.head()

Pokemon
Bulbasaur     Grass
Ivysaur       Grass
Venusaur      Grass
Charmander     Fire
Charmeleon     Fire
Name: Type, dtype: object

## The copy Method

In [6]:
pokemon_df = pd.read_csv("pokemon.csv", usecols = ["Pokemon"])
pokemon_series = pokemon_df.squeeze("columns")

In [8]:
pokemon_series[0] = "Whatever"
pokemon_series.head(1)
pokemon_df.head(1) # Can also be called on a DataFrame

Unnamed: 0,Pokemon
0,Whatever


In [None]:
#    yHouse (DataFrame)
#   Door (Series)

In [9]:
pokemon_df = pd.read_csv("pokemon.csv", usecols = ["Pokemon"])
pokemon_series = pokemon_df.squeeze("columns").copy()

In [10]:
pokemon_series[0] = "Whatever"
pokemon_series.head(1)
pokemon_df.head(1)

Unnamed: 0,Pokemon
0,Bulbasaur


## The inplace Parameter

In [13]:
google = pd.read_csv("google_stock_price.csv", usecols = ["Stock Price"]).squeeze("columns").copy()

google = (
    pd.read_csv("google_stock_price.csv", usecols = ["Stock Price"])
    .squeeze("columns")
    .copy()
)

In [12]:
# First option - without inplace
google = google.sort_values()
google.head()

11    49.95
9     50.07
0     50.12
10    50.70
12    50.74
Name: Stock Price, dtype: float64

In [14]:
# Second option - with inplace
google.sort_values(inplace = True)
google.head()

11    49.95
9     50.07
0     50.12
10    50.70
12    50.74
Name: Stock Price, dtype: float64

In [15]:
# Error, our Series is a 'view' of the original DataFrame
google = pd.read_csv("google_stock_price.csv", usecols = ["Stock Price"]).squeeze("columns")
google.sort_values(inplace = True)

ValueError: This Series is a view of some other array, to sort in-place you must create a copy

In [None]:
# We can copy on the original import or make a copy and reassign
google = google.copy()
google.sort_values(inplace = True)
google.head()

## Math Methods on `Series` Objects

In [116]:
google = pd.read_csv("google_stock_price.csv", usecols = ["Stock Price"]).squeeze("columns")
google.head(3)

0    50.12
1    54.10
2    54.65
Name: Stock Price, dtype: float64

In [None]:
google.count()    # Count values in Series, ignores null values (missing)
google.sum()      # Sum of all items - covered earlier
google.mean()     # Average of all items - covered earlier
google.product()  # Product of all values - covered earlier
google.std()      # Standard deviation
google.min()      # Smallest number, equal to min() but probably optimized
google.max()      # Greatest number, equal to max() but probably optimized
google.median()   # Item in the middle of sorted pack
google.mode()     # Most frequent item
google.describe() # Deliver a bunch of statistics

## Broadcasting

In [26]:
google = pd.read_csv("google_stock_price.csv", usecols = ["Stock Price"]).squeeze("columns")
google.head()

0    50.12
1    54.10
2    54.65
3    52.38
4    52.95
Name: Stock Price, dtype: float64

In [27]:
google + 10

0        60.12
1        64.10
2        64.65
3        62.38
4        62.95
         ...  
3007    782.88
3008    781.07
3009    783.18
3010    781.61
3011    792.22
Name: Stock Price, Length: 3012, dtype: float64

In [28]:
google - 30

0        20.12
1        24.10
2        24.65
3        22.38
4        22.95
         ...  
3007    742.88
3008    741.07
3009    743.18
3010    741.61
3011    752.22
Name: Stock Price, Length: 3012, dtype: float64

In [29]:
google * 2

0        100.24
1        108.20
2        109.30
3        104.76
4        105.90
         ...   
3007    1545.76
3008    1542.14
3009    1546.36
3010    1543.22
3011    1564.44
Name: Stock Price, Length: 3012, dtype: float64

## The `value_counts` Method

In [18]:
pokemon = pd.read_csv("pokemon.csv", index_col = "Pokemon").squeeze("columns")
pokemon.head(3)

Pokemon
Bulbasaur    Grass
Ivysaur      Grass
Venusaur     Grass
Name: Type, dtype: object

In [22]:
pokemon.value_counts()
pokemon.value_counts(ascending = True)

pokemon.value_counts(sort = False)

pokemon.value_counts(normalize = True)

pokemon.value_counts(normalize = True) * 100

Water       0.145631
Normal      0.128988
Grass       0.091540
Bug         0.087379
Fire        0.065187
Psychic     0.065187
Rock        0.056865
Electric    0.049931
Ground      0.041609
Poison      0.038835
Dark        0.038835
Fighting    0.034674
Dragon      0.033287
Ghost       0.031900
Ice         0.031900
Steel       0.030513
Fairy       0.023578
Flying      0.004161
Name: Type, dtype: float64

## The apply Method
- Invoke function on every value in a `Series`

In [30]:
pokemon = pd.read_csv("pokemon.csv", index_col = "Pokemon").squeeze("columns")
pokemon.head(3)

Pokemon
Bulbasaur    Grass
Ivysaur      Grass
Venusaur     Grass
Name: Type, dtype: object

In [32]:
len("Grass")
pokemon.apply(len)

Pokemon
Bulbasaur     5
Ivysaur       5
Venusaur      5
Charmander    4
Charmeleon    4
             ..
Yveltal       4
Zygarde       6
Diancie       4
Hoopa         7
Volcanion     4
Name: Type, Length: 721, dtype: int64

In [31]:
def rank_pokemon(pokemon_type): # Value gets sent in
    if pokemon_type in ["Grass", "Fire", "Water"]:
        return "Classic"
    elif pokemon_type == "Normal":
        return "Boring"
    else:
        return "TBD"
    
pokemon.apply(rank_pokemon)

Pokemon
Bulbasaur     Classic
Ivysaur       Classic
Venusaur      Classic
Charmander    Classic
Charmeleon    Classic
               ...   
Yveltal           TBD
Zygarde           TBD
Diancie           TBD
Hoopa             TBD
Volcanion     Classic
Name: Type, Length: 721, dtype: object

## The map Method
- Maps the values of a `Series` to another collection.
- If dictionary is passed, maps `Series` values to dictionary keys
- If `Series` is passed, maps `Series` values to other `Series` index

In [34]:
pokemon = pd.read_csv("pokemon.csv", index_col = "Pokemon").squeeze("columns")
pokemon.head()

Pokemon
Bulbasaur     Grass
Ivysaur       Grass
Venusaur      Grass
Charmander     Fire
Charmeleon     Fire
Name: Type, dtype: object

In [35]:
mappings = { 
    "Grass": "Classic", 
    "Fire": "Classic", 
    "Water": "Classic", 
    "Normal": "Boring" 
}

pokemon.map(mappings)

Pokemon
Bulbasaur     Classic
Ivysaur       Classic
Venusaur      Classic
Charmander    Classic
Charmeleon    Classic
               ...   
Yveltal           NaN
Zygarde           NaN
Diancie           NaN
Hoopa             NaN
Volcanion     Classic
Name: Type, Length: 721, dtype: object

In [36]:
mappings_series = pd.Series(mappings)
mappings_series

Grass     Classic
Fire      Classic
Water     Classic
Normal     Boring
dtype: object

In [37]:
pokemon.map(mappings_series)

Pokemon
Bulbasaur     Classic
Ivysaur       Classic
Venusaur      Classic
Charmander    Classic
Charmeleon    Classic
               ...   
Yveltal           NaN
Zygarde           NaN
Diancie           NaN
Hoopa             NaN
Volcanion     Classic
Name: Type, Length: 721, dtype: object