In [None]:
import jupyter_black

jupyter_black.load()

# Series

## Create a Series Object from a List Pandas
- A pandas **Series** is a one-dimensional labelled array.
- A Series combines the best features of a list and a dictionary.
- A Series maintains a single collection of ordered values (i.e. a single column of data).
- We can assign each value an identifier, which does not have to *be* unique.

In [None]:
import pandas as pd
import polars as pl

## Create a `pandas` Series Object from a List
- A pandas **Series** is a one-dimensional labelled array.
- A Series combines the best features of a list and a dictionary.
- A Series maintains a single collection of ordered values (i.e. a single column of data).
- We can assign each value an identifier, which does not have to *be* unique.

In [None]:
# A Series of strings
ice_cream = ["Chocolate", "Vanilla", "Strawberry", "Rum Raisin"]
ice_cream_pd = pd.Series(ice_cream)

# A series of ints
lottery_numbers = [4, 8, 15, 16, 23, 42]
lottery_numbers_pd = pd.Series(lottery_numbers)

# A series of booleans
registrations = [True, False, False, False, True]
registrations_pd = pd.Series(registrations)

print(f"Strings:\n{ice_cream_pd}\n")
print(f"Ints:\n{lottery_numbers_pd}\n")
print(f"Booleans:\n{registrations_pd}\n")

## Create a `polars` Series Object from a List
- A polars **Series** represents a single column in a polars DataFrame.
- `name`: Name of the Series. Will be used as a column name when used in a DataFrame. When not specified, name is set to an empty string.
- `values`: One-dimensional data in various forms. Supported are: Sequence, Series, pyarrow Array, and numpy ndarray.
- `dtype`: Data type of the resulting Series. If set to None (default), the data type is inferred from the values input. 

In [None]:
# A Series of strings
ice_cream_pl = pl.Series(
    name="ice_cream",
    values=ice_cream,
    dtype=pl.String,
)

# A series of ints
lottery_numbers_pl = pl.Series(
    name="lottery_numbers",
    values=lottery_numbers,
    dtype=pl.Int32,
)

# A series of booleans
registrations_pl = pl.Series(
    name="registrations",
    values=registrations,
    dtype=pl.Boolean,
)

print(f"Strings:\n{ice_cream_pl}\n")
print(f"Ints:\n{lottery_numbers_pl}\n")
print(f"Booleans:\n{registrations_pl}\n")

## Create a `pandas` Series Object from a Dictionary

In [None]:
sushi = {"Salmon": "Orange", "Tuna": "Red", "Eel": "Brown"}

pd.Series(sushi)

## Create a `polars` Series Object from a Dictionary

In the `pandas` dictionary to series conversion the `key` became the index and the `value` became the row value. As `polars` doesn't have the same concept of an index that `pandas` has there are tow options:

1. Create two `polars` series.
2. Create a `polars` dataframe with the dictionaty `key` as one colunm and the dictionary `value` as a second column.

In [None]:
# Two series
fish = pl.Series(
    name="fish_name",
    values=sushi.keys(),
    dtype=pl.String,
)

colour = pl.Series(
    name="fish_colour",
    values=sushi.values(),
    dtype=pl.String,
)

print(fish, colour)

# Dataframe
fish_colour = pl.DataFrame(
    {"fish_name": sushi.keys(), "fish_colour": sushi.values()},
    schema={"fish_name": pl.String, "fish_colour": pl.String},
)

print(fish_colour)

## Intro to Series Methods

`pandas` and `polars` have a lot of the same series methods.

- The syntax to invoke a method on any object is `object.method()`.
- The `sum` method adds together the **Series'** values.
- The `product` method multiplies the **Series'** values.
- The `mean` method finds the average of the **Series'** values.
- The `std` method finds the standard deviation of the **Series'** values.

In [None]:
prices_pd = pd.Series([2.99, 4.45, 1.36])
prices_pl = pl.Series(
    name="prices",
    values=[2.99, 4.45, 1.36],
    dtype=pl.Float64,
)

prices_pd, prices_pl

In [None]:
# Sum
prices_pd.sum(), prices_pl.sum()

In [None]:
# Product
prices_pd.product(), prices_pl.product()

In [None]:
# Mean
prices_pd.mean(), prices_pl.mean()

In [None]:
# Standard Deviation
prices_pd.std(), prices_pl.std()

## Intro to Attributes
- An **attribute** is a piece of data that lives on an object.
- An **attribute** is a fact, a detail, a characteristic of the object.
- Access an attribute with `object.attribute` syntax.

### `pandas` series have the following attributes
- The `size` attribute returns a count of the number of values in the **Series**.
- The `is_unique` attribute returns True if the **Series** has no duplicate values.
- The `values` and `index` attributes return the underlying objects that holds the **Series'** values and index labels.

### `polars` series don't have the attribute and instead have to use methods
- `len` Returns the number of elements in the Series.
- There is no  `is_unique` attribute or equivalent in `polars`. I can think of two alternative ways of doing this:
  - Use `equals` to compare the original series and the output of the `unique` method which gets the unique elements in series.
  - Compare if the `len` of the series is the same as the output of `n_unique`
- Once again there is not replica of the `pandas` `values` attribute but there are multiple ways to get the values. Two examples are:
  - `to_list`: Convert the Series to a Python list.
  - `to_numpy`: Convert the Series to a NumPy ndarray.
- `polars` series don't have indexes. See [Understand Polars’ Lack of Indexes](https://towardsdatascience.com/understand-polars-lack-of-indexes-526ea75e413)


In [None]:
adjectives = ["Smart", "Handsome", "Charming", "Brilliant", "Humble", "Smart"]

adjectives_pd = pd.Series(
    ["Smart", "Handsome", "Charming", "Brilliant", "Humble", "Smart"]
)
adjectives_pl = pl.Series(
    name="adjectives",
    values=["Smart", "Handsome", "Charming", "Brilliant", "Humble", "Smart"],
    dtype=pl.String,
)

adjectives_pd, adjectives_pl

In [None]:
# Size
adjectives_pd.size, adjectives_pl.len()

In [None]:
# Is Unique
(
    adjectives_pd.is_unique,
    adjectives_pl.equals(adjectives_pl.unique()),
    adjectives_pl.len() == adjectives_pl.n_unique(),
)

In [None]:
(
    adjectives_pd.values,
    adjectives_pl.to_list(),
    adjectives_pl.to_numpy(),
)

In [None]:
# Index
# Polars does not have the same concept of an index
adjectives_pd.index

In [None]:
# Values type
(
    type(adjectives_pd.values),
    type(adjectives_pl.to_list()),
    type(adjectives_pl.to_numpy()),
)

In [None]:
# Pandas Index Type
type(adjectives.index)

## `pandas` Parameters and Arguments
- A **parameter** is the name for an expected input to a function/method/class instantiation.
- An **argument** is the concrete value we provide for a parameter during invocation.
- We can pass arguments either sequentially (based on parameter order) or with explicit parameter names written out.
- The first two parameters for the **Series** constructor are `data` and `index`, which represent the values and the index labels.

## `polars` Parameters and Arguments
`polars` has 5 parameters:
- `name: str, default None`
  - Name of the Series. Will be used as a column name when used in a DataFrame. When not specified, name is set to an empty string.
- `values: ArrayLike, default None`
  - One-dimensional data in various forms. Supported are: Sequence, Series, pyarrow Array, and numpy ndarray.
- `dtype: DataType, default None`
  - Data type of the resulting Series. If set to None (default), the data type is inferred from the values input. The strategy for data type inference depends on the strict parameter:
- `strict bool, default True`
  - Throw an error if any value does not exactly match the given or inferred data type. If set to False, values that do not match the data type are cast to that data type or, if casting is not possible, set to null instead.
- `nan_to_null: bool, default False`
  - In case a numpy array is used to create this Series, indicate how to deal with np.nan values. (This parameter is a no-op on non-numpy data).

As `polars` doesn't have the concept of an index just create a single series. Using both series to create an index implies a dataframe.

In [None]:
fruits = ["Apple", "Orange", "Plum", "Grape", "Blueberry", "Watermelon"]
weekdays = ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Monday"]

In [None]:
pd.Series(fruits)
pd.Series(weekdays)
pd.Series(fruits, weekdays)
pd.Series(weekdays, fruits)

In [None]:
pd.Series(data=fruits, index=weekdays)
pd.Series(index=weekdays, data=fruits)

In [None]:
pd.Series(fruits, index=weekdays)

In [None]:
pl.Series()

In [None]:
pl.Series(
    name="fruits",
    values=fruits,
    dtype=pl.String,
)

In [None]:
pl.Series(
    name="fruits",
    values=fruits,
    dtype=pl.String,
)

In [None]:
pl.Series(
    values=fruits,
)

In [None]:
pl.Series(fruits)

In [None]:
pl.Series()

## Import Series with the pd.read_csv Function
- A **CSV** is a plain text file that uses line breaks to separate rows and commas to separate row values.
- Pandas ships with many different `read_` functions for different types of files.
- The `read_csv` function accepts many different parameters. The first one specifies the file name/path.
- The `read_csv` function will import the dataset as a **DataFrame**, a 2-dimensional table.
- The `usecols` parameter accepts a list of the column(s) to import.
- The `squeeze` method converts a **DataFrame** to a **Series**.

In [None]:
pokemon = pd.read_csv("pokemon.csv", usecols=["Name"]).squeeze("columns")
pokemon

In [None]:
google = pd.read_csv("google_stock_price.csv", usecols=["Price"]).squeeze("columns")
google

## The head and tail Methods
- The `head` method returns a number of rows from the top/beginning of the `Series`.
- The `tail` method returns a number of rows from the bottom/end of the `Series`.

In [None]:
pokemon = pd.read_csv("pokemon.csv", usecols=["Name"]).squeeze("columns")
google = pd.read_csv("google_stock_price.csv", usecols=["Price"]).squeeze("columns")

In [None]:
pokemon.head()
pokemon.head(5)
pokemon.head(n=5)

pokemon.head(8)
pokemon.head(1)

In [None]:
google.tail()
google.tail(5)
google.tail(n=5)

google.tail(7)
google.tail(n=2)

## Passing Series to Python's Built-In Functions
- The `len` function returns the length of the **Series**.
- The `type` function returns the type of an object.
- The `list` function converts the **Series** to a list.
- The `dict` function converts the **Series** to a dictionary.
- The `sorted` function converts the **Series** to a sorted list.
- The `max` function returns the largest value in the **Series**.
- The `min` function returns the smalllest value in the **Series**.

In [None]:
pokemon = pd.read_csv("pokemon.csv", usecols=["Name"]).squeeze("columns")
google = pd.read_csv("google_stock_price.csv", usecols=["Price"]).squeeze("columns")

In [None]:
pokemon.head()

In [None]:
len(pokemon)
type(pokemon)
list(pokemon)
sorted(pokemon)
type(sorted(pokemon))
sorted(google)
dict(pokemon)

max(google)
min(google)

max(pokemon)
min(pokemon)

## Check for Inclusion with Python's in Keyword
- The `in` keyword checks if a value exists within an object.
- The `in` keyword will look for a value in the **Series's** index.
- Use the `index` and `values` attributes to access "nested" objects within the **Series**.
- Combine the `in` keyword with `values` to search within the **Series's** values.

In [None]:
pokemon = pd.read_csv("pokemon.csv", usecols=["Name"]).squeeze("columns")
google = pd.read_csv("google_stock_price.csv", usecols=["Price"]).squeeze("columns")

pokemon.head()

In [None]:
"car" in "racecar"
2 in [3, 2, 1]

"Bulbasaur" in pokemon
0 in pokemon
5 in pokemon.index

"Bulbasaur" in pokemon.values
"Pikachu" in pokemon.values
"Nonsense" in pokemon.values

## The sort_values Method
- The `sort_values` method sorts a **Series** values in order.
- By default, pandas applies an ascending sort (smallest to largest).
- Customize the sort order with the `ascending` parameter.

In [None]:
pokemon = pd.read_csv("pokemon.csv", usecols=["Name"]).squeeze("columns")
google = pd.read_csv("google_stock_price.csv", usecols=["Price"]).squeeze("columns")

google.head()

In [None]:
google.sort_values()
google.sort_values(ascending=True)
google.sort_values(ascending=False)
google.sort_values(ascending=False).head()

In [None]:
pokemon.sort_values()
pokemon.sort_values(ascending=True)
pokemon.sort_values(ascending=False)
pokemon.sort_values(ascending=False).tail()

## The sort_index Method
- The `sort_index` method sorts a **Series** by its index.
- The `sort_index` method also accepts an `ascending` parameter to set sort order.

In [None]:
pokemon = pd.read_csv("pokemon.csv", index_col="Name").squeeze("columns")
pokemon.head()

In [None]:
pokemon.sort_index()
pokemon.sort_index(ascending=True)
pokemon.sort_index(ascending=False)

## Extract Series Value by Index Position
- Use the `iloc` accessor to extract a **Series** value by its index position.
- `iloc` is short for "index location".
- Python's list slicing syntaxes (slices, slices from start, slices to end, etc.) are supported with **Series** objects.

In [None]:
pokemon = pd.read_csv("pokemon.csv", usecols=["Name"]).squeeze("columns")
pokemon.head()

In [None]:
pokemon.iloc[0]
pokemon.iloc[500]

# pokemon.iloc[1500]

pokemon.iloc[[100, 200, 300]]

# pokemon.iloc[[100, 200, 300, 1500]]

pokemon.iloc[27:36]
pokemon.iloc[0:7]
pokemon.iloc[:7]

pokemon.iloc[700:1010]
pokemon.iloc[700:]
pokemon.iloc[700:5000]

pokemon.iloc[-1]
pokemon.iloc[-10]

pokemon.iloc[-10:-5]
pokemon.iloc[-8:]

## Extract Series Value by Index Label
- Use the `loc` accessor to extract a **Series** value by its index label.
- Pass a list to extract multiple values by index label.
- If one index label/position in the list does not exist, Pandas will raise an error.

In [None]:
pokemon = pd.read_csv("pokemon.csv", index_col="Name").squeeze("columns")
pokemon.head()

In [None]:
pokemon.loc["Bulbasaur"]

In [None]:
pokemon.iloc[0]

In [None]:
pokemon.loc["Mewtwo"]
pokemon.loc[["Charizard", "Jolteon", "Meowth"]]

# pokemon.loc["Digimon"]

# pokemon.loc[["Pikachu", "Digimon"]]

## The get Method on a Series
- The `get` method extracts a **Series** value by index label. It is an alternative option to square brackets.
- The `get` method's second argument sets the fallback value to return if the label/position does not exist.

In [None]:
pokemon = pd.read_csv("pokemon.csv", index_col="Name").squeeze("columns")
pokemon.head()

In [None]:
pokemon.get("Moltres")
pokemon.loc["Moltres"]

# pokemon.loc["Digimon"]
pokemon.get("Digimon")
pokemon.get("Digimon", "Nonexistent")
pokemon.get("Moltres", "Nonexistent")

pokemon.get(["Moltres", "Digimon"], "One of the values in the list was not found")

## Overwrite a Series Value
- Use the `loc/iloc` accessor to target an index label/position, then use an equal sign to provide a new value.

In [None]:
pokemon = pd.read_csv("pokemon.csv", usecols=["Name"]).squeeze("columns")
pokemon.head()

In [None]:
pokemon.iloc[0] = "Borisaur"

In [None]:
pokemon.head()

In [None]:
pokemon.iloc[[1, 2, 4]] = ["Firemon", "Flamemon", "Blazemon"]

In [None]:
pokemon.head()

In [None]:
pokemon = pd.read_csv("pokemon.csv", index_col="Name").squeeze("columns")
pokemon.head()

In [None]:
pokemon.loc["Bulbasaur"] = "Awesomeness"

In [None]:
pokemon.head()

In [None]:
pokemon.iloc[1] = "Silly"

In [None]:
pokemon.head()

## The copy Method
- A **copy** is a duplicate/replica of an object.
- Changes to a copy do not modify the original object.
- A **view** is a different way of looking at the *same* data.
- Changes to a view *do* modify the original object.
- The `copy` method creates a copy of a pandas object.

In [None]:
pokemon_df = pd.read_csv("pokemon.csv", usecols=["Name"])
pokemon_series = pokemon_df.squeeze("columns").copy()

In [None]:
pokemon_df

In [None]:
pokemon_series[0] = "Whatever"

In [None]:
pokemon_series.head()

In [None]:
pokemon_df

## Math Methods on Series Objects
- The `count` method returns the number of values in the **Series**. It excludes missing values; the `size` attribute includes missing values.
- The `sum` method adds together the **Series's** values.
- The `product` method multiplies together the **Series's** values.
- The `mean` method calculates the average of the **Series's** values.
- The `std` method calculates the standard deviation of the **Series's** values.
- The `max` method returns the largest value in the **Series**.
- The `min` method returns the smallest value in the **Series**.
- The `median` method returns the median of the **Series** (the value in the middle).
- The `mode` method returns the mode of the **Series** (the most frequent alue).
- The `describe` method returns a summary with various mathematical calculations.

In [None]:
google = pd.read_csv("google_stock_price.csv", usecols=["Price"]).squeeze("columns")
google.head()

In [None]:
google.count()
google.sum()
google.product()
pd.Series([1, 2, 3, 4]).product()
google.mean()
google.std()
google.max()
google.min()
google.median()
google.mode()
pd.Series([1, 2, 2, 2, 3]).mode()

google.describe()

## Broadcasting
- **Broadcasting** describes the process of applying an arithmetic operation to an array (i.e., a **Series**).
- We can combine mathematical operators with a **Series** to apply the mathematical operation to every value.
- There are also methods to accomplish the same results (`add`, `sub`, `mul`, `div`, etc.)

In [None]:
google = pd.read_csv("google_stock_price.csv", usecols=["Price"]).squeeze("columns")
google.head()

In [None]:
google.add(10)
google + 10

google.sub(30)
google - 30

google.mul(1.25)
google * 1.25
1.25 * google

google.div(2)
google / 2

## The value_counts Method
- The `value_counts` method returns the number of times each unique value occurs in the **Series**.
- The `normalize` parameter returns the relative frequencies/percentages of the values instead of the counts.

In [None]:
pokemon = pd.read_csv("pokemon.csv", index_col="Name").squeeze("columns")
pokemon.head()

In [None]:
pokemon.value_counts()
pokemon.value_counts(ascending=True)
pokemon.value_counts(normalize=True)
pokemon.value_counts(normalize=True) * 100

## The apply Method
- The `apply` method accepts a function. It invokes that function on every `Series` value.

In [None]:
pokemon = pd.read_csv("pokemon.csv", usecols=["Name"]).squeeze("columns")
pokemon.head()

In [None]:
pokemon.apply(len)

In [None]:
def count_of_a(pokemon):
    return pokemon.count("a")

pokemon.apply(count_of_a)

## The map Method
- The `map` method "maps" or connects each **Series** values to another value.
- We can pass the method a dictionary or a **Series**. Both types connects keys to values.
- The `map` method uses our argument to connect or bridge together the values.

In [None]:
pokemon = pd.read_csv("pokemon.csv", index_col="Name").squeeze("columns")
pokemon

In [None]:
attack_powers = pd.Series({
    "Grass": 10,
    "Fire": 15,
    "Water": 15,
    "Fairy, Fighting": 20,
    "Grass, Psychic": 50
})

attack_powers

In [None]:
pokemon.map(attack_powers)