# Series

In [2]:
import pandas as pd

## Create a Series Object from a List
- A pandas **Series** is a one-dimensional labelled array.
- A Series combines the best features of a list and a dictionary.
- A Series maintains a single collection of ordered values (i.e. a single column of data).
- We can assign each value an identifier, which does not have to *be* unique.

In [3]:
#creating a pandas series from a list, the index of the series is automatically put to integers (0)..(len-)
list_icecream: list[str] = ["chocolate", "vanilla", "cookies"]
list_series:pd.Series = pd.Series(list_icecream)
#strings data types are seen by pandas as an object
list_series

0    chocolate
1      vanilla
2      cookies
dtype: object

In [4]:
registrations = [True, False, True, False]

bool_series = pd.Series(registrations)
bool_series

0     True
1    False
2     True
3    False
dtype: bool

In [5]:
#series from dict
name_num_dict: dict [str,int] = {"john1": 69, "robert": 4, "rebbeca": 6, "john2": 69}

dict_series = pd.Series(name_num_dict)

dict_series

john1      69
robert      4
rebbeca     6
john2      69
dtype: int64

## Create a Series Object from a Dictionary

## Intro to Series Methods
- The syntax to invoke a method on any object is `object.method()`.
- The `sum` method adds together the **Series'** values.
- The `product` method multiplies the **Series'** values.
- The `mean` method finds the average of the **Series'** values.
- The `std` method finds the standard deviation of the **Series'** values.

In [6]:
num_series = pd.Series([3.11, 4.9, 5.0, 8.9])
num_series.std()

2.4413162433408746

In [7]:
dict_series.value_counts()

69    2
4     1
6     1
Name: count, dtype: int64

## Intro to Attributes
- An **attribute** is a piece of data that lives on an object.
- An **attribute** is a fact, a detail, a characteristic of the object.
- Access an attribute with `object.attribute` syntax.
- The `size` attribute returns a count of the number of values in the **Series**.
- The `is_unique` attribute returns True if the **Series** has no duplicate values.
- The `values` and `index` attributes return the underlying objects that holds the **Series'** values and index labels.

In [8]:
adjectives = pd.Series(["smart", "beatiful", "humble"])

adjectives.size

3

In [9]:
vals = adjectives.values
print(type(vals))

for i in vals:
   print(i)

<class 'numpy.ndarray'>
smart
beatiful
humble


## Parameters and Arguments
- A **parameter** is the name for an expected input to a function/method/class instantiation.
- An **argument** is the concrete value we provide for a parameter during invocation.
- We can pass arguments either sequentially (based on parameter order) or with explicit parameter names written out.
- The first two parameters for the **Series** constructor are `data` and `index`, which represent the values and the index labels.

In [10]:
fruits: list[str] = ["orange", "pear", "grape", "plum"]
days :list[str] = ["sunday", "monday", "tuesday", "wedsnesday"]
series= pd.Series(fruits,days)

series.index

Index(['sunday', 'monday', 'tuesday', 'wedsnesday'], dtype='object')

## Import Series with the pd.read_csv Function
- A **CSV** is a plain text file that uses line breaks to separate rows and commas to separate row values.
- Pandas ships with many different `read_` functions for different types of files.
- The `read_csv` function accepts many different parameters. The first one specifies the file name/path.
- The `read_csv` function will import the dataset as a **DataFrame**, a 2-dimensional table.
- The `usecols` parameter accepts a list of the column(s) to import.
- The `squeeze` method converts a **DataFrame** to a **Series**.

In [11]:
import os

pokemon: pd.Series = pd.read_csv(os.path.join("pokemon.csv"), usecols=["Name","Type"], index_col="Name")
pokemon = pokemon.squeeze()
pokemon

Name
Bulbasaur          Grass, Poison
Ivysaur            Grass, Poison
Venusaur           Grass, Poison
Charmander                  Fire
Charmeleon                  Fire
                      ...       
Iron Valiant     Fairy, Fighting
Koraidon        Fighting, Dragon
Miraidon        Electric, Dragon
Walking Wake       Water, Dragon
Iron Leaves       Grass, Psychic
Name: Type, Length: 1010, dtype: object

In [12]:
#squeezing a series into a scalar
series1 = pd.Series([1])

scalar = series1.squeeze()
type(scalar)

#google stock series with date index
google_stock_date_ind = pd.read_csv(os.path.join("google_stock_price.csv"), index_col="Date")
google_stock_date_ind: pd.Series = google_stock_date_ind.squeeze()

#google stock series with num index
google_stock_num_ind = pd.read_csv(os.path.join("google_stock_price.csv"), usecols=["Price"])
google_stock_num_ind: pd.Series = google_stock_num_ind.squeeze()


## The head and tail Methods
- The `head` method returns a number of rows from the top/beginning of the `Series`.
- The `tail` method returns a number of rows from the bottom/end of the `Series`.

In [13]:
google_stock_num_ind.head(-9)

0         2.490664
1         2.515820
2         2.758411
3         2.770615
4         2.614201
           ...    
4779    131.589996
4780    129.279999
4781    130.449997
4782    129.059998
4783    127.849998
Name: Price, Length: 4784, dtype: float64

## Passing Series to Python's Built-In Functions
- The `len` function returns the length of the **Series**.
- The `type` function returns the type of an object.
- The `list` function converts the **Series** to a list.
- The `dict` function converts the **Series** to a dictionary.
- The `sorted` function converts the **Series** to a sorted list.
- The `max` function returns the largest value in the **Series**.
- The `min` function returns the smalllest value in the **Series**.

In [14]:
dict1 = dict(google_stock_num_ind.head(5))
dict1

{0: 2.490664, 1: 2.51582, 2: 2.758411, 3: 2.770615, 4: 2.614201}

## Check for Inclusion with Python's in Keyword
- The `in` keyword checks if a value exists within an object.
- The `in` keyword will look for a value in the **Series's** index.
- Use the `index` and `values` attributes to access "nested" objects within the **Series**.
- Combine the `in` keyword with `values` to search within the **Series's** values.

In [15]:
#"Grass, Poison" in pokemon.values

pokemon["Bulbasaur"]

'Grass, Poison'

## The sort_values Method
- The `sort_values` method sorts a **Series** values in order.
- By default, pandas applies an ascending sort (smallest to largest).
- Customize the sort order with the `ascending` parameter.

In [16]:

sorted_google: pd.Series = google_stock_num_ind.sort_values(ignore_index=True)

sorted_index_google =  google_stock_num_ind.sort_index(ignore_index=True)

sorted_google
sorted_index_google
google_stock_num_ind

0         2.490664
1         2.515820
2         2.758411
3         2.770615
4         2.614201
           ...    
4788    132.080002
4789    132.998001
4790    135.570007
4791    137.050003
4792    138.429993
Name: Price, Length: 4793, dtype: float64

## The sort_index Method
- The `sort_index` method sorts a **Series** by its index.
- The `sort_index` method also accepts an `ascending` parameter to set sort order.

In [17]:
pokemon_type_index: pd.Series = pd.Series(data= pokemon.index , index= pokemon.values)
#print(pokemon)
list_ = pokemon.iloc[:5]
list_

Name
Bulbasaur     Grass, Poison
Ivysaur       Grass, Poison
Venusaur      Grass, Poison
Charmander             Fire
Charmeleon             Fire
Name: Type, dtype: object

In [18]:
pokemon_sorted = pokemon.sort_index(ascending= False)
pokemon_sorted["Bulbasaur"]

'Grass, Poison'

In [19]:
google_stock_num_ind.iloc[1:5] = 2.81
google_stock_num_ind

0         2.490664
1         2.810000
2         2.810000
3         2.810000
4         2.810000
           ...    
4788    132.080002
4789    132.998001
4790    135.570007
4791    137.050003
4792    138.429993
Name: Price, Length: 4793, dtype: float64

## Extract Series Value by Index Position
- Use the `iloc` accessor to extract a **Series** value by its index position.
- `iloc` is short for "index location".
- Python's list slicing syntaxes (slices, slices from start, slices to end, etc.) are supported with **Series** objects.

In [20]:
pokemon

Name
Bulbasaur          Grass, Poison
Ivysaur            Grass, Poison
Venusaur           Grass, Poison
Charmander                  Fire
Charmeleon                  Fire
                      ...       
Iron Valiant     Fairy, Fighting
Koraidon        Fighting, Dragon
Miraidon        Electric, Dragon
Walking Wake       Water, Dragon
Iron Leaves       Grass, Psychic
Name: Type, Length: 1010, dtype: object

## Extract Series Value by Index Label
- Use the `loc` accessor to extract a **Series** value by its index label.
- Pass a list to extract multiple values by index label.
- If one index label/position in the list does not exist, Pandas will raise an error.

In [21]:
pokemon.loc[["Bulbasaur", "Ivysaur","Koraidon" ]]

Name
Bulbasaur       Grass, Poison
Ivysaur         Grass, Poison
Koraidon     Fighting, Dragon
Name: Type, dtype: object

## The get Method on a Series
- The `get` method extracts a **Series** value by index label. It is an alternative option to square brackets.
- The `get` method's second argument sets the fallback value to return if the label/position does not exist.

## Overwrite a Series Value
- Use the `loc/iloc` accessor to target an index label/position, then use an equal sign to provide a new value.

In [22]:
pokemon.iloc[0] = "op pokemon"
pokemon.iloc[0]

'op pokemon'

## The copy Method
- A **copy** is a duplicate/replica of an object.
- Changes to a copy do not modify the original object.
- A **view** is a different way of looking at the *same* data.
- Changes to a view *do* modify the original object.
- The `copy` method creates a copy of a pandas object.

In [23]:
pokemon_df: pd.DataFrame = pd.read_csv(os.path.join("pokemon.csv"), usecols=["Name","Type"], index_col="Name")
pokemon_series = pokemon_df.squeeze().copy() #pokemon_series is a copy 

In [24]:
pokemon_series: pd.Series = pd.read_csv(os.path.join("pokemon.csv"), usecols=["Name","Type"], index_col="Name").squeeze()


## Math Methods on Series Objects
- The `count` method returns the number of values in the **Series**. It excludes missing values; the `size` attribute includes missing values.
- The `sum` method adds together the **Series's** values.
- The `product` method multiplies together the **Series's** values.
- The `mean` method calculates the average of the **Series's** values.
- The `std` method calculates the standard deviation of the **Series's** values.
- The `max` method returns the largest value in the **Series**.
- The `min` method returns the smallest value in the **Series**.
- The `median` method returns the median of the **Series** (the value in the middle).
- The `mode` method returns the mode of the **Series** (the most frequent alue).
- The `describe` method returns a summary with various mathematical calculations.

In [39]:
series2: pd.Series = pd.Series([1,2,3,4,1,2,1,1])
type(series2.describe().loc["count"])

numpy.float64

## Broadcasting
- **Broadcasting** describes the process of applying an arithmetic operation to an array (i.e., a **Series**).
- We can combine mathematical operators with a **Series** to apply the mathematical operation to every value.
- There are also methods to accomplish the same results (`add`, `sub`, `mul`, `div`, etc.)

In [53]:
pokemon_series.value_counts(normalize=True)

Date
2004-08-19    False
2004-08-20    False
2004-08-23    False
2004-08-24    False
2004-08-25    False
              ...  
2023-08-28     True
2023-08-29     True
2023-08-30     True
2023-08-31     True
2023-09-01     True
Name: Price, Length: 4793, dtype: bool

## The value_counts Method
- The `value_counts` method returns the number of times each unique value occurs in the **Series**.
- The `normalize` parameter returns the relative frequencies/percentages of the values instead of the counts.

## The apply Method
- The `apply` method accepts a function. It invokes that function on every `Series` value.

In [67]:
pokemon_series.apply(len)

def count_upcase_a(str_:str)->int:
   return str_.count("a")

google_stock_date_ind.apply(int)
pokemon_type_index.apply(count_upcase_a)

op pokemon          2
Grass, Poison       1
Grass, Poison       1
Fire                2
Fire                1
                   ..
Fairy, Fighting     2
Fighting, Dragon    1
Electric, Dragon    1
Water, Dragon       2
Grass, Psychic      1
Name: Name, Length: 1010, dtype: int64

## The map Method
- The `map` method "maps" or connects each **Series** values to another value.
- We can pass the method a dictionary or a **Series**. Both types connects keys to values.
- The `map` method uses our argument to connect or bridge together the values.

In [92]:
attack_powers = {"Grass": 10,"Fire": 15,"Water": 15,"Fairy, Fighting": 20,"Grass, Psychic": 50}
new_series = pokemon.map(attack_powers)
new_series.info()

<class 'pandas.core.series.Series'>
Index: 1010 entries, Bulbasaur to Iron Leaves
Series name: Type
Non-Null Count  Dtype  
--------------  -----  
160 non-null    float64
dtypes: float64(1)
memory usage: 48.1+ KB
