# Series

In [1]:
import pandas as pd

## Create a Series Object from a List
- A pandas **Series** is a one-dimensional labelled array.
- A Series combines the best features of a list and a dictionary.
- A Series maintains a single collection of ordered values (i.e. a single column of data).
- We can assign each value an identifier, which does not have to *be* unique.

In [4]:
cookies = ["choco", "soft bake", "lemon", "caramel"]
pd.Series(cookies)

0        choco
1    soft bake
2        lemon
3      caramel
dtype: object

In [6]:
pookiewookie = [1, 2, 8, 5, 7, 4]
pd.Series(pookiewookie)

0    1
1    2
2    8
3    5
4    7
5    4
dtype: int64

In [8]:
two_options = [True, True, False, True, False]
pd.Series(two_options)

0     True
1     True
2    False
3     True
4    False
dtype: bool

## Create a Series Object from a Dictionary

In [11]:
slavic_bodega = {
    "caviar": "green",
    "paste": "pink",
    "chicory": "brown",
    "cabbage": "white"
}
pd.Series(slavic_bodega)

caviar     green
paste       pink
chicory    brown
cabbage    white
dtype: object

## Intro to Series Methods
- The syntax to invoke a method on any object is `object.method()`.
- The `sum` method adds together the **Series'** values.
- The `product` method multiplies the **Series'** values.
- The `mean` method finds the average of the **Series'** values.
- The `std` method finds the standard deviation of the **Series'** values.

In [14]:
prices = pd.Series([1.3, 2, 4, 5.8])
prices

0    1.3
1    2.0
2    4.0
3    5.8
dtype: float64

In [16]:
prices.std()

2.0353132437047616

In [18]:
prices.sum()
prices.product()

60.32

## Intro to Attributes
- An **attribute** is a piece of data that lives on an object.
- An **attribute** is a fact, a detail, a characteristic of the object.
- Access an attribute with `object.attribute` syntax.
- The `size` attribute returns a count of the number of values in the **Series**.
- The `is_unique` attribute returns True if the **Series** has no duplicate values.
- The `values` and `index` attributes return the underlying objects that holds the **Series'** values and index labels.

In [21]:
#can help expose the underlying pieces that make an object up

adjective = pd.Series(["ugly", "hot", "sweaty", "mouth-breather", "hot"])
adjective

0              ugly
1               hot
2            sweaty
3    mouth-breather
4               hot
dtype: object

In [23]:
adjective.size

5

In [25]:
adjective.is_unique

False

In [27]:
# object made from objects (car constructed from many parts ex.)

In [29]:
adjective.values

array(['ugly', 'hot', 'sweaty', 'mouth-breather', 'hot'], dtype=object)

In [31]:
adjective.index

RangeIndex(start=0, stop=5, step=1)

In [33]:
type(adjective.values)

numpy.ndarray

In [35]:
type(adjective.index)

pandas.core.indexes.range.RangeIndex

## Parameters and Arguments
- A **parameter** is the name for an expected input to a function/method/class instantiation.
- An **argument** is the concrete value we provide for a parameter during invocation.
- We can pass arguments either sequentially (based on parameter order) or with explicit parameter names written out.
- The first two parameters for the **Series** constructor are `data` and `index`, which represent the values and the index labels.

In [38]:
fruits = ["apples", "oranges", "figs", "melon"]
weekdays = ["mon", "tues", "wed", "thurs"]

In [40]:
pd.Series(fruits)
pd.Series(weekdays)
pd.Series(fruits, weekdays)

mon       apples
tues     oranges
wed         figs
thurs      melon
dtype: object

In [42]:
pd.Series(data=fruits, index=weekdays)
pd.Series(index=weekdays, data=fruits)
pd.Series(data=weekdays, index=fruits)

apples       mon
oranges     tues
figs         wed
melon      thurs
dtype: object

In [44]:
# you can match by name, rather than relying on automatic positioning from parameter list (shift+tab)
pd.Series(fruits,index=weekdays)

mon       apples
tues     oranges
wed         figs
thurs      melon
dtype: object

## Import Series with the pd.read_csv Function
- A **CSV** is a plain text file that uses line breaks to separate rows and commas to separate row values.
- Pandas ships with many different `read_` functions for different types of files.
- The `read_csv` function accepts many different parameters. The first one specifies the file name/path.
- The `read_csv` function will import the dataset as a **DataFrame**, a 2-dimensional table.
- The `usecols` parameter accepts a list of the column(s) to import.
- The `squeeze` method converts a **DataFrame** to a **Series**.

In [46]:
pd.read_csv("pokemon.csv")

Unnamed: 0,Name,Type
0,Bulbasaur,"Grass, Poison"
1,Ivysaur,"Grass, Poison"
2,Venusaur,"Grass, Poison"
3,Charmander,Fire
4,Charmeleon,Fire
...,...,...
1005,Iron Valiant,"Fairy, Fighting"
1006,Koraidon,"Fighting, Dragon"
1007,Miraidon,"Electric, Dragon"
1008,Walking Wake,"Water, Dragon"


In [56]:
pokemon = pd.read_csv("pokemon.csv", usecols=["Name"]).squeeze("columns")
#only works with single column
pokemon

0          Bulbasaur
1            Ivysaur
2           Venusaur
3         Charmander
4         Charmeleon
            ...     
1005    Iron Valiant
1006        Koraidon
1007        Miraidon
1008    Walking Wake
1009     Iron Leaves
Name: Name, Length: 1010, dtype: object

In [70]:
google = pd.read_csv("google_stock_price.csv", usecols=['Price']).squeeze("columns")
google

0         2.490664
1         2.515820
2         2.758411
3         2.770615
4         2.614201
           ...    
4788    132.080002
4789    132.998001
4790    135.570007
4791    137.050003
4792    138.429993
Name: Price, Length: 4793, dtype: float64

## The head and tail Methods
- The `head` method returns a number of rows from the top/beginning of the `Series`.
- The `tail` method returns a number of rows from the bottom/end of the `Series`.

In [76]:
pokemon.head()
pokemon.head(5)
pokemon.head(n=5)
#these all essentially mean the same thing

0     Bulbasaur
1       Ivysaur
2      Venusaur
3    Charmander
4    Charmeleon
Name: Name, dtype: object

In [94]:
google.tail()
google.tail(n=9)
google.tail(3)

4790    135.570007
4791    137.050003
4792    138.429993
Name: Price, dtype: float64

In [101]:
pokemon.tail(4)

1006        Koraidon
1007        Miraidon
1008    Walking Wake
1009     Iron Leaves
Name: Name, dtype: object

## Passing Series to Python's Built-In Functions
- The `len` function returns the length of the **Series**.
- The `type` function returns the type of an object.
- The `list` function converts the **Series** to a list.
- The `dict` function converts the **Series** to a dictionary.
- The `sorted` function converts the **Series** to a sorted list.
- The `max` function returns the largest value in the **Series**.
- The `min` function returns the smalllest value in the **Series**.

In [138]:
list(pokemon) # lists every single one lol 
sorted(pokemon) # a list in sorted order
len(pokemon)
type(pokemon)
type(sorted(pokemon)) # can get the type of all sorted values
dict(pokemon)

max(google)
min(google)

max(pokemon) #closest to end of alphabet
min(pokemon) #closest to beginning of alphabet

'Abomasnow'

In [122]:
pokemon.head()

0     Bulbasaur
1       Ivysaur
2      Venusaur
3    Charmander
4    Charmeleon
Name: Name, dtype: object

In [124]:
pokemon = pd.read_csv("pokemon.csv", usecols=["Name"]).squeeze("columns")
google = pd.read_csv("google_stock_price.csv", usecols=['Price']).squeeze("columns")

## Check for Inclusion with Python's in Keyword
- The `in` keyword checks if a value exists within an object.
- The `in` keyword will look for a value in the **Series's** index.
- Use the `index` and `values` attributes to access "nested" objects within the **Series**.
- Combine the `in` keyword with `values` to search within the **Series's** values.

In [142]:
pokemon = pd.read_csv("pokemon.csv", usecols=["Name"]).squeeze("columns")
google = pd.read_csv("google_stock_price.csv", usecols=['Price']).squeeze("columns")
pokemon.head()

0     Bulbasaur
1       Ivysaur
2      Venusaur
3    Charmander
4    Charmeleon
Name: Name, dtype: object

In [152]:
"Venusaur" in pokemon # we get FALSE here
#why? 'in' function defaults to looking within the index

2 in pokemon
5 in pokemon.index

True

In [156]:
#you have to access the values atribute to get to the underlying ndarray objects 
"Charmeleon" in pokemon.values

True

## The sort_values Method
- The `sort_values` method sorts a **Series** values in order.
- By default, pandas applies an ascending sort (smallest to largest).
- Customize the sort order with the `ascending` parameter.

In [160]:
pokemon = pd.read_csv("pokemon.csv", usecols=["Name"]).squeeze("columns")
google = pd.read_csv("google_stock_price.csv", usecols=['Price']).squeeze("columns")
pokemon.head()

0     Bulbasaur
1       Ivysaur
2      Venusaur
3    Charmander
4    Charmeleon
Name: Name, dtype: object

In [164]:
google.sort_values() #sorted from smallest to largest values 
google.sort_values(ascending=True) #DEFAULT
google.sort_values(ascending=False)

4395    151.863495
4345    151.000000
4346    150.141754
4341    150.000000
4336    150.000000
           ...    
12        2.515820
11        2.514326
13        2.509095
0         2.490664
10        2.470490
Name: Price, Length: 4793, dtype: float64

In [196]:
#we can call on the head method here if 
#we only want to see a certain number of values

google.sort_values(ascending=False).head(8)
pokemon.sort_values().tail()

570     Zoroark
569       Zorua
40        Zubat
633    Zweilous
717     Zygarde
Name: Name, dtype: object

## The sort_index Method
- The `sort_index` method sorts a **Series** by its index.
- The `sort_index` method also accepts an `ascending` parameter to set sort order.

## Extract Series Value by Index Position
- Use the `iloc` accessor to extract a **Series** value by its index position.
- `iloc` is short for "index location".
- Python's list slicing syntaxes (slices, slices from start, slices to end, etc.) are supported with **Series** objects.

## Extract Series Value by Index Label
- Use the `loc` accessor to extract a **Series** value by its index label.
- Pass a list to extract multiple values by index label.
- If one index label/position in the list does not exist, Pandas will raise an error.

## The get Method on a Series
- The `get` method extracts a **Series** value by index label. It is an alternative option to square brackets.
- The `get` method's second argument sets the fallback value to return if the label/position does not exist.

## Overwrite a Series Value
- Use the `loc/iloc` accessor to target an index label/position, then use an equal sign to provide a new value.

## The copy Method
- A **copy** is a duplicate/replica of an object.
- Changes to a copy do not modify the original object.
- A **view** is a different way of looking at the *same* data.
- Changes to a view *do* modify the original object.
- The `copy` method creates a copy of a pandas object.

## Math Methods on Series Objects
- The `count` method returns the number of values in the **Series**. It excludes missing values; the `size` attribute includes missing values.
- The `sum` method adds together the **Series's** values.
- The `product` method multiplies together the **Series's** values.
- The `mean` method calculates the average of the **Series's** values.
- The `std` method calculates the standard deviation of the **Series's** values.
- The `max` method returns the largest value in the **Series**.
- The `min` method returns the smallest value in the **Series**.
- The `median` method returns the median of the **Series** (the value in the middle).
- The `mode` method returns the mode of the **Series** (the most frequent alue).
- The `describe` method returns a summary with various mathematical calculations.

## Broadcasting
- **Broadcasting** describes the process of applying an arithmetic operation to an array (i.e., a **Series**).
- We can combine mathematical operators with a **Series** to apply the mathematical operation to every value.
- There are also methods to accomplish the same results (`add`, `sub`, `mul`, `div`, etc.)

## The value_counts Method
- The `value_counts` method returns the number of times each unique value occurs in the **Series**.
- The `normalize` parameter returns the relative frequencies/percentages of the values instead of the counts.

## The apply Method
- The `apply` method accepts a function. It invokes that function on every `Series` value.

## The map Method
- The `map` method "maps" or connects each **Series** values to another value.
- We can pass the method a dictionary or a **Series**. Both types connects keys to values.
- The `map` method uses our argument to connect or bridge together the values.