# Series

In [1]:
import pandas as pd

## Create a Series Object from a List
- A pandas **Series** is a one-dimensional labelled array.
- A Series combines the best features of a list and a dictionary.
- A Series maintains a single collection of ordered values (i.e. a single column of data).
- We can assign each value an identifier, which does not have to *be* unique.

In [2]:
ice_cream = ["Chocolate", "Vanilla", "Strawberry", "Rum Raisin"]
pd.Series(ice_cream)


0     Chocolate
1       Vanilla
2    Strawberry
3    Rum Raisin
dtype: object

In [3]:
#dtype = datatype of objects held in the series

In [4]:
lottery_numbers = [4, 8, 15, 23, 42, 3]
pd.Series(lottery_numbers)

0     4
1     8
2    15
3    23
4    42
5     3
dtype: int64

In [5]:
registrations = [True, False, False, False, True]
pd.Series(registrations)


0     True
1    False
2    False
3    False
4     True
dtype: bool

In [6]:
#The index will be unique. 

## Create a Series Object from a Dictionary

In [7]:
sushi = {
    "Salmon": "Orange",
    "Tuna": "Red",
    "Eel": "Brown"
}

#Series Objects are kept in order.If the key is a different type, it'll use the keys as the identifier. 

pd.Series(sushi) 

Salmon    Orange
Tuna         Red
Eel        Brown
dtype: object

## Intro to Series Methods
- The syntax to invoke a method on any object is `object.method()`.
- The `sum` method adds together the **Series'** values.
- The `product` method multiplies the **Series'** values.
- The `mean` method finds the average of the **Series'** values.
- The `std` method finds the standard deviation of the **Series'** values.

In [8]:
prices = pd.Series([2.99, 4.33, 1.24])
prices

0    2.99
1    4.33
2    1.24
dtype: float64

In [9]:
prices.sum()

np.float64(8.56)

In [10]:
prices.product()

np.float64(16.053908000000003)

In [11]:
prices.mean()

np.float64(2.8533333333333335)

In [12]:
prices.std()

1.5495268094916375

## Intro to Attributes
- An **attribute** is a piece of data that lives on an object.
- An **attribute** is a fact, a detail, a characteristic of the object.
- Access an attribute with `object.attribute` syntax.
- The `size` attribute returns a count of the number of values in the **Series**.
- The `is_unique` attribute returns True if the **Series** has no duplicate values.
- The `values` and `index` attributes return the underlying objects that holds the **Series'** values and index labels.

In [13]:
adjectives = pd.Series(["Smart","Handsome", "Brilliant", "Humble", "Smart"])
adjectives


0        Smart
1     Handsome
2    Brilliant
3       Humble
4        Smart
dtype: object

In [14]:
adjectives.size

5

In [15]:
adjectives.is_unique

False

In [16]:
adjectives.values

#This gives us the underlying object that is holding the strings in the series. 

array(['Smart', 'Handsome', 'Brilliant', 'Humble', 'Smart'], dtype=object)

In [17]:
adjectives.index

#Gives us the object that is holding the indicies. 

RangeIndex(start=0, stop=5, step=1)

In [18]:
type(adjectives.values)

numpy.ndarray

## Parameters and Arguments
- A **parameter** is the name for an expected input to a function/method/class instantiation.
- An **argument** is the concrete value we provide for a parameter during invocation.
- We can pass arguments either sequentially (based on parameter order) or with explicit parameter names written out.
- The first two parameters for the **Series** constructor are `data` and `index`, which represent the values and the index labels.

In [19]:
fruits = ["Apple", "Orange", "Plum", "Grape", "Blueberry", "Watermelon"]
weekdays = ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Monday"]

In [20]:
pd.Series(fruits)

0         Apple
1        Orange
2          Plum
3         Grape
4     Blueberry
5    Watermelon
dtype: object

In [21]:
pd.Series(weekdays)
pd.Series(fruits, weekdays)
pd.Series(weekdays, fruits)

    #If you press shift-tab after the first parenthesis it will bring up the documentation for the class. The documentation tells you the list of parameters that can be passed to the class and if you follow the order of the parameters listed in the documentation you can input different objects to fill the arguments and get more customizable data, as shown above. The order of parameters is crucial if you are not providing the parameter name. If you do provide the parameter name you don't have to worry about order. Position and parameter name, remember. 

Apple            Monday
Orange          Tuesday
Plum          Wednesday
Grape          Thursday
Blueberry        Friday
Watermelon       Monday
dtype: object

In [22]:
pd.Series(data=fruits, index=weekdays)

Monday            Apple
Tuesday          Orange
Wednesday          Plum
Thursday          Grape
Friday        Blueberry
Monday       Watermelon
dtype: object

In [23]:
pd.Series(index=weekdays, data=fruits)

#Case in point

Monday            Apple
Tuesday          Orange
Wednesday          Plum
Thursday          Grape
Friday        Blueberry
Monday       Watermelon
dtype: object

In [24]:
pd.Series(fruits, index=weekdays)

#Mixing and matching allows you to skip needless parameters and go straight to a named parameter that you want to specifically include. 

Monday            Apple
Tuesday          Orange
Wednesday          Plum
Thursday          Grape
Friday        Blueberry
Monday       Watermelon
dtype: object

## Import Series with the pd.read_csv Function
- A **CSV** is a plain text file that uses line breaks to separate rows and commas to separate row values.
- Pandas ships with many different `read_` functions for different types of files.
- The `read_csv` function accepts many different parameters. The first one specifies the file name/path.
- The `read_csv` function will import the dataset as a **DataFrame**, a 2-dimensional table.
- The `usecols` parameter accepts a list of the column(s) to import.
- The `squeeze` method converts a **DataFrame** to a **Series**.

In [25]:
pokemon = pd.read_csv("pokemon.csv", usecols=['Name']).squeeze("columns")
pokemon
#Keep your dataset in the same directory as your notebook.
#usecols specifies what columns you want to include.

0          Bulbasaur
1            Ivysaur
2           Venusaur
3         Charmander
4         Charmeleon
            ...     
1005    Iron Valiant
1006        Koraidon
1007        Miraidon
1008    Walking Wake
1009     Iron Leaves
Name: Name, Length: 1010, dtype: object

In [26]:
#A dataframe is a 2 dimensional table with rows and columns.

In [27]:
pd.Series([1,2,3])
#This is a 1 dimensional series. 

0    1
1    2
2    3
dtype: int64

In [28]:
google = pd.read_csv('google_stock_price.csv', usecols=["Price"]).squeeze("columns")
google

0         2.490664
1         2.515820
2         2.758411
3         2.770615
4         2.614201
           ...    
4788    132.080002
4789    132.998001
4790    135.570007
4791    137.050003
4792    138.429993
Name: Price, Length: 4793, dtype: float64

## The head and tail Methods
- The `head` method returns a number of rows from the top/beginning of the `Series`.
- The `tail` method returns a number of rows from the bottom/end of the `Series`.

In [29]:
pokemon.head()
pokemon.head(5)
pokemon.head(n=5)

pokemon.head(8)
pokemon.head(1)

0    Bulbasaur
Name: Name, dtype: object

In [30]:
google.tail()
google.tail(7)
google.tail(n=2)

4791    137.050003
4792    138.429993
Name: Price, dtype: float64

## Passing Series to Python's Built-In Functions
- The `len` function returns the length of the **Series**.
- The `type` function returns the type of an object.
- The `list` function converts the **Series** to a list.
- The `dict` function converts the **Series** to a dictionary.
- The `sorted` function converts the **Series** to a sorted list.
- The `max` function returns the largest value in the **Series**.
- The `min` function returns the smalllest value in the **Series**.

In [31]:
pokemon = pd.read_csv("pokemon.csv", usecols=['Name']).squeeze("columns")
google = pd.read_csv('google_stock_price.csv', usecols=["Price"]).squeeze("columns")

In [32]:
pokemon.head()

0     Bulbasaur
1       Ivysaur
2      Venusaur
3    Charmander
4    Charmeleon
Name: Name, dtype: object

In [33]:
len(pokemon)
type(pokemon)
list(pokemon)
sorted(pokemon)
type(sorted(pokemon))

sorted(google)

dict(pokemon)

max(google)
min(google)

max(pokemon)
min(pokemon)
#sorted by default loads the data in ascending order for you. list converts the series into a list. dict tries to convert a series into a dictionary, the index labels in the series become the keys. If you run the max or min function on a series of strings it'll give you the word that is closest to the end of the alphabet for max and the word that is closest to the beginning of the alphabet for min.

'Abomasnow'

## Check for Inclusion with Python's in Keyword
- The `in` keyword checks if a value exists within an object.
- The `in` keyword will look for a value in the **Series's** index.
- Use the `index` and `values` attributes to access "nested" objects within the **Series**.
- Combine the `in` keyword with `values` to search within the **Series's** values.

In [34]:
pokemon = pd.read_csv("pokemon.csv", usecols=['Name']).squeeze("columns")
google = pd.read_csv('google_stock_price.csv', usecols=["Price"]).squeeze("columns")

pokemon.head()

0     Bulbasaur
1       Ivysaur
2      Venusaur
3    Charmander
4    Charmeleon
Name: Name, dtype: object

In [35]:
'car' in 'racecar'

#Be aware that "in" checks among the series indicies by default, not the values. 

"Bulbasaur" in pokemon

False

In [36]:
0 in pokemon

True

In [37]:
5 in pokemon.index

True

In [38]:
"Bulbasaur" in pokemon.values
#This is how you search values, using the values attribute. 

True

## The sort_values Method
- The `sort_values` method sorts a **Series** values in order.
- By default, pandas applies an ascending sort (smallest to largest).
- Customize the sort order with the `ascending` parameter.

In [39]:
pokemon = pd.read_csv("pokemon.csv", usecols=['Name']).squeeze("columns")
google = pd.read_csv('google_stock_price.csv', usecols=["Price"]).squeeze("columns")

In [40]:
google.head()

0    2.490664
1    2.515820
2    2.758411
3    2.770615
4    2.614201
Name: Price, dtype: float64

In [41]:
google.sort_values()
google.sort_values(ascending=True)
google.sort_values(ascending=False)
google.sort_values(ascending=False).head()

#Sort doesn't mutate the original series, it just makes new copies for the methods we call, in this case sort_values(). We are getting back series objects, which have all the same methods that all series objects do. 

4395    151.863495
4345    151.000000
4346    150.141754
4341    150.000000
4336    150.000000
Name: Price, dtype: float64

In [42]:
pokemon.sort_values()

459    Abomasnow
62          Abra
358        Absol
616     Accelgor
680    Aegislash
         ...    
570      Zoroark
569        Zorua
40         Zubat
633     Zweilous
717      Zygarde
Name: Name, Length: 1010, dtype: object

In [43]:
pokemon.sort_values(ascending=True)

459    Abomasnow
62          Abra
358        Absol
616     Accelgor
680    Aegislash
         ...    
570      Zoroark
569        Zorua
40         Zubat
633     Zweilous
717      Zygarde
Name: Name, Length: 1010, dtype: object

In [44]:
pokemon.sort_values(ascending=False)

717      Zygarde
633     Zweilous
40         Zubat
569        Zorua
570      Zoroark
         ...    
680    Aegislash
616     Accelgor
358        Absol
62          Abra
459    Abomasnow
Name: Name, Length: 1010, dtype: object

In [45]:
pokemon.sort_values(ascending=False).tail()

680    Aegislash
616     Accelgor
358        Absol
62          Abra
459    Abomasnow
Name: Name, dtype: object

## The sort_index Method
- The `sort_index` method sorts a **Series** by its index.
- The `sort_index` method also accepts an `ascending` parameter to set sort order.

In [46]:
pokemon = pd.read_csv("pokemon.csv", index_col="Name").squeeze("columns")
pokemon.head()


#The reason we're able to have the name and attack columns show up here is because we've used the Name column as the index column, and the Attack column is used as the value. 

Name
Bulbasaur     Grass, Poison
Ivysaur       Grass, Poison
Venusaur      Grass, Poison
Charmander             Fire
Charmeleon             Fire
Name: Type, dtype: object

In [47]:
pokemon.sort_index()



#sort_index sorts ascending by default. With strings that means alphabetical ascending order. 

Name
Abomasnow        Grass, Ice
Abra                Psychic
Absol                  Dark
Accelgor                Bug
Aegislash      Steel, Ghost
                  ...      
Zoroark                Dark
Zorua                  Dark
Zubat        Poison, Flying
Zweilous       Dark, Dragon
Zygarde      Dragon, Ground
Name: Type, Length: 1010, dtype: object

In [48]:
pokemon.sort_index(ascending=False)

Name
Zygarde      Dragon, Ground
Zweilous       Dark, Dragon
Zubat        Poison, Flying
Zorua                  Dark
Zoroark                Dark
                  ...      
Aegislash      Steel, Ghost
Accelgor                Bug
Absol                  Dark
Abra                Psychic
Abomasnow        Grass, Ice
Name: Type, Length: 1010, dtype: object

## Extract Series Value by Index Position
- Use the `iloc` accessor to extract a **Series** value by its index position.
- `iloc` is short for "index location".
- Python's list slicing syntaxes (slices, slices from start, slices to end, etc.) are supported with **Series** objects.

In [52]:
pokemon = pd.read_csv("pokemon.csv", usecols=["Name"]).squeeze("columns")
pokemon.head()


0     Bulbasaur
1       Ivysaur
2      Venusaur
3    Charmander
4    Charmeleon
Name: Name, dtype: object

In [53]:
pokemon.iloc[0]

'Bulbasaur'

In [54]:
pokemon.iloc[500]

'Oshawott'

In [55]:
pokemon.iloc[1500]
#doesn't exist, causes error.

IndexError: single positional indexer is out-of-bounds

In [56]:
pokemon.iloc[[100, 200, 300]]
#The internal brackets holds a list of index positions that iloc uses to return a new series of the indicies and corresponding values. 

100    Electrode
200        Unown
300     Delcatty
Name: Name, dtype: object

In [57]:
pokemon.iloc[[100, 200, 300, 1500]]

#you have to provide existing indicies.

IndexError: positional indexers are out-of-bounds

In [58]:
pokemon.iloc[27:36]

#All the slicing methods work with iloc. Remember the right side number is what the series goes up to before but doesn't include.

27    Sandslash
28     Nidoran♀
29     Nidorina
30    Nidoqueen
31     Nidoran♂
32     Nidorino
33     Nidoking
34     Clefairy
35     Clefable
Name: Name, dtype: object

In [59]:
pokemon.iloc[0:7]

0     Bulbasaur
1       Ivysaur
2      Venusaur
3    Charmander
4    Charmeleon
5     Charizard
6      Squirtle
Name: Name, dtype: object

In [60]:
pokemon.iloc[:7]
pokemon.iloc[0:7]

#these are the same.

0     Bulbasaur
1       Ivysaur
2      Venusaur
3    Charmander
4    Charmeleon
5     Charizard
6      Squirtle
Name: Name, dtype: object

In [63]:
pokemon.iloc[700:1010]
pokemon.iloc[700:]
pokemon.iloc[700:5000]
#Each of these go to the end of the series. iloc will give you an error if you try to access an index that goes past the end of the series, but if you enter a slice that goes past the end it'll just give you the values up to the end of the list. 


700         Hawlucha
701          Dedenne
702          Carbink
703            Goomy
704          Sliggoo
            ...     
1005    Iron Valiant
1006        Koraidon
1007        Miraidon
1008    Walking Wake
1009     Iron Leaves
Name: Name, Length: 310, dtype: object

In [None]:
pokemon.iloc[-8:]

#this will give you the series values starting 8 from the end, to the end of the series. 


In [64]:
pokemon.iloc[-10:-5]

#This will go from 10 before the end to 5 before the end. 


1000        Wo-Chien
1001       Chien-Pao
1002         Ting-Lu
1003          Chi-Yu
1004    Roaring Moon
Name: Name, dtype: object

## Extract Series Value by Index Label
- Use the `loc` accessor to extract a **Series** value by its index label.
- Pass a list to extract multiple values by index label.
- If one index label/position in the list does not exist, Pandas will raise an error.

In [66]:
pokemon = pd.read_csv("pokemon.csv", index_col=["Name"]).squeeze("columns")
pokemon.head()

Name
Bulbasaur     Grass, Poison
Ivysaur       Grass, Poison
Venusaur      Grass, Poison
Charmander             Fire
Charmeleon             Fire
Name: Type, dtype: object

In [67]:
pokemon.loc["Bulbasaur"]

#Remember that accessors use brackets instead of parenthesis.

'Grass, Poison'

In [68]:
pokemon.iloc[0]
#Python still keeps track of the numeric order of index labels under the hood, so you can technically reference valkues using numeric values. 

'Grass, Poison'

In [70]:
pokemon["Bulbasaur"]
pokemon[0]

#DON'T USE INDEX POSITIONS TO LOCATE VALUES

  pokemon[0]


'Grass, Poison'

In [None]:
#iloc is for index positions, loc is for labels. 

In [72]:
pokemon.loc["Mewtwo"]

'Psychic'

In [73]:
pokemon.loc[["Mewtwo", "Jolteon", "Meowth"]]

#We're asking for three values with three index labels so pandas is going to package up these results into a new series. 

Name
Mewtwo      Psychic
Jolteon    Electric
Meowth       Normal
Name: Type, dtype: object

## The get Method on a Series
- The `get` method extracts a **Series** value by index label. It is an alternative option to square brackets.
- The `get` method's second argument sets the fallback value to return if the label/position does not exist.

In [75]:
pokemon = pd.read_csv("pokemon.csv", index_col=["Name"]).squeeze("columns")
pokemon.head()


Name
Bulbasaur     Grass, Poison
Ivysaur       Grass, Poison
Venusaur      Grass, Poison
Charmander             Fire
Charmeleon             Fire
Name: Type, dtype: object

In [76]:
pokemon.get("Moltres")

'Fire, Flying'

In [None]:
pokemon.get("Moltres")
#get is like loc.

In [78]:
pokemon.loc["Moltres"]

'Fire, Flying'

In [None]:
#get provides a fallback parameter "None" in case the value doesn't exist in your series. None is the technical representation of nothing. When get returns none it prints nothing.

In [80]:
None

In [81]:
pokemon.get("Digimon", "Nonexistent") #This is the parameter for a fallback value to print out if the key doesn't exist. 

'Nonexistent'

In [85]:
pokemon.get(["Moltres", "Digimon"], "One of the values in the list was not found")

#You can enter a list of keys and if one of the index label doesn't exist in your series it'll print out that fallback value. 

'One of the values in the list was not found'

In [86]:
pokemon.get(0) #remember this error, don't use index positions with get either. 

  pokemon.get(0)


'Grass, Poison'

## Overwrite a Series Value
- Use the `loc/iloc` accessor to target an index label/position, then use an equal sign to provide a new value.

In [87]:
pokemon = pd.read_csv("pokemon.csv", usecols=["Name"]).squeeze("columns")
pokemon.head()


0     Bulbasaur
1       Ivysaur
2      Venusaur
3    Charmander
4    Charmeleon
Name: Name, dtype: object

In [89]:
pokemon.iloc[0] = "Borisaur"

#This is how you change a value in a series based on an index position. 

In [91]:
pokemon.head()

0      Borisaur
1       Ivysaur
2      Venusaur
3    Charmander
4    Charmeleon
Name: Name, dtype: object

In [93]:
pokemon.iloc[[1,2,4]] = ["Firemon", "Flameon", "Blazemon"]

#This is how you change a list of index positions' corresponding values. 

In [94]:
pokemon.head()

0      Borisaur
1       Firemon
2       Flameon
3    Charmander
4      Blazemon
Name: Name, dtype: object

In [95]:
pokemon = pd.read_csv("pokemon.csv", index_col=["Name"]).squeeze("columns")
pokemon.head()

Name
Bulbasaur     Grass, Poison
Ivysaur       Grass, Poison
Venusaur      Grass, Poison
Charmander             Fire
Charmeleon             Fire
Name: Type, dtype: object

pokemon.loc["Bulbasaur"] = "Awesomeness"

In [97]:
pokemon.head()

Name
Bulbasaur       Awesomeness
Ivysaur       Grass, Poison
Venusaur      Grass, Poison
Charmander             Fire
Charmeleon             Fire
Name: Type, dtype: object

In [98]:
pokemon.iloc[1] = "Silly"

In [99]:
pokemon.head()

Name
Bulbasaur       Awesomeness
Ivysaur               Silly
Venusaur      Grass, Poison
Charmander             Fire
Charmeleon             Fire
Name: Type, dtype: object

In [100]:
pokemon[2] = "Test"

#this code works currently but don't use it. Use loc or iloc. 

  pokemon[2] = "Test"


In [101]:
pokemon.head()

Name
Bulbasaur     Awesomeness
Ivysaur             Silly
Venusaur             Test
Charmander           Fire
Charmeleon           Fire
Name: Type, dtype: object

## The copy Method
- A **copy** is a duplicate/replica of an object.
- Changes to a copy do not modify the original object.
- A **view** is a different way of looking at the *same* data.
- Changes to a view *do* modify the original object.
- The `copy` method creates a copy of a pandas object.

## Math Methods on Series Objects
- The `count` method returns the number of values in the **Series**. It excludes missing values; the `size` attribute includes missing values.
- The `sum` method adds together the **Series's** values.
- The `product` method multiplies together the **Series's** values.
- The `mean` method calculates the average of the **Series's** values.
- The `std` method calculates the standard deviation of the **Series's** values.
- The `max` method returns the largest value in the **Series**.
- The `min` method returns the smallest value in the **Series**.
- The `median` method returns the median of the **Series** (the value in the middle).
- The `mode` method returns the mode of the **Series** (the most frequent alue).
- The `describe` method returns a summary with various mathematical calculations.

## Broadcasting
- **Broadcasting** describes the process of applying an arithmetic operation to an array (i.e., a **Series**).
- We can combine mathematical operators with a **Series** to apply the mathematical operation to every value.
- There are also methods to accomplish the same results (`add`, `sub`, `mul`, `div`, etc.)

## The value_counts Method
- The `value_counts` method returns the number of times each unique value occurs in the **Series**.
- The `normalize` parameter returns the relative frequencies/percentages of the values instead of the counts.

## The apply Method
- The `apply` method accepts a function. It invokes that function on every `Series` value.

## The map Method
- The `map` method "maps" or connects each **Series** values to another value.
- We can pass the method a dictionary or a **Series**. Both types connects keys to values.
- The `map` method uses our argument to connect or bridge together the values.