# Series

In [89]:
import pandas as pd

## Create a Series Object from a List
- A pandas **Series** is a one-dimensional labelled array.
- A Series combines the best features of a list and a dictionary.
- A Series maintains a single collection of ordered values (i.e. a single column of data).
- We can assign each value an identifier, which does not have to *be* unique.

In [90]:
ice_cream = ['chocolate', 'vanila', 'strawbery', 'rum rainsin']
pd.Series(ice_cream)

0      chocolate
1         vanila
2      strawbery
3    rum rainsin
dtype: object

In [91]:
lottery_numbers = [4, 8, 15, 16, 23, 44]
pd.Series(lottery_numbers)

0     4
1     8
2    15
3    16
4    23
5    44
dtype: int64

In [92]:
registration = [True, False, False, True]
pd.Series(registration)

0     True
1    False
2    False
3     True
dtype: bool

## Create a Series Object from a Dictionary

In [93]:
sushi = {
    'Salmon':'Orange',
    'Tuna':'Red',
    'Eel':'Brown'
}
pd.Series(sushi)

Salmon    Orange
Tuna         Red
Eel        Brown
dtype: object

In [94]:
# exercise:
# Create a list with 4 countries - United States, France, Germany, Italy
# Create a new Series by passing in the list of countries
# Assign the Series to a "countries" variable
list = ['United States', 'France', 'Germany','Italy']
countries = pd.Series(list)
# Create a list with 3 colors - red, green, blue
# Create a new Series by passing in the list of colors
# Assign the Series to a "colors" variable
colors_list = ['red', 'green', 'blue']
colors = pd.Series(colors_list)

# Given the "recipe" dictionary below,
# create a new Series by passing in the dictionary as the data source
# Assign the resulting Series to a "series_dict" variable
recipe = {
  "Flour": True,
  "Sugar": True,
  "Salt": False
}
series_dict = pd.Series(recipe)

## Intro to Series Methods
- The syntax to invoke a method on any object is `object.method()`.
- The `sum` method adds together the **Series'** values.
- The `product` method multiplies the **Series'** values.
- The `mean` method finds the average of the **Series'** values.
- The `std` method finds the standard deviation of the **Series'** values.

In [95]:
prices = pd.Series([2.99, 4.45, 1.36])
prices

0    2.99
1    4.45
2    1.36
dtype: float64

In [96]:
prices.sum()

8.8

In [97]:
prices.product() 
# multiplying all numbers together

18.095480000000006

In [98]:
prices.mean()

2.9333333333333336

In [99]:
prices.std()
# variation

1.5457791994115246

## Intro to Attributes
- An **attribute** is a piece of data that lives on an object.
- An **attribute** is a fact, a detail, a characteristic of the object.
- Access an attribute with `object.attribute` syntax.
- The `size` attribute returns a count of the number of values in the **Series**.
- The `is_unique` attribute returns True if the **Series** has no duplicate values.
- The `values` and `index` attributes return the underlying objects that holds the **Series'** values and index labels.

In [100]:
adjectives = pd.Series(['smart', 'handsome', 'brilliant', 'humble', 'smart'])
adjectives

0        smart
1     handsome
2    brilliant
3       humble
4        smart
dtype: object

In [101]:
adjectives.size

5

In [102]:
adjectives.is_unique

False

In [103]:
adjectives.values

array(['smart', 'handsome', 'brilliant', 'humble', 'smart'], dtype=object)

In [104]:
adjectives.index

RangeIndex(start=0, stop=5, step=1)

In [105]:
type(adjectives.values)

numpy.ndarray

In [106]:
# exercise:
# The Series below stores the number of home runs
# that a baseball player hit per game
home_runs = pd.Series([3, 4, 8, 2])
home_runs
# Find the total number of home runs (i.e. the sum) and assign it
# to the total_home_runs variable below
total_home_runs = home_runs.sum()
total_home_runs
# Find the average number of home runs and assign it
# to the average_home_runs variable below
average_home_runs =home_runs.mean()
average_home_runs

4.25

## Parameters and Arguments
- A **parameter** is the name for an expected input to a function/method/class instantiation.
- An **argument** is the concrete value we provide for a parameter during invocation.
- We can pass arguments either sequentially (based on parameter order) or with explicit parameter names written out.
- The first two parameters for the **Series** constructor are `data` and `index`, which represent the values and the index labels.

In [107]:
fruits = ['Apple','Orange', 'Plum', 'Grape', 'Blueberry', 'Watermelon']
weekdays = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Monday']

In [108]:
pd.Series (fruits)
pd.Series (fruits, weekdays)
pd.Series (weekdays, fruits)

Apple            Monday
Orange          Tuesday
Plum          Wednesday
Grape          Thursday
Blueberry        Friday
Watermelon       Monday
dtype: object

In [109]:
pd.Series(data=fruits, index=weekdays)
# this way provides context and is more explicit. We're no longer dependent on the exact order defined by pandas
pd.Series(index=weekdays, data=fruits)

Monday            Apple
Tuesday          Orange
Wednesday          Plum
Thursday          Grape
Friday        Blueberry
Monday       Watermelon
dtype: object

In [110]:
# Combination of both ways can be used as well. Think practical as a developer!
pd.Series(fruits, index=weekdays)

Monday            Apple
Tuesday          Orange
Wednesday          Plum
Thursday          Grape
Friday        Blueberry
Monday       Watermelon
dtype: object

In [111]:
# exercise:
# The code below defines a list of delicious foods
# and some dipping sauces to dip them in

foods = ["French Fries", "Chicken Nuggets", "Celery", "Carrots"]
dipping_sauces = ["BBQ", "Honey Mustard", "Ranch", "Sriracha"]

# Create a Series and assign it to the s1 variable below. 
# Assign the foods list as the data source
# and the dipping_sauces list as the Series index 
# For this solution, use positional arguments (i.e. feed in the arguments sequentially)
s1 = pd.Series(foods, dipping_sauces)


# Create a Series and assign it to the s2 variable below. 
# Assign the dipping_sauces list as the data source
# and the foods list as the Series index 
# For this solution, use keyword arguments (i.e. provide the parameter names
# alongside the arguments)
s2 = pd.Series(data=dipping_sauces, index=foods)

## Import Series with the pd.read_csv Function
- A **CSV** is a plain text file that uses line breaks to separate rows and commas to separate row values.
- Pandas ships with many different `read_` functions for different types of files.
- The `read_csv` function accepts many different parameters. The first one specifies the file name/path.
- The `read_csv` function will import the dataset as a **DataFrame**, a 2-dimensional table.
- The `usecols` parameter accepts a list of the column(s) to import.
- The `squeeze` method converts a **DataFrame** to a **Series**.

In [112]:
pd.read_csv('pokemon.csv')
# read.csv always results in a dataframe, two (row, column) dimensional data structure
# sometimes it can be useful to have only one dimensional data structure = series.
# this is how we can achieve this:

Unnamed: 0,Name,Type
0,Bulbasaur,"Grass, Poison"
1,Ivysaur,"Grass, Poison"
2,Venusaur,"Grass, Poison"
3,Charmander,Fire
4,Charmeleon,Fire
...,...,...
1005,Iron Valiant,"Fairy, Fighting"
1006,Koraidon,"Fighting, Dragon"
1007,Miraidon,"Electric, Dragon"
1008,Walking Wake,"Water, Dragon"


In [113]:
pokemon = pd.read_csv('pokemon.csv', usecols=['Name']).squeeze('columns')
pokemon

0          Bulbasaur
1            Ivysaur
2           Venusaur
3         Charmander
4         Charmeleon
            ...     
1005    Iron Valiant
1006        Koraidon
1007        Miraidon
1008    Walking Wake
1009     Iron Leaves
Name: Name, Length: 1010, dtype: object

In [114]:
google = pd.read_csv('google_stock_price.csv', usecols=['Price']).squeeze('columns')
google

0         2.490664
1         2.515820
2         2.758411
3         2.770615
4         2.614201
           ...    
4788    132.080002
4789    132.998001
4790    135.570007
4791    137.050003
4792    138.429993
Name: Price, Length: 4793, dtype: float64

In [115]:
# exercise:
# We have a foods.csv CSV file with 3 columns: Item Number, Menu Item, Price
# You can explore the data by clicking into the foods.csv file on the left
# Import the CSV file into a pandas Series object
# The Series should have the standard pandas numeric index
# The Series values should be the string values from the "Menu Item" column
# Assign the Series object to a "foods" variable
foods = pd.read_csv('foods.csv', usecols=['Menu Item']).squeeze('columns')

ValueError: Usecols do not match columns, columns expected but not found: ['Menu Item']

## The head and tail Methods
- The `head` method returns a number of rows from the top/beginning of the `Series`.
- The `tail` method returns a number of rows from the bottom/end of the `Series`.

In [None]:
pokemon = pd.read_csv('pokemon.csv', usecols=['Name']).squeeze('columns')
google = pd.read_csv('google_stock_price.csv', usecols=['Price']).squeeze('columns')


In [None]:
pokemon.head()
pokemon.head(5)
pokemon.head(n=5)
# all provides the same

0     Bulbasaur
1       Ivysaur
2      Venusaur
3    Charmander
4    Charmeleon
Name: Name, dtype: object

In [None]:
google.tail()
google.tail(5)
google.tail(n=5)

In [None]:
# exercise:
# We have a roller_coasters.csv CSV file with 4 columns: Name, Park, Country, and Height.
# You can explore the data by clicking into the CSV file on the left
# Import the CSV file into a pandas Series object
# The Series should have the standard pandas numeric index
# The Series values should be the string values from the "Name" column
# Assign the Series object to a "coasters" variable
coasters = pd.read_csv('roller_coasters.csv', usecols=['Name']).squeeze('columns')

# I only want to ride the top 3 roller coasters on the list.
# Starting with the "coasters" Series, extract the first 3 rows in a new Series.
# Assign the new Series to a "top_three" variable.
top_three = coasters.head(3)

# I'm now curious about some of the last entries on the coaster list.
# Starting with the "coasters" Series, extract the last 4 rows in a new Series.
# Assign the new Series to a "bottom_four" variable.
bottom_four = coasters.tail(4)


## Passing Series to Python's Built-In Functions
- The `len` function returns the length of the **Series**.
- The `type` function returns the type of an object.
- The `list` function converts the **Series** to a list.
- The `dict` function converts the **Series** to a dictionary.
- The `sorted` function converts the **Series** to a sorted list.
- The `max` function returns the largest value in the **Series**.
- The `min` function returns the smalllest value in the **Series**.

In [None]:
pokemon = pd.read_csv('pokemon.csv', usecols=['Name']).squeeze('columns')
google = pd.read_csv('google_stock_price.csv', usecols=['Price']).squeeze('columns')


In [None]:
len(pokemon)
# gives us number of objects contained in a series
type(pokemon)
list(pokemon)
sorted(pokemon)
type(sorted(pokemon))
dict(pokemon)

list

In [None]:
max(google)
min(google)

2.47049

In [None]:
max(pokemon)
min(pokemon)

'Abomasnow'

## Check for Inclusion with Python's in Keyword
- The `in` keyword checks if a value exists within an object.
- The `in` keyword will look for a value in the **Series's** index.
- Use the `index` and `values` attributes to access "nested" objects within the **Series**.
- Combine the `in` keyword with `values` to search within the **Series's** values.

In [None]:
pokemon = pd.read_csv('pokemon.csv', usecols=['Name']).squeeze('columns')
google = pd.read_csv('google_stock_price.csv', usecols=['Price']).squeeze('columns')
pokemon.head()

0     Bulbasaur
1       Ivysaur
2      Venusaur
3    Charmander
4    Charmeleon
Name: Name, dtype: object

In [None]:
'car' in 'racecar'

True

In [None]:
'Bulbasaur' in pokemon
# returns 'False' because this method searches only the index of the series

'Bulbasaur' in pokemon.values
# this will search the actual values

True

In [None]:
# exercise:
# This challenge includes a coffee.csv with 2 columns: 
# Coffee and Calories. Import the CSV. Assign the Coffee
# column to be the index and the Calories column to be the
# Series' values. Assign the Series to a 'coffee' variable.
coffee = pd.read_csv('coffee.csv', index_col ='Coffee').squeeze('columns')

# Check whether the coffee 'Flat White' is present in the data.
# Assign the result to a `flat_white` variable
flat_white = 'Flat White' in coffee.index

# Check whether the coffee 'Cortado' is present in the data.
# Assign the result to a `cortado` variable
cortado = 'Cortado' in coffee.index

# Check whether the coffee 'Blackberry Mocha' is present in the data.
# Assign the result to a `blackberry_mocha` variable
blackberry_mocha = 'Blackberry Mocha' in coffee.index

# Check whether the value 221 is present in the data.
# Assign the result to a 'high_calorie' variable.
high_calorie = 221 in coffee.values

# Check whether the value 400 is present in the data.
# Assign the result to a 'super_high_calorie' variable.
super_high_calorie = 400 in coffee.values


## The sort_values Method
- The `sort_values` method sorts a **Series** values in order.
- By default, pandas applies an ascending sort (smallest to largest).
- Customize the sort order with the `ascending` parameter.

In [None]:
pokemon = pd.read_csv('pokemon.csv', usecols=['Name']).squeeze('columns')
google = pd.read_csv('google_stock_price.csv', usecols=['Price']).squeeze('columns')
google.head()

0    2.490664
1    2.515820
2    2.758411
3    2.770615
4    2.614201
Name: Price, dtype: float64

In [None]:
google.sort_values()
# ascending order is default

10        2.470490
0         2.490664
13        2.509095
11        2.514326
1         2.515820
           ...    
4341    150.000000
4336    150.000000
4346    150.141754
4345    151.000000
4395    151.863495
Name: Price, Length: 4793, dtype: float64

In [None]:
google.sort_values(ascending=False)
google.sort_values(ascending=False).head()

4395    151.863495
4345    151.000000
4346    150.141754
4341    150.000000
4336    150.000000
Name: Price, dtype: float64

In [None]:
pokemon.sort_values()
# ascending = alphabetical order by default
pokemon.sort_values(ascending=False).tail()

680    Aegislash
616     Accelgor
358        Absol
62          Abra
459    Abomasnow
Name: Name, dtype: object

In [None]:
# exercise:
# Below, we have a list of delicious tortilla chip flavors
flavors = ["Spicy Sweet Chili", "Cool Ranch", "Nacho Cheese", "Salsa Verde"]

# Create a new Series object, passing in the flavors list defined above
# Assign it to a 'doritos' variable. The resulting Series should look like this:
doritos = pd.Series(flavors)
#
#   0    Spicy Sweet Chili
#   1           Cool Ranch
#   2         Nacho Cheese
#   3          Salsa Verde
#   dtype: object



# Below, sort the doritos Series in descending order.
# Assign the sorted a Series to a 'sorted_doritos' variable.
# The sorted Series should like this:
#
#   0    Spicy Sweet Chili
#   3          Salsa Verde
#   2         Nacho Cheese
#   1           Cool Ranch
#   dtype: object
sorted_doritos = doritos.sort_values(ascending = False)

## The sort_index Method
- The `sort_index` method sorts a **Series** by its index.
- The `sort_index` method also accepts an `ascending` parameter to set sort order.

In [116]:
pokemon = pd.read_csv('pokemon.csv', index_col='Name').squeeze('columns')
pokemon.head()

Name
Bulbasaur     Grass, Poison
Ivysaur       Grass, Poison
Venusaur      Grass, Poison
Charmander             Fire
Charmeleon             Fire
Name: Type, dtype: object

In [117]:
pokemon.sort_index()

Name
Abomasnow        Grass, Ice
Abra                Psychic
Absol                  Dark
Accelgor                Bug
Aegislash      Steel, Ghost
                  ...      
Zoroark                Dark
Zorua                  Dark
Zubat        Poison, Flying
Zweilous       Dark, Dragon
Zygarde      Dragon, Ground
Name: Type, Length: 1010, dtype: object

In [None]:
# exercise:
# Below, we have a list of delicious drink flavors
# We create a sorted Series of strings and assign it to a 'gatorade' variable
flavors = ["Red", "Blue", "Green", "Orange"]
gatorade = pd.Series(flavors).sort_values()

# I'd like to return the Series to its original order 
# (sorted by the numeric index in ascending order). 
# Sort the gatorade Series by index.
# Assign the result to an 'original' variable.
original = gatorade.sort_index()

## Extract Series Value by Index Position
- Use the `iloc` accessor to extract a **Series** value by its index position.
- `iloc` is short for "index location".
- Python's list slicing syntaxes (slices, slices from start, slices to end, etc.) are supported with **Series** objects.

In [118]:
pokemon = pd.read_csv('pokemon.csv', usecols=['Name']).squeeze('columns')
pokemon.head()

0     Bulbasaur
1       Ivysaur
2      Venusaur
3    Charmander
4    Charmeleon
Name: Name, dtype: object

In [119]:
pokemon.iloc[0]
# returns the value connected with the first index

'Bulbasaur'

In [122]:
pokemon.iloc[[200,300,400]]
# retrurns the results in a new series

pokemon.iloc[28:36]
# slicing works as usual

pokemon.iloc[-1]

'Iron Leaves'

## Extract Series Value by Index Label
- Use the `loc` accessor to extract a **Series** value by its index label.
- Pass a list to extract multiple values by index label.
- If one index label/position in the list does not exist, Pandas will raise an error.

In [123]:
pokemon = pd.read_csv('pokemon.csv', index_col='Name').squeeze('columns')
pokemon.head()

Name
Bulbasaur     Grass, Poison
Ivysaur       Grass, Poison
Venusaur      Grass, Poison
Charmander             Fire
Charmeleon             Fire
Name: Type, dtype: object

In [126]:
pokemon.loc['Bulbasaur']
# returns the same as
pokemon.iloc[0]

'Grass, Poison'

In [None]:
# exercise:
# I have a dictionary that maps guitar types to their colors
guitars_dict = {
    "Fender Telecaster": "Baby Blue",
    "Gibson Les Paul": "Sunburst",
    "ESP Eclipse": "Dark Green"
}

# Create a new Series object, passing in the guitars_dict dictionary as the data source.
# Assign the resulting Series to a "guitars" variable.
guitars = pd.Series(guitars_dict)

# Access the value for the index position of 0 within the "guitars" Series.
# Assign the value to a "fender_color" variable.
fender_color = guitars.iloc[0]

# Access the value for the index label of "Gibson Les Paul" in the "guitars" Series.
# Assign the value to a "gibson_color" variable.
gibson_color = guitars.loc['Gibson Les Paul']

# Access the value for the index label of "ESP Eclipse" in the "guitars" Series.
# Assign the value to a "esp_color" variable.
esp_color = guitars.iloc[-1]

## The get Method on a Series
- The `get` method extracts a **Series** value by index label. It is an alternative option to square brackets.
- The `get` method's second argument sets the fallback value to return if the label/position does not exist.

In [None]:
pokemon = pd.read_csv('pokemon.csv', index_col='Name').squeeze('columns')
pokemon.head()


In [129]:
pokemon.get('Moltres')
# when not sure if the index exists
pokemon.get('Digimon')
# will not give me an error and I can provide a fallback value:
pokemon.get('Digimon', 'Nonexistent')

'Nonexistent'

In [130]:
pokemon.get(['Moltres', 'Digimon'], 'One of the values in the list was not found.')

# get method is more resilient approach then .loc, because it can handle situations, where the index does not exists
# very useful for user inputs
# do not use with index positions, only with index labels


'One of the values in the list was not found.'

## Overwrite a Series Value
- Use the `loc/iloc` accessor to target an index label/position, then use an equal sign to provide a new value.

In [135]:
pokemon = pd.read_csv('pokemon.csv', usecols=['Name']).squeeze('columns')


In [136]:
pokemon.iloc[0] = 'Borisaur'
pokemon.head()

0      Borisaur
1       Ivysaur
2      Venusaur
3    Charmander
4    Charmeleon
Name: Name, dtype: object

In [137]:
pokemon.iloc[[1,2,4]] = ['Firemon', 'Flamemon', 'Blazemon']
pokemon.head()

0      Borisaur
1       Firemon
2      Flamemon
3    Charmander
4      Blazemon
Name: Name, dtype: object

In [139]:
pokemon = pd.read_csv('pokemon.csv', index_col='Name').squeeze('columns')
pokemon.head()

Name
Bulbasaur     Grass, Poison
Ivysaur       Grass, Poison
Venusaur      Grass, Poison
Charmander             Fire
Charmeleon             Fire
Name: Type, dtype: object

In [140]:
pokemon.loc['Bulbasaur'] = 'Awesomeness'
pokemon.head()

Name
Bulbasaur       Awesomeness
Ivysaur       Grass, Poison
Venusaur      Grass, Poison
Charmander             Fire
Charmeleon             Fire
Name: Type, dtype: object

## The copy Method
- A **copy** is a duplicate/replica of an object.
- Changes to a copy do not modify the original object.
- A **view** is a different way of looking at the *same* data.
- Changes to a view *do* modify the original object.
- The `copy` method creates a copy of a pandas object.

In [142]:
pokemon_df = pd.read_csv('pokemon.csv', usecols=['Name'])
pokemon_series = pokemon_df.squeeze('columns')

In [143]:
pokemon_series.iloc[0] = 'Whatever'
pokemon_series.head()

0      Whatever
1       Ivysaur
2      Venusaur
3    Charmander
4    Charmeleon
Name: Name, dtype: object

In [144]:
pokemon_df.head()
# pokemon_series is a view of pokemon_df

Unnamed: 0,Name
0,Whatever
1,Ivysaur
2,Venusaur
3,Charmander
4,Charmeleon


In [145]:
# when we don't want to change the data source, we can create a copy:
pokemon_df = pd.read_csv('pokemon.csv', usecols=['Name'])
pokemon_series = pokemon_df.squeeze('columns').copy()

In [147]:
pokemon_series.iloc[0] = 'Whatever'
pokemon_df.head()


Unnamed: 0,Name
0,Bulbasaur
1,Ivysaur
2,Venusaur
3,Charmander
4,Charmeleon


## Math Methods on Series Objects
- The `count` method returns the number of values in the **Series**. It excludes missing values; the `size` attribute includes missing values.
- The `sum` method adds together the **Series's** values.
- The `product` method multiplies together the **Series's** values.
- The `mean` method calculates the average of the **Series's** values.
- The `std` method calculates the standard deviation of the **Series's** values.
- The `max` method returns the largest value in the **Series**.
- The `min` method returns the smallest value in the **Series**.
- The `median` method returns the median of the **Series** (the value in the middle).
- The `mode` method returns the mode of the **Series** (the most frequent alue).
- The `describe` method returns a summary with various mathematical calculations.

## Broadcasting
- **Broadcasting** describes the process of applying an arithmetic operation to an array (i.e., a **Series**).
- We can combine mathematical operators with a **Series** to apply the mathematical operation to every value.
- There are also methods to accomplish the same results (`add`, `sub`, `mul`, `div`, etc.)

## The value_counts Method
- The `value_counts` method returns the number of times each unique value occurs in the **Series**.
- The `normalize` parameter returns the relative frequencies/percentages of the values instead of the counts.

## The apply Method
- The `apply` method accepts a function. It invokes that function on every `Series` value.

## The map Method
- The `map` method "maps" or connects each **Series** values to another value.
- We can pass the method a dictionary or a **Series**. Both types connects keys to values.
- The `map` method uses our argument to connect or bridge together the values.