# Selecting Subsets of Data from a Series

## Using Dot Notation to Select a Column as a Series
Previously we learned how to use *just the brackets* to select a single column as a Series. Another common way to do this uses dot notation. Place the column name following a dot after the name of your DataFrame. Let's begin by reading in the movie dataset and setting the index as the title.

In [None]:
import pandas as pd
movie = pd.read_csv('../data/movie.csv', index_col='title')
movie.head(3)

Instead of using *just the brackets* to select a single column, you can use dot notation. Let's select the `year` column in such a manner.

In [None]:
movie.year.head(3)

### I don't recommend doing this
Although this is valid pandas syntax I don't recommend using this notation for the following reasons:

* You cannot select columns with spaces in them
* You cannot select columns that have the same name as a pandas method such as `count`
* You cannot use a variable name that is assigned to the name of a column

Using *just the brackets* **always** works so I recommend doing the following instead:

In [None]:
movie['year'].head(3)

### Why even know about this?
pandas is written differently by different people and you will definitely see this syntax around, so it's important to be aware of it.

## Selecting Subsets of Data From a Series
Selecting subsets of data from a Series is very similar to that as a DataFrame. Since there are no columns in a Series, there isn't a need to use *just the brackets*. Instead, you can do all of your subset selection with `loc` and `iloc`. Let's select the `imdb_score` column as a Series and output the head.

In [None]:
imdb = movie['imdb_score']
imdb.head(3)

### Selection with a scalar, a list, and a slice
Just like with a DataFrame, both `loc` and `iloc` accept either a single scalar, a list, or a slice. The `loc` indexer also accepts a boolean Series/array which will be covered in a later chapter. Let's select the IMDB score for 'Forrest Gump'. Since we are selecting a single label, only the value is returned.

In [None]:
imdb.loc['Forrest Gump']

Select the scores for both 'Forrest Gump' and 'Avatar' with a list. Notice that a Series is returned.

In [None]:
locs = ['Forrest Gump', 'Avatar']
imdb.loc[locs]

Select every 100th movie from 'Avatar' to 'Forrest Gump' with slice notation:

In [None]:
imdb.loc['Avatar':'Forrest Gump':100]

### Repeat with `iloc`
The `iloc` indexer works analogously as `loc` on Series but only uses integer location. Let's make selections with a single integer, a list of integers, and a slice of integers. We'll begin by selecting the score of the 21st movie (integer location 20).

In [None]:
imdb.iloc[20]

In this example, we select three scores with a list.

In [None]:
ilocs = [10, 20, 30]
imdb.iloc[ilocs]

Here is an example that uses slice notation.

In [None]:
imdb.iloc[3000:3050:10]

### Trouble with *just the brackets*
It is possible to use just the brackets to make the same selections as above. See the following examples:

In [None]:
imdb['Forrest Gump']

In [None]:
imdb['Avatar':'Forrest Gump':100]

In [None]:
ilocs = [10, 20, 30]
imdb[ilocs]

In [None]:
imdb[3000:3050:10]

### Can you spot the problem?
The major issue is that using *just the brackets* is **ambiguous** and **not explicit**. We don't know if we are selecting by label or by integer location. With `loc` and `iloc`, it is clear what our intentions are. I suggest using `loc` and `iloc` for clarity.

## Comparison to Python Lists and Dictionaries
It may be helpful to compare pandas ability to make selections by label and integer location to that of Python lists and dictionaries. Python lists allow for selection of data only through **integer location**. You can use a single integer or slice notation to make the selection but NOT a list of integers. Let's see examples of subset selection of lists using integers:

In [None]:
a_list = [10, 5, 3, 89, 20, 44, 37]

In [None]:
a_list[4]

In [None]:
a_list[-3:]

### Selection by label with Python dictionaries
All values in each dictionary are labeled by a key. We use this key to make single selections. Dictionaries only allow selection with a single label. Slices and lists of labels are not allowed.

In [None]:
d = {'a':1, 'b':2, 't':20, 'z':26, 'A':27}
d['a']

In [None]:
d['A']

### pandas has the power of lists and dictionaries
DataFrames and Series are able to make selections with integers like a list and with labels like a dictionary.

## Exercises

### Exercise 1
<span  style="color:green; font-size:16px">Read in the bikes dataset. We will be using it for the next few questions. Select the wind speed column as a Series and assign it to a variable and output the head. What kind of index does this Series have?</span>

### Exercise 2
<span  style="color:green; font-size:16px">From the wind speed Series, select the integer locations 4 through, but not including 10</span>

### Exercise 3
<span  style="color:green; font-size:16px">Copy and paste your answer to problem 2 below but use `loc` instead. Do you get the same result? Why not?</span>

### Exercise 4
<span  style="color:green; font-size:16px">Read in the movie dataset and set the index to be the title. Select `actor1` as a Series. Who is the `actor1` for 'My Big Fat Greek Wedding'?</span>

### Exercise 5
<span  style="color:green; font-size:16px">Find `actor1` for your favorite two movies?</span>

### Exercise 6
<span  style="color:green; font-size:16px">Select the last 3 values from `actor1` using two different ways?</span>