In [1]:
import pandas as pd

## How to select a pandas Series from a DataFrame?

- Dataframes are multi-dimensional arrays i.e. they have rows and columns.
- Series are one dimensional arrays i.e. they can be either rows or columns.

In [2]:
# read a dataset of UFO reports into a DataFrame
# Here we are using 'read_table()' to read a csv file for this we are using the argument 'sep='

ufo = pd.read_table('http://bit.ly/uforeports', sep=',')

In [3]:
# 'read_csv()' is equivalent to read_table, except it assumes a comma separator

ufo = pd.read_csv('http://bit.ly/uforeports')

In [4]:
# examine the first 5 rows

ufo.head()

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
0,Ithaca,,TRIANGLE,NY,6/1/1930 22:00
1,Willingboro,,OTHER,NJ,6/30/1930 20:00
2,Holyoke,,OVAL,CO,2/15/1931 14:00
3,Abilene,,DISK,KS,6/1/1931 13:00
4,New York Worlds Fair,,LIGHT,NY,4/18/1933 19:00


In [5]:
# To check th type of the object

type(ufo)

pandas.core.frame.DataFrame

In [6]:
# select the 'City' Series using bracket notation
# It will select the 'City' column of the dataframe and shows it as a Series
# Remember the Bracket notation is case sensitive

ufo['City']

0                      Ithaca
1                 Willingboro
2                     Holyoke
3                     Abilene
4        New York Worlds Fair
                 ...         
18236              Grant Park
18237             Spirit Lake
18238             Eagle River
18239             Eagle River
18240                    Ybor
Name: City, Length: 18241, dtype: object

In [7]:
type(ufo['City'])

pandas.core.series.Series

In [8]:
# Another way, use dot notation

ufo.City

0                      Ithaca
1                 Willingboro
2                     Holyoke
3                     Abilene
4        New York Worlds Fair
                 ...         
18236              Grant Park
18237             Spirit Lake
18238             Eagle River
18239             Eagle River
18240                    Ybor
Name: City, Length: 18241, dtype: object

- Everytime a series is added to a dataframe, it's name automatically becomes an attribute of that dataframe.

**Bracket notation will always work, whereas dot notation has limitations:**

- Dot notation doesn't work if there are spaces in the Series name like "Colors Reported".
- Dot notation doesn't work if the Series has the same name as a DataFrame method or attribute (like 'head' or 'shape').
- Dot notation can't be used to define the name of a new Series (see below).

In [10]:
# create a new 'Location' Series (must use bracket notation to define the Series name)

ufo['Location'] = ufo.City + ', ' + ufo.State
ufo.head()

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time,Location
0,Ithaca,,TRIANGLE,NY,6/1/1930 22:00,"Ithaca, NY"
1,Willingboro,,OTHER,NJ,6/30/1930 20:00,"Willingboro, NJ"
2,Holyoke,,OVAL,CO,2/15/1931 14:00,"Holyoke, CO"
3,Abilene,,DISK,KS,6/1/1931 13:00,"Abilene, KS"
4,New York Worlds Fair,,LIGHT,NY,4/18/1933 19:00,"New York Worlds Fair, NY"


In [11]:
# Seeing the columns names of the dataframe

ufo.columns

Index(['City', 'Colors Reported', 'Shape Reported', 'State', 'Time',
       'Location'],
      dtype='object')