In [21]:
import numpy as np
import pandas as pd

The Pandas Series Object

A Pandas Series is a one-dimensional array of indexed data.

In [22]:
data = pd.Series([1, 2, 3, 5, 8, 13])
data

0     1
1     2
2     3
3     5
4     8
5    13
dtype: int64

The Series combines a sequence of values with an explicit sequence of indices, which we can access with the values and index attributes.

In [23]:
# NumPy array
data.values

array([ 1,  2,  3,  5,  8, 13])

In [24]:
# pd.Index array-like object
data.index

RangeIndex(start=0, stop=6, step=1)

Series as a generalized NumPy array

In [25]:
# non-integer index
data =pd.Series([1, 2, 3, 5, 8, 13], index=['first', 'second', 'third', 'fourth', 'fifth', 'sixth'])
data

first      1
second     2
third      3
fourth     5
fifth      8
sixth     13
dtype: int64

In [26]:
data["third"]

np.int64(3)

Even non-sequential or non-contiguous values can be used as index

Series as a specialized Dictionary

In [27]:
population_dict = {'California': 39538223,
                   'Texas': 29145505,
                   'Florida': 21538187,
                   'New York': 20201249,
                   'Pennsylvania': 13002700}
population = pd.Series(population_dict)
population

California      39538223
Texas           29145505
Florida         21538187
New York        20201249
Pennsylvania    13002700
dtype: int64

The Pandas DataFrame Object

DataFrame as a generalized NumPy array

In [28]:
area_dict = {'California': 423967,
             'Texas': 695662,
             'Florida': 170312,
             'New York': 141297,
             'Pennsylvania': 119280}
area = pd.Series(area_dict)
area

California      423967
Texas           695662
Florida         170312
New York        141297
Pennsylvania    119280
dtype: int64

In [29]:
states = pd.DataFrame({'population': population,
                       'area': area})
states

Unnamed: 0,population,area
California,39538223,423967
Texas,29145505,695662
Florida,21538187,170312
New York,20201249,141297
Pennsylvania,13002700,119280


In [30]:
states.index

Index(['California', 'Texas', 'Florida', 'New York', 'Pennsylvania'], dtype='object')

In [31]:
states.columns

Index(['population', 'area'], dtype='object')

DataFrame as a specialized dictionary

In [32]:
states['area']

California      423967
Texas           695662
Florida         170312
New York        141297
Pennsylvania    119280
Name: area, dtype: int64

Notice a potential point of confusion here...
In a two-dimensional NumPy array data[0] would refer to the first row in the array.
In a DataFrame data['col0'] will return the first column which is a Series object.

Constructing DataFrame objects

From a single Series object

In [33]:
pd.DataFrame(population, columns=['population'])

Unnamed: 0,population
California,39538223
Texas,29145505
Florida,21538187
New York,20201249
Pennsylvania,13002700


From a list of dictionaries

In [34]:
data = [{'a': i, 'b': 2 * i} for i in range(3)]
pd.DataFrame(data)

Unnamed: 0,a,b
0,0,0
1,1,2
2,2,4


From a dictionary of Series objects

In [35]:
pd.DataFrame({'population': population, 'area': area})

Unnamed: 0,population,area
California,39538223,423967
Texas,29145505,695662
Florida,21538187,170312
New York,20201249,141297
Pennsylvania,13002700,119280


From a two-dimensional NumPy array

In [36]:
pd.DataFrame(np.random.rand(3, 2),
             columns=['foo', 'bar'],
             index=['a', 'b', 'c'])

Unnamed: 0,foo,bar
a,0.096183,0.677245
b,0.498775,0.389856
c,0.145771,0.192085


The Pandas Index object

In [37]:
ind = pd.Index([2, 3, 5, 7, 11])

The Pandas Index object is an immutable array.

Index as an ordered set.

In [38]:
indA = pd.Index([1, 3, 5, 7, 9])
indB = pd.Index([2, 3, 5, 7, 11])
indA.intersection(indB)

Index([3, 5, 7], dtype='int64')

In [39]:
indA.union(indB)

Index([1, 2, 3, 5, 7, 9, 11], dtype='int64')

In [40]:
indA.symmetric_difference(indB)

Index([1, 2, 9, 11], dtype='int64')