# Chapter 13. Introducing Pandas Objects

In [2]:
import numpy as np
import pandas as pd

## The Pandas Series Object

#### A Pandas Series is a one-dimensional array of indexed data. It can be created from a list or array

In [4]:
data = pd.Series([0.25,0.5,0.75,1.0])
data

0    0.25
1    0.50
2    0.75
3    1.00
dtype: float64

In [5]:
data.values

array([0.25, 0.5 , 0.75, 1.  ])

In [6]:
data.index

RangeIndex(start=0, stop=4, step=1)

In [7]:
data[1]

np.float64(0.5)

#### The essential difference is that while the NumPy array has an implicitly defined integer index used to access the values, the Pandas Series has an explicitly defined index associated with the values.

In [8]:
data = pd.Series([0.25,0.5,0.75,1.0],
                 index = ['a','b','c','d'])

In [9]:
data

a    0.25
b    0.50
c    0.75
d    1.00
dtype: float64

In [11]:
data['c']

np.float64(0.75)

#### you can think of a Pandas Series a bit like a specialization of a Python dictionary. A dictionary is a structure that maps arbitrary keys to a set of arbitrary values, and a Series is a structure that maps typed keys to a set of typed values. This typing is important: just as the type-specific compiled code behind a NumPy array makes it more efficient than a Python list for certain operations, the type information of a Pandas Series makes it more efficient than Python dictionaries for certain operations.

In [13]:
# Series as dictionary
population_dict = {'California': 39538223,'Texas': 2914505,
                   'Florida': 21539197,'New York':20201249,'Pennsylvania':13002700}
population = pd.Series(population_dict)
population

California      39538223
Texas            2914505
Florida         21539197
New York        20201249
Pennsylvania    13002700
dtype: int64

In [14]:
pd.Series(5,index=[100,200,300])

100    5
200    5
300    5
dtype: int64

In [15]:
pd.Series({1:'a',2:'b',3:'c'})

1    a
2    b
3    c
dtype: object

In [16]:
pd.Series({1:'a',2:'b',3:'c'},index = [1,2])

1    a
2    b
dtype: object

## The Pandas DataFrame Object

#### a DataFrame is an analog of a two-dimensional array with explicit row and column indices. Just as you might think of a two-dimensional array as an ordered sequence of aligned one-dimensional columns, you can think of a DataFrame as a sequence of aligned Series objects. Here, by “aligned” we mean that they share the same index.

In [17]:
area_dict = {'California': 423967, 'Texas': 695662,
'Florida': 170312,
'New York': 141297, 'Pennsylvania': 119280}

In [19]:
area = pd.Series(area_dict)
area

California      423967
Texas           695662
Florida         170312
New York        141297
Pennsylvania    119280
dtype: int64

In [20]:
population

California      39538223
Texas            2914505
Florida         21539197
New York        20201249
Pennsylvania    13002700
dtype: int64

In [21]:
states = pd.DataFrame({'population':population,'area':area})
states

Unnamed: 0,population,area
California,39538223,423967
Texas,2914505,695662
Florida,21539197,170312
New York,20201249,141297
Pennsylvania,13002700,119280


In [26]:
states['area']

California      423967
Texas           695662
Florida         170312
New York        141297
Pennsylvania    119280
Name: area, dtype: int64

In [27]:
values = {
    'name':['Alice','Kelvin','John'],
    'age':[12,32,21],
    'gender':['female','male','male']
}
details = pd.DataFrame(values)

In [28]:
details

Unnamed: 0,name,age,gender
0,Alice,12,female
1,Kelvin,32,male
2,John,21,male


In [29]:
details.index

RangeIndex(start=0, stop=3, step=1)

In [30]:
details.columns

Index(['name', 'age', 'gender'], dtype='object')

In [34]:
details['name']

0     Alice
1    Kelvin
2      John
Name: name, dtype: object

### Constructing DataFrame Objects
A Pandas DataFrame can be constructed in a variety of ways.

#### From a single Series object

In [35]:
pd.DataFrame(population,columns=['population'])
population

California      39538223
Texas            2914505
Florida         21539197
New York        20201249
Pennsylvania    13002700
dtype: int64

#### From a list of dicts

In [36]:
data = [{'a':i,'b':2*i}for i in range(3)]
pd.DataFrame(data)

Unnamed: 0,a,b
0,0,0
1,1,2
2,2,4


### From a two-dimensional NumPy array

In [37]:
pd.DataFrame(np.random.rand(3,2),
             columns=['foo','bar'],
             index=['a','b','c'])

Unnamed: 0,foo,bar
a,0.267385,0.616157
b,0.737944,0.032408
c,0.127843,0.964488


In [38]:
indA = pd.Index([1,3,5,7,9])
indB = pd.Index([2,3,5,7,11])

In [39]:
indA.intersection(indB)

Index([3, 5, 7], dtype='int64')

In [40]:
indA.union(indB)

Index([1, 2, 3, 5, 7, 9, 11], dtype='int64')

In [42]:
indA.symmetric_difference(indB)

Index([1, 2, 9, 11], dtype='int64')