# Pandas Objects
* Pandas objects can be thought of as enhanced versions of
NumPy structured arrays in which the rows and columns are identified with labels
rather than simple integer indices.

# A. Pandas Series Object:

* Pandas Series is a one-dimensional array of indexed data.
* The Series combines a sequence of values with an explicit sequence of indices,
which we can access with the `values` and `index` attributes.

In [1]:
import pandas as pd

In [2]:
data = pd.Series([0.25, 0.5, 0.75, 1.0])
data

0    0.25
1    0.50
2    0.75
3    1.00
dtype: float64

In [3]:
data.values

array([0.25, 0.5 , 0.75, 1.  ])

In [4]:
data.index

RangeIndex(start=0, stop=4, step=1)

In [5]:
data[1]

0.5

### A.1. `Series` as Generalized NumPy Array:

* The Series object may appear to be basically interchangeable
with a one-dimensional NumPy array. 
* The essential difference is that
while the NumPy array has an implicitly defined integer index used to access the values,
the Pandas Series has an explicitly defined index associated with the values.

In [6]:
data = pd.Series([0.25, 0.5, 0.75, 1.0], index=['a', 'b', 'c', 'd'])

In [7]:
data[0]

0.25

In [8]:
data['a']

0.25

In [9]:
data.a

0.25

In [10]:
data = pd.Series([0.25, 0.5, 0.75, 1.0], index=[2, 5, 3, 7])

In [11]:
data[2]

0.25

In [12]:
data[5]

0.5

### A.2. `Series` as Specialized Dictionary:
* Series is a structure that maps typed keys to a set of typed values.
* The type information
of a Pandas Series makes it more efficient than Python dictionaries for certain
operations.


In [13]:
population_dict = {'California': 39538223, 'Texas': 29145505,
                    'Florida': 21538187, 'New York': 20201249,
                    'Pennsylvania': 13002700}
population = pd.Series(population_dict)

In [14]:
population

California      39538223
Texas           29145505
Florida         21538187
New York        20201249
Pennsylvania    13002700
dtype: int64

In [15]:
population['California']

39538223

In [16]:
population.California

39538223

In [17]:
population['California':'Florida']

California    39538223
Texas         29145505
Florida       21538187
dtype: int64

### A.3. Constructing Series Objects

* `pd.Series(data, index=index)`

In [18]:
# from list or np array
pd.Series([2, 4, 6])

0    2
1    4
2    6
dtype: int64

In [19]:
# from scalar, w/c is repeated to fill the specified index
pd.Series(5, index=[100, 200, 300])

100    5
200    5
300    5
dtype: int64

In [20]:
# from dictionary, in w/c case index defaults to the dictionary keys
a = pd.Series({2:'a', 1:'b', 3:'c'})
a

2    a
1    b
3    c
dtype: object

In [21]:
print(a.values)

['a' 'b' 'c']


In [22]:
print(a.index)

Int64Index([2, 1, 3], dtype='int64')


In [23]:
# In each case, the index can be explicitly set to control the order or the subset of keys used
pd.Series({2:'a', 1:'b', 3:'c'}, index=[1, 2])

1    b
2    a
dtype: object

# B. The Pandas DataFrame Object

### B.1. DataFrame as Generalized NumPy Array:

* If a `Series` is an analog of a one-dimensional array with explicit indices, a `DataFrame` is an analog of a two-dimensional array with explicit row and column indices.
* Think of a `DataFrame` as a sequence of aligned `Series` objects. Here, by “aligned” we mean that they share the same index.

In [25]:
area_dict = {'California': 423967, 'Texas': 695662, 'Florida': 170312, 'New York': 141297, 'Pennsylvania': 119280}
area = pd.Series(area_dict)
area

California      423967
Texas           695662
Florida         170312
New York        141297
Pennsylvania    119280
dtype: int64

In [26]:
population

California      39538223
Texas           29145505
Florida         21538187
New York        20201249
Pennsylvania    13002700
dtype: int64

In [27]:
states = pd.DataFrame({'population': population, 'area': area})
states

Unnamed: 0,population,area
California,39538223,423967
Texas,29145505,695662
Florida,21538187,170312
New York,20201249,141297
Pennsylvania,13002700,119280


In [28]:
states.index

Index(['California', 'Texas', 'Florida', 'New York', 'Pennsylvania'], dtype='object')

In [29]:
states.columns

Index(['population', 'area'], dtype='object')

### B.2. DataFrame as Specialized Dictionary

* Where a dictionary maps a key to a value, a DataFrame maps a column name to a Series of column data.

In [30]:
states['area']

California      423967
Texas           695662
Florida         170312
New York        141297
Pennsylvania    119280
Name: area, dtype: int64

### B.3. Constructing DataFrame Objects

##### B.3.a. From a single Series object :

* A DataFrame is a collection of Series objects, and a single-column DataFrame can be constructed from a single Series:

In [31]:
type(population)

pandas.core.series.Series

In [32]:
pd.DataFrame(population, columns=['population'])

Unnamed: 0,population
California,39538223
Texas,29145505
Florida,21538187
New York,20201249
Pennsylvania,13002700


##### B.3.b. From a list of dicts :

In [33]:
data = [{'a':i, 'b':2*i} for i in range(3)]
data

[{'a': 0, 'b': 0}, {'a': 1, 'b': 2}, {'a': 2, 'b': 4}]

In [34]:
pd.DataFrame(data)

Unnamed: 0,a,b
0,0,0
1,1,2
2,2,4


In [35]:
pd.DataFrame([{'a': 1, 'b': 2}, {'b': 3, 'c': 4}])

Unnamed: 0,a,b,c
0,1.0,2,
1,,3,4.0


##### B.3.c. From a dictionary of Series objects :

In [36]:
pd.DataFrame({'population': population, 'area': area})

Unnamed: 0,population,area
California,39538223,423967
Texas,29145505,695662
Florida,21538187,170312
New York,20201249,141297
Pennsylvania,13002700,119280


##### B.3.d. From a two-dimensional NumPy array :

In [37]:
import numpy as np
pd.DataFrame(np.random.rand(3, 2), columns=['foo', 'bar'], index=['a', 'b', 'c'])

Unnamed: 0,foo,bar
a,0.932996,0.157505
b,0.761165,0.175581
c,0.048158,0.612452


##### B.3.e. From a NumPy structured array :


In [38]:
A = np.zeros(3, dtype=[('A', 'i8'), ('B', 'f8')])
A

array([(0, 0.), (0, 0.), (0, 0.)], dtype=[('A', '<i8'), ('B', '<f8')])

In [39]:
pd.DataFrame(A)

Unnamed: 0,A,B
0,0,0.0
1,0,0.0
2,0,0.0


# C. The Pandas Index Object:

* The `Index` object can be thought of either as immutable array or as an ordered set(technically a multiset, as `Index` objects may contain repeated values).

In [40]:
ind = pd.Index([2, 3, 5, 7, 11])
ind

Int64Index([2, 3, 5, 7, 11], dtype='int64')

### C.1. Index as Immutable Array:

In [41]:
ind[1]

3

In [42]:
ind[::2]

Int64Index([2, 5, 11], dtype='int64')

In [43]:
print(ind.size, ind.shape, ind.ndim, ind.dtype)

5 (5,) 1 int64


In [44]:
ind[1] = 0

TypeError: Index does not support mutable operations

### C.2. Index as Ordered Set:
* Pandas objects are designed to facilitate operations such as joins across datasets, which depend on many aspects of set arithmetic.
* The `Index` object follows many of the conventions used by Python’s built-in set data structure, so that unions, intersections, differences, and other combinations can be computed in a familiar way:

In [45]:
indA = pd.Index([1, 3, 5, 7, 9])
indB = pd.Index([2, 3, 5, 7, 11])

In [46]:
indA.intersection(indB)

Int64Index([3, 5, 7], dtype='int64')

In [47]:
indA.union(indB)

Int64Index([1, 2, 3, 5, 7, 9, 11], dtype='int64')

In [48]:
indA.symmetric_difference(indB)

Int64Index([1, 2, 9, 11], dtype='int64')