## Data Indexing and Selection 
Eber David Gaytan Medina

Introducing Pandas Objects
At the very basic level, Pandas objects can be thought of as enhanced versions of NumPy structured arrays in which the rows and columns are identified with labels rather than simple integer indices. As we will see during the course of this chapter, Pandas provides a host of useful tools, methods, and functionality on top of the basic data structures, but nearly everything that follows will require an understanding of what these structures are. Thus, before we go any further, let's introduce these three fundamental Pandas data structures: the Series, DataFrame, and Index.

In [None]:





import numpy as np
import pandas as pd

data = pd.Series([0.25, 0.5, 0.75, 1.0])
data
0    0.25
1    0.50
2    0.75
3    1.00
dtype: float64

data.values
array([ 0.25,  0.5 ,  0.75,  1.  ])

data.index
RangeIndex(start=0, stop=4, step=1)

data[1]
0.5
data[1:3]
1    0.50
2    0.75
dtype: float64

data = pd.Series([0.25, 0.5, 0.75, 1.0],
                 index=['a', 'b', 'c', 'd'])
data
a    0.25
b    0.50
c    0.75
d    1.00
dtype: float64

data['b']
0.5

data = pd.Series([0.25, 0.5, 0.75, 1.0],
                 index=[2, 5, 3, 7])
data
2    0.25
5    0.50
3    0.75
7    1.00
dtype: float64
data[5]
0.5

population_dict = {'California': 38332521,
                   'Texas': 26448193,
                   'New York': 19651127,
                   'Florida': 19552860,
                   'Illinois': 12882135}
population = pd.Series(population_dict)
population
California    38332521
Florida       19552860
Illinois      12882135
New York      19651127
Texas         26448193
dtype: int64

population['California']
38332521
opulation['California':'Illinois']
California    38332521
Florida       19552860
Illinois      12882135
dtype: int64


>>> pd.Series(data, index=index)


pd.Series([2, 4, 6])
0    2
1    4
2    6
dtype: int64

pd.Series(5, index=[100, 200, 300])
100    5
200    5
300    5
dtype: int64

pd.Series({2:'a', 1:'b', 3:'c'})
1    b
2    a
3    c
dtype: object

pd.Series({2:'a', 1:'b', 3:'c'}, index=[3, 2])
3    c
2    a
dtype: object

area_dict = {'California': 423967, 'Texas': 695662, 'New York': 141297,
             'Florida': 170312, 'Illinois': 149995}
area = pd.Series(area_dict)
area
California    423967
Florida       170312
Illinois      149995
New York      141297
Texas         695662
dtype: int64

states = pd.DataFrame({'population': population,
                       'area': area})
states
area	population
California	423967	38332521
Florida	170312	19552860
Illinois	149995	12882135
New York	141297	19651127
Texas	695662	26448193

states.index
Index(['California', 'Florida', 'Illinois', 'New York', 'Texas'], dtype='object')

states.columns
Index(['area', 'population'], dtype='object')

states['area']
California    423967
Florida       170312
Illinois      149995
New York      141297
Texas         695662

.DataFrame(population, columns=['population'])
population
California	38332521
Florida	19552860
Illinois	12882135
New York	19651127
Texas	26448193
From a list of dicts

data = [{'a': i, 'b': 2 * i}
        for i in range(3)]
pd.DataFrame(data)
a	b
0	0	0
1	1	2
2	2	4

pd.DataFrame([{'a': 1, 'b': 2}, {'b': 3, 'c': 4}])
a	b	c
0	1.0	2	NaN
1	NaN	3	4.0

pd.DataFrame({'population': population,
              'area': area})
area	population
California	423967	38332521
Florida	170312	19552860
Illinois	149995	12882135
New York	141297	19651127
Texas	695662	26448193

pd.DataFrame(np.random.rand(3, 2),
             columns=['foo', 'bar'],
             index=['a', 'b', 'c'])

a	0.865257	0.213169
b	0.442759	0.108267
c	0.047110	0.905718

A = np.zeros(3, dtype=[('A', 'i8'), ('B', 'f8')])
A
array([(0, 0.0), (0, 0.0), (0, 0.0)], 
      dtype=[('A', '<i8'), ('B', '<f8')])
pd.DataFrame(A)
A	B
0	0	0.0
1	0	0.0
2	0	0.0

ind = pd.Index([2, 3, 5, 7, 11])
ind
Int64Index([2, 3, 5, 7, 11], dtype='int64')

ind[1]
3
ind[::2]
Int64Index([2, 5, 11], dtype='int64')

print(ind.size, ind.shape, ind.ndim, ind.dtype)
5 (5,) 1 int64
nd[1] = 0

TypeError                                 Traceback (most recent call last)
<ipython-input-34-40e631c82e8a> in <module>()
----> 1 ind[1] = 0

/Users/jakevdp/anaconda/lib/python3.5/site-packages/pandas/indexes/base.py in __setitem__(self, key, value)
   1243 
   1244     def __setitem__(self, key, value):
-> 1245         raise TypeError("Index does not support mutable operations")
   1246 
   1247     def __getitem__(self, key):


indA = pd.Index([1, 3, 5, 7, 9])
indB = pd.Index([2, 3, 5, 7, 11])
indA & indB  # intersection
Int64Index([3, 5, 7], dtype='int64')
indA | indB  # union
Int64Index([1, 2, 3, 5, 7, 9, 11], dtype='int64')
indA ^ indB  # symmetric difference
Int64Index([1, 2, 9, 11], dtype='int64')
