### Data Selection in Series
`Series` ~ Python dictionary in many ways.

In [1]:
# Series as a dictionary
import pandas as pd
data = pd.Series([0.25, 0.5, 0.75, 1.0],
                 index=['a', 'b', 'c', 'd'])
data

a    0.25
b    0.50
c    0.75
d    1.00
dtype: float64

In [2]:
# Print a single value
print(data['b'])

# Check for key in series
print('a' in data)

# Print the keys
print(data.keys())

# Print the Series as items
print(data.items())

0.5
True
Index(['a', 'b', 'c', 'd'], dtype='object')
<zip object at 0x000001E940421848>


In [3]:
# Extending the Series with a new key-value pair
data['e'] = 1.2
data

a    0.25
b    0.50
c    0.75
d    1.00
e    1.20
dtype: float64

### Series as a One-D array
We can do *slicing, maksing and fancy indexing* in series as well.

In [4]:
# Slicing by explicit index
print(data['a':'c'])

# Slicing by implicit index
print(data[0:2])

# masking
print(data[(data>0.3) & (data<0.8)])

# Fancy indexing
print(data[['a', 'e']])

a    0.25
b    0.50
c    0.75
dtype: float64
a    0.25
b    0.50
dtype: float64
b    0.50
c    0.75
dtype: float64
a    0.25
e    1.20
dtype: float64


### Indexing: loc, iloc

In [5]:
## THE CONFUSION OF INDEXING & SLICING
# Series
data = pd.Series(['a', 'b', 'c'], index=[1, 3, 5])

# explicit index when indexing
print(data[1])

# implicit index when slicing
print(data[1:3])

a
3    b
5    c
dtype: object


Because of this there are Pandas Indexer attributes that explicitly expose certain indexing schemes. These are not functional methods, but attributes that expose particular slicing interface to the data in the `Series`.

**.loc\[ \]** attribute:
<br> They always reference using the **explicit** index.

In [6]:
print(data.loc[1])

print(data.loc[1:3])

a
1    a
3    b
dtype: object


**.iloc\[ \]** attribute:
<br> They always use the **implicit** index.

In [7]:
print(data.iloc[1])

print(data.iloc[1:3])

b
3    b
5    c
dtype: object


One guiding principle is to use **explicit** rather than **implicit**. The explicit nature of loc and iloc make them very useful in maintaining clean and readable code; especially in the case of integer indices. 

### Data selection in Dataframe
- **Dataframe as a dictionary**
The first analogu is to consider a DataFrame as a dictionary of related `Series` objects. 

In [8]:
area = pd.Series({'California': 423967, 'Texas': 695662,
                  'New York': 141297, 'Florida': 170312,
                  'Illinois': 149995})
pop = pd.Series({'California': 38332521, 'Texas': 26448193,
                 'New York': 19651127, 'Florida': 19552860,
                 'Illinois': 12882135})
data = pd.DataFrame({'area':area, 'pop':pop})
data

Unnamed: 0,area,pop
California,423967,38332521
Texas,695662,26448193
New York,141297,19651127
Florida,170312,19552860
Illinois,149995,12882135


In [9]:
# Access the induvidual Series using the key value/ the column name
print(data['area'])

# Attribute style call to the same - Equivalent statement
print(data.area)

California    423967
Texas         695662
New York      141297
Florida       170312
Illinois      149995
Name: area, dtype: int64
California    423967
Texas         695662
New York      141297
Florida       170312
Illinois      149995
Name: area, dtype: int64


Attribute style access to columns is not possible when the column names are not strings. For column assignment **avoid** attribute style indexing.

In [10]:
data['density'] = data['pop']/data['area']
data

Unnamed: 0,area,pop,density
California,423967,38332521,90.413926
Texas,695662,26448193,38.01874
New York,141297,19651127,139.076746
Florida,170312,19552860,114.806121
Illinois,149995,12882135,85.883763


- **Dataframes as 2D arrays**
We can see them as an enhanced version of the 2D arrays. We can examine the raw underlying data array using the value attributes.

In [11]:
# Printing the values in the array using values attribute
print(data.values)

# Transpose of dataframe
print(data.T)

[[4.23967000e+05 3.83325210e+07 9.04139261e+01]
 [6.95662000e+05 2.64481930e+07 3.80187404e+01]
 [1.41297000e+05 1.96511270e+07 1.39076746e+02]
 [1.70312000e+05 1.95528600e+07 1.14806121e+02]
 [1.49995000e+05 1.28821350e+07 8.58837628e+01]]
           California         Texas      New York       Florida      Illinois
area     4.239670e+05  6.956620e+05  1.412970e+05  1.703120e+05  1.499950e+05
pop      3.833252e+07  2.644819e+07  1.965113e+07  1.955286e+07  1.288214e+07
density  9.041393e+01  3.801874e+01  1.390767e+02  1.148061e+02  8.588376e+01


In [12]:
# Access a row using a single index
print(data.values[0])

# Access a column using a single "index"
print(data['area'])

[4.23967000e+05 3.83325210e+07 9.04139261e+01]
California    423967
Texas         695662
New York      141297
Florida       170312
Illinois      149995
Name: area, dtype: int64


### Using .loc[ ], .iloc[ ] Indexer attributes

In [13]:
# .iloc() uses implicit indexing
print(data.iloc[:3, :2])

# .loc() uses explicit indexing
print(data.loc[:'Illinois', :'pop'])

              area       pop
California  423967  38332521
Texas       695662  26448193
New York    141297  19651127
              area       pop
California  423967  38332521
Texas       695662  26448193
New York    141297  19651127
Florida     170312  19552860
Illinois    149995  12882135


In [14]:
# Masking and fancy indexing in Dataframes
print(data.loc[data.density > 100, ['pop', 'density']])

               pop     density
New York  19651127  139.076746
Florida   19552860  114.806121


In [15]:
# Indexing refers to columns, slicing refers to rows
print(data['Florida':'Illinois'])

            area       pop     density
Florida   170312  19552860  114.806121
Illinois  149995  12882135   85.883763


In [16]:
# Slicing can refer to rows by number too
print(data[1:3]) # Prints the second and third row

            area       pop     density
Texas     695662  26448193   38.018740
New York  141297  19651127  139.076746


In [17]:
# Direct masking is interpreted row-wise rather than column wise
print(data[data.density > 100])

            area       pop     density
New York  141297  19651127  139.076746
Florida   170312  19552860  114.806121


---