# 04 - pandas gotchas

* [02-data-indexing-and-selection.ipynb](https://github.com/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/03.02-Data-Indexing-and-Selection.ipynb) -- VanderPlas


In [None]:
import pandas as pd
import numpy as np

## Indexing with pandas

"...slicing and indexing conventions can be a source of confusion. For example, if your Series has an explicit integer index, an indexing operation such as data[1] will use the explicit indices, while a slicing operation like data[1:3] will use the implicit Python-style index."

In [None]:
area = pd.Series({'California': 423967, 'Texas': 695662,
                  'New York': 141297, 'Florida': 170312,
                  'Illinois': 149995})
pop = pd.Series({'California': 38332521, 'Texas': 26448193,
                 'New York': 19651127, 'Florida': 19552860,
                 'Illinois': 12882135})
data = pd.DataFrame({'area':area, 'pop':pop})
data

Unnamed: 0,area,pop
California,423967,38332521
Texas,695662,26448193
New York,141297,19651127
Florida,170312,19552860
Illinois,149995,12882135


In [None]:
data['pop']

California    38332521
Texas         26448193
New York      19651127
Florida       19552860
Illinois      12882135
Name: pop, dtype: int64

In [None]:
data.area

California    423967
Texas         695662
New York      141297
Florida       170312
Illinois      149995
Name: area, dtype: int64

In [None]:
data.area is data['area']

True

In [None]:
data.pop is data['pop']

False

## pandas indexing can be confusing

Cell 10: Among these, slicing may be the source of the most confusion. Notice that when slicing with an explicit index (i.e., data['a':'c']), the final index is included in the slice, while when slicing with an implicit index (i.e., data[0:2]), the final index is excluded from the slice.

In [None]:
# This returns a...?  A: pandas series containing a column
data['pop']

California    38332521
Texas         26448193
New York      19651127
Florida       19552860
Illinois      12882135
Name: pop, dtype: int64

In [None]:
# This returns a...?  A: DataFrame with the first row
data[:1]

Unnamed: 0,area,pop
California,423967,38332521


In [None]:
# This returns a...?   A: empty DataFrame with column names
data[:0]

Unnamed: 0,area,pop


In [None]:
# This returns a...?   A: KeyError
#data[0]

In [None]:
# This returns the entire DataFrame -- copy or slice?
data.loc[:, :'pop']

Unnamed: 0,area,pop
California,423967,38332521
Texas,695662,26448193
New York,141297,19651127
Florida,170312,19552860
Illinois,149995,12882135


In [None]:
# Given the previous cell, is the following line True of False?  A: False
data.loc[:, :'pop'] is data

False

# Chained indexing

We've used method chaining extensively.  What about chained indexing? 

In general, be careful!

In [None]:
data[:1]

Unnamed: 0,area,pop
California,423967,38332521


In [None]:
data[:1]['pop']

California    38332521
Name: pop, dtype: int64

In [None]:
data[:1]['pop'] = 3

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


The previous cell generates a SettingWithCopyWarning.  

Can you predict the result from the next cell?

In [None]:
data

Unnamed: 0,area,pop
California,423967,3
Texas,695662,26448193
New York,141297,19651127
Florida,170312,19552860
Illinois,149995,12882135


# Returning a view or copy

* [returning a view versus a copy](https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy) -- pandas.pydata.org
* [why does assignment fail when using chained indexing](https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#why-does-assignment-fail-when-using-chained-indexing) -- pandas.pydata.org
* [settingwithcopywarning](https://stackoverflow.com/questions/20625582/how-to-deal-with-settingwithcopywarning-in-pandas) -- sfo
  * the good: this sfo exchange points to the documentation
  * the bad: the sfo answer tells you how to suppress the warning!!!
* moral: 
  * don't suppress warnings
  * make sure you understand what you're doing with pandas indexing (and in general)
  * don't trust sfo!!



In [None]:
data = pd.DataFrame({'area':area, 'pop':pop})

df = data[:1]['pop']
df = 3

data

Unnamed: 0,area,pop
California,423967,38332521
Texas,695662,26448193
New York,141297,19651127
Florida,170312,19552860
Illinois,149995,12882135


In [None]:
data[:1]['pop'] = 3

data

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


Unnamed: 0,area,pop
California,423967,3
Texas,695662,26448193
New York,141297,19651127
Florida,170312,19552860
Illinois,149995,12882135


# Indexers: loc, iloc, and ix

See Cell 10 in [03.02-Data-Indexing-and-Selection](https://github.com/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/03.02-Data-Indexing-and-Selection.ipynb)

* [What's new in v1.0.0 (Jan 29, 2020)](https://pandas.pydata.org/pandas-docs/version/1.0.0/whatsnew/v1.0.0.html)
* [v1.0.0 release notes](https://pandas.pydata.org/pandas-docs/version/1.0.0/whatsnew/v1.0.0.html#removal-of-prior-version-deprecations-changes)

In [None]:
# Can you predict the result of the next line?
# data.ix[:1, 'pop']