# Pandas

## Sections
1. <a href= #intro /> Introduction
2. <a href= #df /> DataFrame
3. <a href= #index /> Data Indexing and Selection
4. <a href= #indexing /> Array indexing

<a id='intro'/>

## Introduction

In [None]:
import pandas as pd

pd.__version__

A Pandas Series is a one-dimensional array of indexed data. It can be created from a list or array as follows:

In [None]:
## Pandas Series Object
data = pd.Series([0.25, 0.5, 0.75, 1.0])
data

As we see in the preceding output, the Series wraps both a sequence of values and a
sequence of indices, which we can access with the values and index attributes.

In [None]:
data.values

In [None]:
data.index

In [None]:
""" Like with a NumPy array, data can be accessed by the associated index via the familiar
Python square-bracket notation """
data[1]

In [None]:
#Slicing 
data[1:3]

The essential difference is the presence
of the index: while the NumPy array has an implicitly defined integer index used
to access the values, the Pandas Series has an explicitly defined index associated with
the values.

In [None]:
data = pd.Series([0.25, 0.5, 0.75, 1.0], index=['a', 'b', 'c', 'd'])
data['b']

In [None]:
#We can even use noncontiguous or nonsequential indices
data = pd.Series([0.25, 0.5, 0.75, 1.0], index=[2, 5, 3, 7])
data[5]

In [None]:
""" We can make the Series-as-dictionary analogy even more clear by constructing a
Series object directly from a Python dictionary """

population_dict = {'California': 38332521,
'Texas': 26448193,
'New York': 19651127,
'Florida': 19552860,
'Illinois': 12882135}
population = pd.Series(population_dict)
population

In [None]:
population['California': 'Illinois']

<a id = 'df' />

## 2. DataFrame 
The next fundamental structure in Pandas is the DataFrame. Like the Series object
discussed in the previous section, the DataFrame can be thought of either as a generalization
of a NumPy array, or as a specialization of a Python dictionary. We’ll now
take a look at each of these perspectives.

In [None]:
area_dict = {'California': 423967, 'Texas': 695662, 'New York': 141297,
'Florida': 170312, 'Illinois': 149995}

In [None]:
area = pd.Series(area_dict)

In [None]:
states = pd.DataFrame({'population': population,
'area': area})

In [None]:
states

In [None]:
"""Like the Series object, the DataFrame has an index attribute that gives access to the
index labels"""
states.index

In [None]:
"""Additionally, the DataFrame has a columns attribute, which is an Index object holding
the column labels"""

states.columns

### 2.1 DataFrame as a specialized dictionary

In [None]:
states['area']

Notice the potential point of confusion here: in a two-dimensional NumPy array,
data[0] will return the first row. For a DataFrame, data['col0'] will return the first
column. Because of this, it is probably better to think about DataFrames as generalized
dictionaries rather than generalized arrays

### 2.2 Pandas Index Object
We have seen here that both the Series and DataFrame objects contain an explicit
index that lets you reference and modify data. This Index object is an interesting
structure in itself, and it can be thought of either as an immutable array or as an
ordered set (technically a multiset, as Index objects may contain repeated values).
Those views have some interesting consequences in the operations available on Index
objects. As a simple example, let’s construct an Index from a list of integers

In [None]:
ind = pd.Index([2, 3, 5, 7, 11])
ind

In [None]:
# Index is immutable
ind[1] = 0

### 2.3 Index as ordered set
Pandas objects are designed to facilitate operations such as joins across datasets, which depend on many aspects of set arithmetic. The Index object follows many of the conventions used by Python’s built-in set data structure, so that unions, intersections, differences, and other combinations can be computed in a familiar way

In [None]:
indA = pd.Index([1, 3, 5, 7, 9])
indB = pd.Index([2, 3, 5, 7, 11])
indA & indB # intersection



In [None]:
indA | indB # union

In [None]:
indA ^ indB # symmetric difference

<a id ='index' />

## 3. Data Indexing and Selection

In [None]:
# Data selection in DF
area = pd.Series({'California': 423967, 'Texas': 695662,
'New York': 141297, 'Florida': 170312,
'Illinois': 149995})
pop = pd.Series({'California': 38332521, 'Texas': 26448193,
'New York': 19651127, 'Florida': 19552860,
'Illinois': 12882135})
data = pd.DataFrame({'area':area, 'pop':pop})
data

In [None]:
data['area']

In [None]:
# Membership operator
'area' in data

In [None]:
# Keys in the series object
data.keys()

DF objects can even be modified with a dictionary-like syntax. Just as you can
extend a dictionary by assigning to a new key, you can extend a Series by assigning
to a new index value

Like with the Series objects discussed earlier, this dictionary-style syntax can also be
used to modify the object, in this case to add a new column

In [None]:
data['density'] = data['pop'] / data['area']
data

In [None]:
# We can examine the raw underlying data array using the values attribute
data.values

In [None]:
# we can transpose the full DataFrame to swap rows and columns
data.T

<a id = 'indexing'/>

##  4. Array-style indexing, we need another convention. Here Pandas again uses the loc, iloc indexers

In [None]:
data.iloc[:3, :2] #integers

In [None]:
data.loc[:'Illinois', :'pop'] #Exact indexes