# Python Pandas
## Exploring, viewing and selecting data

* 1 - Select Dice and Slice
  * 1.1 - Data overview
    * 1.1.1 - Top rows `df.Head()`
    * 1.1.2 - Bottom rows `df.Tail()`
  * 1.2 - Select columns
    * 1.2.1 - Select single columns
    * 1.2.2 - Select multiple columns
  * 1.3 - Select rows
    * 1.3.1 - Indexing
    * 1.3.2 - Select single row
    * 1.3.3 - Select multiple rows
  * 1.4 - Row and column selection
    * 1.4.1 - Select single column and row
    * 1.4.2 - Select multiple rows and columns

[Official Documentation](https://pandas.pydata.org/pandas-docs/stable/index.html)

# 1 - Select, Dice and Slice
Methods to view and select specific rows and/or columns from a given dataframe (or series)

In [2]:
import pandas as pd

## 2.1 - Data Overview
### 2.1.1 - df.Head()
By default `df.Head()` will show top five rows of a `DataFrame` as well as the headers of for each column. `Head` function allows to have a quick vision of the data set, what can be usefull when it is particularly sizeable. Adding a number inside the brakets will instruct the function to show that number of rows from the data set.
 * `df.Head(3)` - will show the top three rows
 * `df.Head(6)` - will show the top six rows
 
 Start by uploading `Market.csv` dataset

In [3]:
Market = pd.read_csv("Market.csv")
print(Market.shape)

(21, 4)


Above `DataFrame` has 21 rows, and the goal is to see the top 5. 5 is the Default number of rows.

In [4]:
Market.head()

Unnamed: 0,Fund,NPV,Market Value,Fund NAV
0,Fund A,11.85,11.85,10.0
1,Fund B,7.29,7.29,7.1
2,Fund C,2.25,2.25,2.19
3,Fund D,5.15,5.15,5.07
4,Fund E,2352.82,2319.32,2333.72


Show top 3 observations

In [5]:
Market.head(3)

Unnamed: 0,Fund,NPV,Market Value,Fund NAV
0,Fund A,11.85,11.85,10.0
1,Fund B,7.29,7.29,7.1
2,Fund C,2.25,2.25,2.19


### 1.1.2 - df.Tail()
Similar to `df.Head()` except in this case it will show the the bottom five rows. The same rules are applicable for the brakets in the funstion compared to the `Head` function, with exception that instead of showing rows from the top, it will show rows from the bottom of the data set.

In [65]:
Market.tail()

Unnamed: 0,Fund,NPV,Market Value,Fund NAV
16,Fund R,4.67,4.67,4.7
17,Fund S,610.68,610.69,613.99
18,Fund T,909.71,909.7,915.04
19,Fund U,21.03,20.86,21.66
20,Fund V,21.92,21.84,47.23


### 1.1.3 - Change column order

In [67]:
Market[['Fund NAV', 'Market Value', 'NPV', 'Fund']].head()

Unnamed: 0,Fund NAV,Market Value,NPV,Fund
0,10.0,11.85,11.85,Fund A
1,7.1,7.29,7.29,Fund B
2,2.19,2.25,2.25,Fund C
3,5.07,5.15,5.15,Fund D
4,2333.72,2319.32,2352.82,Fund E


## 1.2 - Select Columns
### 1.2.1 - Select a single column
There are a couple ways to select a single column, either by specifying by the name or the location.
 * Selection by Atribute

In [7]:
# Market.Fund
Market[['Fund']].head()

Unnamed: 0,Fund
0,Fund A
1,Fund B
2,Fund C
3,Fund D
4,Fund E


* Selection by label `.loc()` method.

In [8]:
Market.loc[:, ['Fund']].head()

Unnamed: 0,Fund
0,Fund A
1,Fund B
2,Fund C
3,Fund D
4,Fund E


 * Selection by label (.iloc)
 
 For a `.iloc()` method the first column corresponds to value `0`, and the second column corresponds to value `1`.

In [9]:
Market.iloc[: , [0]].head()

Unnamed: 0,Fund
0,Fund A
1,Fund B
2,Fund C
3,Fund D
4,Fund E


#### 1.2.2 - Select multiple columns
There tal tal tal tal
 * Selection by Atribute
     * Specify which columns to be selected
  Check columns. `df.columns`

In [13]:
Market.columns

Index(['Fund', 'NPV', 'Market Value', 'Fund NAV'], dtype='object')

Select `Fund` and `Market Value` columns

In [17]:
Market[['Fund', 'Market Value']].head()

Unnamed: 0,Fund,Market Value
0,Fund A,11.85
1,Fund B,7.29
2,Fund C,2.25
3,Fund D,5.15
4,Fund E,2319.32


Select same columns by label with `.loc` function.
Please note on the outer square brakets, on the left hand side of the coma, the `:` character specifies that I hant all rows, and on the right hand side of the coma I am specifying the `Fund` and `Market Value` columns.

In [19]:
Market.loc[:, ['Fund', 'Market Value']].head()

Unnamed: 0,Fund,Market Value
0,Fund A,11.85
1,Fund B,7.29
2,Fund C,2.25
3,Fund D,5.15
4,Fund E,2319.32


* Selection by Label
     * Specify `range` of columns to be selected from `Fund` to `Market Value`

In [21]:
Market.loc[:, 'Fund': 'Market Value'].head()

Unnamed: 0,Fund,NPV,Market Value
0,Fund A,11.85,11.85
1,Fund B,7.29,7.29
2,Fund C,2.25,2.25
3,Fund D,5.15,5.15
4,Fund E,2352.82,2319.32


* Selection by Atribute
   * Specify range of columns to be selected

It will exclude the last column

In [26]:
Market.iloc[:, 0:3].head()

Unnamed: 0,Fund,NPV,Market Value
0,Fund A,11.85,11.85
1,Fund B,7.29,7.29
2,Fund C,2.25,2.25
3,Fund D,5.15,5.15
4,Fund E,2352.82,2319.32


## 1.3 - Select Rows
### 1.3.1 - Indexing
 * In this case Indexing will be used to demonstration

In [28]:
iMarket = Market.set_index('Fund')
iMarket.head()

Unnamed: 0_level_0,NPV,Market Value,Fund NAV
Fund,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Fund A,11.85,11.85,10.0
Fund B,7.29,7.29,7.1
Fund C,2.25,2.25,2.19
Fund D,5.15,5.15,5.07
Fund E,2352.82,2319.32,2333.72


In [29]:
iMarket.index

Index(['Fund A', 'Fund B', 'Fund C', 'Fund D', 'Fund E', 'Fund F', 'Fund H',
       'Fund I', 'Fund J', 'Fund K', 'Fund L', 'Fund M', 'Fund N', 'Fund O',
       'Fund P', 'Fund Q', 'Fund R', 'Fund S', 'Fund T', 'Fund U', 'Fund V'],
      dtype='object', name='Fund')

### 1.2.2 - Select single row
 * Select by atribute

In [32]:
# iMarket.loc[['Fund B']]
iMarket.loc[['Fund B'], :]

Unnamed: 0_level_0,NPV,Market Value,Fund NAV
Fund,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Fund B,7.29,7.29,7.1


In [36]:
Market.loc[[1], :]

Unnamed: 0,Fund,NPV,Market Value,Fund NAV
1,Fund B,7.29,7.29,7.1


In [38]:
Market.iloc[[1]]
Market.iloc[[1], :]

Unnamed: 0,Fund,NPV,Market Value,Fund NAV
1,Fund B,7.29,7.29,7.1


### 1.2.3 - Select multiple rows
 * Selection by Atribute
     * `Label` selection - Specify each row to be selected

In [40]:
# Market.loc[[1, 5, 6]]
Market.loc[[1, 5, 6], :]

Unnamed: 0,Fund,NPV,Market Value,Fund NAV
1,Fund B,7.29,7.29,7.1
5,Fund F,40.08,40.07,39.83
6,Fund H,307.36,303.47,305.55


In [42]:
# iMarket.loc[['Fund B', 'Fund F', 'Fund H']]
iMarket.loc[['Fund B', 'Fund F', 'Fund H'], :]

Unnamed: 0_level_0,NPV,Market Value,Fund NAV
Fund,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Fund B,7.29,7.29,7.1
Fund F,40.08,40.07,39.83
Fund H,307.36,303.47,305.55


Select a range of columns, from `Fund A` to `Fund F`

In [46]:
# iMarket.loc['Fund A': 'Fund F']
# iMarket.loc[: 'Fund F', :]
iMarket.loc['Fund A': 'Fund F', :]

Unnamed: 0_level_0,NPV,Market Value,Fund NAV
Fund,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Fund A,11.85,11.85,10.0
Fund B,7.29,7.29,7.1
Fund C,2.25,2.25,2.19
Fund D,5.15,5.15,5.07
Fund E,2352.82,2319.32,2333.72
Fund F,40.08,40.07,39.83


Range from `Fund R` to `Fund V`

In [49]:
# iMarket.loc['Fund R': 'Fund V']
# iMarket.loc['Fund R': , :]
iMarket.loc['Fund R': 'Fund V', :]

Unnamed: 0_level_0,NPV,Market Value,Fund NAV
Fund,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Fund R,4.67,4.67,4.7
Fund S,610.68,610.69,613.99
Fund T,909.71,909.7,915.04
Fund U,21.03,20.86,21.66
Fund V,21.92,21.84,47.23


## 1.4 - Row and Column Selection
Use Indexed dataset

### 1.4.1 - Select single column and row
Example, what is the Market Value Fund C?

In [51]:
iMarket.loc[['Fund C'], ['Market Value']]

Unnamed: 0_level_0,Market Value
Fund,Unnamed: 1_level_1
Fund C,2.25


In [54]:
iMarket.iloc[[2], [1]]

Unnamed: 0_level_0,Market Value
Fund,Unnamed: 1_level_1
Fund C,2.25


### 1.4.2 - Select multiple rows and columns

What is the NPV and the NAV for funds B and F

In [60]:
iMarket.loc[['Fund B', 'Fund F'], ['NPV', 'Fund NAV']]

Unnamed: 0_level_0,NPV,Fund NAV
Fund,Unnamed: 1_level_1,Unnamed: 2_level_1
Fund B,7.29,7.1
Fund F,40.08,39.83


What is the NPV, Market Value and NAV for funds A to F

In [63]:
iMarket.loc['Fund A': 'Fund F', 'NPV': 'Fund NAV']

Unnamed: 0_level_0,NPV,Market Value,Fund NAV
Fund,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Fund A,11.85,11.85,10.0
Fund B,7.29,7.29,7.1
Fund C,2.25,2.25,2.19
Fund D,5.15,5.15,5.07
Fund E,2352.82,2319.32,2333.72
Fund F,40.08,40.07,39.83
