<center><img src="../day03_numpy_advanced/images/python_day3.png" style="width: 500px;"/></center>


# NumPy: Advanced
***
### Selection, Indexing and Filters

# Program so far 
***
- Python Basics
- Python Programming Constructs
- Data Structures
- Functions
- Object Oriented Programming in Python
- NumPy
- Data Manipulation

# What are we going to learn today?
***
- Indexing and Selection
- Slicing
- Filters
- Pandas

# Refresher: Weather Data
***
```
Date/Time,Temp (C),Dew Point Temp (C),Rel Hum (%),Wind Spd (km/h),Visibility (km),Stn Press (kPa)
2012-01-01 00:00:00,-1.8,-3.9,86,4,8.0,101.24
2012-01-01 01:00:00,-1.8,-3.7,87,4,8.0,101.24
2012-01-01 02:00:00,-1.8,-3.4,89,7,4.0,101.26
2012-01-01 03:00:00,-1.5,-3.2,88,6,4.0,101.27```

In [1]:
import numpy as np

weather = np.genfromtxt("data/weather_small_2012.csv", dtype='|S20', skip_header=1, delimiter=",")
print(type(weather))

<class 'numpy.ndarray'>


<img src="../day03_numpy_advanced/images/icon/Technical-Stuff.png" alt="Concept-Alert" style="width: 100px;float:left; margin-right:15px"/>
<br /> 
# Indexing and Selection
***

In [2]:
# indexing one dimensional array
import numpy as np

arr = np.arange(10)
print("Array:", arr)

# get the element at index 5
print("Element:", arr[5])

#Get values in a range
print("Slice:", arr[1:9:2])

Array: [0 1 2 3 4 5 6 7 8 9]
Element: 5
Slice: [1 3 5 7]


### Multiple Indices along each dimension

In [3]:
# indexing one dimensional array
import numpy as np

arr = np.arange(10)
print("Array:", arr)

# get the element at index 5
print("Element:", arr[5])

#Get values in a range
print("Slice:", arr[1:9:2])

Array: [0 1 2 3 4 5 6 7 8 9]
Element: 5
Slice: [1 3 5 7]


In [4]:
# indexing two dimensional array
arr = np.array([[1,2,3],[4,5,6],[7,8,9]])

print (arr)
print (arr[1])    # select a row
print (arr[2][2]) #[row], [column]
print (arr[2,2])  # [row, column] 

[[1 2 3]
 [4 5 6]
 [7 8 9]]
[4 5 6]
9
9


### What was the wind speed recorded at 1 AM on 1st January 2012 ?

In [5]:
# the time for which information is desired is at the second row in the data set
print(weather[1])

# wind speed is recorded in the fifth column (index 4)
print(weather[1][4])

# Alternatively
print(weather[1, [4,5]])  # recommended and correct way of doing it

[b'2012-01-01 01:00:00' b'-1.8' b'-3.7' b'87' b'4' b'8.0' b'101.24']
b'4'
[b'4' b'8.0']


### Multiple Indices along each dimension
***
In the previous example, select both the wind speed and visibility for the same time.

In [6]:
# the time for which information is desired is at the second row in the data set
print(weather[1])

# in this case, we want columns with indices 4 and 5, for the same row
print(weather[1, [4, 5]])

[b'2012-01-01 01:00:00' b'-1.8' b'-3.7' b'87' b'4' b'8.0' b'101.24']
[b'4' b'8.0']


<img src="../day03_numpy_advanced/images/icon/Technical-Stuff.png" alt="Concept-Alert" style="width: 100px;float:left; margin-right:15px"/>
<br /> 
# Slicing
***
This works very similarly to the Python list. Syntax: **`arr[start:end:step]`**

In [7]:
# Slicing one dimensional array
arr = np.arange(10)
print (arr)

print (arr[0:3])

# start from first index and get every 3rd elemnt
print (arr[1::3])

[0 1 2 3 4 5 6 7 8 9]
[0 1 2]
[1 4 7]


In [8]:
# Slicing two-dimensional array

arr = np.array([[1, 2, 3, 4, 5],
                [6, 7, 8, 9, 10],
                [11,12,13,14,15]])


# 1st row to 2nd row , all columns
print(arr[1:5:2, 1:4])

# notice that the output is also a 2d array

[[7 8 9]]


In [9]:
# Slicing matrix
arr = np.array([[1, 2, 3, 4, 5],
                [6, 7, 8, 9, 10],
                [11,12,13,14,15]])

# 0th row to 2nd row , 1st column to last column
print (arr[:3, 1:])

[[ 2  3  4  5]
 [ 7  8  9 10]
 [12 13 14 15]]


### What was the wind speed recorded at 1 AM on 1st January 2012 ?

In [10]:
# the time for which information is desired is at the second row in the data set
print(weather[1])

# wind speed is recorded in the fifth column (index 4)
print(weather[1][4])

# Alternatively
print(weather[1, [4,5]])  # recommended and correct way of doing it

[b'2012-01-01 01:00:00' b'-1.8' b'-3.7' b'87' b'4' b'8.0' b'101.24']
b'4'
[b'4' b'8.0']


### What were the temperatures, relative humidity and pressure recorded on Jan 6 ?
***
Hint: Recordings for Jan 6 are from index 120 to 143 inclusive

In [11]:
weather[120:144, [1, 3, 6]]

array([[b'-9.6', b'56', b'100.81'],
       [b'-10.0', b'55', b'100.81'],
       [b'-10.5', b'61', b'100.84'],
       [b'-10.6', b'64', b'100.76'],
       [b'-11.3', b'68', b'100.7'],
       [b'-11.8', b'71', b'100.61'],
       [b'-12.0', b'71', b'100.58'],
       [b'-14.4', b'85', b'100.52'],
       [b'-12.3', b'73', b'100.51'],
       [b'-12.5', b'71', b'100.53'],
       [b'-12.3', b'72', b'100.47'],
       [b'-12.0', b'72', b'100.36'],
       [b'-11.7', b'74', b'100.23'],
       [b'-11.9', b'74', b'100.13'],
       [b'-11.2', b'75', b'100.07'],
       [b'-11.5', b'79', b'100.06'],
       [b'-11.6', b'78', b'100.1'],
       [b'-11.2', b'78', b'100.15'],
       [b'-10.5', b'78', b'100.12'],
       [b'-10.5', b'79', b'100.13'],
       [b'-10.2', b'80', b'100.15'],
       [b'-9.5', b'79', b'100.13'],
       [b'-9.3', b'79', b'100.16'],
       [b'-9.0', b'79', b'100.15']], 
      dtype='|S20')

<img src="../day03_numpy_advanced/images/icon/Concept-Alert.png" alt="Concept-Alert" style="width: 100px;float:left; margin-right:15px"/>
<br /> 
# Filters
***
![Filter](images/filters1.jpg)
***
Anything that takes in data, processes it, and provides an output

Input Data ⟶ Filter ⟶ Output Data

## Creating a Filter

In [12]:
arr = np.array([[1,2,3],[4,5,6],[7,8,9]])

below5_filter = (arr < 5)
print(below5_filter)

[[ True  True  True]
 [ True False False]
 [False False False]]


## Using a Filter
***
A filter can simply be used in an indexing operation (instead of numbers or slices)

In [13]:
arr[below5_filter]

array([1, 2, 3, 4])

### What were the date/times when the temperature was above 32 degrees and visibility over 30 km ?


In [14]:
# select the date time column
times = weather[:,0]

# create filter: temperature over 32 degrees
temperature_over_32 = weather[:,1].astype(np.float16) > 32

# create filter: visibility above 30 km
visibility_over_30 = weather[:,-2].astype(np.float16) > 30

# get the times
times[temperature_over_32 & visibility_over_30]

array([b'2012-07-13 14:00:00', b'2012-07-14 14:00:00',
       b'2012-07-14 15:00:00', b'2012-07-14 16:00:00',
       b'2012-07-14 17:00:00'], 
      dtype='|S20')

<img src="../day03_numpy_advanced/images/icon/Technical-Stuff.png" alt="Concept-Alert" style="width: 100px;float:left; margin-right:15px"/>
<br /> 
# Replacing values
***
We can also use comparisons to replace values in an array, based on certain conditions.

In [15]:
# Replacing Values
import numpy as np

vector = np.array([5, 10, 15, 20])
print (vector)

equal_to_ten_or_five = (vector == 10) | (vector == 5)
vector[equal_to_ten_or_five] = 50

print (vector)

[ 5 10 15 20]
[50 50 15 20]


# Introduction to
***
<img src="images/pandas.png"/>
***
The most widely used Python library for data science

It has nothing to do with cute bears
***
![cute_pandas](./images/cute_pandas.jpg)

<img src="../day03_numpy_advanced/images/icon/Concept-Alert.png" alt="Concept-Alert" style="width: 100px;float:left; margin-right:15px"/>
<br /> 
## Why Pandas ?
***
<img src="images/why-pandas.jpg" width="70%"/>

<img src="../day03_numpy_advanced/images/icon/Concept-Alert.png" alt="Concept-Alert" style="width: 100px;float:left; margin-right:15px"/>
<br /> 
## Features of Pandas
***
<img src="images/pandas-features.jpg" width="70%"/>

<img src="../day03_numpy_advanced/images/icon/Concept-Alert.png" alt="Concept-Alert" style="width: 100px;float:left; margin-right:15px"/>
<br /> 
# Pandas Data Structures
***
<img src="images/pandas-datastructures.jpg" width="70%"/>

<img src="../day03_numpy_advanced/images/icon/Concept-Alert.png" alt="Concept-Alert" style="width: 100px;float:left; margin-right:15px"/>
<br /> 
# Pandas Series
***
* Very similar to a NumPy array.

* What differentiates the NumPy array from a Series, is that a Series can have axis labels, meaning it can be indexed by a label, instead of just a number location.

<img src="../day03_numpy_advanced/images/icon/Technical-Stuff.png" alt="Concept-Alert" style="width: 100px;float:left; margin-right:15px"/>
<br /> 
## Creating a Series
***
You can convert a list, numpy array, or dictionary to a Series.

<div class="alert alert-block alert-success">**From a List / NumPy array (without specifying an index)**</div>


In [16]:
import pandas as pd

my_list = [10, 20, 30]
series = pd.Series(my_list)

print(series)
print(series.index)
print(series.values)

0    10
1    20
2    30
dtype: int64
RangeIndex(start=0, stop=3, step=1)
[10 20 30]


<div class="alert alert-block alert-success">**From a List / NumPy array (with specifying an index)**</div>


In [17]:
# creating a series from numPy Array
import numpy as np
import pandas as pd

index = ['a','b','c']
arr = np.array([10,20,30])

pd.Series(data=arr,index=index)

a    10
b    20
c    30
dtype: int64

<div class="alert alert-block alert-success">**From a dictionary**</div>


In [18]:
# creating a series from dictionary
import pandas as pd

d = {'a':10, 'b':20, 'c':30}
pd.Series(d)

a    10
b    20
c    30
dtype: int64

<img src="../day03_numpy_advanced/images/icon/Concept-Alert.png" alt="Concept-Alert" style="width: 100px;float:left; margin-right:15px"/>
<br /> 
## Using Index in a Series
***
* The key to using a Series is understanding its index.

* Pandas makes use of these index names or numbers by allowing for **fast lookups** of information (works like a hash table or dictionary).


In [19]:
# Custom index
import pandas as pd
ser1 = pd.Series([1,2,3,4], index=['USA', 'Germany','USSR', 'Japan']) 
ser2 = pd.Series([1,2,5,4], index=['USA', 'Germany','Italy', 'Japan'])   

# get the value of 'USA'
print(ser1['USA'])

1


In [20]:
print(ser1 + ser2)

Germany    4.0
Italy      NaN
Japan      8.0
USA        2.0
USSR       NaN
dtype: float64


### Question: What were the pressure values recorded on Jan 6 ?
***
Last time we did this, we knew that the recordings for Jan 6 were from index 120 to 143 inclusive.
The solution was:

    weather[120:144, 6]

Can we do it without that knowledge (of indices) this time?

<div class="alert alert-block alert-success">**Solution 1: What were the first 5 pressure values recorded on Jan 6 ?**</div>


In [31]:
weather_series[indices[0]]

KeyError: '2012-01-06 00:00:00'

In [32]:
index = weather[:, 0]
pressure_values = weather[:, -1].astype(np.float16)
weather_series = pd.Series(pressure_values, index)

indices = ['2012-01-06 {:02d}:00:00'.format(i) for i in range(5)]
print(indices)

print('-' * 40)

jan6_pressures = [weather_series[ind] for ind in indices]
print(jan6_pressures)

['2012-01-06 00:00:00', '2012-01-06 01:00:00', '2012-01-06 02:00:00', '2012-01-06 03:00:00', '2012-01-06 04:00:00']
----------------------------------------


KeyError: '2012-01-06 00:00:00'

<div class="alert alert-block alert-success">**Solution 2: What were the first 5 pressure values recorded on Jan 6 ?**</div>


***
Surely there must be a better way than using loops and falling back to Python lists?

In [33]:
index = weather[:, 0]
pressure_values = weather[:, -1].astype(np.float16)
weather_series = pd.Series(pressure_values, index)


jan6_filter = weather_series.index.str.startswith('2012-01-06')
print(jan6_filter)

print('-' * 40)

jan6_pressures = weather_series[jan6_filter]
print(jan6_pressures[:5])

AttributeError: Can only use .str accessor with string values (i.e. inferred_type is 'string', 'unicode' or 'mixed')

<div class="alert alert-block alert-success">**Solution 3: What were the first 5 pressure values recorded on Jan 6 ?**</div>

***
That was definitely better. Can we do even better?

In [34]:
# NOTE this step!
index = pd.DatetimeIndex(weather[:, 0])

pressure_values = weather[:, -1].astype(np.float16)
weather_series = pd.Series(pressure_values, index)

weather_series['2012-01-06'][:5]

TypeError: <class 'bytes'> is not convertible to datetime

# Further Reading
***
- NumPy documentation: http://www.numpy.org/
- Pandas documentation: http://pandas.pydata.org/


<img src="./images/icon/Recap.png" alt="Recap" style="width: 100px;float:left; margin-right:15px"/>
<br />
# In-session Recap Time
***
- Indexing and Selection
- Slicing
- Filters
- Replacing Values
* Pandas
    * Data Structures in Pandas
    * Features of Pandas
    * Series

<img src="./images/icon/quiz.png" alt="Quiz" style="width: 100px;float:left; margin-right:15px"/>
<br />
# That time of the day again - Quiz Time
***
### [Click Here to get started](http://www.google.com)



<img src="./images/icon/Projects.png" alt="Recap" style="width: 100px;float:left; margin-right:15px"/>
<br />

# Let's get hands-on 
***
Great! I think we are now thorough with the concepts of Pandas. Let's solve some assignments to get a good hang of it!!

### Why I should do these assignments?
***
* You should spend some more time understanding these methods in depth as they are some of the most frequently used concepts in data-science.


### [Click Here to get started](http://www.google.com)


# Glossary/ Cheat-sheet 
***
* Terms
* Formula
* Important Python Syntax
* Definition

# Coming up next...
***
- Pandas Advanced: DataFrame
- Visualizing data

# Thank You
***
### Next Session: Data Frames in Pandas & Data Visualization
For more queries - Reach out to academics@greyatom.com 