## PBQ-02 Tutorial Session

### Author: Jay Parmar

#### Created on: 22/08/2020

### Today's Agenda

- Loading stock data in pandas
    - `pd.read_csv()`
    - `df.head()`
    - `df.tail()`
    - `df.set_index()`
    - `pd.to_datetime()`
    - `df.info()`
    - `df.index`
- Extracting rows and columns from a dataframe
    - `[]`
    - `df.loc[]`
    - `df.iloc[]`
- Modifying data in a dataframe
    - `df.pct_change()`
    - `df.value_counts()`
    - `np.where()`
    - `del()`
    - `pd.merge()`
- Plotting Candlestick chart using Plotly

In [None]:
# Importing necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from datetime import date
import warnings

%matplotlib inline
warnings.filterwarnings('ignore')

# Directing pandas not to use scientific notation while displaying float values
# pd.set_option('display.float_format', lambda x: '%.5' % x)

# To reset the pandas option
# pd.reset_option('display.float_format')

## Accessing Stock Data

#### Method 1

```python
data = pd.read_csv('AAPL_Daily_Data.csv') # Read local data

data.index # Check index

data = data.set_index(data['Date']) # Set index

data.set_index(data['Date'], inplace = True) # Set index

data.index = pd.to_datetime(data.index) # Convert index to datetime

data.info() # Print dataframe info
```

In [None]:
# Example 1
# Reading data from a local source


In [None]:
# Verify data head


In [None]:
# Verify data tail


In [None]:
# Check info


In [None]:
# Check index


In [None]:
# Set Index - Method 1


In [None]:
# Check index


In [None]:
# OR
# Set Index using inplace - Method 2


In [None]:
# Check the index and its type


In [None]:
# Convert the index to datetime index


In [None]:
# Check the index and its type


In [None]:
# Verify the dataframe info


#### List the steps
1. Read data
2. Set the index
3. Convert index to datetime

#### Method 2

```python
data = pd.read_csv('AAPL_Daily_Data.csv', index_col=0, parse_dates=True) # Read local data
```

In [None]:
# Example 2 - Recommended Method
# Read the stock stock and set the index


In [None]:
# Verify the data head


In [None]:
# Check the info


##### We answered the following questions:

- How to read CSV file?
- How to verify whether the data is loaded correctly or not?
- How to set date column as an index?
- How to read parse dates?

We learnt/revised the following methods and attributes while answering the above questions:

- `pd.read_csv()`
- `df.head()`
- `df.tail()`
- `df.info()`
- `pd.to_datetime()`
- `df.set_index()`
- `df.index`

## Extracting the data

##### Extracting Columns using `[]`

```python
data['Close'] # Extract a single column

data[['Close', 'Adj Close']].head() # Extract multiple columns

data[['Close', 'Adj Close']].plot(figsize=(10, 6)) # Plot extracted columns
plt.show()
```

In [None]:
# Example 3
# Extracting a single column


In [None]:
# Checking its type


In [None]:
# Example 4
# Extracting multiple columns


In [None]:
# Checking its type


In [None]:
# Plotting columns directly from a dataframe


##### Extracting columns using `.loc[]` operator

The syntax for `.loc[]` operator is `data.loc[row label, column label]`.

```python
data.loc[:, 'Close'] # Extract single column

data.loc[:, ['Close', 'Adj Close']] # Extract multiple columns
```

In [None]:
# Example 5
# Extracting a single column


In [None]:
# Example 6
# Extracting multiple columns


##### Extracting rows using `.loc[]`

```python
data.loc['2019-12-20'] # Extract a single row

data.loc['2019-12-20':'2019-12-31'] # Extract multiple rows

data.loc['2019-12'] # Extract rows for a month

data['2012'] # Extract rows for a year
```

In [None]:
# Example 7
# Extracting a single row


In [None]:
# Example 8
# Extracting multiple rows


In [None]:
# Example 9
# Extracting a single month data from the dataframe


In [None]:
# Example 10
# Extracting a single year data from the dataframe


##### Extracting rows and columns together using `.loc[]`

```python
data.loc['2019-08':'2020-04', 'Open':'Close'] # Extract OHLC data from Aug 2019 to April 2020 
```

In [None]:
# Example 11
# Extracting a subset of rows and columns


##### Extracting columns using `.iloc[]` operator

The syntax for `.iloc[]` operator is `data.iloc[row index/es, column index/es]`

```python
data.iloc[:, 4] # Extract a single column

data.iloc[:, [4, 5]] # Extract multiple columns

data.iloc[0] # Extract the first row

data.iloc[-1] # Extract the last row

data.iloc[10] # Extract a single row

data.iloc[10:20] # Extract multiple rows
```

In [None]:
# Example 12
# Extracting a single column


In [None]:
# Example 13
# Extracting multiple columns


##### Extracting rows using `.iloc[]`

In [None]:
# Example 14
# Extracting the first row of a dataframe


In [None]:
# Extracting the last row of a dataframe


In [None]:
# Extracting a single row


In [None]:
# Example 15
# Extracting multiple rows


##### Extracting rows and columns together using `.loc[]`

```python
data.iloc[100:200, 0:4] # Extract rows and columns from the dataframe
```

In [None]:
# Example 16
# Extracting a subset of rows and columns


##### We answered the following questions:

- How to extract a column from a dataframe? - *Using `[]`, `.loc[]`, and `.iloc[]`*
- How to extract multiple columns from a dataframe? - *Using `[]`, `.loc[]`, and `.iloc[]`*
- How to extract a row from a dataframe? - *Using `.loc[]`, and `.iloc[]`*
- How to extract multiple rows from a dataframe? - *Using `.loc[]`, and `.iloc[]`*
- How to extract multiple rows and columns together? - *Using `.loc[]`, and `.iloc[]`*

We learnt/revised the following methods and attributes while answering the above questions:

- `df.iloc[]`
- `df.loc[]`

## Modifying a dataframe

##### Adding new columns

In [None]:
# Example 17
# Adding a new column to the dataframe and assigning a static value


In [None]:
# Counting the values in a column - Using value counts


In [None]:
# Example 18
# Adding a new column to the dataframe and assigning dynamic value

# E-18A - Calculate percentage returns
# E-18B - Convert prices to log prices
# E-18C - Generate log returns


In [None]:
# Example 19
# Adding a new column to the dataframe and assigning values based on a logical condition
data['movement'] = np.where(data['percentage_change'] > 0, 1, 
                            np.where(data['percentage_change'] < 0, -1, 0))

In [None]:
# Counting the values in a column
data['movement'].value_counts()

In [None]:
# Plotting it
data['movement'][:100].plot(figsize=(12, 4))

##### Removing columns

```python
del(data['returns'])

data.drop(columns=['percentage_change', 'movement'], inplace=True)
```

In [None]:
# Example 20
# Dropping a column using the built-in del() method


In [None]:
# Check the updated dataframes


In [None]:
# Example 21
# Dropping columns using the drop() method


In [None]:
# Check the updated dataframes


##### Filtering a dataframe using `.loc[]`

```python
data.loc[data['Close'] > 250] # Filter a dataframe

data.loc[(data['Close'] > 200) & (data['Close'] < 250)] # Filter a dataframe using multiple conditions
```

In [None]:
# Example 22
# Filtering the dataframe for all close prices greater than 250


In [None]:
# Example 23
# Filtering the dataframe using multiple conditions


##### Filtering the dataframe using `.iloc[]` and `np.where()`

```python
data.iloc[np.where(data['Close'] > 250)] # Filter all rows where close price is greater than 250
```

In [None]:
# Example 24
# Filtering the dataframe for all close prices greater than 250


##### Merging two dataframes

```python
tcs_data = pd.read_csv('TCS_Data.csv', index_col=0, parse_dates=True) # Read TCS data
infy_data = pd.read_csv('INFY_Data.csv', index_col=0, parse_dates=True) # Read INFY data

merged_data = pd.merge(tcs_data['Close'], infy_data['Close'], on=tcs_data.index, suffixes=('_TCS', '_INFY')) # Merge dataframes
```

In [None]:
# Example 25
# Reading data
tcs_data = pd.read_csv('TCS_Data.csv', index_col=0, parse_dates=True)
infy_data = pd.read_csv('INFY_Data.csv', index_col=0, parse_dates=True)

In [None]:
# Check TCS dataframe


In [None]:
# Check INFY dataframe


In [None]:
# Merging both dataframes based on dates
merged_data = pd.merge(tcs_data['Close'], infy_data['Close'], on=tcs_data.index, suffixes=('_TCS', '_INFY'))

In [None]:
# Check merged dataframe


##### We answered the following questions:

- How to add new columns? - Both with *static* and *dynamic* values.
- How to drop columns? - Using `del()` and `.drop()` methods.
- How to filtering columns? - Using `.loc[]` and `.iloc[]` operators.
- How to count values in a given column?
- How to merge two dataframes? - Using `pd.merge()` method.

We learnt/revised the following methods and attributes while answering the above questions:

- `df.pct_change()`
- `df.value_counts()`
- `np.where()`
- `del()`
- `pd.merge()`

##### Home exercises:

- Sorting a dataframe using `df.sort_values()`.
- Renaming dataframe columns using `df.rename()`.
- Appending a new row using `df.append()`.
- Dropping dataframe rows using `df.drop()`.

### Plotting Candlestick Charts using Plotly

In [None]:
# To install plotly library
# !pip install plotly

In [None]:
# Example 26
# Import plotly library
import plotly.graph_objs as go

In [None]:
# Create chart object
chart_data = [go.Candlestick(x=data.index[:100], 
                             open=data['Open'][:100], 
                             high=data['High'][:100], 
                             low=data['Low'][:100],
                             close=data['Close'][:100])]

# Load chart data
fig = go.Figure(data=chart_data)

# Update chart layout
fig.update_layout(xaxis_rangeslider_visible=False, xaxis_showticklabels=True, yaxis_showticklabels=True)

# Plot chart
fig.show()