# Data Retrieval
## Understanding JSON Format

JSON (JavaScript Object Notation) is a lightweight data interchange format that is easy for humans to read and write. It is also easy for machines to parse and generate. JSON is widely used for representing structured data and exchanging information between a server and a web application, as well as between different programming languages.

### JSON Syntax

JSON data is represented as key-value pairs, similar to Python dictionaries. The basic syntax includes objects, arrays, strings, numbers, booleans, and null.

#### Objects

An object is an unordered collection of key-value pairs, enclosed in curly braces `{}`.


``` json
{
  "name": "John Doe",
  "age": 30,
  "city": "New York"
}
```

#### Arrays

An array is an ordered list of values, enclosed in square brackets `[]`.


``` json
{
  "fruits": ["apple", "banana", "orange"]
}
```

#### Strings, Numbers, Booleans, and Null


``` json
{
  "name": "Alice",
  "age": 25,
  "isStudent": true,
  "score": null
}


## HTTP Requests and Methods

HTTP (Hypertext Transfer Protocol) is the protocol used for communication on the World Wide Web. It defines a set of rules for how messages are formatted and transmitted, and it involves requests and responses between clients (such as web browsers) and servers.

### HTTP Methods

HTTP supports various methods, also known as verbs, that define the action to be performed on a resource. The common HTTP methods include:

#### 1. GET

The GET method is primarily used for retrieving data from a specified resource. It is like asking the server to "get" or "give" you information. When you enter a URL in your browser or click on a link, the browser sends a GET request to the server, asking for the content of that page or resource.

#### 2. POST

The POST method is used to submit data to be processed to a specified resource. It is often used for actions that create new resources on the server, such as submitting a form or uploading a file.

#### 3. PUT

The PUT method is used to update a resource or create a new resource if it does not exist. It typically requires sending the entire representation of the resource in the request body.

#### 4. DELETE

The DELETE method is used to request the removal of a resource. It is used to delete the resource identified by the URI.

#### 5. PATCH

The PATCH method is used to apply partial modifications to a resource, updating only specific fields without affecting others.


## API Calls and Interactions in Finance

API (Application Programming Interface) allows different software applications to communicate with each other. API calls are requests made by one software component to another, often to retrieve or manipulate financial data. In the context of financial systems, APIs play a crucial role in accessing stock market information, financial analytics, and other relevant data.

### Making API Calls for Finance Data

When making API calls for finance data, the client sends a request to a specific endpoint on the financial server, specifying the desired financial operation and any necessary parameters. The structure of an API call typically includes the following components:

#### 1. Endpoint

The endpoint is the specific URL or URI that represents the target resource or operation on the server. In finance APIs, endpoints may represent actions like retrieving stock quotes, updating portfolios, or executing trades. Here are examples of endpoints:

- **Retrieve Stock Quotes**:
    ```http
    GET /api/stock/quotes?symbol=<symbol>
    ```

- **Update Portfolio**:
    ```http
    POST /api/portfolio/update
    ```

- **Execute Trade**:
    ```http
    PUT /api/trade/execute?symbol=<symbol>&quantity=<quantity>&action=<action>
    ```

- **Cancel Order**:
    ```http
    DELETE /api/order?order_id=<order_id>
    ```

5. **GET Request for Order data**:
    ```http
    GET /api/order?order_id=<order_id>
    ```

#### 2. HTTP Method

The HTTP method defines the type of operation the client wants to perform on the server. Common HTTP methods include:

- **GET**: Retrieve data
- **POST**: Submit data for processing
- **PUT**: Update a resource or create a new one
- **DELETE**: Remove a resource
- **PATCH**: Apply partial modifications to a resource

#### 3. Parameters

Parameters are key-value pairs included in the API call to provide additional information or context. In finance APIs, parameters might include:

- **Symbol**: The stock symbol for retrieving specific stock information.
- **Quantity**: The number of shares to be traded.
- **Action**: The type of trade action, such as "buy" or "sell."
- **Order ID**: The unique identifier of a trade order for cancellation.

#### Examples of API Calls for Finance Data

1. **GET Request for Stock Prices**:
    ```http
    GET /api/stock/quotes?symbol=AAPL
    ```

2. **POST Request for Portfolio Updates**:
    ```http
    POST /api/portfolio/update
    ```

3. **PUT Request for Trade Execution**:
    ```http
    PUT /api/trade/execute?symbol=GOOGL&quantity=10&action=buy
    ```

4. **DELETE Request for Order Cancellation**:
    ```http
    DELETE /api/order?order_id=12345
    ```

5. **GET Request for Order data**:
    ```http
    GET /api/order?order_id=12345
    ```

### Response codes
HTTP response codes are used to indicate success, failure, and other properties about the result of an HTTP request. Regardless of the contents of an HTTP response message body, a client will act according to the response status code.

- 1xx: Informational
- 2xx: Success
- 3xx: Redirection
- 4xx: Client error
- 5xx: Server error

https://http.dev/status
### Authentication in Finance APIs

Financial APIs often require authentication to ensure secure access to sensitive information. Common authentication methods include API keys, OAuth tokens, or other secure credentials.

### Handling API Responses in Finance

API responses in finance typically include a status code indicating the success or failure of the request. The response body may contain financial data, such as stock prices, portfolio details, or analytics, often formatted in JSON or another standardized format.

When working with finance APIs, it's essential to refer to the API documentation for specific details on available endpoints, required parameters, and expected response formats.

#### Making API calls

In [None]:
import requests

def make_api_call(base_url, endpoint="", method="GET", **kwargs):
    # Construct the full URL
    full_url = f'{base_url}{endpoint}'

    # Make the API call
    response = requests.request(method=method, url=full_url, **kwargs)
    
    # Check if the request was successful (status code 200)
    if response.status_code == 200:
        return response
    else:
        # If the request was not successful, raise an exception with the error message
        raise Exception(f'API request failed with status code {response.status_code}: {response.text}')


## Cryptocurrency
### Binance
https://binance-docs.github.io/apidocs/futures/en/#change-log

In [None]:
# first get the base url of the API
base_url = 'https://fapi.binance.com'

Check Server Time

In [None]:
# method
method = 'GET'

# endpoint
endpoint = '/fapi/v1/time'

# make the call
response = make_api_call(base_url, endpoint, method).json()
response

#### What is epoch time or Unix Timestamp?
A timestamp is a representation of a specific point in time, typically expressed as the number of seconds, milliseconds, or microseconds that have elapsed since a predefined reference point known as the "epoch.

The Unix epoch (or Unix time or POSIX time or Unix timestamp) is the number of seconds that have elapsed since January 1, 1970

Key Characteristics:
- Universal Reference: Timestamps provide a standardized way of expressing time across different systems and locations. The reference point (epoch) is often set to a significant historical or technological event.

- Granularity: Timestamps can have varying levels of granularity, ranging from seconds to milliseconds or even microseconds, depending on the precision required for a specific application.

- Precision and Accuracy: The precision of a timestamp determines how finely it can represent time intervals, while accuracy reflects how closely the timestamp aligns with the actual time it represents.

Common Usage:
- Data Logging: Timestamps are widely used in data logging to record when events occur, helping analyze patterns, durations, and chronological sequences.

- Synchronization: Timestamps are essential for synchronizing activities across distributed systems, ensuring that events are coordinated based on a common temporal reference.

https://www.epochconverter.com/

In [None]:
import datetime
import pytz

# extract server timestamp
server_time = response['serverTime']

# convert timestamp to regular date
date = datetime.datetime.utcfromtimestamp(int(server_time) / 1000.0)
date_string = date.strftime("%Y-%m-%d %H:%M:%S %Z")
print(f'server_time as timestamp: {int(server_time) / 1000.0}, date: {date_string}')

# convert timestamp to Asia/Jerusalem
utc_date = datetime.datetime.utcfromtimestamp(int(server_time) / 1000.0)
jer_tz= pytz.timezone('Asia/Jerusalem')
date_in_jer= utc_date.replace(tzinfo=pytz.utc).astimezone(jer_tz)
date_string_jer = date_in_jer.strftime("%Y-%m-%d %H:%M:%S %Z")
print(f'server_time as timestamp: {int(server_time) / 1000.0}, date in Asia/Jerusalem: {date_string_jer}')

# move one week earlier
week_before = date - datetime.timedelta(weeks=1)
week_before_date_string = week_before.strftime("%Y-%m-%d %H:%M:%S UTC")
print(f'One week before as timestamp {week_before.timestamp()}, date: {week_before_date_string}')

Get historical data

https://binance-docs.github.io/apidocs/futures/en/#kline-candlestick-data


In [None]:
# specific endpoint and method
method = 'GET'
endpoint = '/fapi/v1/klines'

# parameters
params = {
    'symbol': 'BTCUSDT',# Mandatory
    'interval': '1h',# Mandatory
    # 'startTime': ,
    # 'endTime': ,
    'limit': 5 # default 500
}

# make call
response = make_api_call(base_url, endpoint, method, params=params)
response.json()

Wrapping the response as a pandas df

In [None]:
import pandas as pd

# Define the columns
columns = ['open_time', 'open', 'high', 'low', 'close', 'volume', 'close_time', 'quote_asset_volume',
               'number_of_trades', 'taker_buy_base_asset_volume', 'taker_buy_quote_asset_volume', 'ignore']

# Define each column dtype to prevent auto inference
dtype={
    'open_time': 'datetime64[ms, Asia/Jerusalem]',
    'open': 'float64',
    'high': 'float64',
    'low': 'float64',
    'close': 'float64',
    'volume': 'float64',
    'close_time': 'datetime64[ms, Asia/Jerusalem]',
    'quote_asset_volume': 'float64',
    'number_of_trades': 'int64',
    'taker_buy_base_asset_volume': 'float64',
    'taker_buy_quote_asset_volume': 'float64',
    'ignore': 'float64'
}

btcusdt_df = pd.DataFrame(response.json(), columns=columns)
btcusdt_df = btcusdt_df.astype(dtype)
btcusdt_df

Aggregate calls to overcome limitations

In [None]:
def get_binance_historical_data(symbol, interval, start_date):
    
    # define basic parameters for call
    base_url = 'https://fapi.binance.com'
    endpoint = '/fapi/v1/klines'
    method = 'GET'
    
    # Set the start time parameter in the params dictionary
    params = {
        'symbol': symbol,
        'interval': interval,
        'limit': 1500,
        'startTime': start_date # Start time in milliseconds
    }


    # Make initial API call to get candles
    response = make_api_call(base_url, endpoint=endpoint, method=method, params=params)

    candles_data = []

    while len(response.json()) > 0:
        # Append the received candles to the list
        candles_data.extend(response.json())

        # Update the start time for the next API call
        params['startTime'] = candles_data[-1][0] + 1 # last candle open_time + 1ms

        # Make the next API call
        response = make_api_call(base_url, endpoint=endpoint, method=method, params=params)

    
    # Wrap the candles data as a pandas DataFrame
    columns = ['open_time', 'open', 'high', 'low', 'close', 'volume', 'close_time', 'quote_asset_volume',
               'number_of_trades', 'taker_buy_base_asset_volume', 'taker_buy_quote_asset_volume', 'ignore']
    dtype={
    'open_time': 'datetime64[ms, Asia/Jerusalem]',
    'open': 'float64',
    'high': 'float64',
    'low': 'float64',
    'close': 'float64',
    'volume': 'float64',
    'close_time': 'datetime64[ms, Asia/Jerusalem]',
    'quote_asset_volume': 'float64',
    'number_of_trades': 'int64',
    'taker_buy_base_asset_volume': 'float64',
    'taker_buy_quote_asset_volume': 'float64',
    'ignore': 'float64'
    }
    
    df = pd.DataFrame(candles_data, columns=columns)
    df = df.astype(dtype)

    return df

Example usage

In [None]:
from datetime import datetime

# Example usage:
symbol = 'BTCUSDT'
interval = '1h'
start_date = int(datetime(year=2024, month=1, day=1).timestamp() * 1000)

btcusdt_df = get_binance_historical_data(symbol, interval, start_date)
btcusdt_df

## Stocks
### yfinance
yfinance is a Python library that provides a convenient interface to interact with financial data from Yahoo Finance. Yahoo Finance is a popular financial news and data platform that offers a wide range of financial information, including stock quotes, historical prices, financial statements, and more.

https://github.com/ranaroussi/yfinance

Get historical data

In [None]:
# https://github.com/ranaroussi/yfinance/wiki/Ticker
import yfinance as yf

aapl_ticker = yf.Ticker('AAPL')
aapl_df = aapl_ticker.history(interval='1h', repair=True, period='3mo')
aapl_df

## What to do when we can't find an api documentation?

### Postman - API platform for developers

https://www.postman.com/

<div align="center">
  <blockquote style="border-left: 2px solid #6b7c93; padding: 10px; background-color: #2d2d2d; color: #ffffff; margin: 0; font-style: italic;">
    “I choose a lazy person to do a hard job. Because a lazy person will find an easy way to do it.”
    ― Bill Gates
  </blockquote>
</div>


### TASE - Tel Aviv Stock Exchange

https://market.tase.co.il/en/market_data

Get TA-35 historical data

In [None]:
import requests

url = "https://api.tase.co.il/api/index/historyeod"

payload = '{"pType":"8","dFrom":"2023-01-01","dTo":"2024-01-18","TotalRec":1,"pageNum":1,"oId":"142","lang":"1"}'
headers = {
  'authority': 'api.tase.co.il',
  'accept': 'application/json, text/plain, */*',
  'accept-language': 'en-US',
  'content-type': 'application/json;charset=UTF-8',
  'origin': 'https://market.tase.co.il',
  'referer': 'https://market.tase.co.il/',
  'sec-ch-ua': '"Not_A Brand";v="8", "Chromium";v="120", "Google Chrome";v="120"',
  'sec-ch-ua-mobile': '?0',
  'sec-ch-ua-platform': '"Windows"',
  'sec-fetch-dest': 'empty',
  'sec-fetch-mode': 'cors',
  'sec-fetch-site': 'same-site',
  'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
  'Cookie': 'TS017a52ce=01fbbcea6c105e8383a5ca4c6b5cc98869d80a835a1484f29da700b7f4ca4d4602d4d6afd905760df07ef328d91891f36f0adff69ef9e2e81ee0d9c54363cf17b29df08510; dtCookie=v_4_srv_1_sn_5CE258388430EED645B50B927FCF4E0C_perc_100000_ol_0_mul_1_app-3Aea7c4b59f27d43eb_1; incap_ses_1463_1706181=ziFxKtS0B0t6v0xLfJ9NFCfyqGUAAAAAisQwEZnymJ9MO/G6mU1gQA==; nlbi_1706181=jv1IPgAvF3Ilc39KTu8iRwAAAACoqtdeygqayNOAGbHFMP2f; TS0187b1a4=01fbbcea6c04c591c4845879f0a9e0da1fedff3be930b6aafb56e40eafdb5c3d5567a58e51603d48c44b80fb2cdf433cd49d83297d'
}

response = requests.request("POST", url, headers=headers, data=payload)

print(response.text)


Get as json

In [None]:
response.json()

Understand data structure and extract relevant data

In [None]:
historical_data = response.json()['Items']
historical_data

Wrap the data as a pandas df

In [None]:
columns = ['TradeDate', 'OpenRate', 'CloseRate', 'HighRate', 'LowRate', 'MarketValue']
dtype={
'TradeDate': 'datetime64[ms]',
'OpenRate': 'float64',
'CloseRate': 'float64',
'HighRate': 'float64',
'LowRate': 'float64',
'MarketValue': 'float64',
}

df = pd.DataFrame(historical_data, columns=columns)
df = df.astype(dtype)
df

Wrap as a function

In [None]:
def get_tase_historical_idx_data(from_date: str, to_date: str, oId: str = '142') -> pd.DataFrame:
    """
    Retrieves historical index data from the Tel Aviv Stock Exchange (TASE) API.

    Parameters:
    - from_date (str): Start date in the format 'yyyy-mm-dd'.
    - to_date (str): End date in the format 'yyyy-mm-dd'.
    - oId (str): required idx id. default 142 (TA-35)

    Returns:
    dict: A dictionary containing the API response.
    """
    
    # define basic parameters for call
    url = "https://api.tase.co.il/api/index/historyeod"
    method = 'POST'
    headers = {
        'authority': 'api.tase.co.il',
        'accept': 'application/json, text/plain, */*',
        'accept-language': 'en-US',
        'content-type': 'application/json;charset=UTF-8',
        'origin': 'https://market.tase.co.il',
        'referer': 'https://market.tase.co.il/',
        'sec-ch-ua': '"Not_A Brand";v="8", "Chromium";v="120", "Google Chrome";v="120"',
        'sec-ch-ua-mobile': '?0',
        'sec-ch-ua-platform': '"Windows"',
        'sec-fetch-dest': 'empty',
        'sec-fetch-mode': 'cors',
        'sec-fetch-site': 'same-site',
        'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
    }
    
    # Set the first page payload
    page_num = 1
    payload = f'{{"pType":"8","dFrom":"{from_date}","dTo":"{to_date}","TotalRec":1,"pageNum":{page_num},"oId":"142","lang":"1"}}'

    # Make initial API call to get candles
    response = make_api_call(base_url=url, method=method, headers=headers, data=payload)
    items = response.json()['Items']
    candles_data = []

    while len(items) > 0:
        # Append the received candles to the list
        candles_data.extend(items)

        # Update the start time for the next API call
        page_num += 1
        payload = f'{{"pType":"8","dFrom":"{from_date}","dTo":"{to_date}","TotalRec":1,"pageNum":{page_num},"oId":"{oId}","lang":"1"}}'

        # Make the next API call
        response = make_api_call(base_url=url, method=method, headers=headers, data=payload)
        
        items = response.json()['Items']

    
    # Convert the candles data list to a pandas DataFrame
    columns = ['TradeDate', 'OpenRate', 'CloseRate', 'HighRate', 'LowRate', 'MarketValue']
    dtype={
    'TradeDate': 'datetime64[ns]',
    'OpenRate': 'float64',
    'CloseRate': 'float64',
    'HighRate': 'float64',
    'LowRate': 'float64',
    'MarketValue': 'float64',
    }
    
    df = pd.DataFrame(candles_data, columns=columns)
    # default format is '%m/%d/%Y' but in Israel its different so we need to specify it.
    df['TradeDate'] = pd.to_datetime(df['TradeDate'], format='%d/%m/%Y')
    df = df.astype(dtype)

    return df

ta35_df = get_tase_historical_idx_data('2023-01-01', '2024-01-01')
ta35_df

Get a specific stock historical data

In [None]:
def get_tase_historical_stock_data(from_date: str, to_date: str, oId: str) -> pd.DataFrame:
    """
    Retrieves historical stock data from the Tel Aviv Stock Exchange (TASE) API.

    Parameters:
    - from_date (str): Start date in the format 'yyyy-mm-dd'.
    - to_date (str): End date in the format 'yyyy-mm-dd'.
    - oId (str): required stock id.

    Returns:
    dict: A dictionary containing the API response.
    """
    
    # define basic parameters for call
    url = "https://api.tase.co.il/api/security/historyeod"
    method = 'POST'
    headers = {
        'authority': 'api.tase.co.il',
        'accept': 'application/json, text/plain, */*',
        'accept-language': 'en-US',
        'content-type': 'application/json;charset=UTF-8',
        'origin': 'https://market.tase.co.il',
        'referer': 'https://market.tase.co.il/',
        'sec-ch-ua': '"Not_A Brand";v="8", "Chromium";v="120", "Google Chrome";v="120"',
        'sec-ch-ua-mobile': '?0',
        'sec-ch-ua-platform': '"Windows"',
        'sec-fetch-dest': 'empty',
        'sec-fetch-mode': 'cors',
        'sec-fetch-site': 'same-site',
        'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
    }
    
    # Set the first page payload
    page_num = 1
    payload = f'{{"pType":"8","dFrom":"{from_date}","dTo":"{to_date}","TotalRec":1,"pageNum":{page_num},"oId":"{oId}","lang":"1"}}'

    # Make initial API call to get candles
    response = make_api_call(base_url=url, method=method, headers=headers, data=payload)
    items = response.json()['Items']
    candles_data = []

    while len(items) > 0:
        # Append the received candles to the list
        candles_data.extend(items)

        # Update the start time for the next API call
        page_num += 1
        payload = f'{{"pType":"8","dFrom":"{from_date}","dTo":"{to_date}","TotalRec":1,"pageNum":{page_num},"oId":"{oId}","lang":"1"}}'

        # Make the next API call
        response = make_api_call(base_url=url, method=method, headers=headers, data=payload)
        
        items = response.json()['Items']

    
    # Convert the candles data list to a pandas DataFrame
    columns = ['TradeDate', 'OpenRate', 'CloseRate', 'HighRate', 'LowRate', 'MarketValue', 'TurnOverValueShekel', 'OverallTurnOverUnits', 'DealsNo']
    dtype={
    'TradeDate': 'datetime64[ns]',
    'OpenRate': 'float64',
    'CloseRate': 'float64',
    'HighRate': 'float64',
    'LowRate': 'float64',
    'MarketValue': 'float64',
    'TurnOverValueShekel': 'float64',
    'OverallTurnOverUnits': 'float64',
    'DealsNo': 'int64',
    }
    
    candles_data = [{k: v for k,v in candle.items() if k in columns} for candle in candles_data]
    
    df = pd.DataFrame(candles_data, columns=columns)
    # default format is '%m/%d/%Y' but in Israel its different so we need to specify it.
    df['TradeDate'] = pd.to_datetime(df['TradeDate'], format='%d/%m/%Y')
    df = df.astype(dtype)

    return df

poli_oId = '00662577'
poli_df = get_tase_historical_stock_data('2023-01-01', '2024-01-01', oId=poli_oId)
poli_df

## Options

https://www.optionsdx.com/product-category/option-chains/

Get historical data

In [None]:
qqq_q1_options_df = pd.read_csv('options_data/QQQ/qqq_eod_202301.csv')
qqq_q1_options_df.dtypes

Rename the column names

In [None]:
# fix column names by removing [, ], " " characters
qqq_q1_options_df.columns = qqq_q1_options_df.columns.str.strip('[] ')

# cast date columns to datetime
qqq_q1_options_df['EXPIRE_DATE'] = pd.to_datetime(qqq_q1_options_df['EXPIRE_DATE'])
qqq_q1_options_df['QUOTE_DATE'] = pd.to_datetime(qqq_q1_options_df['QUOTE_DATE'])
qqq_q1_options_df['QUOTE_READTIME'] = pd.to_datetime(qqq_q1_options_df['QUOTE_READTIME'])
qqq_q1_options_df.dtypes

Have a look at the data

In [None]:
qqq_q1_options_df[['QUOTE_DATE', 'EXPIRE_DATE','STRIKE']].sample(10)

In [None]:
# group by strike and expiration date
qqq_q1_options_df.groupby(['EXPIRE_DATE','STRIKE'])['C_LAST'].describe()

Aggregate over multiple files

In [None]:
import os

folder_path = 'options_data\QQQ'

options_dfs = []

for filename in os.listdir(folder_path):
    if ('2022' not in filename) and ('2023' not in filename):
        continue
    file_path = os.path.join(folder_path, filename)
     
    options_df = pd.read_csv(file_path, low_memory=False)
    
    options_dfs.append(options_df)

options_df = pd.concat(options_dfs, ignore_index=True)  
  
# fix column names by removing [, ], " " characters
options_df.columns = options_df.columns.str.strip('[] ')

# Define data types for all columns
for col in options_df.columns:
    if col.startswith('C_') or col.startswith('P_'):
        options_df[col] = pd.to_numeric(options_df[col], errors='coerce')

dtypes = {
    'QUOTE_UNIXTIME': 'int64',
    'QUOTE_READTIME': 'datetime64[ns]',
    'QUOTE_DATE': 'datetime64[ns]',
    'QUOTE_TIME_HOURS': 'float64',
    'UNDERLYING_LAST': 'float64',
    'EXPIRE_DATE': 'datetime64[ns]',
    'EXPIRE_UNIX': 'int64',
    'DTE': 'float64',
    'C_DELTA': 'float64',
    'C_GAMMA': 'float64',
    'C_VEGA': 'float64',
    'C_THETA': 'float64',
    'C_RHO': 'float64',
    'C_IV': 'float64',
    'C_VOLUME': 'float64',
    'C_LAST': 'float64',
    'C_SIZE': 'float64',
    'C_BID': 'float64',
    'C_ASK': 'float64',
    'STRIKE': 'float64',
    'P_BID': 'float64',
    'P_ASK': 'float64',
    'P_SIZE': 'float64',
    'P_LAST': 'float64',
    'P_DELTA': 'float64',
    'P_VOLUME': 'float64',
    'STRIKE_DISTANCE': 'float64',
    'STRIKE_DISTANCE_PCT': 'float64'
}

# Convert columns to the specified data types
options_df = options_df.astype(dtypes)

options_df.sample(10)

In [None]:
# group by strike and expiration date
options_df.groupby(['EXPIRE_DATE','STRIKE']).count().sort_values(by='P_LAST', ascending=False)

In [None]:
s_385_19012024_df = options_df.groupby(['EXPIRE_DATE','STRIKE']).get_group((pd.Timestamp('2024-01-19 00:00:00'), 385.0))
s_385_19012024_df

In [None]:
s_385_19012024_df['P_LAST'].describe()

Set index to be quote date

In [None]:
s_385_19012024_df.index = s_385_19012024_df['QUOTE_DATE']
s_385_19012024_df['C_LAST'].plot()

In [None]:
s_385_19012024_df['P_LAST'].plot()

In [None]:
s_385_19012024_df[(s_385_19012024_df['QUOTE_DATE'] > pd.Timestamp('2022-06-01 00:00:00')) & (s_385_19012024_df['QUOTE_DATE'] < pd.Timestamp('2022-07-25 00:00:00'))][['UNDERLYING_LAST', 'UNDERLYING_LAST', 'P_LAST',  'P_ASK',  'P_BID', 'P_VOLUME']]

### Basic methods with time series data
getting to know the data

In [None]:
aapl_ticker = yf.Ticker('AAPL')
aapl_df = aapl_ticker.history(interval='1h', repair=True, period='3mo')

In [None]:
aapl_df.head()

In [None]:
aapl_df.describe()

In [None]:
aapl_df.index

<font size="4">Truncate</font>

<font size="3">truncate a Series or DataFrame before and after some index value</font>

In [None]:
print("first candle date:", aapl_df.index.min())
print("last candle date:", aapl_df.index.max())

In [None]:
new_min_date = pd.Timestamp('2024-01-05 14:30:00', tz='America/New_York')
new_max_date = pd.Timestamp('2024-01-22 14:30:00', tz='America/New_York')

truncated_df = aapl_df.truncate(before=new_min_date, after=new_max_date)
truncated_df

<font size="4">Resample</font>

<font size="3">resample time-series data</font>

In [None]:
daily_aapl_df = aapl_df.resample('1D').mean()
daily_aapl_df

<font size="4">Shift</font>

<font size="3">shift index by desired number of periods with an optional time freq.</font>


In [None]:
aapl_df['Volume'].shift()

<font size="3">Calculate volume diff using shift.</font>

In [None]:
aapl_df['hourly_vol_change'] = aapl_df['Volume'] - aapl_df['Volume'].shift()
aapl_df

<font size="4">Diff</font>

<font size="3">calculates the difference of a DataFrame element compared with another element in the DataFrame (default is element in previous row)</font>


In [None]:
aapl_df['hourly_vol_change_with_diff'] = aapl_df['Volume'].diff()
aapl_df

<font size="4">pct_change</font>

<font size="3">Computes the fractional change from the immediately previous row by default. This is useful in comparing the fraction of change in a time series of elements.</font>


In [None]:
aapl_df['hourly_returns'] = aapl_df['Close'].pct_change()
aapl_df

<font size="4">Expanding</font>

<font size="3">Provide expanding window calculations.</font>


In [None]:
aapl_df['cummulative_returns'] = aapl_df['hourly_returns'].expanding().sum()
aapl_df

In [None]:
aapl_df['hourly_returns'].sum()

<font size="4">Rolling</font>

<font size="3">Provide rolling window calculations.</font>


In [None]:
aapl_df['last_3h_returns_std'] = aapl_df['hourly_returns'].rolling('3h').std()
aapl_df['last_3_candles_std'] = aapl_df['hourly_returns'].rolling(3).std()
aapl_df.head(20)