# Analyze and Transform Financial Market Data with Pandas

In this chapter we'll cover the following recipes:
1. Diving into index types
2. Building Pandas series and DataFrames 
3. Manipulating and transforming DataFrames
4. Examining and Selecting Data from DataFrames
5. Calculating asset returns 
6. Measuring the volatility of a return series 
7. Resampling data from different time frames
8. Addressing missing data issues
9. Applying custom functions to analyse time series data

In [1]:
import pandas as pd

In [2]:
idx_1 = pd.Index([0,1,2,3,4,5,6,7,8,9])

In [3]:
idx_1 # as we can see this index is of type Int64Index, which means it's made up of 64-bit integers

Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='int64')

Pandas has several Index types to support many use cases, including those related to time series analysis. We'll cover examples of the most often used index types

### DatatimeInndex

In [4]:
# extremly useful when dealing with time series data
days = pd.date_range("2016-01-01", periods=6, freq="D")
days # this creates an index with six incremental datetime objects

DatetimeIndex(['2016-01-01', '2016-01-02', '2016-01-03', '2016-01-04',
               '2016-01-05', '2016-01-06'],
              dtype='datetime64[ns]', freq='D')

In [29]:
# we can use different frequencies, including seconds
seconds = pd.date_range("2016-01-01", periods=6, freq="s")
seconds

DatetimeIndex(['2016-01-01 00:00:00', '2016-01-01 00:00:01',
               '2016-01-01 00:00:02', '2016-01-01 00:00:03',
               '2016-01-01 00:00:04', '2016-01-01 00:00:05'],
              dtype='datetime64[ns]', freq='S')

In [6]:
# by default DatetimeIndexes are "timezone naive". To localize:
seconds_utc = seconds.tz_localize("UTC")
seconds_utc # as we can see, localizing simply appends time zone information to the object

DatetimeIndex(['2016-01-01 00:00:00+00:00', '2016-01-01 00:00:01+00:00',
               '2016-01-01 00:00:02+00:00', '2016-01-01 00:00:03+00:00',
               '2016-01-01 00:00:04+00:00', '2016-01-01 00:00:05+00:00'],
              dtype='datetime64[ns, UTC]', freq='S')

### PeriodIndex

In [7]:
# it's possible to create ranges of periods -> such as quarters using period_range method
prng = pd.period_range("1990Q1", "2000Q4", freq="Q-NOV")
prng

PeriodIndex(['1990Q1', '1990Q2', '1990Q3', '1990Q4', '1991Q1', '1991Q2',
             '1991Q3', '1991Q4', '1992Q1', '1992Q2', '1992Q3', '1992Q4',
             '1993Q1', '1993Q2', '1993Q3', '1993Q4', '1994Q1', '1994Q2',
             '1994Q3', '1994Q4', '1995Q1', '1995Q2', '1995Q3', '1995Q4',
             '1996Q1', '1996Q2', '1996Q3', '1996Q4', '1997Q1', '1997Q2',
             '1997Q3', '1997Q4', '1998Q1', '1998Q2', '1998Q3', '1998Q4',
             '1999Q1', '1999Q2', '1999Q3', '1999Q4', '2000Q1', '2000Q2',
             '2000Q3', '2000Q4'],
            dtype='period[Q-NOV]')

### MultiIndex

In [53]:
# often referred as "hierarchical index", is a data structure that allows for complex data organization within pandas dataframe and series.
# to create a MultiIndex object, pass a list of tuples to the from_tuples method 
tuples = [
    (pd.Timestamp("2023-07-10"), "WMT"),
    (pd.Timestamp("2023-07-10"), "JPM"),
    (pd.Timestamp("2023-07-10"), "TGT"),
    (pd.Timestamp("2023-07-11"), "WMT"),
    (pd.Timestamp("2023-07-11"), "JPM"),
    (pd.Timestamp("2023-07-11"), "TGT")
]

midx = pd.MultiIndex.from_tuples(tuples, names=("date","symbol"))
midx

MultiIndex([('2023-07-10', 'WMT'),
            ('2023-07-10', 'JPM'),
            ('2023-07-10', 'TGT'),
            ('2023-07-11', 'WMT'),
            ('2023-07-11', 'JPM'),
            ('2023-07-11', 'TGT')],
           names=['date', 'symbol'])

In [54]:
# create DataFrame with MultiIndex
data = {"Price": [150, 100, 90, 155, 102, 92]}
df = pd.DataFrame(data, index=midx)
df

Unnamed: 0_level_0,Unnamed: 1_level_0,Price
date,symbol,Unnamed: 2_level_1
2023-07-10,WMT,150
2023-07-10,JPM,100
2023-07-10,TGT,90
2023-07-11,WMT,155
2023-07-11,JPM,102
2023-07-11,TGT,92


In [44]:
# more on multiindexes
'''
We have data for two stores (Store A and Store B) for two days (Day 1 and Day 2) and we want to record their sales. 
Using a MultiIndex, we can store this data in a Series:
'''

# Define the index (tuples with two levels: day and store)
index = [('Day 1', 'Store A'), ('Day 1', 'Store B'),
         ('Day 2', 'Store A'), ('Day 2', 'Store B')]

# Create a MultiIndex
multi_index = pd.MultiIndex.from_tuples(index, names=['Day', 'Store'])

# Define the sales data
sales_data = [100, 150, 200, 250]

# Create the Series with MultiIndex
sales = pd.Series(sales_data, index=multi_index)
print(sales)

Day    Store  
Day 1  Store A    100
       Store B    150
Day 2  Store A    200
       Store B    250
dtype: int64


In [51]:
# MultiIndex in a Dataframe
index = pd.MultiIndex.from_tuples(
    [('Day 1', 'Store A'), ('Day 1', 'Store B'),
     ('Day 2', 'Store A'), ('Day 2', 'Store B')],
    names=['Day', 'Store']
)

columns = ['Product 1', 'Product 2']
sales_data = [[10, 20], [15, 25], [20, 30], [25, 35]]

df = pd.DataFrame(sales_data, index=index, columns=columns)
df

Unnamed: 0_level_0,Unnamed: 1_level_0,Product 1,Product 2
Day,Store,Unnamed: 2_level_1,Unnamed: 3_level_1
Day 1,Store A,10,20
Day 1,Store B,15,25
Day 2,Store A,20,30
Day 2,Store B,25,35


In [49]:
# Accessing Data with MultiIndex
print(sales.loc['Day 1'])

# Accessing Data on Store B Day 2
print(sales.loc[('Day 2', 'Store B')])

Store
Store A    100
Store B    150
dtype: int64
250


In [52]:
# Accessing elements in a MultiIndex Dataframe
# Sales of all products for Day 1, Store A
print(df.loc[('Day 1', 'Store A')])

Product 1    10
Product 2    20
Name: (Day 1, Store A), dtype: int64


# Building pandas Series and DataFrames

In [9]:
# A series is a one-dimentional labeled array that can hold any data type, including integers, floats, strings and objects
# The axis labels of a Series are collectively referred to as Index -> allows for manipulation and easy access

A key feature of pandas series is its ability to handle missing data, represented as a NumPy nan (not a number).
 -> Unlike other values, nan doesn't equal anything, which is why we use functions such as numpy.isnan() to check for nan.

In [32]:
# how to construct a DataFrame out of several Series
import numpy as np
def rnd():
    return np.random.randn(6)

In [33]:
# creating three pandas series that we'll use to create df
s_1 = pd.Series(rnd(), index=seconds)
s_2 = pd.Series(rnd(), index=seconds)
s_3 = pd.Series(rnd(), index=seconds)

In [37]:
# creating a DataFrame using a Dictionary 
df = pd.DataFrame({"a": s_1, "b": s_2, "c": s_3})
df
# result is a DataFrame with a DatetimeIndex object of second resolution and three columns all with samples from a normal distribution

Unnamed: 0,a,b,c
2016-01-01 00:00:00,-0.702609,0.224508,-0.622859
2016-01-01 00:00:01,0.400119,-1.710678,0.166156
2016-01-01 00:00:02,-1.157357,-0.682956,-0.439438
2016-01-01 00:00:03,1.710172,-1.021789,-1.167827
2016-01-01 00:00:04,1.341819,0.628829,-0.066414
2016-01-01 00:00:05,1.001978,1.401021,-1.882693


In [17]:
import pandas as pd

data = {
    "Name": ["Alice", "Bob", "Charlie"],
    "Age": [25,30,35]
}

index = ['ID_1', 'ID_2', 'ID_3']

s = pd.DataFrame(data, index=index)
s

Unnamed: 0,Name,Age
ID_1,Alice,25
ID_2,Bob,30
ID_3,Charlie,35


In [28]:
# Basic index operations [Sorting, Union, Intersection]

# sorting an index 
idx = pd.Index([3,4,1,2, 5])
sorted_idx = idx.sort_values()
print(sorted_idx)

# ex: Union and Intersection
idx_a = pd.Index([1,2,3])
idx_b = pd.Index([2,3,4])

union_idx = idx_a.union(idx_b)
print(f"Union: {union_idx}")

intesect_idx = idx_a.intersection(idx_b)
print(f"Intersection: {intesect_idx}")

Index([1, 2, 3, 4, 5], dtype='int64')
Union: Index([1, 2, 3, 4], dtype='int64')
Intersection: Index([2, 3], dtype='int64')


## Building a MultiIndex DataFrame from scratch

In [56]:
# we will use the same object that we've previously created
tuples = [
    (pd.Timestamp("2023-07-10"), "WMT"),
    (pd.Timestamp("2023-07-10"), "JPM"),
    (pd.Timestamp("2023-07-10"), "TGT"),
    (pd.Timestamp("2023-07-11"), "WMT"),
    (pd.Timestamp("2023-07-11"), "JPM"),
    (pd.Timestamp("2023-07-11"), "TGT"),
]

midx = pd.MultiIndex.from_tuples(tuples, names=("date","symbol"))
midx

MultiIndex([('2023-07-10', 'WMT'),
            ('2023-07-10', 'JPM'),
            ('2023-07-10', 'TGT'),
            ('2023-07-11', 'WMT'),
            ('2023-07-11', 'JPM'),
            ('2023-07-11', 'TGT')],
           names=['date', 'symbol'])

In [59]:
# now that we have the index we can create dataframe
df_2 = pd.DataFrame(
    {
        "close": [158.11,144.64,132.55,158.20,146.61,134.86],
        "factor_1": [0.31, 0.24, 0.67, 0.29, 0.23, 0.71]
    }, index=midx
)

df_2

Unnamed: 0_level_0,Unnamed: 1_level_0,close,factor_1
date,symbol,Unnamed: 2_level_1,Unnamed: 3_level_1
2023-07-10,WMT,158.11,0.31
2023-07-10,JPM,144.64,0.24
2023-07-10,TGT,132.55,0.67
2023-07-11,WMT,158.2,0.29
2023-07-11,JPM,146.61,0.23
2023-07-11,TGT,134.86,0.71


### Reindexing and excisting dataframe with a MultiIndex object

In [60]:
'''
It’s common to add a MultiIndex object to a DataFrame. 
Let’s consider an example of reindexing options data for a MultiIndex object
'''

# import openbb platform 
from openbb import obb
obb.user.preferences.output_type = "dataframe"

In [61]:
chains = obb.derivatives.options.chains("SPY")

In [62]:
# DataFrame with SPY options data
chains

Unnamed: 0,underlying_symbol,underlying_price,contract_symbol,expiration,dte,strike,option_type,open_interest,volume,theoretical_price,...,low,prev_close,change,change_percent,implied_volatility,delta,gamma,theta,vega,rho
0,SPY,579.1,SPY241024C00300000,2024-10-24,0,300.0,call,8,27,279.0101,...,278.79,277.940002,1.670,0.006008,0.0000,1.0000,0.0000,-0.0001,0.0000,0.0000
1,SPY,579.1,SPY241024P00300000,2024-10-24,0,300.0,put,0,0,0.0001,...,0.00,0.005000,0.000,0.000000,7.2127,0.0000,0.0000,-0.0001,0.0000,0.0000
2,SPY,579.1,SPY241024C00305000,2024-10-24,0,305.0,call,0,0,274.0102,...,0.00,273.460007,0.000,0.000000,0.0000,1.0000,0.0000,-0.0002,0.0000,0.0000
3,SPY,579.1,SPY241024P00305000,2024-10-24,0,305.0,put,0,0,0.0002,...,0.00,0.005000,0.000,0.000000,7.0394,0.0000,0.0000,-0.0002,0.0000,0.0000
4,SPY,579.1,SPY241024C00310000,2024-10-24,0,310.0,call,0,0,269.0102,...,0.00,268.020004,0.000,0.000000,8.3494,1.0000,0.0000,-0.0002,0.0000,0.0000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8985,SPY,579.1,SPY270115P00890000,2027-01-15,813,890.0,put,0,0,310.9900,...,0.00,312.024994,0.000,0.000000,0.0000,-1.0000,0.0000,-0.0780,0.0000,0.0001
8986,SPY,579.1,SPY270115C00895000,2027-01-15,813,895.0,call,6,0,1.0360,...,0.00,0.980000,0.000,0.000000,0.1204,0.0292,0.0006,-0.0039,0.6395,0.3452
8987,SPY,579.1,SPY270115P00895000,2027-01-15,813,895.0,put,0,0,315.9900,...,0.00,317.024994,0.000,0.000000,0.0000,-1.0000,0.0000,-0.0780,0.0000,0.0001
8988,SPY,579.1,SPY270115C00900000,2027-01-15,813,900.0,call,939,1,0.9728,...,1.00,1.045000,-0.045,-0.043062,0.1234,0.0276,0.0006,-0.0037,0.6116,0.3257


Options are derivatives that are often grouped by expiration date, stike price, option type and any combination of those thee.

-> Using the set_index method takes the arguments in the list and uses those columns as indexes, converting RangeIndex into a MultiIndex object. In this example, we use the expiration date, strike price, and option type as the three indexes.

In [64]:
df_3 = chains.set_index(["expiration", "strike", "option_type"])
df_3

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,underlying_symbol,underlying_price,contract_symbol,dte,open_interest,volume,theoretical_price,last_trade_price,last_trade_time,tick,...,low,prev_close,change,change_percent,implied_volatility,delta,gamma,theta,vega,rho
expiration,strike,option_type,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1
2024-10-24,300.0,call,SPY,579.1,SPY241024C00300000,0,8,27,279.0101,279.61,2024-10-24 09:30:15,up,...,278.79,277.940002,1.670,0.006008,0.0000,1.0000,0.0000,-0.0001,0.0000,0.0000
2024-10-24,300.0,put,SPY,579.1,SPY241024P00300000,0,0,0,0.0001,0.00,NaT,no_change,...,0.00,0.005000,0.000,0.000000,7.2127,0.0000,0.0000,-0.0001,0.0000,0.0000
2024-10-24,305.0,call,SPY,579.1,SPY241024C00305000,0,0,0,274.0102,0.00,NaT,no_change,...,0.00,273.460007,0.000,0.000000,0.0000,1.0000,0.0000,-0.0002,0.0000,0.0000
2024-10-24,305.0,put,SPY,579.1,SPY241024P00305000,0,0,0,0.0002,0.00,NaT,no_change,...,0.00,0.005000,0.000,0.000000,7.0394,0.0000,0.0000,-0.0002,0.0000,0.0000
2024-10-24,310.0,call,SPY,579.1,SPY241024C00310000,0,0,0,269.0102,0.00,NaT,no_change,...,0.00,268.020004,0.000,0.000000,8.3494,1.0000,0.0000,-0.0002,0.0000,0.0000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2027-01-15,890.0,put,SPY,579.1,SPY270115P00890000,813,0,0,310.9900,0.00,NaT,no_change,...,0.00,312.024994,0.000,0.000000,0.0000,-1.0000,0.0000,-0.0780,0.0000,0.0001
2027-01-15,895.0,call,SPY,579.1,SPY270115C00895000,813,6,0,1.0360,1.28,2024-10-18 14:17:38,down,...,0.00,0.980000,0.000,0.000000,0.1204,0.0292,0.0006,-0.0039,0.6395,0.3452
2027-01-15,895.0,put,SPY,579.1,SPY270115P00895000,813,0,0,315.9900,0.00,NaT,no_change,...,0.00,317.024994,0.000,0.000000,0.0000,-1.0000,0.0000,-0.0780,0.0000,0.0001
2027-01-15,900.0,call,SPY,579.1,SPY270115C00900000,813,939,1,0.9728,1.00,2024-10-24 09:30:09,up,...,1.00,1.045000,-0.045,-0.043062,0.1234,0.0276,0.0006,-0.0037,0.6116,0.3257


In [65]:
df_3.index

MultiIndex([(2024-10-24, 300.0, 'call'),
            (2024-10-24, 300.0,  'put'),
            (2024-10-24, 305.0, 'call'),
            (2024-10-24, 305.0,  'put'),
            (2024-10-24, 310.0, 'call'),
            (2024-10-24, 310.0,  'put'),
            (2024-10-24, 315.0, 'call'),
            (2024-10-24, 315.0,  'put'),
            (2024-10-24, 320.0, 'call'),
            (2024-10-24, 320.0,  'put'),
            ...
            (2027-01-15, 880.0, 'call'),
            (2027-01-15, 880.0,  'put'),
            (2027-01-15, 885.0, 'call'),
            (2027-01-15, 885.0,  'put'),
            (2027-01-15, 890.0, 'call'),
            (2027-01-15, 890.0,  'put'),
            (2027-01-15, 895.0, 'call'),
            (2027-01-15, 895.0,  'put'),
            (2027-01-15, 900.0, 'call'),
            (2027-01-15, 900.0,  'put')],
           names=['expiration', 'strike', 'option_type'], length=8990)

# Manipulating and Transforming DataFrames

Topics:
- Creating new columns using aggregates, Booleans and Strings 
- Concatenating two DataFrames together 
- Pivoting a DataFrame such Excel
- Grouping data on a key or index and applying and aggregate
- Joining options data together to create a straddle prices

In [3]:
import numpy as np 
import pandas as pd 
from openbb import obb
obb.user.preferences.output_type = "dataframe"

In [4]:
asset = obb.equity.price.historical("AAPL")
benchmark = obb.equity.price.historical("SPY")

In [5]:
print(asset.head())
print(benchmark.head())

            open  high   low  close      volume
date                                           
2004-01-02  0.39  0.39  0.38   0.38  2024993600
2004-01-05  0.38  0.40  0.38   0.40  5530257600
2004-01-06  0.40  0.40  0.39   0.40  7130872000
2004-01-07  0.40  0.41  0.39   0.41  8216241600
2004-01-08  0.41  0.42  0.41   0.42  6444244800
              open    high     low   close    volume
date                                                
2004-01-02  111.74  112.19  110.73  111.23  38072300
2004-01-05  111.69  112.52  111.59  112.44  27959800
2004-01-06  112.16  112.73  112.00  112.55  20472800
2004-01-07  112.39  113.06  111.89  112.93  30170400
2004-01-08  113.25  113.41  112.77  113.38  36438400


In [6]:
columns = [
    "open",
    "high",
    "low",
    "close",
    "volume"
]

asset.columns = columns
benchmark.columns = columns

In [7]:
# Add new columns with value from aggregate
asset["price_diff"] = asset.close.diff()
benchmark["price_diff"] = benchmark.close.diff()

In [8]:
# Adding new columns with a boolean
asset["gain"] = asset.price_diff > 0
benchmark["gain"] = benchmark.price_diff > 0

In [9]:
# Adding new column with a string value
asset["ticker"] = "AAPL"
benchmark["ticker"] = "SPY"

In [10]:
# Results of adding new columns to both dataframes
asset.head()

Unnamed: 0_level_0,open,high,low,close,volume,price_diff,gain,ticker
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2004-01-02,0.39,0.39,0.38,0.38,2024993600,,False,AAPL
2004-01-05,0.38,0.4,0.38,0.4,5530257600,0.02,True,AAPL
2004-01-06,0.4,0.4,0.39,0.4,7130872000,0.0,False,AAPL
2004-01-07,0.4,0.41,0.39,0.41,8216241600,0.01,True,AAPL
2004-01-08,0.41,0.42,0.41,0.42,6444244800,0.01,True,AAPL


In [11]:
benchmark.head()

Unnamed: 0_level_0,open,high,low,close,volume,price_diff,gain,ticker
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2004-01-02,111.74,112.19,110.73,111.23,38072300,,False,SPY
2004-01-05,111.69,112.52,111.59,112.44,27959800,1.21,True,SPY
2004-01-06,112.16,112.73,112.0,112.55,20472800,0.11,True,SPY
2004-01-07,112.39,113.06,111.89,112.93,30170400,0.38,True,SPY
2004-01-08,113.25,113.41,112.77,113.38,36438400,0.45,True,SPY


In [12]:
# Set a single value based on aggregate of values
asset_2 = asset.copy()
asset_2.at[
    asset_2.index[10],
    "volume"
] = asset_2.volume.mean()

  asset_2.at[


In [13]:
asset_2.iat[10,4]

962016497.4093857

## Concatenating two Dataframes together

In [14]:
pd.concat([asset, asset_2]).drop_duplicates()

Unnamed: 0_level_0,open,high,low,close,volume,price_diff,gain,ticker
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2004-01-02,0.39,0.39,0.38,0.38,2.024994e+09,,False,AAPL
2004-01-05,0.38,0.40,0.38,0.40,5.530258e+09,0.02,True,AAPL
2004-01-06,0.40,0.40,0.39,0.40,7.130872e+09,0.00,False,AAPL
2004-01-07,0.40,0.41,0.39,0.41,8.216242e+09,0.01,True,AAPL
2004-01-08,0.41,0.42,0.41,0.42,6.444245e+09,0.01,True,AAPL
...,...,...,...,...,...,...,...,...
2024-10-21,234.52,236.85,234.45,236.48,3.625447e+07,1.48,True,AAPL
2024-10-22,233.95,236.22,232.60,235.86,3.884658e+07,-0.62,False,AAPL
2024-10-23,234.10,235.14,227.76,230.76,5.228698e+07,-5.10,False,AAPL
2024-10-24,229.98,230.82,228.41,230.57,3.110950e+07,-0.19,False,AAPL


## Pivoting a df such as Excel

In [15]:
pd.pivot_table(
    data=asset,
    values="price_diff",
    columns="gain",
    aggfunc=["sum","mean", "std"]
) # the result is a pivoted dataframe with MultiIndex column labels

Unnamed: 0_level_0,sum,sum,mean,mean,std,std
gain,False,True,False,True,False,True
price_diff,-1542.21,1772.4,-0.603133,0.660358,1.182803,1.196536


## Grouping data on a key or index and applying an aggregate

In [16]:
concated = pd.concat([asset, benchmark])

In [17]:
concated.head()

Unnamed: 0_level_0,open,high,low,close,volume,price_diff,gain,ticker
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2004-01-02,0.39,0.39,0.38,0.38,2024993600,,False,AAPL
2004-01-05,0.38,0.4,0.38,0.4,5530257600,0.02,True,AAPL
2004-01-06,0.4,0.4,0.39,0.4,7130872000,0.0,False,AAPL
2004-01-07,0.4,0.41,0.39,0.41,8216241600,0.01,True,AAPL
2004-01-08,0.41,0.42,0.41,0.42,6444244800,0.01,True,AAPL


In [18]:
# group the resulting df by ticker 
concated.groupby("ticker").close.ohlc()

Unnamed: 0_level_0,open,high,low,close
ticker,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
AAPL,0.38,236.48,0.38,230.57
SPY,111.23,584.59,68.11,579.24


## Joining options data together to create straddle prices

In [19]:
# DataFrame joints are similar to sql joints. It combines to dataframes on a matching key
chains = obb.derivatives.options.chains(
    "AAPL", provider="cboe"
)
chains.head()

Unnamed: 0,underlying_symbol,underlying_price,contract_symbol,expiration,dte,strike,option_type,open_interest,volume,theoretical_price,...,low,prev_close,change,change_percent,implied_volatility,delta,gamma,theta,vega,rho
0,AAPL,230.86,AAPL241101C00100000,2024-11-01,5,100.0,call,3,0,131.7228,...,0.0,130.650002,0.0,0.0,0.0,1.0,0.0,-0.0008,0.0001,0.0219
1,AAPL,230.86,AAPL241101P00100000,2024-11-01,5,100.0,put,3,0,0.001,...,0.0,0.005,0.0,0.0,1.8532,-0.0001,0.0,-0.0008,0.0001,0.0
2,AAPL,230.86,AAPL241101C00105000,2024-11-01,5,105.0,call,5,0,126.7286,...,0.0,125.600002,0.0,0.0,0.0,1.0,0.0,-0.0008,0.0001,0.023
3,AAPL,230.86,AAPL241101P00105000,2024-11-01,5,105.0,put,0,0,0.0011,...,0.0,0.02,0.0,0.0,2.1338,-0.0001,0.0,-0.0008,0.0001,0.0
4,AAPL,230.86,AAPL241101C00110000,2024-11-01,5,110.0,call,0,0,121.7345,...,0.0,120.600002,0.0,0.0,0.0,1.0,0.0,-0.0009,0.0001,0.0241


In [20]:
# in order to construct a straddle we need to filter out the calls and put for especific expiration
expirations = chains.expiration.unique()
calls = chains[
    (chains.option_type == "call") & (chains.expiration == expirations[5])
]

puts = chains[
    (chains.option_type == "put") & (chains.expiration == expirations[5])
]

In [26]:
calls_strike = calls.set_index("strike")
puts_strike = puts.set_index("strike")

In [27]:
joined = calls_strike.join(
    puts_strike,
    how="left",
    lsuffix="_call",
    rsuffix="_put"
)

In [28]:
# we need only the price columns from joined DataFrame
prices = joined[["last_trade_price_call",
                "last_trade_price_put"]]

In [31]:
# sum up the call and put prices by using the axis argument in the sum method
prices["straddle_price"] = prices.sum(axis=1)
prices.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  prices["straddle_price"] = prices.sum(axis=1)


Unnamed: 0_level_0,last_trade_price_call,last_trade_price_put,straddle_price
strike,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
100.0,0.0,0.0,0.0
105.0,0.0,0.0,0.0
110.0,0.0,0.0,0.0
115.0,0.0,0.0,0.0
120.0,0.0,0.0,0.0
