# Word vectors from SEC filings using Gensim: Preprocessing

In this section, we will learn word and phrase vectors from annual SEC filings using gensim to illustrate the potential value of word embeddings for algorithmic trading. In the following sections, we will combine these vectors as features with price returns to train neural networks to predict equity prices from the content of security filings.

In particular, we use a dataset containing over 22,000 10-K annual reports from the period 2013-2016 that are filed by listed companies and contain both financial information and management commentary (see chapter 3 on Alternative Data). For about half of 11K filings for companies that we have stock prices to label the data for predictive modeling

## Imports & Settings

In [33]:
using Pkg
using PyCall
using Conda

In [34]:
#Pkg.add("Glob")
#Pkg.add("TextAnalysis")
#Pkg.add("DataFrames")
#Pkg.add("Plots")
#Pkg.add("CSV")
#Pkg.add("JSON")
#Pkg.add("PlotlyJS")
#Pkg.add("StatsBase")

In [35]:
using Glob
using TextAnalysis
using DataFrames
using StatsBase
using StringEncodings
using PlotlyJS
using Printf
using CSV

In [36]:
#Conda.add("gensim")
@pyimport gensim

In [37]:
Word2Vec = gensim.models.Word2Vec
LineSentence = gensim.models.word2vec.LineSentence
Phrases = gensim.models.phrases.Phrases
Phraser = gensim.models.phrases.Phraser

PyObject <class 'gensim.models.phrases.FrozenPhrases'>

In [39]:
#Conda.pip_interop(true)

In [40]:
#Conda.pip("install", "https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.3.0/en_core_web_sm-3.3.0-py3-none-any.whl")
#Conda.pip("install", "https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.3.0/en_core_web_sm-3.3.0.tar.gz")

In [41]:
#Conda.add("spacy")
@pyimport spacy

In [42]:
function format_time(t)
    m = t ÷ 60
    s = t % 60
    h = m ÷ 60
    m = m % 60
    h = length("$h") == 2 ? h : "0$h"
    m = length("$m") == 2 ? m : "0$m"
    s = length("$s") == 2 ? s : "0$s"
    return "$h:$m:$s"
end

format_time (generic function with 1 method)

## Data Download

<div style='direction:rtl; font-family: "B Nazanin"; font-size: 20px;'> 
با توجه به دستورالعمل زیر داده‌ها دانلود و unzip شود.
سپس فولدر filings که داده‌ها در آن ذخیره شده‌اند، در فولدر به نام sec-filings در همان فولدر filings قرار داده شود.
با توجه به حجم بالای داده‌ها، تنها از بخشی از داده‌ها استفاده شده است.
برای اجرای عملیات‌های زیر بر روی کل داده‌ها، نیاز است از یک سیستم با توان پردازشی بالا و از GPU به جای CPU استفاده شود.

The data can be downloaded from [here](https://drive.google.com/uc?id=0B4NK0q0tDtLFendmeHNsYzNVZ2M&export=download). Unzip and move into the `data` folder in the repository"s root directory and rename to `filings`.

### Paths

Each filing is a separate text file and a master index contains filing metadata. We extract the most informative sections, namely
- Item 1 and 1A: Business and Risk Factors
- Item 7 and 7A: Management"s Discussion and Disclosures about Market Risks

The notebook preprocessing shows how to parse and tokenize the text using spaCy, similar to the approach in chapter 14. We do not lemmatize the tokens to preserve nuances of word usage.

We use gensim to detect phrases. The Phrases module scores the tokens and the Phraser class transforms the text data accordingly. The notebook shows how to repeat the process to create longer phrases.

In [43]:
sec_path = joinpath("..", "data", "sec-filings")
filing_path = joinpath(sec_path, "filings")
sections_path = joinpath(sec_path, "sections")

"..\\data\\sec-filings\\sections"

In [44]:
if !(isdir(sections_path))
    mkpath(sections_path)
end

## Identify Sections

In [45]:
files = Glob.glob("../data/sec-filings/filings/*.txt")

22631-element Vector{String}:
 "..\\data\\sec-filings\\filings\\1.txt"
 "..\\data\\sec-filings\\filings\\10.txt"
 "..\\data\\sec-filings\\filings\\100.txt"
 "..\\data\\sec-filings\\filings\\1000.txt"
 "..\\data\\sec-filings\\filings\\10000.txt"
 "..\\data\\sec-filings\\filings\\10001.txt"
 "..\\data\\sec-filings\\filings\\10002.txt"
 "..\\data\\sec-filings\\filings\\10003.txt"
 "..\\data\\sec-filings\\filings\\10004.txt"
 "..\\data\\sec-filings\\filings\\10005.txt"
 "..\\data\\sec-filings\\filings\\10006.txt"
 "..\\data\\sec-filings\\filings\\10007.txt"
 "..\\data\\sec-filings\\filings\\10008.txt"
 ⋮
 "..\\data\\sec-filings\\filings\\9989.txt"
 "..\\data\\sec-filings\\filings\\999.txt"
 "..\\data\\sec-filings\\filings\\9990.txt"
 "..\\data\\sec-filings\\filings\\9991.txt"
 "..\\data\\sec-filings\\filings\\9992.txt"
 "..\\data\\sec-filings\\filings\\9993.txt"
 "..\\data\\sec-filings\\filings\\9994.txt"
 "..\\data\\sec-filings\\filings\\9995.txt"
 "..\\data\\sec-filings\\filings\\9996.tx

In [46]:
function read_text(file)
    f = open(file, "r")
    s = StringDecoder(f, "LATIN1", "UTF-8")
    text = join(readlines(s), " ")
    close(f)
    return text
end

read_text (generic function with 1 method)

In [47]:
for (i, filing) in enumerate(files[1:5000])
    filing_id = splitext(basename(filing))[1]
    items = Dict()
    for section in split(lowercase(read_text(filing)), "°")
        if startswith(section, "item ")
            if length(split(section)) > 1
                item = replace(replace(replace(split(section)[2], "."=>""), ":"=>""), ","=>"")
                text = join([t for t in split(section)[3:end]], " ")
                if !(item ∈ keys(items)) || length(items[item]) < length(text)
                    items[item] = text
                end
            end
        end
    end

    txt = DataFrame(item=[item[1] for item in items], text=[item[2] for item in items])
    CSV.write(joinpath(sections_path, "$filing_id" * ".csv"), txt)
end

## Parse Sections

Select the following sections:

In [48]:
sections = ["1", "1a", "7", "7a"]

4-element Vector{String}:
 "1"
 "1a"
 "7"
 "7a"

In [49]:
clean_path = joinpath(sec_path, "selected_sections")
if !(isdir(clean_path))
    mkpath(clean_path)
end

In [50]:
nlp = spacy.load("en_core_web_sm", disable=["ner"])
nlp.max_length = 6000000

6000000

In [51]:
function add_clean_token(clean_sentence, token)
    condition = token.is_stop || token.is_digit || !(token.is_alpha) || token.is_punct || token.is_space
    condition = condition || (token.lemma_ == "-PRON-") || (token.pos_ ∈ ["PUNCT", "SYM", "X"])
    if !(condition)
        push!(clean_sentence, lowercase(token.text))
    end
    return clean_sentence
end

add_clean_token (generic function with 1 method)

In [52]:
t = total_tokens = 0

sections_files = Glob.glob("../data/sec-filings/sections/*.csv")
clean_files = Glob.glob("../data/sec-filings/selected_sections/*.csv")
to_do = length(sections_files)
done = length(clean_files) + 1

for text_file in sections_files[1:100]
    file_id = splitext(basename(text_file))[1]
    clean_file = joinpath(clean_path, "$(file_id).csv")
    
    items = dropmissing(DataFrame(CSV.File((text_file))))
    items = filter(row -> row.item ∈ sections, items)
    
    clean_doc = []
    for row in eachrow(items)
        item = row["item"]
        text = row["text"]
        doc = nlp(text)
        for (s, sentence) ∈ enumerate(doc.sents)
            clean_sentence = []
            if sentence != nothing
                for (t, token) ∈ enumerate(sentence)
                    clean_sentence = add_clean_token(clean_sentence, token)
                end
                total_tokens += t
                if length(clean_sentence) > 0
                    push!(clean_doc, [item, s, join(clean_sentence, " ")])
                end
            end
        end
    end
    clean_doc_df = dropmissing(DataFrame(item = [doc[1] for doc ∈ clean_doc], 
                            sentence = [doc[2] for doc ∈ clean_doc], 
                            text = [doc[3] for doc ∈ clean_doc]))
    CSV.write(clean_file, clean_doc_df)
    done += 1
end

## Create ngrams

In [53]:
ngram_path = joinpath(sec_path, "ngrams")
stats_path = joinpath(sec_path, "corpus_stats")

for path in [ngram_path, stats_path]
    if !(isdir(path))
        mkpath(path)
    end
end

In [54]:
unigrams = joinpath(ngram_path, "ngrams_1.txt")

"..\\data\\sec-filings\\ngrams\\ngrams_1.txt"

In [55]:
value_counts(df, col) = combine(groupby(df, col), nrow)

value_counts (generic function with 1 method)

In [56]:
clean_files = Glob.glob("../data/sec-filings/selected_sections/*.csv")
df = DataFrame(CSV.File(clean_files[1]))
df.text[1]

"business annual report form k contains forward looking statements future events future results subject safe harbors created securities act amended securities exchange act amended"

In [57]:
function create_unigrams(; min_length=3)
    texts = []
    sentence_counter = Dict()
    vocab = Dict()
    clean_files = Glob.glob("../data/sec-filings/selected_sections/*.csv")
    for (i, f) ∈ enumerate(clean_files)
        df = DataFrame(CSV.File(f))
        df = filter(row -> row.item ∈ sections, df)
        
        addcounts!(sentence_counter, Dict(eachrow(value_counts(df, :item))))
        for entry ∈ (dropmissing(df, :text)).text
            sentence = split(entry)
            if length(sentence) >= min_length
                addcounts!(vocab, sentence)
                push!(texts, join(sentence, " "))
            end
        end
    end
    sentence_counter = sort(collect(sentence_counter), by=x->x[2], rev=true)
    sentence_counter_df = DataFrame(item = [pair[1] for pair ∈ sentence_counter], 
                                    sentences = [pair[2] for pair ∈ sentence_counter])
    CSV.write(joinpath(stats_path, "selected_sentences.csv"), sentence_counter_df)

    vocab = sort(collect(vocab), by=x->x[2], rev=true)
    vocab_df = DataFrame(token = [pair[1] for pair ∈ vocab], 
                            n = [pair[2] for pair ∈ vocab])
    CSV.write(joinpath(stats_path, "sections_vocab.csv"), vocab_df)
    
    open(unigrams, "w") do file
        write(unigrams, join(texts, "\n"))
    end
    return [split(l) for l ∈ texts]
end

create_unigrams (generic function with 1 method)

In [58]:
start = time()

texts = create_unigrams()

println("Reading: $(format_time(floor(Int, time() - start)))")

Reading: 00:00:04


In [59]:
texts

133218-element Vector{Vector{SubString{String}}}:
 ["business", "annual", "report", "form", "k", "contains", "forward", "looking", "statements", "future"  …  "safe", "harbors", "created", "securities", "act", "amended", "securities", "exchange", "act", "amended"]
 ["statements", "statements", "historical", "fact", "statements", "deemed", "forward", "looking", "statements"]
 ["statements", "contain", "words", "expects", "anticipates", "intends", "plans", "believes", "seeks", "estimates", "wording", "indicating", "future", "results", "expectations"]
 ["forward", "looking", "statements", "subject", "risks", "uncertainties"]
 ["actual", "results", "differ", "materially", "results", "discussed", "forward", "looking", "statements"]
 ["factors", "cause", "actual", "results", "differ", "materially", "include", "limited", "discussed", "risk", "factors", "item", "report"]
 ["business", "financial", "condition", "results", "operations", "materially", "harmed", "factors"]
 ["undertake", "obligatio

In [60]:
function get_articles(article_path)
    file = open(article_path, "r")
    s = StringDecoder(file,"LATIN1", "UTF-8")
    articles = readlines(s)
    close(file)
    return articles
end

get_articles (generic function with 1 method)

In [61]:
function get_ngram_articles(articles, n)
    sentences = Any[]
    ngram_counter = Dict()
    for article ∈ articles
        doc = StringDocument(article)
        doc_ngram = ngrams(doc, n)
        for (k, v) ∈ doc_ngram
            if k ∈ keys(ngram_counter)
                ngram_counter[k] += v
            else
                ngram_counter[k] = v
            end
        end
        sentence = join(keys(doc_ngram), " ") * "\n"
        push!(sentences, sentence)
    end
    return ngram_counter, sentences
end

get_ngram_articles (generic function with 1 method)

In [62]:
function create_ngrams(max_length=3)
    """Using TextAnalysis to create ngrams"""

    n_grams_df = DataFrame()
    start = time()
    for n ∈ 2:max_length
        articles = get_articles(joinpath(ngram_path, "ngrams_$(n-1).txt"))
        ngram_counter, sentences = get_ngram_articles(articles, n)

        s = DataFrame(length = [n for i ∈ 1:length(keys(ngram_counter))], 
                                phrase = [ngram_word for ngram_word ∈ keys(ngram_counter)], 
                                count = [ngram_count for ngram_count ∈ values(ngram_counter)])
                          
        n_grams_df = vcat(n_grams_df, s)

        open(joinpath(ngram_path, "ngrams_$(n).txt"), "w") do file
            for sentence ∈ sentences
                write(file, sentence)
            end
        end
    end

    n_grams_df = sort!(n_grams_df, :count, rev=true)
    n_grams_df[!, "ngram"] = replace.(n_grams_df[:, "phrase"], " "=>"_")
        
    CSV.write(joinpath(sec_path, "ngrams.csv"), n_grams_df)

    println("Duration: $(format_time(floor(Int, (time() - start))))\n")
    println("ngrams: $(size(n_grams_df)[1])\n")
    println(value_counts(n_grams_df, :length))
end

create_ngrams (generic function with 2 methods)

In [63]:
create_ngrams()

Duration: 00:01:35

ngrams: 3616211

2×2 DataFrame
│ Row │ length │ nrow    │
│     │ [90mInt64[39m  │ [90mInt64[39m   │
├─────┼────────┼─────────┤
│ 1   │ 2      │ 688350  │
│ 2   │ 3      │ 2927861 │


## Inspect Corpus

In [64]:
nsents, ntokens = Dict(), Dict()

clean_files = Glob.glob("../data/sec-filings/selected_sections/*.csv")
for f in clean_files
    df = DataFrame(CSV.File(f))

    for (k, v) ∈ Dict(eachrow(value_counts(df, :item)))
        if k ∈ keys(nsents)
            nsents["$k"] += v
        else
            nsents["$k"] = v
        end
    end

    df["ntokens"] = length.(split.(df.text))

    for (k, v) ∈ Dict(eachrow(combine(groupby(df, :item), :ntokens => sum)))
        if k ∈ keys(ntokens)
            ntokens["$k"] += v
        else
            ntokens["$k"] = v
        end
    end
end

ntokens = sort(collect(ntokens), by=x->x[2], rev=true)
nsents = sort(collect(nsents), by=x->x[2], rev=true)
println("Number of Sentences:")
println(nsents)
println("\nNumber of Tokens:")
println(ntokens)

Number of Sentences:
Pair{Any, Any}["1a" => 36403, "7a" => 3481, "7" => 2619, "1" => 1949]

Number of Tokens:
Pair{Any, Any}["1a" => 587418, "7a" => 56941, "7" => 41712, "1" => 30666]


In [65]:
ntokens_df = DataFrame(Item = [pair[1] for pair ∈ ntokens])
ntokens_df[!, "# Tokens"] = [pair[2] for pair ∈ ntokens]

nsents_df = DataFrame(Item = [pair[1] for pair ∈ nsents])
nsents_df[!, "# Sentences"] = [pair[2] for pair ∈ nsents]

println("Number of Sentences:")
println(nsents_df)
println("\nNumber of Tokens:")
println(ntokens_df)

Number of Sentences:
4×2 DataFrame
│ Row │ Item   │ # Sentences │
│     │ [90mString[39m │ [90mInt64[39m       │
├─────┼────────┼─────────────┤
│ 1   │ 1a     │ 36403       │
│ 2   │ 7a     │ 3481        │
│ 3   │ 7      │ 2619        │
│ 4   │ 1      │ 1949        │

Number of Tokens:
4×2 DataFrame
│ Row │ Item   │ # Tokens │
│     │ [90mString[39m │ [90mInt64[39m    │
├─────┼────────┼──────────┤
│ 1   │ 1a     │ 587418   │
│ 2   │ 7a     │ 56941    │
│ 3   │ 7      │ 41712    │
│ 4   │ 1      │ 30666    │


In [109]:
PlotlyJS.plot(nsents_df["# Sentences"], x=nsents_df.Item, kind="bar")

In [111]:
PlotlyJS.plot(ntokens_df["# Tokens"], x=ntokens_df.Item, kind="bar")

In [67]:
ngrams_df = DataFrame(CSV.File(joinpath(sec_path, "ngrams.csv")))

Unnamed: 0_level_0,length,phrase,count,ngram
Unnamed: 0_level_1,Int64,String,Int64,String
1,2,company s,4302,company_s
2,2,year ended,4172,year_ended
3,2,ended december,3905,ended_december
4,2,results operations,3756,results_operations
5,2,common stock,3546,common_stock
6,2,table contents,3490,table_contents
7,2,financial condition,3128,financial_condition
8,2,fair value,2824,fair_value
9,2,financial statements,2729,financial_statements
10,2,million million,2605,million_million


In [68]:
first(ngrams_df, 5)

Unnamed: 0_level_0,length,phrase,count,ngram
Unnamed: 0_level_1,Int64,String,Int64,String
1,2,company s,4302,company_s
2,2,year ended,4172,year_ended
3,2,ended december,3905,ended_december
4,2,results operations,3756,results_operations
5,2,common stock,3546,common_stock


In [69]:
describe(ngrams_df.count)

Summary Stats:
Length:         3616211
Missing Count:  0
Mean:           1.583679
Minimum:        1.000000
1st Quartile:   1.000000
Median:         1.000000
3rd Quartile:   1.000000
Maximum:        4302.000000
Type:           Int64


In [70]:
threshold = 0.7 * maximum(ngrams_df.count)
first(sort(filter(row -> row.count > threshold, ngrams_df), ["length", "count"], rev=true), 10)

Unnamed: 0_level_0,length,phrase,count,ngram
Unnamed: 0_level_1,Int64,String,Int64,String
1,2,company s,4302,company_s
2,2,year ended,4172,year_ended
3,2,ended december,3905,ended_december
4,2,results operations,3756,results_operations
5,2,common stock,3546,common_stock
6,2,table contents,3490,table_contents
7,2,financial condition,3128,financial_condition


In [71]:
vocab = dropmissing(DataFrame(CSV.File(joinpath(stats_path, "sections_vocab.csv"))))

Unnamed: 0_level_0,token,n
Unnamed: 0_level_1,String31,Int64
1,million,20454
2,company,17253
3,financial,12897
4,business,12891
5,products,12514
6,s,10890
7,operations,10596
8,year,10304
9,december,10222
10,net,10131


In [72]:
describe(vocab.n)

Summary Stats:
Length:         26044
Missing Count:  0
Mean:           83.708647
Minimum:        1.000000
1st Quartile:   1.000000
Median:         4.000000
3rd Quartile:   21.000000
Maximum:        20454.000000
Type:           Int64


In [79]:
tokens = Dict()
open(joinpath(ngram_path, "ngrams_2.txt"), "r") do file
    for l in readlines(file)
        addcounts!(tokens, split(l))
    end
end
tokens = sort(collect(tokens), by=x->x[2], rev=true)

26045-element Vector{Pair{Any, Any}}:
             "million" => 34435
             "company" => 28594
           "financial" => 24424
            "business" => 22824
            "products" => 22240
                   "s" => 20488
          "operations" => 17778
                 "net" => 16563
               "sales" => 16511
              "market" => 16322
                "year" => 16086
            "interest" => 15951
                "cash" => 15713
                       ⋮
              "fibrin" => 1
 "undercapitalization" => 1
         "witnessview" => 1
                "told" => 1
        "constriction" => 1
          "tactically" => 1
     "tachyarrhythmia" => 1
            "kalbitor" => 1
           "jelectric" => 1
             "tapered" => 1
               "crowe" => 1
     "differentiators" => 1

In [82]:
tokens = DataFrame(token = [pair[1] for pair ∈ tokens], count = [pair[2] for pair ∈ tokens])

Unnamed: 0_level_0,token,count
Unnamed: 0_level_1,SubStri…,Int64
1,million,34435
2,company,28594
3,financial,24424
4,business,22824
5,products,22240
6,s,20488
7,operations,17778
8,net,16563
9,sales,16511
10,market,16322


In [83]:
first(tokens, 5)

Unnamed: 0_level_0,token,count
Unnamed: 0_level_1,SubStri…,Int64
1,million,34435
2,company,28594
3,financial,24424
4,business,22824
5,products,22240


In [84]:
describe(select(filter(row -> contains(row.token, "_"), tokens), :count))

Unnamed: 0_level_0,variable,mean,min,median,max,nunique,nmissing,eltype
Unnamed: 0_level_1,Symbol,Float64,Nothing,Nothing,Nothing,Nothing,Nothing,DataType
1,count,,,,,,,Int64


In [85]:
CSV.write(joinpath(sec_path, "ngram_examples.csv"), first(filter(row -> contains(row.token, "_"), tokens), 20))

"..\\data\\sec-filings\\ngram_examples.csv"

In [86]:
first(filter(row -> contains(row.token, "_"), tokens), 20)

Unnamed: 0_level_0,token,count
Unnamed: 0_level_1,SubStri…,Int64


<div style='direction:rtl; font-family: "B Nazanin"; font-size: 20px;'> 
فایلی که ینسن در ادامه استفاده کرده است
فایل assets.h5
، در گیت هاب خود او نیز موجود نبود. 
بنابراین کدهای بخش بعدی، کامنت شده‌اند.

## Get returns

In [None]:
#DATA_FOLDER = joinpath("..", "data")

In [None]:
"""
with pd.HDFStore(DATA_FOLDER / "assets.h5") as store:
    prices = store["quandl/wiki/prices"].adj_close
"""

In [None]:
"""
sec = pd.read_csv(joinpath(sec_path, "filing_index.csv")).rename(columns=str.lower)
sec.date_filed = pd.to_datetime(sec.date_filed)
"""

In [None]:
#sec.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 22631 entries, 0 to 22630
Data columns (total 11 columns):
 #   Column        Non-Null Count  Dtype         
---  ------        --------------  -----         
 0   cik           22631 non-null  int64         
 1   company_name  22631 non-null  object        
 2   form_type     22631 non-null  object        
 3   date_filed    22631 non-null  datetime64[ns]
 4   edgar_link    22631 non-null  object        
 5   quarter       22631 non-null  int64         
 6   ticker        22631 non-null  object        
 7   sic           22461 non-null  object        
 8   exchange      20619 non-null  object        
 9   hits          22555 non-null  object        
 10  year          22631 non-null  int64         
dtypes: datetime64[ns](1), int64(3), object(7)
memory usage: 1.9+ MB


In [None]:
#idx = pd.IndexSlice

In [None]:
"""
first = sec.date_filed.min() + relativedelta(months=-1)
last = sec.date_filed.max() + relativedelta(months=1)
prices = (prices
          .loc[idx[first:last, :]]
          .unstack().resample("D")
          .ffill()
          .dropna(how="all", axis=1)
          .filter(sec.ticker.unique()))
"""

In [None]:
"""
sec = sec.loc[sec.ticker.isin(prices.columns), ["ticker", "date_filed"]]

price_data = []
for ticker, date in sec.values.tolist():
    target = date + relativedelta(months=1)
    s = prices.loc[date: target, ticker]
    price_data.append(s.iloc[-1] / s.iloc[0] - 1)

df = pd.DataFrame(price_data,
                  columns=["returns"],
                  index=sec.index)
"""

In [None]:
#df.returns.describe()       

count    11101.000000
mean         0.022839
std          0.126137
min         -0.555556
25%         -0.032213
50%          0.017349
75%          0.067330
max          1.928826
Name: returns, dtype: float64

In [None]:
#sec["returns"] = price_data
#sec.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 11375 entries, 0 to 22629
Data columns (total 3 columns):
 #   Column      Non-Null Count  Dtype         
---  ------      --------------  -----         
 0   ticker      11375 non-null  object        
 1   date_filed  11375 non-null  datetime64[ns]
 2   returns     11101 non-null  float64       
dtypes: datetime64[ns](1), float64(1), object(1)
memory usage: 355.5+ KB


In [None]:
#sec.dropna().to_csv(sec_path / "sec_returns.csv", index=False)