# Data Cleaning ALL 1 Minute Files From FirstRate Data
Previously, we had a cleaning process where Hayes started with the entire set of FRD csv files, converted them into parquet files, uploaded them onto Google Drive, and then filtered them down based on a certain set of criteria. I then continued filtering that down even further on Google Drive (through Google Colab), and I ended up with only the parquet day files that had 1) data starting from 8:00am at the latest, and 2) a PM volume of at least 100,000 shares traded. This limited the number of stocks significantly, and we were planning on testing that with the BHOD python backtester I developed.

However, we switched gears and started focusing on 1) forward testing through a TWS API bot, and 2) developing ML-based strategies. As of April 2025, Hayes has continued working on the bot, and while I spent the last month working on my own bot as well, I'm dedicating these notebooks to testing ML strategies that do not take in our pre-conceived notions of a "Stock in Play" or what might be considered a good entry or exit. Therefore, I'm going to re-download and re-clean all of the files from FirstRate Data, just so we have as much data as possible, and then I'll likely limit the amount of data that we use to a certain subset. More specifically, the plan is to only use data after the market returned to normal after the COVID pandemic (i.e., starting ~October 2020) and build our models based off of future time periods.

## Imports

In [171]:
import pandas as pd
import numpy as np
import yfinance as yf
from datetime import date, datetime
import matplotlib.pyplot as plt
import seaborn as sns
from mlxtend.plotting import heatmap
import dask
import dask.dataframe as dd
from dask import delayed
from pyarrow.parquet import ParquetFile
import pyarrow as pa
from tqdm import tqdm

import tulipy as ti

import sklearn
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.decomposition import KernelPCA
from sklearn.decomposition import IncrementalPCA

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout
from tensorflow.keras import initializers
from tensorflow.keras.models import load_model
from tensorflow.keras.utils import register_keras_serializable
from tensorflow.keras.optimizers import SGD
import keras_tuner as kt
from keras_tuner import HyperParameters

import os
import sys
import warnings
import shutil

## Downloading process for csv files
One problem that I'm running into is deciding which tickers to actually include in my dataset. More specifically, in the current shared Google Drive, we have the raw 1-min csv files from FRD, and we also have a filtered folder after filtering for an average volume of at least 500,000, a 14-day ATR of at least \\$0.50, an Open price of at least \\$10, and a market cap of at least 50M (as of 1/20/25).

Here's the thing: I'm conflicted on whether to use that dataset (since some of the day data will likely be missing and I need ALL the sequential day data), or try re-creating it on my own (which will be quite cumbersome). Furthermore, I'm not sure which criteria I want to include in the first place. For example, while I believe I'm only going to trade stocks that are at least \\$10, that's a criteria I set for myself when I was day trading. Theoretically, with an ML model, there's no reason why I can't trade stocks below \\$10 if they can potentially be profitable. On the other hand, I really do think I should keep the 50M market cap condition, since I don't want to be trading penny stocks.

After some consideration, I believe I'll recreate the data on my own from the raw 1-min FRD csv files. However, I maintain that I don't want to be trading penny stocks. Therefore, I'm going to do a few checks:
1. I'm going to take each ticker in the filtered-parquet files and see if they're included in my 475 S&P500 tickers from my DATASCI 207 ML class. If most of them are there, I can be rest assured that I have at least most of the tickers I want.
   - The reason I'm checking this is because even though I only want the 50M market cap condition, the filtered-parquet files are also filtered based on the other 3 criteria I mentioned above. Therefore, even though all the tickers in the filtered-parquet folder meet the 50M market cap condition, there might be some that are missing based on the other criteria.
2. If I'm comfortable with the tickers that are in the folder, I will take that ticker list and only create the separated csv (and then parquet) files that are on that list.
   - So the difference now is that I'm not checking for every day's volume, ATR, or price, and I will include **all** the available data from the stocks on my list. Yes, some of those tickers may have been filtered out when Hayes was creating the filtered-parquet folders initially, but the hope is that there's at least one day that exists for the >50M market cap stocks that meet the other criteria as well, which then means I can include that ticker's entire history.

## Checking which tickers from the 475 S&P500 tickers from DATASCI 207 exist in the filtered-parquet folder
I've downloaded the list of the tickers (3148) that Hayes has filtered out previously (from the filtered-parquet folder on the Google Drive). We can now compare that to the tickers in the S&P500 list that we downloaded and see which ones match (the more the better).

In [3]:
# Getting the paths for the filtered tickers and the S&P500 data
filtered_parquet_tickers_path = '/Users/danielwang/Desktop/Work stuff/Coding Stuff/Day Trading Stuff/Trading-Strategies-With-ML/Archive/all_tickers_in_filtered_parquet.csv'
sp500_path = '/Users/danielwang/Desktop/Berkeley MIDS Stuff/Berkeley MIDS Spring 2025 Stuff/Berkeley MIDS DATASCI 207/datasci-207-final-project/SP500_updated_data_2025_03_26.csv'

In [20]:
# Getting tickers from the filtered_parquet_tickers_path
filtered_parquet_tickers = pd.read_csv(filtered_parquet_tickers_path, index_col='Unnamed: 0')
filtered_parquet_tickers = list(filtered_parquet_tickers.iloc[:, 0].unique())

# Getting just the tickers from the sp500 df
sp500_df = pd.read_csv(sp500_path)
sp500_tickers = list(sp500_df['Ticker'].unique())

In [43]:
# Initializing list to add missing tickers to
missing_tickers_in_sp500 = []

# Checking which tickers are in the filtered-parquet
print("Tickers in the filtered-parquet folder that are missing from the 475 S&P500 list:")
for sp500_ticker in sp500_tickers:
    if sp500_ticker not in filtered_parquet_tickers:
        print(" -", sp500_ticker)
        missing_tickers_in_sp500.append(sp500_ticker)

Tickers in the filtered-parquet folder that are missing from the 475 S&P500 list:
 - AMCR
 - BDX
 - CB
 - CHD
 - CME
 - CSX
 - CVS
 - DPZ
 - ETN
 - FIS
 - GPN
 - HD
 - HSIC
 - IP
 - IPG
 - JNJ
 - K
 - MRK
 - NVR
 - POOL
 - PPG
 - SBUX
 - SO
 - TAP
 - WY


While we have most of the stocks from S&P500, there are definitely some notable ones missing, such as CVS, JNJ, K, MRK, or SBUX. Therefore, we'll add all the stocks that are missing from our S&P500 list onto our final list.

## Obtaining final list of tickers that we want in our dataset
For the most part, these are stocks that have a market cap of at least 50M, and we've included all stocks in the S&P500 as well (as of April 6, 2025). There is also some slight cleaning up that we need to do, and we'll do that here as well.

In [125]:
# Adding the missing tickers in the S&P500 to the full list of tickers
full_ticker_list = filtered_parquet_tickers + missing_tickers_in_sp500

# Converting list back into df and sorting
full_ticker_df = pd.DataFrame(full_ticker_list)
full_ticker_df = full_ticker_df.sort_values(by=0).reset_index(drop=True)
full_ticker_df = full_ticker_df.iloc[:-2]  # Dropping 'ls' and 'NaN'
full_ticker_df = full_ticker_df.rename(columns={0: 'Ticker'})
full_ticker_df

Unnamed: 0,Ticker
0,A
1,AA
2,AADI
3,AAL
4,AAOI
...,...
3166,ZURA
3167,ZVRA
3168,ZWS
3169,ZYME


In the end, we see that we get 3171 tickers in our final list. While this will likely decrease some more (after we remove tickers that don't have either sequential data or enough data spanning from October 1, 2020 to the last date of the dataset), this is a good start to limiting the number of tickers we need to analyze.

# Getting Fully Cleaned Data from FRD
The previous steps have all been to figure out which stocks I wanted to keep in my final dataset, and now I'm ready to do the cleaning process. In fact, if I purchase more FRD data in the future, or even if I get data from other sources that come in this format (i.e., large csv files for every ticker), I'll be able to go through this process the exact same way.

Here is everything I will be doing to obtain the final cleaned dataset:
1. Get a final list of tickers that I want in my dataset and keep only the data from those stocks. Some criteria include the following:
   - Market cap of at least 50M
   - All stocks in the S&P500
2. Separate out each stock into individual csv files with one day of 1-minute data in each.
3. Choose a start date that I want data for, and filter the data based on that start date.
   - This date will likely be sometime around October 1, 2020, as that's when the market recovered from COVID.
4. Convert the remaining csv files into parquet files.
5. Ensure that each stock in our dataset has continuous data for every single trading day available since our chosen start date.
   - Assuming that there aren't too many stocks that don't have every single trading day (e.g., BIIB), filter out those stocks and keep only the ones with all the data.
6. Clean the data by doing the following*:
   - Start by forward-filling all missing data so that we have a continuous stream of data. All OHLC prices will be equal to the Close price of the most recent candlestick, and their volumes will be set to 0.
   - Backfill any data to 4:00am (if data doesn't exist) based on the earliest PM 1-min candlestick. All OHLC prices will be equal to the first Open price in the PM, and their volumes will be set to 0.
   - Forward-fill any data to 7:59pm (if data doesn't exist) based on the last RTH (regular trading hours) or EHT (extended-hours trading) candlestick. All OHLC prices will be equal to the Close price of the last candlestick, and their volumes will be set to 0.
7. Data is ready for use, and we can now take what we want and feed it into our models.
   - For example, we may only want data from 8:00am to 3:59pm, or we may only want stocks in certain sectors. With this cleaned dataset, we can easily accommodate both requests, and we'll only need to do some slight additional cleaning to get what we want.

**This technically introduces price data that never existed, which may cause some problems with the modeling. However, I believe the benefits outweigh this drawback for the following reasons: 1) We'll have consistent data for our model, which allows us to generate same-length sequences, making our modeling (and QA) tasks much easier, and 2) The 0-volume candlesticks might give our model more information to work from, and I'm hoping it can actually learn something from that.*

## 1. Filtering for only the tickers that are in the final list
I've downloaded all the FRD 1-min data onto the SSD, and now I'll begin process of creating a new folder and only transferring over the tickers in my final list to that folder.

In [126]:
# Getting the path to the full FRD tickers and initializing its list of tickers
full_FRD_tickers_path = '/Volumes/T7/Original_FRD_Data_Till_Dec_2024/unzipped-FRD-csv-full'
full_FRD_tickers_list = []

# Going through all the tickers and keeping the ones in the final list
for ticker_path in os.listdir(full_FRD_tickers_path):
    ticker = ticker_path.split('_')[0]
    full_FRD_tickers_list.append(ticker)

len(full_FRD_tickers_list)

7606

In [128]:
# Checking to see if all the tickers in FRD are also in the final list
print("Tickers from final list that don't appear in the FRD data:")
for ticker in full_ticker_df['Ticker']:
    if ticker not in full_FRD_tickers_list:
        print(" -", ticker)

Tickers from final list that don't appear in the FRD data:


Therefore, it does appear that every single ticker from our final list appears in the FRD data. Now that we've confirmed this, I can move on to creating a folder with just those tickers.

In [139]:
# Copying only the tickers in my final list to a new folder
source_dir = "/Volumes/T7/Original_FRD_Data_Till_Dec_2024/unzipped-FRD-csv-full"
target_dir = "/Volumes/T7/FRD-High-Market-Cap-Tickers"

# Making sure the target directory exists
os.makedirs(target_dir, exist_ok=True)

# Getting the list of tickers from the DataFrame
tickers_set = set(full_ticker_df['Ticker'].astype(str))

# Looping through files in the source directory
for file in tqdm(os.listdir(source_dir)):
    ticker = file.split('_')[0]  # Getting the ticker name
    
    if ticker in tickers_set:
        src_path = os.path.join(source_dir, file)
        dst_path = os.path.join(target_dir, file)
        shutil.copy(src_path, dst_path)

100%|███████████████████████████████████████| 7606/7606 [07:57<00:00, 15.94it/s]


In [140]:
# Confirming that that we have the same number of tickers in our new folder as we did in our final list
len(os.listdir(target_dir))

3171

We do indeed have the same number of tickers, which means that we can move onto the next step.

## 2. Splitting the full csv files for every ticker into individual csv files for each day
Now that I have my folder with only the csv files that match the tickers in my final list, I can begin the process of creating new folders for every ticker and splitting the combined csv file into individual csv files for each day.

In [168]:
# Setting the input and output directory
base_dir = "/Volumes/T7/FRD-High-Market-Cap-Tickers"

# Getting a list of all _full_1min_UNADJUSTED.csv files
csv_files = [f for f in os.listdir(base_dir) if f.endswith("_full_1min_UNADJUSTED.csv")]

for filename in tqdm(csv_files):
    ticker = filename.split('_')[0]  # Getting just the ticker name
    
    # Setting the paths
    csv_path = os.path.join(base_dir, filename)
    output_folder = os.path.join(base_dir, f"{ticker}_1min_csv")
    os.makedirs(output_folder, exist_ok=True)
    
    # Loading the CSV and parsing the datetime
    df = pd.read_csv(csv_path)
    df['Datetime'] = pd.to_datetime(df['Datetime'])
    
    # Grouping by date
    df['Date'] = df['Datetime'].dt.date
    grouped = df.groupby('Date')
    
    # Writing one file per date
    for date, group in grouped:
        date_str = date.strftime("%Y-%m-%d")
        output_filename = f"{date_str}_{ticker}_1min.csv"
        output_path = os.path.join(output_folder, output_filename)
        group.drop(columns=['Date']).to_csv(output_path, index=False)

100%|████████████████████████████████████| 3171/3171 [12:47:02<00:00, 14.51s/it]


## 3. Filtering the stocks for only the dates that are on or after 10/1/20
In hindsight, I definitely should have done this with the previous step after I grouped by date time, but alas, it's too late now.

In [175]:
# Defining base directory
base_dir = "/Volumes/T7/FRD-High-Market-Cap-Tickers"
cutoff_date = datetime.strptime("2020-10-01", "%Y-%m-%d")

# Looping through all ticker folders ending with "_1min_csv"
for folder_name in tqdm(os.listdir(base_dir)):
    folder_path = os.path.join(base_dir, folder_name)
    
    if os.path.isdir(folder_path) and folder_name.endswith("_1min_csv"):
        for file_name in os.listdir(folder_path):
            if file_name.endswith("_1min.csv"):
                try:
                    # Extracting the date part from the filename
                    date_str = file_name.split('_')[0]
                    file_date = datetime.strptime(date_str, "%Y-%m-%d")
                    
                    # Deleting the file if it's before the cutoff date
                    if file_date < cutoff_date:
                        os.remove(os.path.join(folder_path, file_name))
                except Exception as e:
                    print(f"Skipping {file_name} due to error: {e}")

100%|█████████████████████████████████████| 3171/3171 [1:06:53<00:00,  1.27s/it]


In [181]:
# Confirming that we have no dates before 2020-10-01
test_list = os.listdir(base_dir)
test_list.sort()
cutoff_date = datetime.strptime("2020-10-01", "%Y-%m-%d")

for folder in tqdm(test_list):
    folder_path = os.path.join(base_dir, folder_name)

    for file_name in os.listdir(folder_path):
        date_str = file_name.split('_')[0]
        file_date = datetime.strptime(date_str, "%Y-%m-%d")

        if file_date < cutoff_date:
            print("Previous date still exists:", folder, file_name, file_date)

100%|██████████████████████████████████████| 3171/3171 [00:28<00:00, 111.93it/s]


Now that we have filtered our data, we can convert all the csv files into parquet files.

## 4. Converting all filtered FRD csv files to parquet files
With the csv files now filtered, I can convert all those files into parquet format. Again, in hindsight, I probably should've just converted the files directly to parquet, but this will work too.

In [200]:
AAL_path = "/Volumes/T7/FRD-High-Market-Cap-Tickers/AAL_1min_csv/2020-10-01_AAL_1min.csv"
AAL_df = pd.read_csv(AAL_path)
AAL_df['Datetime'] = pd.to_datetime(AAL_df['Datetime'])
AAL_df = AAL_df.set_index('Datetime')

AAL_df

Unnamed: 0_level_0,Open,High,Low,Close,Volume,Ticker
Datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2020-10-01 04:00:00,12.44,12.44,12.37,12.37,4940,AAL
2020-10-01 04:01:00,12.38,12.38,12.38,12.38,780,AAL
2020-10-01 04:02:00,12.36,12.36,12.30,12.30,4476,AAL
2020-10-01 04:07:00,12.38,12.40,12.38,12.40,879,AAL
2020-10-01 04:08:00,12.44,12.44,12.44,12.44,2800,AAL
...,...,...,...,...,...,...
2020-10-01 19:55:00,12.48,12.49,12.48,12.49,3220,AAL
2020-10-01 19:56:00,12.49,12.50,12.49,12.50,10750,AAL
2020-10-01 19:57:00,12.50,12.50,12.49,12.49,13350,AAL
2020-10-01 19:58:00,12.49,12.50,12.47,12.47,3699,AAL


In [201]:
# Source and target root directories
csv_root = "/Volumes/T7/FRD-High-Market-Cap-Tickers"
parquet_root = "/Volumes/T7/Filtered-Cleaned-Parquet-ML"

# Ensuring the destination root folder exists
os.makedirs(parquet_root, exist_ok=True)

# Looping through all folders ending with _1min_csv
for folder_name in tqdm(os.listdir(csv_root)):
    if folder_name.endswith("_1min_csv"):
        ticker = folder_name.split('_')[0]
        source_folder = os.path.join(csv_root, folder_name)
        dest_folder = os.path.join(parquet_root, f"{ticker}_1min_parquet")
        
        # Creating destination folder
        os.makedirs(dest_folder, exist_ok=True)
        
        # Looping through CSV files
        for file_name in os.listdir(source_folder):
            if file_name.endswith(".csv"):
                csv_path = os.path.join(source_folder, file_name)
                
                # Reading CSV
                try:
                    df = pd.read_csv(csv_path)
                    
                    # Converting 'Datetime' column to datetime and setting as index
                    df['Datetime'] = pd.to_datetime(df['Datetime'])
                    df = df.set_index('Datetime')
                    
                    # Saving to Parquet
                    parquet_file_name = file_name.replace(".csv", ".parquet")
                    parquet_path = os.path.join(dest_folder, parquet_file_name)
                    df.to_parquet(parquet_path, index=True)
                except Exception as e:
                    print(f"Error processing {csv_path}: {e}")

100%|█████████████████████████████████████| 3171/3171 [4:29:30<00:00,  5.10s/it]


## 5. Filtering only for stocks that have continuous data for all trading day since our chosen start date
With the list that we have, there may be some stocks that have disappeared for one reason or another, and there may be others that popped up only recently. In either case, we'll remove those stocks since they don't have data for our entire time period.

In [212]:
# Analyzing BIIB on 6/9/23 because we know from previous experiments that it's missing data:
BIIB_parquet_root = "/Volumes/T7/Filtered-Cleaned-Parquet-ML/BIIB_1min_parquet/2023-06-09_BIIB_1min.parquet"
pd.read_parquet(BIIB_parquet_root)

Unnamed: 0_level_0,Open,High,Low,Close,Volume,Ticker
Datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2023-06-09 19:05:00,325.0,330.0,325.0,330.0,716,BIIB
2023-06-09 19:14:00,334.0,334.0,334.0,334.0,100,BIIB
2023-06-09 19:22:00,334.58,334.58,334.58,334.58,130,BIIB
2023-06-09 19:25:00,330.0,330.0,330.0,330.0,100,BIIB
2023-06-09 19:31:00,332.8,332.8,332.8,332.8,100,BIIB
2023-06-09 19:42:00,330.0,330.0,330.0,330.0,100,BIIB
2023-06-09 19:46:00,329.0,329.0,329.0,329.0,500,BIIB
2023-06-09 19:49:00,328.94,329.0,328.94,329.0,296,BIIB
2023-06-09 19:50:00,329.0,329.0,329.0,329.0,600,BIIB
2023-06-09 19:51:00,328.99,329.0,328.99,329.0,500,BIIB


In actuality, we see that no trading occurred for BIIB during RTH, but there appears to have been some trading during the extended hours. While I personally still want to remove this stock as I do not believe it will add much value, there's something to be said about including BIIB in our data and having our model learn from this particular day (0 volume). Also, there may be other stocks that have this problem, and unless we want to filter all those stocks out as well (which might dramatically reduce our data size), we might as well just keep this day in our data and backfill the other candlesticks, but just keeping the volume at 0.

In [214]:
# Checking how many days of data there are in total
parquet_root = "/Volumes/T7/Filtered-Cleaned-Parquet-ML"
days_of_data = {}

for folder_name in tqdm(os.listdir(parquet_root)):
    folder_path = os.path.join(parquet_root, folder_name)
    ticker_name = folder_name.split('_')[0]
    days_of_data[ticker_name] = len(os.listdir(folder_path))

100%|██████████████████████████████████████| 3171/3171 [00:28<00:00, 110.59it/s]


In [220]:
# Finding the max number of days in the data
num_days_total = max(days_of_data.values())

num_days_total

1055

In [249]:
# Getting the tickers that have fewer than 1055 days
missing_days_data = pd.Series(days_of_data)
missing_days_data = missing_days_data[missing_days_data < num_days_total]
missing_days_data = missing_days_data.sort_values().sort_values()

for ticker, num in zip(missing_days_data.index, missing_days_data):
    print(ticker, num)

SARO 25
SNRE 26
ZJK 27
AMTM 27
CURB 29
ECG 31
KLC 44
SOBO 45
CON 97
LTM 97
LINE 97
OS 98
ARDT 102
TLN 108
SDST 109
LB 115
WBTN 116
TEM 124
SILA 125
GRAL 126
WAY 129
LIF 130
OKLO 148
ZK 148
SVCO 149
NNE 150
VIK 155
RBRK 159
LOAR 159
MRX 159
SERV 164
IBTA 164
CTRI 164
ULS 168
PACS 169
GEV 179
SOLV 180
DJTWW 180
DXYZ 180
RDDT 183
ALAB 184
SOC 207
UMAC 208
TBBB 211
KYTX 212
AHR 213
ANRO 216
AS 217
FLUT 220
AVBP 221
BTSG 221
CGON 222
KSPI 226
LLYVK 238
AISP 243
DEC 246
WS 259
HG 272
CRGX 272
NLOP 276
LXEO 277
ATGL 278
AIRE 286
NATL 294
BIRK 294
KLG 304
VLTO 304
VSTS 304
KVYO 309
CART 310
NMRA 312
ARM 313
TKO 315
VFS 334
DBD 335
FTEL 339
SN 345
ODD 353
APGE 356
SGMT 356
SVV 366
KGS 366
FIHL 366
PHIN 367
FTRE 372
CAVA 375
INDV 378
NPWR 379
KNF 385
ATMU 388
SLRN 403
KVUE 404
MSGE 413
BTDR 418
PKST 418
CHSN 418
CXT 425
ZURA 435
AESI 443
BMR 450
LUNR 459
HSAI 462
NXT 462
GPCR 466
GNLX 472
VTS 479
SKWD 480
ATLX 483
GEHC 499
MBC 503
FG 509
DRS 511
NAMS 514
ACRV 520
NWTN 521
ATAT 522
RXO 533
MBLY 5

After scanning through every single ticker, there does not seem to be any noticeable ones that we're missing. Some tickers, like ARM, SERV, or OKLO, may be popular now, but they are too new to be included in the full list. Others, like RDDT, CAVA, CART, BIRD, HOOD, DNUT, WOOF, or AMLX, might have been around for awhile, but only went public recently, so we don't have all the data for those tickers either. There are, however, definitely a few notable stocks in this list that have both been around for and publicly traded for longer than the cutoff date (e.g., ATAT, CLSK), but I'll assume that something happened with those stocks between October 1, 2020 and the end date, and I'll leave them out of the final dataset.

In [253]:
# Removing all tickers with fewer than the max number of files (i.e., tickers that don't have continuous day data)
parquet_root = "/Volumes/T7/Filtered-Cleaned-Parquet-ML"
tickers_removed = []

# Looping through all folders in the parquet root
for folder_name in tqdm(os.listdir(parquet_root)):
    folder_path = os.path.join(parquet_root, folder_name)
    
    if os.path.isdir(folder_path) and folder_name.endswith("_1min_parquet"):
        num_files = len([f for f in os.listdir(folder_path) if f.endswith(".parquet")])
        
        if num_files < num_days_total:
            tickers_removed.append(folder_name.split('_')[0])
            print(f"Deleting {folder_name} — only {num_files} files found.")
            shutil.rmtree(folder_path)

  0%|                                          | 1/3171 [00:00<08:32,  6.19it/s]

Deleting ATGL_1min_parquet — only 278 files found.
Deleting ATLX_1min_parquet — only 483 files found.


  0%|                                          | 3/3171 [00:00<08:14,  6.41it/s]

Deleting AS_1min_parquet — only 217 files found.
Deleting ATAT_1min_parquet — only 522 files found.


  0%|                                          | 4/3171 [00:00<10:04,  5.24it/s]

Deleting ATAI_1min_parquet — only 876 files found.


  0%|                                          | 7/3171 [00:01<08:51,  5.95it/s]

Deleting ASTS_1min_parquet — only 1050 files found.


  0%|                                          | 9/3171 [00:01<10:44,  4.91it/s]

Deleting ASTL_1min_parquet — only 790 files found.


  0%|▏                                        | 11/3171 [00:02<10:25,  5.05it/s]

Deleting ASO_1min_parquet — only 1054 files found.


  1%|▎                                        | 27/3171 [00:02<03:17, 15.96it/s]

Deleting ARRY_1min_parquet — only 1045 files found.
Deleting ARQQ_1min_parquet — only 821 files found.


  1%|▍                                        | 31/3171 [00:03<05:41,  9.19it/s]

Deleting ARL_1min_parquet — only 1039 files found.


  1%|▌                                        | 39/3171 [00:04<04:50, 10.77it/s]

Deleting ARM_1min_parquet — only 313 files found.
Deleting ARIS_1min_parquet — only 788 files found.
Deleting ARDT_1min_parquet — only 102 files found.
Deleting ARHS_1min_parquet — only 779 files found.


  1%|▌                                        | 42/3171 [00:05<07:02,  7.40it/s]

Deleting ARBE_1min_parquet — only 798 files found.


  1%|▌                                        | 44/3171 [00:05<07:34,  6.89it/s]

Deleting AMTM_1min_parquet — only 27 files found.
Deleting AMBI_1min_parquet — only 975 files found.


  2%|▋                                        | 54/3171 [00:06<05:05, 10.22it/s]

Deleting ALVO_1min_parquet — only 625 files found.
Deleting ACT_1min_parquet — only 749 files found.


  2%|▋                                        | 56/3171 [00:06<06:41,  7.77it/s]

Deleting ACDC_1min_parquet — only 648 files found.


  2%|▋                                        | 58/3171 [00:07<06:54,  7.52it/s]

Deleting CMPO_1min_parquet — only 995 files found.


  2%|▊                                        | 60/3171 [00:07<08:04,  6.42it/s]

Deleting CIFR_1min_parquet — only 997 files found.


  2%|▊                                        | 61/3171 [00:08<10:18,  5.03it/s]

Deleting BYRN_1min_parquet — only 907 files found.


  2%|▊                                        | 64/3171 [00:08<09:37,  5.38it/s]

Deleting BZ_1min_parquet — only 881 files found.


  2%|▊                                        | 67/3171 [00:09<08:59,  5.75it/s]

Deleting EBC_1min_parquet — only 1045 files found.


  2%|▉                                        | 70/3171 [00:09<09:09,  5.64it/s]

Deleting EAF_1min_parquet — only 990 files found.


  3%|█▏                                       | 95/3171 [00:10<02:07, 24.12it/s]

Deleting APGE_1min_parquet — only 356 files found.
Deleting APLD_1min_parquet — only 1015 files found.


  3%|█▎                                      | 101/3171 [00:11<03:04, 16.67it/s]

Deleting APP_1min_parquet — only 921 files found.


  3%|█▍                                      | 110/3171 [00:11<03:01, 16.90it/s]

Deleting ANRO_1min_parquet — only 216 files found.
Deleting APG_1min_parquet — only 990 files found.


  4%|█▍                                      | 114/3171 [00:12<03:34, 14.28it/s]

Deleting AMPX_1min_parquet — only 563 files found.


  4%|█▌                                      | 128/3171 [00:12<02:51, 17.77it/s]

Deleting AESI_1min_parquet — only 443 files found.
Deleting ACRV_1min_parquet — only 520 files found.
Deleting ADCT_1min_parquet — only 990 files found.


  5%|█▊                                      | 143/3171 [00:13<02:29, 20.22it/s]

Deleting ALHC_1min_parquet — only 934 files found.


  5%|█▊                                      | 148/3171 [00:14<03:05, 16.26it/s]

Deleting ALGM_1min_parquet — only 1033 files found.


  5%|█▉                                      | 156/3171 [00:14<03:34, 14.08it/s]

Deleting AISP_1min_parquet — only 243 files found.
Deleting AI_1min_parquet — only 1007 files found.


  5%|██                                      | 159/3171 [00:15<04:37, 10.85it/s]

Deleting AIRT_1min_parquet — only 1021 files found.


  5%|██                                      | 162/3171 [00:16<05:54,  8.48it/s]

Deleting AIRE_1min_parquet — only 286 files found.
Deleting AMLX_1min_parquet — only 734 files found.


  5%|██▏                                     | 171/3171 [00:16<04:16, 11.69it/s]

Deleting AMPL_1min_parquet — only 806 files found.
Deleting AMPS_1min_parquet — only 970 files found.


  6%|██▎                                     | 185/3171 [00:17<03:18, 15.02it/s]

Deleting AMBP_1min_parquet — only 1051 files found.


  6%|██▍                                     | 190/3171 [00:18<03:51, 12.87it/s]

Deleting ALTS_1min_parquet — only 1054 files found.


  6%|██▍                                     | 194/3171 [00:18<04:33, 10.90it/s]

Deleting ALT_1min_parquet — only 990 files found.


  6%|██▍                                     | 197/3171 [00:19<05:08,  9.63it/s]

Deleting ALKT_1min_parquet — only 922 files found.


  6%|██▌                                     | 203/3171 [00:19<04:43, 10.47it/s]

Deleting ALGS_1min_parquet — only 1044 files found.


  6%|██▌                                     | 205/3171 [00:20<06:30,  7.60it/s]

Deleting ALAB_1min_parquet — only 184 files found.
Deleting AHR_1min_parquet — only 213 files found.


  7%|██▌                                     | 207/3171 [00:20<05:53,  8.39it/s]

Deleting ACLX_1min_parquet — only 716 files found.


  7%|██▊                                     | 220/3171 [00:20<02:48, 17.48it/s]

Deleting ABNB_1min_parquet — only 1006 files found.


  7%|██▊                                     | 224/3171 [00:21<03:42, 13.24it/s]

Deleting BWIN_1min_parquet — only 1031 files found.


  7%|██▉                                     | 233/3171 [00:22<03:27, 14.13it/s]

Deleting BTSG_1min_parquet — only 221 files found.
Deleting BTDR_1min_parquet — only 418 files found.
Deleting DUOL_1min_parquet — only 849 files found.
Deleting DTM_1min_parquet — only 876 files found.


  7%|██▉                                     | 236/3171 [00:23<06:20,  7.71it/s]

Deleting AVIR_1min_parquet — only 1033 files found.


  8%|███                                     | 238/3171 [00:23<07:57,  6.14it/s]

Deleting AVBP_1min_parquet — only 221 files found.
Deleting AVAH_1min_parquet — only 911 files found.
Deleting DRTS_1min_parquet — only 695 files found.


  8%|███                                     | 240/3171 [00:24<10:13,  4.78it/s]

Deleting BRDG_1min_parquet — only 857 files found.


  8%|███▏                                    | 253/3171 [00:25<04:06, 11.85it/s]

Deleting AFRM_1min_parquet — only 984 files found.


  8%|███▎                                    | 263/3171 [00:25<03:28, 13.97it/s]

Deleting AGFY_1min_parquet — only 974 files found.
Deleting AGL_1min_parquet — only 921 files found.


  8%|███▎                                    | 267/3171 [00:26<05:02,  9.59it/s]

Deleting ACNT_1min_parquet — only 1052 files found.


  9%|███▍                                    | 275/3171 [00:27<04:32, 10.62it/s]

Deleting ACHR_1min_parquet — only 1000 files found.


  9%|███▊                                    | 298/3171 [00:27<02:04, 23.06it/s]

Deleting ACVA_1min_parquet — only 936 files found.


 10%|███▉                                    | 317/3171 [00:28<01:35, 29.81it/s]

Deleting CMTG_1min_parquet — only 780 files found.
Deleting ABCL_1min_parquet — only 1005 files found.


 11%|████▏                                   | 336/3171 [00:29<01:52, 25.15it/s]

Deleting CGEM_1min_parquet — only 987 files found.


 11%|████▎                                   | 343/3171 [00:30<02:13, 21.10it/s]

Deleting CLOV_1min_parquet — only 987 files found.
Deleting CLSK_1min_parquet — only 1054 files found.
Deleting CLBT_1min_parquet — only 825 files found.


 11%|████▌                                   | 360/3171 [00:31<02:47, 16.81it/s]

Deleting CIVI_1min_parquet — only 781 files found.


 12%|████▌                                   | 366/3171 [00:32<03:15, 14.31it/s]

Deleting CHSN_1min_parquet — only 418 files found.


 12%|████▋                                   | 371/3171 [00:32<02:52, 16.19it/s]

Deleting CGON_1min_parquet — only 222 files found.
Deleting CHRD_1min_parquet — only 1019 files found.


 12%|████▉                                   | 388/3171 [00:33<02:05, 22.19it/s]

Deleting CGNT_1min_parquet — only 971 files found.
Deleting CFLT_1min_parquet — only 872 files found.


 12%|████▉                                   | 394/3171 [00:34<03:27, 13.36it/s]

Deleting CDRE_1min_parquet — only 779 files found.
Deleting CERT_1min_parquet — only 1005 files found.
Deleting CCCS_1min_parquet — only 846 files found.


 13%|█████                                   | 399/3171 [00:35<06:14,  7.39it/s]

Deleting CELC_1min_parquet — only 1054 files found.


 13%|█████                                   | 406/3171 [00:36<05:34,  8.27it/s]

Deleting CEG_1min_parquet — only 698 files found.


 13%|█████▏                                  | 409/3171 [00:36<05:40,  8.12it/s]

Deleting CCSI_1min_parquet — only 804 files found.


 13%|█████▎                                  | 421/3171 [00:37<03:55, 11.66it/s]

Deleting CCCC_1min_parquet — only 1054 files found.


 14%|█████▍                                  | 433/3171 [00:38<03:22, 13.54it/s]

Deleting CART_1min_parquet — only 310 files found.
Deleting CAVA_1min_parquet — only 375 files found.


 14%|█████▍                                  | 435/3171 [00:38<03:32, 12.86it/s]

Deleting BXSL_1min_parquet — only 784 files found.


 14%|█████▌                                  | 441/3171 [00:38<03:31, 12.90it/s]

Deleting BVS_1min_parquet — only 964 files found.


 14%|█████▌                                  | 443/3171 [00:39<04:36,  9.87it/s]

Deleting AXR_1min_parquet — only 1049 files found.


 15%|█████▊                                  | 461/3171 [00:39<02:31, 17.87it/s]

Deleting ECG_1min_parquet — only 31 files found.
Deleting DV_1min_parquet — only 917 files found.
Deleting DTC_1min_parquet — only 784 files found.


 15%|█████▊                                  | 465/3171 [00:40<04:04, 11.06it/s]

Deleting DRVN_1min_parquet — only 982 files found.
Deleting DOLE_1min_parquet — only 847 files found.
Deleting DNA_1min_parquet — only 919 files found.


 15%|█████▉                                  | 468/3171 [00:42<07:10,  6.28it/s]

Deleting FHTX_1min_parquet — only 1033 files found.
Deleting COIN_1min_parquet — only 922 files found.


 15%|█████▉                                  | 470/3171 [00:43<09:16,  4.85it/s]

Deleting COCO_1min_parquet — only 789 files found.


 15%|█████▉                                  | 472/3171 [00:43<09:11,  4.89it/s]

Deleting IONQ_1min_parquet — only 991 files found.


 15%|██████                                  | 476/3171 [00:44<08:15,  5.44it/s]

Deleting DEC_1min_parquet — only 246 files found.
Deleting BLND_1min_parquet — only 857 files found.


 15%|██████                                  | 480/3171 [00:44<07:05,  6.33it/s]

Deleting FBRT_1min_parquet — only 791 files found.


 15%|██████▏                                 | 487/3171 [00:45<05:05,  8.79it/s]

Deleting FATBB_1min_parquet — only 659 files found.


 16%|██████▏                                 | 494/3171 [00:45<03:54, 11.43it/s]

Deleting FA_1min_parquet — only 873 files found.


 16%|██████▎                                 | 496/3171 [00:46<04:37,  9.64it/s]

Deleting IMNM_1min_parquet — only 1053 files found.
Deleting BLCO_1min_parquet — only 653 files found.


 16%|██████▍                                 | 510/3171 [00:46<03:00, 14.72it/s]

Deleting CADL_1min_parquet — only 850 files found.


 16%|██████▌                                 | 518/3171 [00:47<02:50, 15.58it/s]

Deleting CAMP_1min_parquet — only 971 files found.


 17%|██████▋                                 | 535/3171 [00:47<01:53, 23.31it/s]

Deleting EDR_1min_parquet — only 911 files found.


 17%|██████▉                                 | 549/3171 [00:48<01:46, 24.58it/s]

Deleting DXYZ_1min_parquet — only 180 files found.
Deleting ECAT_1min_parquet — only 806 files found.


 17%|██████▉                                 | 554/3171 [00:49<02:09, 20.20it/s]

Deleting AUR_1min_parquet — only 904 files found.


 18%|███████▏                                | 569/3171 [00:49<01:40, 25.83it/s]

Deleting BRCC_1min_parquet — only 750 files found.


 18%|███████▎                                | 581/3171 [00:50<01:52, 22.98it/s]

Deleting FIHL_1min_parquet — only 366 files found.
Deleting DOCS_1min_parquet — only 872 files found.


 19%|███████▍                                | 587/3171 [00:50<02:16, 18.95it/s]

Deleting ATMU_1min_parquet — only 388 files found.
Deleting FIGS_1min_parquet — only 891 files found.


 19%|███████▍                                | 590/3171 [00:51<03:20, 12.85it/s]

Deleting DJTWW_1min_parquet — only 180 files found.
Deleting DJT_1min_parquet — only 804 files found.
Deleting BNR_1min_parquet — only 1045 files found.


 19%|███████▍                                | 593/3171 [00:52<05:12,  8.25it/s]

Deleting FERG_1min_parquet — only 948 files found.


 19%|███████▋                                | 607/3171 [00:52<02:43, 15.65it/s]

Deleting AVDX_1min_parquet — only 795 files found.


 19%|███████▋                                | 613/3171 [00:53<02:55, 14.57it/s]

Deleting DSP_1min_parquet — only 965 files found.


 19%|███████▊                                | 617/3171 [00:53<03:23, 12.55it/s]

Deleting DRUG_1min_parquet — only 776 files found.


 20%|███████▊                                | 620/3171 [00:54<03:47, 11.22it/s]

Deleting BRZE_1min_parquet — only 770 files found.
Deleting DSGN_1min_parquet — only 934 files found.


 20%|███████▊                                | 623/3171 [00:54<05:26,  7.81it/s]

Deleting AUID_1min_parquet — only 855 files found.


 20%|███████▉                                | 625/3171 [00:55<05:58,  7.10it/s]

Deleting DRS_1min_parquet — only 511 files found.


 20%|███████▉                                | 627/3171 [00:55<05:47,  7.32it/s]

Deleting BROS_1min_parquet — only 815 files found.


 20%|███████▉                                | 630/3171 [00:55<05:42,  7.43it/s]

Deleting CON_1min_parquet — only 97 files found.
Deleting BMR_1min_parquet — only 450 files found.


 20%|████████▏                               | 648/3171 [00:56<01:59, 21.18it/s]

Deleting DBD_1min_parquet — only 335 files found.


 21%|████████▏                               | 653/3171 [00:56<02:02, 20.51it/s]

Deleting BIRK_1min_parquet — only 294 files found.
Deleting BIGZ_1min_parquet — only 934 files found.


 21%|████████▎                               | 657/3171 [00:57<03:09, 13.27it/s]

Deleting DNUT_1min_parquet — only 867 files found.


 21%|████████▎                               | 660/3171 [00:57<03:44, 11.19it/s]

Deleting DOCN_1min_parquet — only 936 files found.


 21%|████████▍                               | 667/3171 [00:58<03:33, 11.74it/s]

Deleting FG_1min_parquet — only 509 files found.


 21%|████████▍                               | 672/3171 [00:58<03:15, 12.78it/s]

Deleting DLO_1min_parquet — only 887 files found.


 22%|████████▋                               | 693/3171 [00:59<01:49, 22.55it/s]

Deleting CORZ_1min_parquet — only 649 files found.


 22%|████████▊                               | 697/3171 [00:59<02:14, 18.43it/s]

Deleting DH_1min_parquet — only 815 files found.


 22%|████████▊                               | 701/3171 [01:00<02:38, 15.59it/s]

Deleting CPNG_1min_parquet — only 945 files found.


 22%|████████▉                               | 704/3171 [01:00<03:16, 12.57it/s]

Deleting COUR_1min_parquet — only 931 files found.


 22%|████████▉                               | 710/3171 [01:00<03:18, 12.39it/s]

Deleting COOK_1min_parquet — only 848 files found.


 23%|█████████                               | 715/3171 [01:01<03:23, 12.08it/s]

Deleting COMP_1min_parquet — only 930 files found.


 23%|█████████                               | 722/3171 [01:01<03:13, 12.68it/s]

Deleting INNV_1min_parquet — only 950 files found.


 23%|█████████▎                              | 735/3171 [01:02<02:12, 18.43it/s]

Deleting CNTA_1min_parquet — only 890 files found.
Deleting CNXC_1min_parquet — only 1014 files found.


 23%|█████████▎                              | 740/3171 [01:03<03:36, 11.23it/s]

Deleting CNM_1min_parquet — only 852 files found.


 23%|█████████▍                              | 744/3171 [01:03<03:45, 10.77it/s]

Deleting IPWR_1min_parquet — only 1054 files found.


 24%|█████████▍                              | 747/3171 [01:04<04:26,  9.08it/s]

Deleting CNH_1min_parquet — only 1031 files found.


 24%|█████████▍                              | 749/3171 [01:04<05:16,  7.66it/s]

Deleting IOT_1min_parquet — only 751 files found.


 24%|█████████▍                              | 753/3171 [01:05<04:47,  8.41it/s]

Deleting INVZ_1min_parquet — only 928 files found.


 24%|█████████▌                              | 756/3171 [01:05<05:02,  7.98it/s]

Deleting BMTX_1min_parquet — only 1054 files found.


 24%|█████████▌                              | 758/3171 [01:06<06:01,  6.67it/s]

Deleting BMEA_1min_parquet — only 920 files found.


 24%|█████████▌                              | 759/3171 [01:06<07:21,  5.47it/s]

Deleting BLDE_1min_parquet — only 1045 files found.


 24%|█████████▌                              | 762/3171 [01:07<07:08,  5.63it/s]

Deleting DCGO_1min_parquet — only 946 files found.


 24%|█████████▋                              | 765/3171 [01:07<06:49,  5.88it/s]

Deleting IMCR_1min_parquet — only 968 files found.


 24%|█████████▋                              | 769/3171 [01:08<06:02,  6.63it/s]

Deleting BKSY_1min_parquet — only 1054 files found.


 25%|█████████▊                              | 778/3171 [01:08<04:10,  9.54it/s]

Deleting IH_1min_parquet — only 1043 files found.


 25%|█████████▊                              | 780/3171 [01:09<05:17,  7.54it/s]

Deleting CNO_1min_parquet — only 1031 files found.


 25%|█████████▉                              | 783/3171 [01:09<05:40,  7.01it/s]

Deleting INTA_1min_parquet — only 868 files found.


 25%|██████████                              | 801/3171 [01:10<02:24, 16.38it/s]

Deleting INDV_1min_parquet — only 378 files found.


 26%|██████████▏                             | 810/3171 [01:10<01:59, 19.72it/s]

Deleting DFH_1min_parquet — only 979 files found.
Deleting BLTE_1min_parquet — only 658 files found.


 26%|██████████▎                             | 814/3171 [01:11<03:14, 12.09it/s]

Deleting FDMT_1min_parquet — only 1005 files found.


 26%|██████████▎                             | 817/3171 [01:12<03:50, 10.20it/s]

Deleting INFA_1min_parquet — only 785 files found.


 26%|██████████▎                             | 822/3171 [01:12<03:47, 10.32it/s]

Deleting BLZE_1min_parquet — only 774 files found.


 26%|██████████▍                             | 826/3171 [01:13<03:50, 10.15it/s]

Deleting INBX_1min_parquet — only 1053 files found.


 26%|██████████▌                             | 833/3171 [01:13<03:42, 10.51it/s]

Deleting INDI_1min_parquet — only 881 files found.


 26%|██████████▌                             | 835/3171 [01:14<04:39,  8.36it/s]

Deleting BMBL_1min_parquet — only 964 files found.


 27%|██████████▋                             | 848/3171 [01:14<02:44, 14.13it/s]

Deleting DAVE_1min_parquet — only 736 files found.
Deleting DAWN_1min_parquet — only 891 files found.


 27%|██████████▋                             | 852/3171 [01:15<03:59,  9.68it/s]

Deleting DASH_1min_parquet — only 1007 files found.


 27%|██████████▊                             | 857/3171 [01:16<04:16,  9.01it/s]

Deleting BKKT_1min_parquet — only 792 files found.
Deleting BIRD_1min_parquet — only 780 files found.


 27%|██████████▊                             | 860/3171 [01:17<05:41,  6.77it/s]

Deleting CYBN_1min_parquet — only 843 files found.


 27%|██████████▉                             | 863/3171 [01:17<05:32,  6.93it/s]

Deleting GTLB_1min_parquet — only 794 files found.


 27%|██████████▉                             | 872/3171 [01:18<03:29, 10.99it/s]

Deleting GRAL_1min_parquet — only 126 files found.


 28%|███████████▏                            | 884/3171 [01:18<01:47, 21.19it/s]

Deleting EXFY_1min_parquet — only 775 files found.


 28%|███████████▏                            | 889/3171 [01:18<02:07, 17.90it/s]

Deleting EWTX_1min_parquet — only 934 files found.


 28%|███████████▎                            | 898/3171 [01:19<02:12, 17.19it/s]

Deleting EWCZ_1min_parquet — only 843 files found.


 28%|███████████▍                            | 902/3171 [01:19<02:31, 14.95it/s]

Deleting EXE_1min_parquet — only 941 files found.
Deleting EVTL_1min_parquet — only 1030 files found.


 29%|███████████▍                            | 905/3171 [01:20<04:10,  9.03it/s]

Deleting HCP_1min_parquet — only 755 files found.


 29%|███████████▍                            | 908/3171 [01:21<04:18,  8.75it/s]

Deleting GB_1min_parquet — only 1050 files found.


 29%|███████████▍                            | 910/3171 [01:21<05:04,  7.42it/s]

Deleting GENI_1min_parquet — only 1052 files found.


 29%|███████████▌                            | 915/3171 [01:22<04:46,  7.89it/s]

Deleting GFS_1min_parquet — only 784 files found.
Deleting GETY_1min_parquet — only 600 files found.


 29%|███████████▌                            | 917/3171 [01:22<06:04,  6.18it/s]

Deleting MRVI_1min_parquet — only 1019 files found.


 29%|███████████▋                            | 924/3171 [01:23<04:31,  8.27it/s]

Deleting MNPR_1min_parquet — only 1054 files found.


 29%|███████████▋                            | 928/3171 [01:23<04:34,  8.17it/s]

Deleting MNTK_1min_parquet — only 978 files found.


 30%|███████████▉                            | 946/3171 [01:24<01:54, 19.37it/s]

Deleting IE_1min_parquet — only 618 files found.
Deleting CXT_1min_parquet — only 425 files found.


 30%|███████████▉                            | 951/3171 [01:25<02:32, 14.59it/s]

Deleting CXM_1min_parquet — only 873 files found.


 30%|████████████                            | 960/3171 [01:25<02:25, 15.24it/s]

Deleting HG_1min_parquet — only 272 files found.
Deleting HGTY_1min_parquet — only 882 files found.
Deleting HEPS_1min_parquet — only 867 files found.


 30%|████████████▏                           | 963/3171 [01:26<04:01,  9.15it/s]

Deleting HAYW_1min_parquet — only 944 files found.


 31%|████████████▎                           | 979/3171 [01:27<02:03, 17.82it/s]

Deleting HASI_1min_parquet — only 990 files found.


 31%|████████████▌                           | 992/3171 [01:27<01:52, 19.45it/s]

Deleting GWH_1min_parquet — only 969 files found.


 31%|████████████▌                           | 998/3171 [01:28<02:08, 16.90it/s]

Deleting GLBE_1min_parquet — only 902 files found.


 32%|████████████▎                          | 1002/3171 [01:28<02:29, 14.49it/s]

Deleting GEHC_1min_parquet — only 499 files found.


 32%|████████████▎                          | 1005/3171 [01:28<02:35, 13.93it/s]

Deleting FYBR_1min_parquet — only 908 files found.


 32%|████████████▍                          | 1012/3171 [01:29<02:36, 13.82it/s]

Deleting GXO_1min_parquet — only 852 files found.


 32%|████████████▍                          | 1015/3171 [01:29<03:00, 11.91it/s]

Deleting GRND_1min_parquet — only 977 files found.


 32%|████████████▌                          | 1021/3171 [01:30<03:04, 11.63it/s]

Deleting GPCR_1min_parquet — only 466 files found.


 32%|████████████▌                          | 1025/3171 [01:30<02:52, 12.46it/s]

Deleting GNLX_1min_parquet — only 472 files found.


 33%|████████████▋                          | 1032/3171 [01:30<02:19, 15.35it/s]

Deleting GSK_1min_parquet — only 1052 files found.


 33%|████████████▊                          | 1043/3171 [01:31<02:12, 16.10it/s]

Deleting GLUE_1min_parquet — only 872 files found.


 34%|█████████████                          | 1063/3171 [01:32<01:25, 24.55it/s]

Deleting GEV_1min_parquet — only 179 files found.
Deleting FRGE_1min_parquet — only 971 files found.
Deleting FNA_1min_parquet — only 793 files found.


 34%|█████████████▏                         | 1068/3171 [01:33<02:44, 12.79it/s]

Deleting FLUT_1min_parquet — only 220 files found.
Deleting FLNC_1min_parquet — only 784 files found.
Deleting MTAL_1min_parquet — only 730 files found.


 34%|█████████████▏                         | 1071/3171 [01:34<03:58,  8.80it/s]

Deleting MNDY_1min_parquet — only 882 files found.


 34%|█████████████▏                         | 1074/3171 [01:34<04:35,  7.62it/s]

Deleting GAMB_1min_parquet — only 852 files found.


 34%|█████████████▎                         | 1079/3171 [01:35<04:22,  7.98it/s]

Deleting CVRX_1min_parquet — only 868 files found.


 34%|█████████████▎                         | 1086/3171 [01:35<03:27, 10.05it/s]

Deleting IBTA_1min_parquet — only 164 files found.
Deleting LESL_1min_parquet — only 1033 files found.


 34%|█████████████▍                         | 1091/3171 [01:36<03:34,  9.70it/s]

Deleting LENZ_1min_parquet — only 871 files found.


 34%|█████████████▍                         | 1093/3171 [01:36<04:19,  8.01it/s]

Deleting LB_1min_parquet — only 115 files found.
Deleting KVYO_1min_parquet — only 309 files found.


 35%|█████████████▋                         | 1108/3171 [01:37<01:41, 20.37it/s]

Deleting FREY_1min_parquet — only 1051 files found.
Deleting FRSH_1min_parquet — only 810 files found.


 35%|█████████████▋                         | 1113/3171 [01:38<02:52, 11.94it/s]

Deleting FLYW_1min_parquet — only 892 files found.


 36%|█████████████▊                         | 1128/3171 [01:38<02:14, 15.16it/s]

Deleting MSGE_1min_parquet — only 413 files found.
Deleting MRX_1min_parquet — only 159 files found.


 36%|█████████████▉                         | 1133/3171 [01:39<01:51, 18.26it/s]

Deleting GCT_1min_parquet — only 582 files found.


 36%|█████████████▉                         | 1137/3171 [01:39<02:03, 16.53it/s]

Deleting MNMD_1min_parquet — only 913 files found.


 36%|██████████████                         | 1140/3171 [01:39<02:38, 12.84it/s]

Deleting CWH_1min_parquet — only 990 files found.


 36%|██████████████                         | 1143/3171 [01:40<03:17, 10.26it/s]

Deleting EVCM_1min_parquet — only 867 files found.
Deleting CWAN_1min_parquet — only 808 files found.


 36%|██████████████                         | 1145/3171 [01:41<05:02,  6.69it/s]

Deleting ICLK_1min_parquet — only 1054 files found.


 36%|██████████████▏                        | 1157/3171 [01:41<02:38, 12.74it/s]

Deleting LTM_1min_parquet — only 97 files found.


 37%|██████████████▌                        | 1179/3171 [01:41<01:00, 33.17it/s]

Deleting MNSO_1min_parquet — only 1045 files found.


 37%|██████████████▌                        | 1187/3171 [01:42<01:21, 24.38it/s]

Deleting LTH_1min_parquet — only 799 files found.
Deleting LPA_1min_parquet — only 870 files found.


 38%|██████████████▋                        | 1193/3171 [01:43<02:27, 13.38it/s]

Deleting LOAR_1min_parquet — only 159 files found.
Deleting LINE_1min_parquet — only 97 files found.
Deleting LIF_1min_parquet — only 130 files found.


 38%|██████████████▋                        | 1198/3171 [01:43<02:08, 15.38it/s]

Deleting KOSS_1min_parquet — only 1053 files found.


 38%|██████████████▊                        | 1202/3171 [01:44<02:43, 12.04it/s]

Deleting MQ_1min_parquet — only 883 files found.


 38%|██████████████▉                        | 1218/3171 [01:44<01:43, 18.86it/s]

Deleting GBDC_1min_parquet — only 990 files found.


 39%|███████████████                        | 1227/3171 [01:45<01:46, 18.30it/s]

Deleting EVGO_1min_parquet — only 1019 files found.


 39%|███████████████▏                       | 1231/3171 [01:46<02:10, 14.86it/s]

Deleting MLNK_1min_parquet — only 849 files found.
Deleting GATO_1min_parquet — only 1033 files found.


 39%|███████████████▏                       | 1234/3171 [01:46<03:26,  9.39it/s]

Deleting MLTX_1min_parquet — only 679 files found.


 39%|███████████████▎                       | 1247/3171 [01:47<01:59, 16.14it/s]

Deleting MKFG_1min_parquet — only 1049 files found.


 40%|███████████████▌                       | 1264/3171 [01:48<01:24, 22.54it/s]

Deleting LOGC_1min_parquet — only 1002 files found.


 40%|███████████████▌                       | 1270/3171 [01:48<01:49, 17.38it/s]

Deleting LLYVK_1min_parquet — only 238 files found.
Deleting KSPI_1min_parquet — only 226 files found.


 40%|███████████████▋                       | 1275/3171 [01:48<01:46, 17.76it/s]

Deleting CURB_1min_parquet — only 29 files found.
Deleting BETR_1min_parquet — only 818 files found.


 40%|███████████████▋                       | 1279/3171 [01:49<02:18, 13.67it/s]

Deleting KNF_1min_parquet — only 385 files found.
Deleting PUBM_1min_parquet — only 1007 files found.


 41%|███████████████▊                       | 1285/3171 [01:49<02:24, 13.03it/s]

Deleting IRON_1min_parquet — only 1054 files found.


 41%|███████████████▊                       | 1290/3171 [01:50<02:38, 11.85it/s]

Deleting PROK_1min_parquet — only 864 files found.


 41%|███████████████▉                       | 1292/3171 [01:50<03:07, 10.03it/s]

Deleting PRME_1min_parquet — only 538 files found.
Deleting PPTA_1min_parquet — only 960 files found.


 41%|███████████████▉                       | 1294/3171 [01:51<04:17,  7.28it/s]

Deleting KIND_1min_parquet — only 891 files found.


 41%|███████████████▉                       | 1297/3171 [01:52<04:22,  7.13it/s]

Deleting OTLY_1min_parquet — only 896 files found.


 41%|███████████████▉                       | 1299/3171 [01:52<04:47,  6.52it/s]

Deleting OLMA_1min_parquet — only 1020 files found.


 41%|████████████████                       | 1303/3171 [01:53<04:50,  6.42it/s]

Deleting ODD_1min_parquet — only 353 files found.
Deleting OB_1min_parquet — only 852 files found.


 41%|████████████████                       | 1304/3171 [01:53<05:54,  5.26it/s]

Deleting FWRG_1min_parquet — only 803 files found.


 42%|████████████████▎                      | 1325/3171 [01:54<01:36, 19.11it/s]

Deleting LXEO_1min_parquet — only 277 files found.
Deleting LU_1min_parquet — only 1033 files found.


 42%|████████████████▎                      | 1329/3171 [01:54<02:09, 14.23it/s]

Deleting LUNR_1min_parquet — only 459 files found.


 42%|████████████████▍                      | 1332/3171 [01:54<02:12, 13.86it/s]

Deleting LXRX_1min_parquet — only 1054 files found.


 43%|████████████████▋                      | 1355/3171 [01:55<01:00, 30.15it/s]

Deleting LFMD_1min_parquet — only 958 files found.


 43%|████████████████▊                      | 1366/3171 [01:56<01:15, 23.90it/s]

Deleting LFST_1min_parquet — only 882 files found.


 43%|████████████████▊                      | 1371/3171 [01:56<01:32, 19.52it/s]

Deleting LDI_1min_parquet — only 964 files found.
Deleting LASE_1min_parquet — only 552 files found.


 43%|████████████████▉                      | 1375/3171 [01:57<02:17, 13.08it/s]

Deleting KYTX_1min_parquet — only 212 files found.
Deleting BFLY_1min_parquet — only 1054 files found.


 44%|████████████████▉                      | 1381/3171 [01:58<02:40, 11.15it/s]

Deleting FTRE_1min_parquet — only 372 files found.


 44%|█████████████████                      | 1386/3171 [01:58<02:09, 13.81it/s]

Deleting KLG_1min_parquet — only 304 files found.


 44%|█████████████████▏                     | 1397/3171 [01:58<01:12, 24.53it/s]

Deleting LAW_1min_parquet — only 854 files found.


 44%|█████████████████▎                     | 1411/3171 [01:58<01:04, 27.42it/s]

Deleting KVUE_1min_parquet — only 404 files found.


 45%|█████████████████▍                     | 1417/3171 [01:59<01:10, 24.82it/s]

Deleting CURV_1min_parquet — only 867 files found.


 45%|█████████████████▌                     | 1423/3171 [01:59<01:30, 19.38it/s]

Deleting JANX_1min_parquet — only 881 files found.


 45%|█████████████████▌                     | 1432/3171 [02:00<01:33, 18.69it/s]

Deleting IVT_1min_parquet — only 796 files found.
Deleting ISPO_1min_parquet — only 954 files found.


 45%|█████████████████▋                     | 1435/3171 [02:01<02:34, 11.24it/s]

Deleting IREN_1min_parquet — only 770 files found.


 45%|█████████████████▋                     | 1438/3171 [02:01<02:48, 10.28it/s]

Deleting PSQH_1min_parquet — only 585 files found.


 45%|█████████████████▋                     | 1440/3171 [02:01<02:57,  9.74it/s]

Deleting PL_1min_parquet — only 912 files found.


 45%|█████████████████▋                     | 1442/3171 [02:02<03:29,  8.26it/s]

Deleting KRON_1min_parquet — only 1049 files found.


 46%|██████████████████                     | 1471/3171 [02:02<01:00, 28.01it/s]

Deleting FUBO_1min_parquet — only 1050 files found.


 47%|██████████████████▏                    | 1478/3171 [02:03<01:24, 20.11it/s]

Deleting IAS_1min_parquet — only 868 files found.


 47%|██████████████████▍                    | 1504/3171 [02:04<00:47, 35.03it/s]

Deleting KLTR_1min_parquet — only 854 files found.


 48%|██████████████████▌                    | 1511/3171 [02:04<01:05, 25.15it/s]

Deleting PSEC_1min_parquet — only 1031 files found.


 48%|██████████████████▋                    | 1522/3171 [02:05<01:13, 22.58it/s]

Deleting PSFE_1min_parquet — only 1049 files found.


 48%|██████████████████▊                    | 1527/3171 [02:05<01:31, 18.02it/s]

Deleting PRVA_1min_parquet — only 911 files found.
Deleting PRCT_1min_parquet — only 815 files found.
Deleting PLTK_1min_parquet — only 982 files found.


 48%|██████████████████▉                    | 1537/3171 [02:07<02:07, 12.77it/s]

Deleting KLC_1min_parquet — only 44 files found.
Deleting OS_1min_parquet — only 98 files found.
Deleting OCFT_1min_parquet — only 1054 files found.


 49%|██████████████████▉                    | 1541/3171 [02:07<02:24, 11.31it/s]

Deleting NBXG_1min_parquet — only 892 files found.


 49%|███████████████████                    | 1553/3171 [02:08<01:47, 15.03it/s]

Deleting NAUT_1min_parquet — only 1052 files found.


 50%|███████████████████▍                   | 1584/3171 [02:08<00:46, 33.93it/s]

Deleting PTLO_1min_parquet — only 789 files found.
Deleting PRM_1min_parquet — only 776 files found.
Deleting PRAX_1min_parquet — only 1044 files found.


 50%|███████████████████▌                   | 1591/3171 [02:10<01:50, 14.35it/s]

Deleting PKST_1min_parquet — only 418 files found.
Deleting FTEL_1min_parquet — only 339 files found.


 51%|███████████████████▊                   | 1607/3171 [02:10<01:14, 21.08it/s]

Deleting CTRI_1min_parquet — only 164 files found.


 51%|███████████████████▉                   | 1616/3171 [02:10<00:55, 27.83it/s]

Deleting PMTS_1min_parquet — only 1053 files found.


 51%|███████████████████▉                   | 1623/3171 [02:11<01:14, 20.86it/s]

Deleting ERAS_1min_parquet — only 857 files found.


 51%|████████████████████                   | 1632/3171 [02:11<01:21, 18.98it/s]

Deleting ONTF_1min_parquet — only 970 files found.


 52%|████████████████████                   | 1636/3171 [02:12<01:38, 15.60it/s]

Deleting PHVS_1min_parquet — only 955 files found.


 52%|████████████████████▎                  | 1648/3171 [02:13<01:33, 16.34it/s]

Deleting PHIN_1min_parquet — only 367 files found.
Deleting ESAB_1min_parquet — only 675 files found.


 52%|████████████████████▎                  | 1654/3171 [02:13<01:32, 16.35it/s]

Deleting ERO_1min_parquet — only 814 files found.


 52%|████████████████████▍                  | 1657/3171 [02:13<01:50, 13.73it/s]

Deleting HYLN_1min_parquet — only 1054 files found.


 52%|████████████████████▍                  | 1663/3171 [02:14<01:58, 12.74it/s]

Deleting BEDU_1min_parquet — only 1030 files found.


 53%|████████████████████▍                  | 1665/3171 [02:14<02:31,  9.94it/s]

Deleting FTCI_1min_parquet — only 912 files found.


 53%|████████████████████▌                  | 1667/3171 [02:15<02:59,  8.36it/s]

Deleting PGY_1min_parquet — only 621 files found.


 53%|████████████████████▌                  | 1669/3171 [02:15<03:07,  8.03it/s]

Deleting KINS_1min_parquet — only 1054 files found.


 53%|████████████████████▌                  | 1674/3171 [02:16<02:53,  8.62it/s]

Deleting CTKB_1min_parquet — only 852 files found.


 53%|████████████████████▋                  | 1687/3171 [02:16<01:55, 12.86it/s]

Deleting KGS_1min_parquet — only 366 files found.
Deleting OWL_1min_parquet — only 1004 files found.


 54%|████████████████████▉                  | 1699/3171 [02:17<01:30, 16.19it/s]

Deleting OUST_1min_parquet — only 1042 files found.


 54%|████████████████████▉                  | 1706/3171 [02:17<01:33, 15.63it/s]

Deleting OKLO_1min_parquet — only 148 files found.
Deleting NWTN_1min_parquet — only 521 files found.


 54%|█████████████████████                  | 1709/3171 [02:18<01:39, 14.76it/s]

Deleting NRGV_1min_parquet — only 928 files found.


 54%|█████████████████████                  | 1716/3171 [02:18<01:53, 12.78it/s]

Deleting NPWR_1min_parquet — only 379 files found.


 54%|█████████████████████▏                 | 1726/3171 [02:18<01:01, 23.47it/s]

Deleting OSCR_1min_parquet — only 951 files found.


 55%|█████████████████████▎                 | 1731/3171 [02:19<01:30, 15.99it/s]

Deleting ONL_1min_parquet — only 777 files found.


 55%|█████████████████████▍                 | 1740/3171 [02:19<01:19, 17.89it/s]

Deleting ONDS_1min_parquet — only 1010 files found.


 55%|█████████████████████▍                 | 1743/3171 [02:20<01:43, 13.76it/s]

Deleting OPAD_1min_parquet — only 1005 files found.


 55%|█████████████████████▍                 | 1746/3171 [02:20<02:06, 11.23it/s]

Deleting ONON_1min_parquet — only 815 files found.


 55%|█████████████████████▋                 | 1759/3171 [02:21<01:14, 19.07it/s]

Deleting OLO_1min_parquet — only 941 files found.
Deleting OLPX_1min_parquet — only 804 files found.


 56%|█████████████████████▋                 | 1764/3171 [02:22<01:59, 11.79it/s]

Deleting OGN_1min_parquet — only 900 files found.


 56%|█████████████████████▊                 | 1769/3171 [02:22<02:01, 11.58it/s]

Deleting OBE_1min_parquet — only 720 files found.


 56%|█████████████████████▊                 | 1776/3171 [02:23<01:45, 13.20it/s]

Deleting NXT_1min_parquet — only 462 files found.


 56%|█████████████████████▉                 | 1786/3171 [02:23<01:19, 17.48it/s]

Deleting NWG_1min_parquet — only 1053 files found.


 57%|██████████████████████                 | 1792/3171 [02:24<01:30, 15.16it/s]

Deleting NVTS_1min_parquet — only 790 files found.
Deleting NRDS_1min_parquet — only 779 files found.


 57%|██████████████████████▏                | 1800/3171 [02:24<01:40, 13.62it/s]

Deleting NNE_1min_parquet — only 150 files found.
Deleting ME_1min_parquet — only 877 files found.


 57%|██████████████████████▎                | 1812/3171 [02:25<01:11, 19.07it/s]

Deleting NUVB_1min_parquet — only 1053 files found.
Deleting NUVL_1min_parquet — only 848 files found.


 58%|██████████████████████▍                | 1827/3171 [02:26<01:13, 18.34it/s]

Deleting NRDY_1min_parquet — only 1013 files found.


 58%|██████████████████████▌                | 1834/3171 [02:26<01:25, 15.56it/s]

Deleting FSK_1min_parquet — only 990 files found.


 59%|██████████████████████▊                | 1856/3171 [02:27<00:57, 22.95it/s]

Deleting NATL_1min_parquet — only 294 files found.
Deleting SRAD_1min_parquet — only 816 files found.
Deleting SPRY_1min_parquet — only 1010 files found.


 59%|██████████████████████▉                | 1861/3171 [02:28<01:35, 13.71it/s]

Deleting HTZ_1min_parquet — only 776 files found.
Deleting ENVX_1min_parquet — only 985 files found.


 59%|██████████████████████▉                | 1865/3171 [02:29<02:12,  9.84it/s]

Deleting SMRT_1min_parquet — only 829 files found.
Deleting SIEB_1min_parquet — only 1053 files found.


 59%|██████████████████████▉                | 1868/3171 [02:30<02:56,  7.39it/s]

Deleting SDIG_1min_parquet — only 790 files found.


 59%|███████████████████████                | 1872/3171 [02:30<02:43,  7.96it/s]

Deleting HUT_1min_parquet — only 879 files found.


 59%|███████████████████████                | 1880/3171 [02:31<02:05, 10.25it/s]

Deleting NN_1min_parquet — only 783 files found.


 59%|███████████████████████▏               | 1884/3171 [02:31<02:05, 10.26it/s]

Deleting HUMA_1min_parquet — only 984 files found.


 60%|███████████████████████▎               | 1898/3171 [02:32<01:15, 16.89it/s]

Deleting NAMS_1min_parquet — only 514 files found.


 60%|███████████████████████▍               | 1906/3171 [02:32<01:09, 18.27it/s]

Deleting MYTE_1min_parquet — only 979 files found.


 61%|███████████████████████▌               | 1919/3171 [02:33<00:56, 22.01it/s]

Deleting SOBO_1min_parquet — only 45 files found.
Deleting NABL_1min_parquet — only 855 files found.


 61%|███████████████████████▋               | 1924/3171 [02:33<01:08, 18.16it/s]

Deleting MYPS_1min_parquet — only 999 files found.


 61%|███████████████████████▋               | 1928/3171 [02:34<01:31, 13.63it/s]

Deleting MXCT_1min_parquet — only 847 files found.


 61%|███████████████████████▊               | 1939/3171 [02:34<01:15, 16.27it/s]

Deleting MTTR_1min_parquet — only 971 files found.


 61%|███████████████████████▉               | 1948/3171 [02:35<01:18, 15.52it/s]

Deleting SVV_1min_parquet — only 366 files found.


 62%|████████████████████████               | 1958/3171 [02:35<00:50, 23.95it/s]

Deleting SYM_1min_parquet — only 944 files found.
Deleting S_1min_parquet — only 868 files found.


 62%|████████████████████████▏              | 1970/3171 [02:36<01:07, 17.81it/s]

Deleting SVCO_1min_parquet — only 149 files found.
Deleting SWIM_1min_parquet — only 915 files found.


 62%|████████████████████████▎              | 1974/3171 [02:36<01:20, 14.96it/s]

Deleting STVN_1min_parquet — only 857 files found.


 63%|████████████████████████▍              | 1986/3171 [02:37<01:04, 18.29it/s]

Deleting SOLV_1min_parquet — only 180 files found.
Deleting SNRE_1min_parquet — only 26 files found.
Deleting SNCY_1min_parquet — only 941 files found.


 63%|████████████████████████▍              | 1990/3171 [02:37<01:19, 14.80it/s]

Deleting SPIR_1min_parquet — only 1031 files found.


 63%|████████████████████████▌              | 2002/3171 [02:38<01:08, 16.97it/s]

Deleting SOC_1min_parquet — only 207 files found.
Deleting SOUN_1min_parquet — only 659 files found.


 63%|████████████████████████▋              | 2005/3171 [02:38<01:19, 14.70it/s]

Deleting SN_1min_parquet — only 345 files found.
Deleting SOFI_1min_parquet — only 1012 files found.


 63%|████████████████████████▋              | 2008/3171 [02:39<01:53, 10.25it/s]

Deleting SNDA_1min_parquet — only 1037 files found.


 64%|█████████████████████████              | 2036/3171 [02:40<00:39, 28.43it/s]

Deleting STEM_1min_parquet — only 1050 files found.
Deleting SST_1min_parquet — only 1053 files found.


 65%|█████████████████████████▏             | 2053/3171 [02:41<00:51, 21.78it/s]

Deleting SGMT_1min_parquet — only 356 files found.


 65%|█████████████████████████▎             | 2062/3171 [02:41<00:47, 23.58it/s]

Deleting SERV_1min_parquet — only 164 files found.
Deleting SES_1min_parquet — only 952 files found.
Deleting SDST_1min_parquet — only 109 files found.
Deleting SATX_1min_parquet — only 684 files found.


 65%|█████████████████████████▍             | 2068/3171 [02:42<01:13, 15.07it/s]

Deleting SARO_1min_parquet — only 25 files found.
Deleting SANA_1min_parquet — only 969 files found.
Deleting RXST_1min_parquet — only 847 files found.


 65%|█████████████████████████▍             | 2072/3171 [02:43<01:41, 10.82it/s]

Deleting RLX_1min_parquet — only 978 files found.


 66%|█████████████████████████▌             | 2079/3171 [02:44<01:34, 11.58it/s]

Deleting RGC_1min_parquet — only 703 files found.


 66%|█████████████████████████▌             | 2082/3171 [02:44<01:38, 11.05it/s]

Deleting QRTEB_1min_parquet — only 994 files found.


 66%|█████████████████████████▊             | 2097/3171 [02:44<01:00, 17.90it/s]

Deleting CSQ_1min_parquet — only 1031 files found.


 66%|█████████████████████████▊             | 2103/3171 [02:45<01:11, 15.03it/s]

Deleting PEPG_1min_parquet — only 653 files found.


 66%|█████████████████████████▉             | 2107/3171 [02:45<01:20, 13.15it/s]

Deleting NMRA_1min_parquet — only 312 files found.
Deleting SMWB_1min_parquet — only 902 files found.


 67%|█████████████████████████▉             | 2110/3171 [02:46<01:33, 11.35it/s]

Deleting MESO_1min_parquet — only 1043 files found.
Deleting KD_1min_parquet — only 788 files found.


 67%|██████████████████████████             | 2123/3171 [02:47<01:13, 14.31it/s]

Deleting CSBR_1min_parquet — only 1037 files found.


 67%|██████████████████████████▏            | 2127/3171 [02:47<01:27, 11.90it/s]

Deleting SEMR_1min_parquet — only 935 files found.


 67%|██████████████████████████▏            | 2130/3171 [02:48<01:39, 10.44it/s]

Deleting SDA_1min_parquet — only 697 files found.


 67%|██████████████████████████▏            | 2133/3171 [02:48<01:42, 10.11it/s]

Deleting SCLX_1min_parquet — only 888 files found.
Deleting PFX_1min_parquet — only 899 files found.


 67%|██████████████████████████▎            | 2135/3171 [02:49<02:36,  6.61it/s]

Deleting RSKD_1min_parquet — only 848 files found.


 67%|██████████████████████████▎            | 2137/3171 [02:49<02:45,  6.25it/s]

Deleting QXO_1min_parquet — only 1054 files found.


 68%|██████████████████████████▎            | 2144/3171 [02:50<02:02,  8.36it/s]

Deleting SMR_1min_parquet — only 656 files found.


 68%|██████████████████████████▍            | 2151/3171 [02:50<01:33, 10.94it/s]

Deleting SHCO_1min_parquet — only 858 files found.


 68%|██████████████████████████▌            | 2157/3171 [02:51<01:26, 11.69it/s]

Deleting SHLS_1min_parquet — only 975 files found.


 68%|██████████████████████████▌            | 2164/3171 [02:51<01:21, 12.31it/s]

Deleting SHC_1min_parquet — only 1019 files found.


 68%|██████████████████████████▋            | 2166/3171 [02:52<01:41,  9.92it/s]

Deleting SG_1min_parquet — only 769 files found.


 68%|██████████████████████████▋            | 2168/3171 [02:52<01:53,  8.84it/s]

Deleting SGML_1min_parquet — only 817 files found.


 68%|██████████████████████████▋            | 2170/3171 [02:52<02:06,  7.89it/s]

Deleting SEER_1min_parquet — only 1010 files found.


 69%|██████████████████████████▊            | 2175/3171 [02:53<01:55,  8.63it/s]

Deleting SDRL_1min_parquet — only 563 files found.


 69%|██████████████████████████▊            | 2178/3171 [02:53<01:48,  9.12it/s]

Deleting SHO_1min_parquet — only 1031 files found.


 69%|██████████████████████████▊            | 2179/3171 [02:54<02:27,  6.75it/s]

Deleting RXRX_1min_parquet — only 920 files found.


 69%|██████████████████████████▉            | 2186/3171 [02:54<01:44,  9.42it/s]

Deleting ROIV_1min_parquet — only 803 files found.


 69%|██████████████████████████▉            | 2188/3171 [02:55<01:58,  8.31it/s]

Deleting SEAT_1min_parquet — only 791 files found.


 70%|███████████████████████████▏           | 2213/3171 [02:55<00:33, 29.01it/s]

Deleting RXO_1min_parquet — only 533 files found.


 70%|███████████████████████████▎           | 2219/3171 [02:55<00:39, 24.24it/s]

Deleting RYAN_1min_parquet — only 853 files found.


 70%|███████████████████████████▎           | 2224/3171 [02:56<00:48, 19.47it/s]

Deleting RNW_1min_parquet — only 830 files found.


 70%|███████████████████████████▍           | 2234/3171 [02:56<00:46, 20.15it/s]

Deleting ZK_1min_parquet — only 148 files found.
Deleting WULF_1min_parquet — only 1052 files found.


 71%|███████████████████████████▌           | 2238/3171 [02:57<01:02, 14.88it/s]

Deleting CSAN_1min_parquet — only 948 files found.


 71%|███████████████████████████▋           | 2247/3171 [02:58<00:59, 15.44it/s]

Deleting VLTO_1min_parquet — only 304 files found.
Deleting TYRA_1min_parquet — only 815 files found.


 71%|███████████████████████████▊           | 2261/3171 [02:58<00:46, 19.58it/s]

Deleting NLOP_1min_parquet — only 276 files found.
Deleting TOP_1min_parquet — only 626 files found.


 72%|███████████████████████████▉           | 2275/3171 [02:59<00:32, 27.80it/s]

Deleting PLL_1min_parquet — only 1053 files found.


 72%|████████████████████████████           | 2280/3171 [02:59<00:50, 17.70it/s]

Deleting RUM_1min_parquet — only 873 files found.


 72%|████████████████████████████▏          | 2296/3171 [03:00<00:35, 24.78it/s]

Deleting ROOT_1min_parquet — only 1033 files found.


 73%|████████████████████████████▎          | 2301/3171 [03:00<00:47, 18.15it/s]

Deleting RKLB_1min_parquet — only 962 files found.
Deleting RIVN_1min_parquet — only 775 files found.


 73%|████████████████████████████▎          | 2305/3171 [03:01<01:13, 11.82it/s]

Deleting RHP_1min_parquet — only 990 files found.


 73%|████████████████████████████▍          | 2308/3171 [03:02<01:24, 10.21it/s]

Deleting RELY_1min_parquet — only 809 files found.


 73%|████████████████████████████▍          | 2314/3171 [03:02<01:16, 11.22it/s]

Deleting QSI_1min_parquet — only 1013 files found.


 73%|████████████████████████████▌          | 2327/3171 [03:03<00:47, 17.58it/s]

Deleting WFG_1min_parquet — only 972 files found.


 74%|████████████████████████████▋          | 2332/3171 [03:03<00:56, 14.76it/s]

Deleting WFRD_1min_parquet — only 888 files found.


 74%|████████████████████████████▋          | 2336/3171 [03:03<01:04, 12.96it/s]

Deleting WGS_1min_parquet — only 1023 files found.
Deleting ZVRA_1min_parquet — only 987 files found.


 74%|████████████████████████████▊          | 2339/3171 [03:04<01:41,  8.21it/s]

Deleting ZIM_1min_parquet — only 974 files found.


 74%|████████████████████████████▊          | 2347/3171 [03:05<01:20, 10.22it/s]

Deleting ZGN_1min_parquet — only 748 files found.


 74%|████████████████████████████▉          | 2353/3171 [03:05<01:12, 11.32it/s]

Deleting XYF_1min_parquet — only 1054 files found.


 74%|████████████████████████████▉          | 2355/3171 [03:06<01:28,  9.26it/s]

Deleting XPOF_1min_parquet — only 852 files found.


 74%|█████████████████████████████          | 2359/3171 [03:06<01:30,  8.98it/s]

Deleting WS_1min_parquet — only 259 files found.


 75%|█████████████████████████████▏         | 2378/3171 [03:07<00:28, 27.38it/s]

Deleting RBLX_1min_parquet — only 946 files found.


 75%|█████████████████████████████▎         | 2384/3171 [03:07<00:41, 19.04it/s]

Deleting PYCR_1min_parquet — only 854 files found.


 75%|█████████████████████████████▍         | 2391/3171 [03:08<00:43, 17.76it/s]

Deleting PWP_1min_parquet — only 1017 files found.
Deleting ZURA_1min_parquet — only 435 files found.
Deleting ZJK_1min_parquet — only 27 files found.
Deleting ZIP_1min_parquet — only 892 files found.


 76%|█████████████████████████████▍         | 2395/3171 [03:09<01:17,  9.99it/s]

Deleting ZIMV_1min_parquet — only 701 files found.


 76%|█████████████████████████████▍         | 2398/3171 [03:09<01:18,  9.82it/s]

Deleting XMTR_1min_parquet — only 868 files found.


 76%|█████████████████████████████▌         | 2400/3171 [03:09<01:30,  8.53it/s]

Deleting VIRC_1min_parquet — only 1048 files found.


 76%|█████████████████████████████▌         | 2404/3171 [03:10<01:31,  8.36it/s]

Deleting VAL_1min_parquet — only 909 files found.


 76%|█████████████████████████████▋         | 2410/3171 [03:10<01:15, 10.08it/s]

Deleting ULS_1min_parquet — only 168 files found.
Deleting TRML_1min_parquet — only 905 files found.


 76%|█████████████████████████████▋         | 2414/3171 [03:11<01:17,  9.73it/s]

Deleting TRIN_1min_parquet — only 973 files found.


 76%|█████████████████████████████▊         | 2421/3171 [03:11<01:02, 11.95it/s]

Deleting WAY_1min_parquet — only 129 files found.
Deleting ENFN_1min_parquet — only 789 files found.


 76%|█████████████████████████████▊         | 2424/3171 [03:12<01:10, 10.55it/s]

Deleting JXN_1min_parquet — only 824 files found.


 77%|█████████████████████████████▉         | 2437/3171 [03:12<00:43, 17.03it/s]

Deleting CRGX_1min_parquet — only 272 files found.


 77%|██████████████████████████████▏        | 2457/3171 [03:13<00:19, 36.47it/s]

Deleting WOOF_1min_parquet — only 983 files found.


 78%|██████████████████████████████▎        | 2463/3171 [03:13<00:32, 21.67it/s]

Deleting ZH_1min_parquet — only 934 files found.


 78%|██████████████████████████████▎        | 2468/3171 [03:14<00:40, 17.44it/s]

Deleting YOU_1min_parquet — only 868 files found.


 78%|██████████████████████████████▍        | 2475/3171 [03:14<00:41, 16.68it/s]

Deleting YSG_1min_parquet — only 1020 files found.
Deleting YMM_1min_parquet — only 874 files found.


 79%|██████████████████████████████▌        | 2490/3171 [03:15<00:37, 18.21it/s]

Deleting XBIT_1min_parquet — only 1054 files found.


 79%|██████████████████████████████▋        | 2498/3171 [03:16<00:41, 16.22it/s]

Deleting WRBY_1min_parquet — only 805 files found.


 80%|███████████████████████████████        | 2522/3171 [03:16<00:21, 30.17it/s]

Deleting WEAV_1min_parquet — only 774 files found.
Deleting WDS_1min_parquet — only 635 files found.


 80%|███████████████████████████████        | 2529/3171 [03:17<00:33, 19.22it/s]

Deleting PECO_1min_parquet — only 858 files found.


 80%|███████████████████████████████▏       | 2537/3171 [03:18<00:34, 18.49it/s]

Deleting WBX_1min_parquet — only 919 files found.


 81%|███████████████████████████████▍       | 2556/3171 [03:18<00:25, 24.53it/s]

Deleting UMAC_1min_parquet — only 208 files found.
Deleting UDMY_1min_parquet — only 783 files found.
Deleting TUYA_1min_parquet — only 940 files found.


 81%|███████████████████████████████▋       | 2581/3171 [03:19<00:19, 30.06it/s]

Deleting VIK_1min_parquet — only 155 files found.
Deleting VMEO_1min_parquet — only 896 files found.
Deleting VIV_1min_parquet — only 1054 files found.


 82%|███████████████████████████████▉       | 2593/3171 [03:21<00:34, 16.99it/s]

Deleting VFS_1min_parquet — only 334 files found.
Deleting VEEA_1min_parquet — only 663 files found.


 82%|███████████████████████████████▉       | 2598/3171 [03:21<00:34, 16.49it/s]

Deleting VERA_1min_parquet — only 900 files found.
Deleting VERV_1min_parquet — only 877 files found.


 82%|████████████████████████████████       | 2612/3171 [03:22<00:32, 17.36it/s]

Deleting UPST_1min_parquet — only 1002 files found.


 83%|████████████████████████████████▏      | 2621/3171 [03:22<00:34, 16.13it/s]

Deleting ULCC_1min_parquet — only 930 files found.


 83%|████████████████████████████████▎      | 2627/3171 [03:23<00:37, 14.45it/s]

Deleting TSVT_1min_parquet — only 778 files found.


 84%|████████████████████████████████▋      | 2656/3171 [03:24<00:17, 29.60it/s]

Deleting TSAT_1min_parquet — only 768 files found.


 84%|████████████████████████████████▊      | 2666/3171 [03:24<00:19, 25.46it/s]

Deleting TOST_1min_parquet — only 810 files found.


 85%|████████████████████████████████▉      | 2683/3171 [03:25<00:16, 30.15it/s]

Deleting CRGY_1min_parquet — only 756 files found.
Deleting PCOR_1min_parquet — only 896 files found.


 86%|█████████████████████████████████▎     | 2712/3171 [03:26<00:13, 34.69it/s]

Deleting WBTN_1min_parquet — only 116 files found.
Deleting WBD_1min_parquet — only 676 files found.
Deleting TEM_1min_parquet — only 124 files found.
Deleting TERN_1min_parquet — only 968 files found.


 86%|█████████████████████████████████▍     | 2720/3171 [03:27<00:23, 19.45it/s]

Deleting BBAI_1min_parquet — only 756 files found.


 86%|█████████████████████████████████▌     | 2726/3171 [03:27<00:24, 18.33it/s]

Deleting PCT_1min_parquet — only 940 files found.


 86%|█████████████████████████████████▌     | 2732/3171 [03:28<00:26, 16.37it/s]

Deleting BARK_1min_parquet — only 1000 files found.


 86%|█████████████████████████████████▋     | 2739/3171 [03:28<00:28, 15.19it/s]

Deleting RBRK_1min_parquet — only 159 files found.
Deleting SKYH_1min_parquet — only 723 files found.


 86%|█████████████████████████████████▋     | 2742/3171 [03:29<00:32, 13.32it/s]

Deleting TIL_1min_parquet — only 939 files found.


 87%|█████████████████████████████████▊     | 2745/3171 [03:29<00:39, 10.89it/s]

Deleting PATH_1min_parquet — only 917 files found.


 87%|█████████████████████████████████▊     | 2749/3171 [03:29<00:41, 10.18it/s]

Deleting EHAB_1min_parquet — only 620 files found.


 87%|█████████████████████████████████▊     | 2751/3171 [03:30<00:44,  9.42it/s]

Deleting HLLY_1min_parquet — only 1015 files found.
Deleting HKD_1min_parquet — only 606 files found.


 87%|█████████████████████████████████▊     | 2753/3171 [03:30<01:05,  6.34it/s]

Deleting LYEL_1min_parquet — only 877 files found.


 87%|█████████████████████████████████▉     | 2756/3171 [03:31<01:03,  6.57it/s]

Deleting TPG_1min_parquet — only 731 files found.


 88%|██████████████████████████████████▎    | 2786/3171 [03:32<00:14, 27.03it/s]

Deleting MDIA_1min_parquet — only 1045 files found.


 88%|██████████████████████████████████▍    | 2803/3171 [03:32<00:13, 28.06it/s]

Deleting SLVM_1min_parquet — only 803 files found.


 89%|██████████████████████████████████▌    | 2809/3171 [03:33<00:16, 21.66it/s]

Deleting RDW_1min_parquet — only 822 files found.


 89%|██████████████████████████████████▌    | 2814/3171 [03:33<00:19, 18.40it/s]

Deleting TNGX_1min_parquet — only 1008 files found.
Deleting HSAI_1min_parquet — only 462 files found.
Deleting TFPM_1min_parquet — only 574 files found.


 89%|██████████████████████████████████▋    | 2818/3171 [03:34<00:33, 10.53it/s]

Deleting TDUP_1min_parquet — only 934 files found.


 89%|██████████████████████████████████▊    | 2827/3171 [03:35<00:27, 12.44it/s]

Deleting MCW_1min_parquet — only 871 files found.


 89%|██████████████████████████████████▊    | 2834/3171 [03:35<00:26, 12.81it/s]

Deleting RDDT_1min_parquet — only 183 files found.
Deleting JSPR_1min_parquet — only 986 files found.


 90%|██████████████████████████████████▉    | 2839/3171 [03:36<00:27, 11.89it/s]

Deleting TMCI_1min_parquet — only 915 files found.


 90%|██████████████████████████████████▉    | 2841/3171 [03:36<00:33,  9.72it/s]

Deleting TLN_1min_parquet — only 108 files found.
Deleting CRC_1min_parquet — only 1033 files found.


 90%|██████████████████████████████████▉    | 2845/3171 [03:37<00:37,  8.75it/s]

Deleting SKWD_1min_parquet — only 480 files found.
Deleting TIXT_1min_parquet — only 970 files found.


 90%|███████████████████████████████████    | 2847/3171 [03:37<00:48,  6.63it/s]

Deleting MBLY_1min_parquet — only 534 files found.


 90%|███████████████████████████████████▏   | 2859/3171 [03:38<00:20, 15.36it/s]

Deleting TBBB_1min_parquet — only 211 files found.


 90%|███████████████████████████████████▏   | 2863/3171 [03:38<00:19, 15.44it/s]

Deleting TARS_1min_parquet — only 1044 files found.


 90%|███████████████████████████████████▎   | 2867/3171 [03:39<00:24, 12.28it/s]

Deleting TASK_1min_parquet — only 881 files found.


 91%|███████████████████████████████████▎   | 2870/3171 [03:39<00:32,  9.40it/s]

Deleting SLRN_1min_parquet — only 403 files found.
Deleting SLI_1min_parquet — only 860 files found.


 91%|███████████████████████████████████▋   | 2901/3171 [03:40<00:08, 33.08it/s]

Deleting TMDX_1min_parquet — only 1054 files found.
Deleting SLDP_1min_parquet — only 893 files found.


 92%|███████████████████████████████████▊   | 2908/3171 [03:41<00:16, 15.97it/s]

Deleting HPH_1min_parquet — only 931 files found.
Deleting VTS_1min_parquet — only 479 files found.


 92%|███████████████████████████████████▉   | 2924/3171 [03:42<00:12, 20.08it/s]

Deleting TLS_1min_parquet — only 1020 files found.


 92%|████████████████████████████████████   | 2932/3171 [03:42<00:13, 17.39it/s]

Deleting CRDO_1min_parquet — only 722 files found.


 93%|████████████████████████████████████   | 2937/3171 [03:43<00:14, 16.21it/s]

Deleting VTEX_1min_parquet — only 854 files found.


 93%|████████████████████████████████████▏  | 2944/3171 [03:44<00:23,  9.80it/s]

Deleting NE_1min_parquet — only 883 files found.
Deleting VSTS_1min_parquet — only 304 files found.


 93%|████████████████████████████████████▏  | 2947/3171 [03:46<00:40,  5.58it/s]

Deleting MAX_1min_parquet — only 1033 files found.


 93%|████████████████████████████████████▎  | 2953/3171 [03:47<00:45,  4.76it/s]

Deleting HLMN_1min_parquet — only 1010 files found.


 93%|████████████████████████████████████▍  | 2959/3171 [03:49<00:48,  4.41it/s]

Deleting BAM_1min_parquet — only 1054 files found.


 93%|████████████████████████████████████▍  | 2961/3171 [03:50<00:49,  4.26it/s]

Deleting EMBC_1min_parquet — only 677 files found.


 94%|████████████████████████████████████▍  | 2965/3171 [03:50<00:41,  5.00it/s]

Deleting VTYX_1min_parquet — only 789 files found.


 94%|████████████████████████████████████▍  | 2967/3171 [03:50<00:40,  5.09it/s]

Deleting RCAT_1min_parquet — only 1049 files found.


 94%|████████████████████████████████████▌  | 2971/3171 [03:51<00:34,  5.74it/s]

Deleting CRCT_1min_parquet — only 935 files found.


 94%|████████████████████████████████████▌  | 2973/3171 [03:51<00:35,  5.53it/s]

Deleting VTRS_1min_parquet — only 1025 files found.


 94%|████████████████████████████████████▌  | 2975/3171 [03:52<00:37,  5.19it/s]

Deleting CRBU_1min_parquet — only 852 files found.


 94%|████████████████████████████████████▋  | 2978/3171 [03:52<00:34,  5.65it/s]

Deleting CRBG_1min_parquet — only 563 files found.


 94%|████████████████████████████████████▋  | 2983/3171 [03:52<00:23,  7.88it/s]

Deleting SKYT_1min_parquet — only 917 files found.


 94%|████████████████████████████████████▊  | 2995/3171 [03:53<00:13, 12.88it/s]

Deleting TKO_1min_parquet — only 315 files found.


 95%|████████████████████████████████████▉  | 3006/3171 [03:53<00:07, 22.69it/s]

Deleting JOBY_1min_parquet — only 839 files found.


 95%|█████████████████████████████████████  | 3011/3171 [03:54<00:08, 18.44it/s]

Deleting HOOD_1min_parquet — only 848 files found.


 95%|█████████████████████████████████████  | 3015/3171 [03:54<00:10, 15.44it/s]

Deleting MBC_1min_parquet — only 503 files found.


 96%|█████████████████████████████████████▎ | 3029/3171 [03:54<00:06, 22.99it/s]

Deleting SILA_1min_parquet — only 125 files found.
Deleting VSTA_1min_parquet — only 1049 files found.


 96%|█████████████████████████████████████▎ | 3033/3171 [03:55<00:08, 16.36it/s]

Deleting SKIN_1min_parquet — only 1016 files found.
Deleting ELDN_1min_parquet — only 1054 files found.


 96%|█████████████████████████████████████▎ | 3036/3171 [03:56<00:14,  9.21it/s]

Deleting PAY_1min_parquet — only 892 files found.


 96%|█████████████████████████████████████▍ | 3041/3171 [03:56<00:13,  9.68it/s]

Deleting NEUE_1min_parquet — only 872 files found.


 96%|█████████████████████████████████████▌ | 3057/3171 [03:57<00:06, 18.14it/s]

Deleting PAX_1min_parquet — only 978 files found.
Deleting VSCO_1min_parquet — only 854 files found.


 97%|█████████████████████████████████████▋ | 3062/3171 [03:58<00:09, 11.25it/s]

Deleting HNST_1min_parquet — only 907 files found.


 97%|█████████████████████████████████████▋ | 3067/3171 [03:58<00:09, 11.26it/s]

Deleting NESR_1min_parquet — only 1037 files found.


 97%|█████████████████████████████████████▉ | 3084/3171 [03:59<00:04, 20.07it/s]

Deleting EH_1min_parquet — only 1052 files found.


 97%|██████████████████████████████████████ | 3090/3171 [03:59<00:04, 17.81it/s]

Deleting MASS_1min_parquet — only 1000 files found.


 98%|██████████████████████████████████████▎| 3113/3171 [04:00<00:02, 26.96it/s]

Deleting PACS_1min_parquet — only 169 files found.
Deleting HIPO_1min_parquet — only 985 files found.
Deleting EE_1min_parquet — only 669 files found.


 98%|██████████████████████████████████████▎| 3118/3171 [04:01<00:03, 17.58it/s]

Deleting ACAD_1min_parquet — only 1054 files found.


 99%|██████████████████████████████████████▌| 3139/3171 [04:01<00:01, 26.65it/s]

Deleting LZ_1min_parquet — only 868 files found.


100%|███████████████████████████████████████| 3171/3171 [04:02<00:00, 13.08it/s]


In [254]:
# Confirming that the number of tickers removed matches what we had previously
print("Supposed to remove", len(missing_days_data), "tickers")
print("Actually removed", len(tickers_removed), "tickers")

Supposed to remove 572 tickers
Actually removed 572 tickers


In [257]:
# Confirming length of folder after removing files
len(os.listdir(parquet_root))

2599

## 6. Cleaning the data
We'll begin our 3-step data cleaning process, and we'll ultimately get cleaned 1-day parquet files that all go from 4:00am to 7:59pm ET.

### 6.1 Forward-filling all missing data
This is the simplest step, and we're simply forward-filling all missing candlesticks so that we have a continuous stream of data. All OHLC prices will be equal to the Close price of the most recent candlestick, and their volumes will be set to 0.

Start by forward-filling all missing data so that we have a continuous stream of data. All OHLC prices will be equal to the Close price of the most recent candlestick, and their volumes will be set to 0.
Backfill any data to 4:00am (if data doesn't exist) based on the earliest PM 1-min candlestick. All OHLC prices will be equal to the first Open price in the PM, and their volumes will be set to 0.
Forward-fill any data to 7:59pm (if data doesn't exist) based on the last RTH (regular trading hours) or EHT (extended-hours trading) candlestick. All OHLC prices will be equal to the Close price of the last candlestick, and their volumes will be set to 0.