# 1. Data Wrangling <a id="data_wrangling"></a>

<a id="contents"></a>
# Table of Contents  
1. [Data Wrangling](#data_wrangling)
    - [1.1 Introduction](#introduction)
    - [1.2 Imports](#imports)
    - [1.3 Load and Concatenate Individual Stock Datasets](#load)
    - [1.4 Dataset Cleaning](#cleaning)

## 1.1 Introduction<a id="introduction"></a>

### Problem:

The goal of this data science project is to develop a machine learning model capable of predicting stock prices for a selected set of publicly traded companies. By leveraging historical stock market data, along with relevant features such as financial indicators, market sentiment, and news headlines, the model aims to forecast future stock prices with a high degree of accuracy. The prediction of stock prices is of paramount importance to investors, traders, and financial institutions seeking to make informed decisions about buying, selling, or holding stocks. The developed model will provide valuable insights and actionable predictions that can potentially lead to improved investment strategies and better risk management in the dynamic and volatile stock market environment.

### Clients:

The findings of this study will be of interest to a broad range of stakeholders, specifically investors, traders, and financial institutions who use stock predictor models to make informed decisions about buying, selling, or holding beauty and wellness stocks.  

### Data:

The dataset for this project was downloaded from Kaggle and has been filtered and cleaned to only include 15 beauty adn wellness stock data. The primary goal is to develop a predictive model that analyze vast amounts of beiatyu adn wellness stock data and reacts to market changes much faster than humans, potentially leading to improved trading performance. However, it's important to note that predicting stock prices is inherently challenging due to the complexity and randomness of financial markets. While stock predictor models can provide valuable insights, they are not guaranteed to accurately forecast future prices. It's essential for investors to consider various factors, including market conditions, economic indicators, and company fundamentals, when making investment decisions.

Kaggle Dataset link: https://www.kaggle.com/datasets/footballjoe789/us-stock-dataset/data?select=Data

## 1.2 Imports<a id="inmports"></a>

In [1]:
import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt 
import seaborn as sns 
import os
import csv
from tqdm.notebook import tqdm
from datetime import datetime, timezone

## 1.3 Load the Data<a id="load"></a>

To begin, we have identified the 15 beauty and wellness stocks (listed below) that we will be analyzing. We then imported the csv files from the Kaggle dataset for the stocks selected for analysis. Each datset varies in row numbe6r but has the same columns. 

	#	Stock Name/Ref
	1	Cutera, Inc. (CUTR)
	2	The Beauty Health Company (SKIN)
	3	Helen of Troy Limited (HELE)
	4	COTY (COTY)
	5	Inter Parfums, Inc (IPAR)
	6	Hims & Hers Health, Inc. (HIMS)
	7	Spectrum Brands Holdings, Inc. (SPB)
	8	Unilever PLC (UL)
	9	e.l.f. Beauty, Inc. (EL)F
	10	International Flavors & Fragrances Inc. (IFF)
	11	The Estée Lauder Companies Inc. (EL)
	12	Ulta Beauty, Inc. (ULTA)
	13	Colgate-Palmolive Company (CL)
	14	Kenvue Inc. (KVUE)
	15	The Procter & Gamble Company (PG)

In [2]:
# Set the directory where your CSV files are located
folder_path = '/Users/heatheradler/Documents/GitHub/Springboard/Springboard_Projects/Capstone/archive/Data/StockHistory'

# Initialize an empty list to store dataframes
dfs = []

# Iterate through each file in the folder
for file_name in os.listdir(folder_path):
    if file_name.endswith('.csv'):
        # Extract the stock symbol from the file name
        stock_symbol = os.path.splitext(file_name)[0]
        
        #Filter for 15 Stocks
        stock_list = ['CUTR','SKIN','HELE','COTY','IPAR','HIMS','SPB','UL','ELF','IFF','EL','ULTA','CL','KVUE','PG']
        if stock_symbol in stock_list:
        
            # Load the CSV file into a dataframe
            file_path = os.path.join(folder_path, file_name)
            df = pd.read_csv(file_path)
        
        # Add a new column 'stock_symbol' with the stock symbol
            df['stock_symbol'] = stock_symbol
        
        # Append the dataframe to the list
            dfs.append(df)

# Concatenate all dataframes together
df = pd.concat(dfs, ignore_index=True)

# save concatenated dataframe
df.to_csv('/Users/heatheradler/Documents/GitHub/Springboard/Springboard_Projects/Capstone/archive/Concated_Dataframe.csv')


In [3]:
df.head()

Unnamed: 0,Date,Open,High,Low,Close,Volume,Dividends,Stock Splits,STOCHk_14_3_3,STOCHd_14_3_3,...,FWMA_10,WILLR_14,ISA_9,ISB_26,ITS_9,IKS_26,ICS_26,OBV,AD,stock_symbol
0,1988-02-04 00:00:00-05:00,0.0,0.442755,0.398479,0.398479,39083,0.0,0.0,,,...,,,,,,,0.309928,39083.0,-39083.0,IPAR
1,1988-02-05 00:00:00-05:00,0.0,0.442755,0.398479,0.398479,606488,0.0,0.0,,,...,,,,,,,0.309928,39083.0,-645571.0,IPAR
2,1988-02-08 00:00:00-05:00,0.0,0.442755,0.398479,0.398479,19440,0.0,0.0,,,...,,,,,,,0.309928,39083.0,-665011.0,IPAR
3,1988-02-09 00:00:00-05:00,0.0,0.442755,0.398479,0.398479,23288,0.0,0.0,,,...,,,,,,,0.309928,39083.0,-688299.0,IPAR
4,1988-02-10 00:00:00-05:00,0.0,0.442755,0.398479,0.398479,10530,0.0,0.0,,,...,,,,,,,0.309928,39083.0,-698829.0,IPAR


In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 107701 entries, 0 to 107700
Data columns (total 42 columns):
 #   Column           Non-Null Count   Dtype  
---  ------           --------------   -----  
 0   Date             107701 non-null  object 
 1   Open             107701 non-null  float64
 2   High             107701 non-null  float64
 3   Low              107701 non-null  float64
 4   Close            107701 non-null  float64
 5   Volume           107701 non-null  int64  
 6   Dividends        107701 non-null  float64
 7   Stock Splits     107701 non-null  float64
 8   STOCHk_14_3_3    107476 non-null  float64
 9   STOCHd_14_3_3    107446 non-null  float64
 10  RSI_14           107491 non-null  float64
 11  CMO_14           107491 non-null  float64
 12  CCI_14_0.015     107301 non-null  float64
 13  MACD_12_26_9     107326 non-null  float64
 14  MACDh_12_26_9    107206 non-null  float64
 15  MACDs_12_26_9    107206 non-null  float64
 16  PPO_12_26_9      107326 non-null  floa

## 1.4 Dataset Cleaning<a id="cleaning"></a>

The dataset does not require much cleaning as the columns are consistent. The date column was cleaned to make it a date type across all rows. 

In [5]:
df['Date'] = pd.to_datetime(df['Date'])

In [6]:
df['Date'] = pd.to_datetime(df['Date'],utc=True).dt.date

In [7]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 107701 entries, 0 to 107700
Data columns (total 42 columns):
 #   Column           Non-Null Count   Dtype  
---  ------           --------------   -----  
 0   Date             107701 non-null  object 
 1   Open             107701 non-null  float64
 2   High             107701 non-null  float64
 3   Low              107701 non-null  float64
 4   Close            107701 non-null  float64
 5   Volume           107701 non-null  int64  
 6   Dividends        107701 non-null  float64
 7   Stock Splits     107701 non-null  float64
 8   STOCHk_14_3_3    107476 non-null  float64
 9   STOCHd_14_3_3    107446 non-null  float64
 10  RSI_14           107491 non-null  float64
 11  CMO_14           107491 non-null  float64
 12  CCI_14_0.015     107301 non-null  float64
 13  MACD_12_26_9     107326 non-null  float64
 14  MACDh_12_26_9    107206 non-null  float64
 15  MACDs_12_26_9    107206 non-null  float64
 16  PPO_12_26_9      107326 non-null  floa

In [8]:
df['Date']

0         1988-02-04
1         1988-02-05
2         1988-02-08
3         1988-02-09
4         1988-02-10
             ...    
107696    2024-03-22
107697    2024-03-25
107698    2024-03-26
107699    2024-03-27
107700    2024-03-28
Name: Date, Length: 107701, dtype: object

In [9]:
df

Unnamed: 0,Date,Open,High,Low,Close,Volume,Dividends,Stock Splits,STOCHk_14_3_3,STOCHd_14_3_3,...,FWMA_10,WILLR_14,ISA_9,ISB_26,ITS_9,IKS_26,ICS_26,OBV,AD,stock_symbol
0,1988-02-04,0.000000,0.442755,0.398479,0.398479,39083,0.0,0.0,,,...,,,,,,,0.309928,39083.0,-3.908300e+04,IPAR
1,1988-02-05,0.000000,0.442755,0.398479,0.398479,606488,0.0,0.0,,,...,,,,,,,0.309928,39083.0,-6.455710e+05,IPAR
2,1988-02-08,0.000000,0.442755,0.398479,0.398479,19440,0.0,0.0,,,...,,,,,,,0.309928,39083.0,-6.650110e+05,IPAR
3,1988-02-09,0.000000,0.442755,0.398479,0.398479,23288,0.0,0.0,,,...,,,,,,,0.309928,39083.0,-6.882990e+05,IPAR
4,1988-02-10,0.000000,0.442755,0.398479,0.398479,10530,0.0,0.0,,,...,,,,,,,0.309928,39083.0,-6.988290e+05,IPAR
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
107696,2024-03-22,83.510002,83.959999,82.459999,83.080002,2382600,0.0,0.0,88.959958,89.329128,...,83.075496,-13.988746,80.240704,78.400043,82.377481,78.476168,,18785300.0,2.142535e+08,IFF
107697,2024-03-25,81.500000,82.959999,81.459999,82.010002,2060200,0.0,0.0,82.242403,86.597159,...,82.670228,-27.930518,80.240704,78.508441,82.382458,78.476168,,16725100.0,2.137042e+08,IFF
107698,2024-03-26,82.279999,82.279999,80.919998,81.440002,1956100,0.0,0.0,72.871266,81.357875,...,82.205385,-39.466939,80.240704,78.508441,82.382458,78.476168,,14769000.0,2.132439e+08,IFF
107699,2024-03-27,82.260002,85.680000,81.650002,85.639999,3319200,0.0,0.0,77.356739,77.490136,...,83.524614,-0.532327,80.240704,78.508441,83.042458,79.136168,,18088200.0,2.164972e+08,IFF


In [10]:
df['stock_symbol'].value_counts()

stock_symbol
PG      15667
CL      12839
IFF     12427
HELE    12034
SPB     11407
UL      11096
IPAR     9108
EL       7138
CUTR     5033
ULTA     4134
COTY     2717
ELF      1891
HIMS     1143
SKIN      840
KVUE      227
Name: count, dtype: int64

In [11]:
df.isnull().sum()

Date                   0
Open                   0
High                   0
Low                    0
Close                  0
Volume                 0
Dividends              0
Stock Splits           0
STOCHk_14_3_3        225
STOCHd_14_3_3        255
RSI_14               210
CMO_14               210
CCI_14_0.015         400
MACD_12_26_9         375
MACDh_12_26_9        495
MACDs_12_26_9        495
PPO_12_26_9          375
PPOh_12_26_9         375
PPOs_12_26_9         375
EMA_10               135
PSARl_0.02_0.2     50853
PSARs_0.02_0.2     56863
PSARaf_0.02_0.2        0
PSARr_0.02_0.2         0
ADX_14               405
DMP_14               210
DMN_14               210
BBL_5_2.0             60
BBM_5_2.0             60
BBU_5_2.0             60
BBB_5_2.0             60
BBP_5_2.0             60
FWMA_10              135
WILLR_14             389
ISA_9                765
ISB_26              1155
ITS_9                120
IKS_26               375
ICS_26               390
OBV                    0


In [12]:
rows_with_null = df[df['PSARs_0.02_0.2'].isnull()]
print(rows_with_null)

              Date       Open       High        Low      Close    Volume  \
0       1988-02-04   0.000000   0.442755   0.398479   0.398479     39083   
2       1988-02-08   0.000000   0.442755   0.398479   0.398479     19440   
4       1988-02-10   0.000000   0.442755   0.398479   0.398479     10530   
6       1988-02-12   0.000000   0.442755   0.398479   0.398479       405   
9       1988-02-18   0.000000   0.442755   0.354204   0.354204     21263   
...            ...        ...        ...        ...        ...       ...   
107695  2024-03-21  84.089996  84.360001  82.830002  83.320000   1559600   
107696  2024-03-22  83.510002  83.959999  82.459999  83.080002   2382600   
107697  2024-03-25  81.500000  82.959999  81.459999  82.010002   2060200   
107699  2024-03-27  82.260002  85.680000  81.650002  85.639999   3319200   
107700  2024-03-28  86.000000  86.410004  84.959999  85.989998  10232900   

        Dividends  Stock Splits  STOCHk_14_3_3  STOCHd_14_3_3  ...    FWMA_10  \
0     

### 1.4.1 Determining Columns to Keep <a id="columns_to_keep"></a>

Based on researching the meaning and significance of the columns containing formulas, we determined we would keep the columns listed below as they provide meaningful insight from our data set (We will be keeping all non-formualic columns):

**EMA_10**: This stands for Exponential Moving Average. Unlike a simple moving average (SMA), which gives equal weight to all data points in the period, the EMA gives more weight to recent data points. This weighting is exponentially decreasing, with the most recent data points having the greatest impact on the average. The '10' represents the period used for calculating the EMA. In this case, it's 10 periods, meaning the EMA is calculated based on the closing prices of the last 10 periods.

**PSARl_0.02_0.2**: This stands for Parabolic Stop and Reverse for long positions. It indicates that the PSAR value being calculated is specifically for long positions. The '.02' represents the acceleration factor or step used in the PSAR calculation. The acceleration factor determines the rate at which the PSAR moves closer to the price. In this case, it's set to 0.02. The '0.2' represents the maximum acceleration factor. Once the PSAR reaches this value, it stops increasing its acceleration and remains at this maximum value. In this case, it's set to 0.2.

**PSARs_0.02_0.2**: Parabolic Stop and Reverse Short. It is similar to PSAR Long but used for short positions. TZhe parameters (0.02, 0.2) refer to the acceleration factor and maximum acceleration factor used in the PSAR calculation- The acceleration factor (AF) starts at 0.02 and increases by 0.02 each time a new extreme point (EP) is reached. The maximum acceleration factor limits the AF to 0.2.

**BBL_5_2.0, BBM_5_2.0, BBU_5_2.0**: Bollinger Bands with parameters (5, 2.0). They consist of a middle band (BBM), upper band (BBU), and lower band (BBL). They are used to identify potential overbought or oversold conditions and measure volatility.
> <br>- The parameters (5, 2.0) typically represent:
>> <br>- The period (5): This is the number of periods used in calculating the moving average. In this case, it's 5 periods.
>> br>- The standard deviation multiplier (2.0): This is the number of standard deviations used to calculate the width of the bands. In this case, it's 2.0, meaning the bands will be placed 2 standard deviations above and below the moving average.

> <br>- **BBL_5_2.0** specifically refers to the lower Bollinger Band with a period of 5 and a standard deviation multiplier of 2.0. It helps traders identify potential support levels based on volatility around the moving average.
> <br>- **BBM_5_2.0** denotes the Middle Bollinger Band with a period of 5 and a standard deviation multiplier of 2.0. This band serves as a reference level for the price movement and can help traders identify potential trends.
> <br>- **BBU_5_2.0** denotes the Upper Bollinger Band with a period of 5 and a standard deviation multiplier of 2.0. The Upper Band serves as an upper boundary or resistance level, helping traders identify potential overbought conditions or areas where prices may revert to the mean.

**ISA_9, ISB_26, ITS_9, IKS_26, ICS_26**: Ichimoku Cloud components (Conversion Line, Base Line, Leading Span A, Leading Span B, and Lagging Span). They are used in the Ichimoku Cloud indicator to identify support, resistance, and trend direction.
> <br> The Ichimoku Cloud indicator is a comprehensive trend-following indicator used in technical analysis. It consists of several components, including the Senkou Span A and Senkou Span B lines, which together form the "cloud" or "kumo." These lines are calculated based on specific periods and plotted ahead of the current price to indicate potential future support and resistance levels.
> <br>- **ISA_9**- The Senkou Span A line, also known as "Leading Span A," is calculated as the average of the Tenkan-sen (Conversion Line) and Kijun-sen (Base Line) plotted forward by a certain number of periods. In the case of "ISA_9," it's calculated with a period of 9. Therefore, "ISA_9" represents the Ichimoku Senkou Span A with a period of 9 in the Ichimoku Cloud indicator. It's used to assess potential future support and resistance levels and to identify trend direction in the market.
> <br>- **ISB_26**- The Senkou Span B line, also known as "Leading Span B," is calculated similarly to Senkou Span A but with a longer period. It's typically calculated as the average of the highest high and lowest low over the past 26 periods, plotted forward by 26 periods. Therefore, "ISB_26" represents the Ichimoku Senkou Span B with a period of 26 in the Ichimoku Cloud indicator. It's used alongside Senkou Span A to provide additional insight into potential future support and resistance levels and to assess the overall trend direction in the market.
> <br>- **ITS_9**- The Tenkan-sen, or Conversion Line, is one of the components of the Ichimoku Cloud indicator. It's calculated as the average of the highest high and lowest low over the past 9 periods. The result is then plotted on the chart. Therefore, "ITS_9" represents the Tenkan-sen with a period of 9 in the Ichimoku Cloud indicator. It's used to assess short-term momentum and potential trend reversals in the market.
> <br>- **IKS_26**- The Kijun-sen, or Base Line, is another key component of the Ichimoku Cloud indicator. It's calculated as the average of the highest high and lowest low over the past 26 periods. The result is then plotted on the chart. Therefore, "IKS_26" represents the Kijun-sen with a period of 26 in the Ichimoku Cloud indicator. It's used to assess medium-term momentum and potential trend reversals in the market.
> <br>- **ICS_26**- The Chikou Span is used to assess momentum and confirm trend direction. When the Chikou Span is above the price plot, it suggests bullish momentum, indicating potential upward movement in prices. Conversely, when the Chikou Span is below the price plot, it suggests bearish momentum, indicating potential downward movement in prices. Therefore, "ICS_26" represents the Chikou Span with a period of 26 in the Ichimoku Cloud indicator, and it's used to confirm the strength and direction of the prevailing trend.


In [13]:
#Formulaic olumns to keep:

formulas_to_keep = ['stock_symbol','Date', 'Open', 'High', 'Low', 'Close','Volume', 'Dividends', 'Stock Splits', 'EMA_10', 'PSARl_0.02_0.2', 'PSARs_0.02_0.2', 'BBL_5_2.0', 'BBM_5_2.0', 'BBU_5_2.0', 'ISA_9', 'ISB_26', 'ITS_9', 'IKS_26', 'ICS_26']
df_1 = df[formulas_to_keep]
df_1

Unnamed: 0,stock_symbol,Date,Open,High,Low,Close,Volume,Dividends,Stock Splits,EMA_10,PSARl_0.02_0.2,PSARs_0.02_0.2,BBL_5_2.0,BBM_5_2.0,BBU_5_2.0,ISA_9,ISB_26,ITS_9,IKS_26,ICS_26
0,IPAR,1988-02-04,0.000000,0.442755,0.398479,0.398479,39083,0.0,0.0,,,,,,,,,,,0.309928
1,IPAR,1988-02-05,0.000000,0.442755,0.398479,0.398479,606488,0.0,0.0,,,0.442755,,,,,,,,0.309928
2,IPAR,1988-02-08,0.000000,0.442755,0.398479,0.398479,19440,0.0,0.0,,0.398479,,,,,,,,,0.309928
3,IPAR,1988-02-09,0.000000,0.442755,0.398479,0.398479,23288,0.0,0.0,,,0.442755,,,,,,,,0.309928
4,IPAR,1988-02-10,0.000000,0.442755,0.398479,0.398479,10530,0.0,0.0,,0.398479,,0.398479,0.398479,0.398479,,,,,0.309928
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
107696,IFF,2024-03-22,83.510002,83.959999,82.459999,83.080002,2382600,0.0,0.0,82.018345,80.431883,,82.581519,83.196935,83.812350,80.240704,78.400043,82.377481,78.476168,
107697,IFF,2024-03-25,81.500000,82.959999,81.459999,82.010002,2060200,0.0,0.0,82.016828,81.138944,,81.841910,82.916830,83.991751,80.240704,78.508441,82.382458,78.476168,
107698,IFF,2024-03-26,82.279999,82.279999,80.919998,81.440002,1956100,0.0,0.0,81.911951,,84.360001,81.058697,82.674001,84.289304,80.240704,78.508441,82.382458,78.476168,
107699,IFF,2024-03-27,82.260002,85.680000,81.650002,85.639999,3319200,0.0,0.0,82.589778,80.919998,,80.207456,83.098001,85.988546,80.240704,78.508441,83.042458,79.136168,


In [14]:
df_1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 107701 entries, 0 to 107700
Data columns (total 20 columns):
 #   Column          Non-Null Count   Dtype  
---  ------          --------------   -----  
 0   stock_symbol    107701 non-null  object 
 1   Date            107701 non-null  object 
 2   Open            107701 non-null  float64
 3   High            107701 non-null  float64
 4   Low             107701 non-null  float64
 5   Close           107701 non-null  float64
 6   Volume          107701 non-null  int64  
 7   Dividends       107701 non-null  float64
 8   Stock Splits    107701 non-null  float64
 9   EMA_10          107566 non-null  float64
 10  PSARl_0.02_0.2  56848 non-null   float64
 11  PSARs_0.02_0.2  50838 non-null   float64
 12  BBL_5_2.0       107641 non-null  float64
 13  BBM_5_2.0       107641 non-null  float64
 14  BBU_5_2.0       107641 non-null  float64
 15  ISA_9           106936 non-null  float64
 16  ISB_26          106546 non-null  float64
 17  ITS_9     

Next, we will identify the null values in the kept columns to be able to decide how to best remove/update null values as necessary. Based on the results of the null values below, we will be updating the null values in respective columns with the mean specific to each stock symbol. For example, the mean will be calculated based on values only from the stock symbol that is being calculated. 

In [15]:
df_1.isnull().sum()

stock_symbol          0
Date                  0
Open                  0
High                  0
Low                   0
Close                 0
Volume                0
Dividends             0
Stock Splits          0
EMA_10              135
PSARl_0.02_0.2    50853
PSARs_0.02_0.2    56863
BBL_5_2.0            60
BBM_5_2.0            60
BBU_5_2.0            60
ISA_9               765
ISB_26             1155
ITS_9               120
IKS_26              375
ICS_26              390
dtype: int64

### Updating Null Values
The below code calculates the mean separately for each stock symbol and only for null values, leaving existing values unchanged. This results in no null values in the data frame. 

In [17]:
# Created a mask for all null values in each field that contained null values
mask1 = df_1['EMA_10'].isnull() 
mask2 = df_1['PSARl_0.02_0.2'].isnull()
mask3 = df_1['PSARs_0.02_0.2'].isnull()
mask4 = df_1['BBL_5_2.0'].isnull()
mask5 = df_1['BBM_5_2.0'].isnull()
mask6 = df_1['BBU_5_2.0'].isnull()
mask7 = df_1['ISA_9'].isnull()
mask8 = df_1['ISB_26'].isnull()
mask9 = df_1['ITS_9'].isnull()
mask10 = df_1['IKS_26'].isnull()
mask11 = df_1['ICS_26'].isnull()

# Calculation performed separately for each stock symbol only for null values
df_1.loc[mask1, 'EMA_10'] = df_1[mask1].groupby('stock_symbol')['Close'].transform(lambda x: x.ewm(span=10, adjust=False).mean())
df_1.loc[mask2, 'PSARl_0.02_0.2'] = df_1[mask2].groupby('stock_symbol')['Close'].transform(lambda x: x.ewm(span=10, adjust=False).mean())
df_1.loc[mask3, 'PSARs_0.02_0.2'] = df_1[mask3].groupby('stock_symbol')['Close'].transform(lambda x: x.ewm(span=10, adjust=False).mean())
df_1.loc[mask4, 'BBL_5_2.0'] = df_1[mask4].groupby('stock_symbol')['Close'].transform(lambda x: x.ewm(span=10, adjust=False).mean())
df_1.loc[mask5, 'BBM_5_2.0'] = df_1[mask5].groupby('stock_symbol')['Close'].transform(lambda x: x.ewm(span=10, adjust=False).mean())
df_1.loc[mask6, 'BBU_5_2.0'] = df_1[mask6].groupby('stock_symbol')['Close'].transform(lambda x: x.ewm(span=10, adjust=False).mean())
df_1.loc[mask7, 'ISA_9'] = df_1[mask7].groupby('stock_symbol')['Close'].transform(lambda x: x.ewm(span=10, adjust=False).mean())
df_1.loc[mask8, 'ISB_26'] = df_1[mask8].groupby('stock_symbol')['Close'].transform(lambda x: x.ewm(span=10, adjust=False).mean())
df_1.loc[mask9, 'ITS_9'] = df_1[mask9].groupby('stock_symbol')['Close'].transform(lambda x: x.ewm(span=10, adjust=False).mean())
df_1.loc[mask10, 'IKS_26'] = df_1[mask10].groupby('stock_symbol')['Close'].transform(lambda x: x.ewm(span=10, adjust=False).mean())
df_1.loc[mask11, 'ICS_26'] = df_1[mask11].groupby('stock_symbol')['Close'].transform(lambda x: x.ewm(span=10, adjust=False).mean())

In [18]:
# Checking to confirm there are no longer any null values
df_1.isnull().sum()

stock_symbol      0
Date              0
Open              0
High              0
Low               0
Close             0
Volume            0
Dividends         0
Stock Splits      0
EMA_10            0
PSARl_0.02_0.2    0
PSARs_0.02_0.2    0
BBL_5_2.0         0
BBM_5_2.0         0
BBU_5_2.0         0
ISA_9             0
ISB_26            0
ITS_9             0
IKS_26            0
ICS_26            0
dtype: int64

### Date Type Update
Re-running the info method on the Dataframe confirmed the Date field was still an object type. The code below tranforms the Date fiel to a date field type for better analysis. 

In [21]:
df_1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 107701 entries, 0 to 107700
Data columns (total 20 columns):
 #   Column          Non-Null Count   Dtype  
---  ------          --------------   -----  
 0   stock_symbol    107701 non-null  object 
 1   Date            107701 non-null  object 
 2   Open            107701 non-null  float64
 3   High            107701 non-null  float64
 4   Low             107701 non-null  float64
 5   Close           107701 non-null  float64
 6   Volume          107701 non-null  int64  
 7   Dividends       107701 non-null  float64
 8   Stock Splits    107701 non-null  float64
 9   EMA_10          107701 non-null  float64
 10  PSARl_0.02_0.2  107701 non-null  float64
 11  PSARs_0.02_0.2  107701 non-null  float64
 12  BBL_5_2.0       107701 non-null  float64
 13  BBM_5_2.0       107701 non-null  float64
 14  BBU_5_2.0       107701 non-null  float64
 15  ISA_9           107701 non-null  float64
 16  ISB_26          107701 non-null  float64
 17  ITS_9     

In [22]:
import datetime as dt
dt.datetime.strptime('2024-01-03', "%Y-%m-%d")

datetime.datetime(2024, 1, 3, 0, 0)

In [23]:
date_to_check = dt.datetime.strptime('2024-01-03', "%Y-%m-%d")

df_1[df_1['Date'] == date_to_check]

Unnamed: 0,stock_symbol,Date,Open,High,Low,Close,Volume,Dividends,Stock Splits,EMA_10,PSARl_0.02_0.2,PSARs_0.02_0.2,BBL_5_2.0,BBM_5_2.0,BBU_5_2.0,ISA_9,ISB_26,ITS_9,IKS_26,ICS_26


In [24]:
df_1.Date = pd.to_datetime(df_1.Date)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_1.Date = pd.to_datetime(df_1.Date)


In [28]:
df_1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 107701 entries, 0 to 107700
Data columns (total 20 columns):
 #   Column          Non-Null Count   Dtype         
---  ------          --------------   -----         
 0   stock_symbol    107701 non-null  object        
 1   Date            107701 non-null  datetime64[ns]
 2   Open            107701 non-null  float64       
 3   High            107701 non-null  float64       
 4   Low             107701 non-null  float64       
 5   Close           107701 non-null  float64       
 6   Volume          107701 non-null  int64         
 7   Dividends       107701 non-null  float64       
 8   Stock Splits    107701 non-null  float64       
 9   EMA_10          107701 non-null  float64       
 10  PSARl_0.02_0.2  107701 non-null  float64       
 11  PSARs_0.02_0.2  107701 non-null  float64       
 12  BBL_5_2.0       107701 non-null  float64       
 13  BBM_5_2.0       107701 non-null  float64       
 14  BBU_5_2.0       107701 non-null  flo

In [27]:
# save concatenated dataframe
df.to_csv('/Users/heatheradler/Documents/GitHub/Springboard/Springboard_Projects/Capstone/archive/Concated_Dataframe.csv')
