# 1. Data Wrangling <a id="data_wrangling"></a>

<a id="contents"></a>
# Table of Contents  
1. [Data Wrangling](#data_wrangling)
    - [1.1 Introduction](#introduction)
    - [1.2 Imports](#imports)
    - [1.3 Load and Concatenate Individual Stock Datasets](#load)
    - [1.4 Dataset Cleaning](#cleaning)

## 1.1 Introduction<a id="introduction"></a>

### Problem:

The goal of this data science project is to develop a machine learning model capable of predicting stock prices for a selected set of publicly traded companies. By leveraging historical stock market data, along with relevant features such as financial indicators, market sentiment, and news headlines, the model aims to forecast future stock prices with a high degree of accuracy. The prediction of stock prices is of paramount importance to investors, traders, and financial institutions seeking to make informed decisions about buying, selling, or holding stocks. The developed model will provide valuable insights and actionable predictions that can potentially lead to improved investment strategies and better risk management in the dynamic and volatile stock market environment.

### Clients:

The findings of this study will be of interest to a broad range of stakeholders, specifically investors, traders, and financial institutions who use stock predictor models to make informed decisions about buying, selling, or holding beauty and wellness stocks.  

### Data:

The dataset for this project was downloaded from Kaggle and has been filtered and cleaned to only include 15 beauty and wellness stock data. The primary goal is to develop a predictive model that analyzes vast amounts of beauty and wellness stock data and reacts to market changes much faster than humans, potentially leading to improved trading performance. However, it's important to note that predicting stock prices is inherently challenging due to the complexity and randomness of financial markets. While stock predictor models can provide valuable insights, they are not guaranteed to accurately forecast future prices. It's essential for investors to consider various factors, including market conditions, economic indicators, and company fundamentals, when making investment decisions.

Kaggle Dataset link: https://www.kaggle.com/datasets/footballjoe789/us-stock-dataset/data?select=Data

## 1.2 Imports<a id="inmports"></a>

In [1]:
import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt 
import seaborn as sns 
import os
import csv
from tqdm.notebook import tqdm
from datetime import datetime, timezone

## 1.3 Load the Data<a id="load"></a>

To begin, we have identified the 4 beauty and wellness stocks (listed below) that we will be analyzing. We then imported the csv files from the Kaggle dataset for the stocks selected for analysis. Each datset varies in row number but has the same columns. 

	#	Stock Name/Ref
	1	The Estée Lauder Companies Inc. (EL)
	2	Ulta Beauty, Inc. (ULTA)
	3	COTY (COTY)
	4	e.l.f. Beauty, Inc. (ELF)

In [2]:
# Set the directory where your CSV files are located
folder_path = '/Users/heatheradler/Documents/GitHub/Springboard/Springboard_Projects/Capstone/archive/Data/StockHistory'

# Initialize an empty list to store dataframes
dfs = []

# Iterate through each file in the folder
for file_name in os.listdir(folder_path):
    if file_name.endswith('.csv'):
        # Extract the stock symbol from the file name
        stock_symbol = os.path.splitext(file_name)[0]
        
        #Filter for 15 Stocks
        stock_list = ['EL','ULTA','COTY','ELF']
        if stock_symbol in stock_list:
        
            # Load the CSV file into a dataframe
            file_path = os.path.join(folder_path, file_name)
            df = pd.read_csv(file_path)
        
        # Add a new column 'stock_symbol' with the stock symbol
            df['stock_symbol'] = stock_symbol
        
        # Append the dataframe to the list
            dfs.append(df)

# Concatenate all dataframes together
df = pd.concat(dfs, ignore_index=True)

# save concatenated dataframe
df.to_csv('/Users/heatheradler/Documents/GitHub/Springboard/Springboard_Projects/Capstone/archive/Concated_Dataframe.csv')


In [3]:
df.head()

Unnamed: 0,Date,Open,High,Low,Close,Volume,Dividends,Stock Splits,STOCHk_14_3_3,STOCHd_14_3_3,...,FWMA_10,WILLR_14,ISA_9,ISB_26,ITS_9,IKS_26,ICS_26,OBV,AD,stock_symbol
0,2007-10-25 00:00:00-04:00,28.728543,34.612703,28.570313,29.490023,7487306,0.0,0.0,,,...,,,,,,,26.008974,7487306.0,-5208027.0,ULTA
1,2007-10-26 00:00:00-04:00,30.211945,32.615056,28.926331,31.645901,1625582,0.0,0.0,,,...,,,,,,,25.563955,9112888.0,-4436638.0,ULTA
2,2007-10-29 00:00:00-04:00,32.130481,34.612704,32.130481,34.316025,668513,0.0,0.0,,,...,,,,,,,26.246321,9781401.0,-3927929.0,ULTA
3,2007-10-30 00:00:00-04:00,34.830269,35.206062,32.634834,35.037945,455543,0.0,0.0,,,...,,,,,,,27.185806,10236944.0,-3531956.0,ULTA
4,2007-10-31 00:00:00-04:00,35.987321,35.987321,32.585388,33.821556,393584,0.0,0.0,,,...,,,,,,,27.670383,9843360.0,-3639505.0,ULTA


In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 27914 entries, 0 to 27913
Data columns (total 42 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   Date             27914 non-null  object 
 1   Open             27914 non-null  float64
 2   High             27914 non-null  float64
 3   Low              27914 non-null  float64
 4   Close            27914 non-null  float64
 5   Volume           27914 non-null  int64  
 6   Dividends        27914 non-null  float64
 7   Stock Splits     27914 non-null  float64
 8   STOCHk_14_3_3    27839 non-null  float64
 9   STOCHd_14_3_3    27829 non-null  float64
 10  RSI_14           27844 non-null  float64
 11  CMO_14           27844 non-null  float64
 12  CCI_14_0.015     27791 non-null  float64
 13  MACD_12_26_9     27789 non-null  float64
 14  MACDh_12_26_9    27749 non-null  float64
 15  MACDs_12_26_9    27749 non-null  float64
 16  PPO_12_26_9      27789 non-null  float64
 17  PPOh_12_26_9

## 1.4 Dataset Cleaning<a id="cleaning"></a>

The dataset does not require much cleaning as the columns are consistent. The date column was cleaned to make it a date type across all rows. 

In [5]:
df['Date'] = pd.to_datetime(df['Date'])

In [6]:
df['Date'] = pd.to_datetime(df['Date'],utc=True).dt.date

In [7]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 27914 entries, 0 to 27913
Data columns (total 42 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   Date             27914 non-null  object 
 1   Open             27914 non-null  float64
 2   High             27914 non-null  float64
 3   Low              27914 non-null  float64
 4   Close            27914 non-null  float64
 5   Volume           27914 non-null  int64  
 6   Dividends        27914 non-null  float64
 7   Stock Splits     27914 non-null  float64
 8   STOCHk_14_3_3    27839 non-null  float64
 9   STOCHd_14_3_3    27829 non-null  float64
 10  RSI_14           27844 non-null  float64
 11  CMO_14           27844 non-null  float64
 12  CCI_14_0.015     27791 non-null  float64
 13  MACD_12_26_9     27789 non-null  float64
 14  MACDh_12_26_9    27749 non-null  float64
 15  MACDs_12_26_9    27749 non-null  float64
 16  PPO_12_26_9      27789 non-null  float64
 17  PPOh_12_26_9

In [8]:
df['Date']

0        2007-10-25
1        2007-10-26
2        2007-10-29
3        2007-10-30
4        2007-10-31
            ...    
27909    2024-03-22
27910    2024-03-25
27911    2024-03-26
27912    2024-03-27
27913    2024-03-28
Name: Date, Length: 27914, dtype: object

In [9]:
df

Unnamed: 0,Date,Open,High,Low,Close,Volume,Dividends,Stock Splits,STOCHk_14_3_3,STOCHd_14_3_3,...,FWMA_10,WILLR_14,ISA_9,ISB_26,ITS_9,IKS_26,ICS_26,OBV,AD,stock_symbol
0,2007-10-25,28.728543,34.612703,28.570313,29.490023,7487306,0.0,0.0,,,...,,,,,,,26.008974,7487306.0,-5.208027e+06,ULTA
1,2007-10-26,30.211945,32.615056,28.926331,31.645901,1625582,0.0,0.0,,,...,,,,,,,25.563955,9112888.0,-4.436638e+06,ULTA
2,2007-10-29,32.130481,34.612704,32.130481,34.316025,668513,0.0,0.0,,,...,,,,,,,26.246321,9781401.0,-3.927929e+06,ULTA
3,2007-10-30,34.830269,35.206062,32.634834,35.037945,455543,0.0,0.0,,,...,,,,,,,27.185806,10236944.0,-3.531956e+06,ULTA
4,2007-10-31,35.987321,35.987321,32.585388,33.821556,393584,0.0,0.0,,,...,,,,,,,27.670383,9843360.0,-3.639505e+06,ULTA
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
27909,2024-03-22,11.880000,11.970000,11.710000,11.780000,2998800,0.0,0.0,9.648984,9.401609,...,11.933986,-94.531273,12.095,12.010,12.255,12.500,,371033900.0,-2.073299e+08,COTY
27910,2024-03-25,11.760000,11.880000,11.570000,11.580000,1945500,0.0,0.0,4.086634,7.960177,...,11.795524,-99.295759,12.095,12.090,12.185,12.435,,369088400.0,-2.091499e+08,COTY
27911,2024-03-26,11.720000,11.820000,11.620000,11.650000,2689900,0.0,0.0,3.935589,5.890402,...,11.737692,-94.366203,12.095,12.095,12.095,12.435,,371778300.0,-2.110328e+08,COTY
27912,2024-03-27,11.730000,11.850000,11.560000,11.820000,4220500,0.0,0.0,8.173271,5.398498,...,11.766503,-81.818224,12.095,12.095,12.045,12.430,,375998800.0,-2.076855e+08,COTY


In [10]:
df['stock_symbol'].value_counts()

stock_symbol
HELE    12034
EL       7138
ULTA     4134
COTY     2717
ELF      1891
Name: count, dtype: int64

In [11]:
df.isnull().sum()

Date                   0
Open                   0
High                   0
Low                    0
Close                  0
Volume                 0
Dividends              0
Stock Splits           0
STOCHk_14_3_3         75
STOCHd_14_3_3         85
RSI_14                70
CMO_14                70
CCI_14_0.015         123
MACD_12_26_9         125
MACDh_12_26_9        165
MACDs_12_26_9        165
PPO_12_26_9          125
PPOh_12_26_9         125
PPOs_12_26_9         125
EMA_10                45
PSARl_0.02_0.2     13255
PSARs_0.02_0.2     14664
PSARaf_0.02_0.2        0
PSARr_0.02_0.2         0
ADX_14               135
DMP_14                70
DMN_14                70
BBL_5_2.0             20
BBM_5_2.0             20
BBU_5_2.0             20
BBB_5_2.0             20
BBP_5_2.0             20
FWMA_10               45
WILLR_14             123
ISA_9                255
ISB_26               385
ITS_9                 40
IKS_26               125
ICS_26               130
OBV                    0


In [12]:
rows_with_null = df[df['PSARs_0.02_0.2'].isnull()]
print(rows_with_null)

             Date       Open       High        Low      Close   Volume  \
0      2007-10-25  28.728543  34.612703  28.570313  29.490023  7487306   
1      2007-10-26  30.211945  32.615056  28.926331  31.645901  1625582   
2      2007-10-29  32.130481  34.612704  32.130481  34.316025   668513   
3      2007-10-30  34.830269  35.206062  32.634834  35.037945   455543   
4      2007-10-31  35.987321  35.987321  32.585388  33.821556   393584   
...           ...        ...        ...        ...        ...      ...   
27891  2024-02-27  13.200000  13.300000  13.050000  13.260000  3863000   
27892  2024-02-28  13.130000  13.210000  12.840000  12.890000  3815300   
27893  2024-02-29  12.920000  12.980000  12.510000  12.560000  4598500   
27894  2024-03-01  12.530000  12.690000  12.290000  12.670000  3748600   
27895  2024-03-04  12.670000  12.810000  12.440000  12.470000  2509700   

       Dividends  Stock Splits  STOCHk_14_3_3  STOCHd_14_3_3  ...    FWMA_10  \
0            0.0           0.0 

### 1.4.1 Determining Columns to Keep <a id="columns_to_keep"></a>

Based on researching the meaning and significance of the columns containing formulas, we determined we would keep the columns listed below as they provide meaningful insight from our data set (We will be keeping all non-formualic columns):

**EMA_10**: This stands for Exponential Moving Average. Unlike a simple moving average (SMA), which gives equal weight to all data points in the period, the EMA gives more weight to recent data points. This weighting is exponentially decreasing, with the most recent data points having the greatest impact on the average. The '10' represents the period used for calculating the EMA. In this case, it's 10 periods, meaning the EMA is calculated based on the closing prices of the last 10 periods.

**PSARl_0.02_0.2**: This stands for Parabolic Stop and Reverse for long positions. It indicates that the PSAR value being calculated is specifically for long positions. The '.02' represents the acceleration factor or step used in the PSAR calculation. The acceleration factor determines the rate at which the PSAR moves closer to the price. In this case, it's set to 0.02. The '0.2' represents the maximum acceleration factor. Once the PSAR reaches this value, it stops increasing its acceleration and remains at this maximum value. In this case, it's set to 0.2.

**PSARs_0.02_0.2**: Parabolic Stop and Reverse Short. It is similar to PSAR Long but used for short positions. TZhe parameters (0.02, 0.2) refer to the acceleration factor and maximum acceleration factor used in the PSAR calculation- The acceleration factor (AF) starts at 0.02 and increases by 0.02 each time a new extreme point (EP) is reached. The maximum acceleration factor limits the AF to 0.2.

**BBL_5_2.0, BBM_5_2.0, BBU_5_2.0**: Bollinger Bands with parameters (5, 2.0). They consist of a middle band (BBM), upper band (BBU), and lower band (BBL). They are used to identify potential overbought or oversold conditions and measure volatility.
> <br>- The parameters (5, 2.0) typically represent:
>> <br>- The period (5): This is the number of periods used in calculating the moving average. In this case, it's 5 periods.
>> br>- The standard deviation multiplier (2.0): This is the number of standard deviations used to calculate the width of the bands. In this case, it's 2.0, meaning the bands will be placed 2 standard deviations above and below the moving average.

> <br>- **BBL_5_2.0** specifically refers to the lower Bollinger Band with a period of 5 and a standard deviation multiplier of 2.0. It helps traders identify potential support levels based on volatility around the moving average.
> <br>- **BBM_5_2.0** denotes the Middle Bollinger Band with a period of 5 and a standard deviation multiplier of 2.0. This band serves as a reference level for the price movement and can help traders identify potential trends.
> <br>- **BBU_5_2.0** denotes the Upper Bollinger Band with a period of 5 and a standard deviation multiplier of 2.0. The Upper Band serves as an upper boundary or resistance level, helping traders identify potential overbought conditions or areas where prices may revert to the mean.

**ISA_9, ISB_26, ITS_9, IKS_26, ICS_26**: Ichimoku Cloud components (Conversion Line, Base Line, Leading Span A, Leading Span B, and Lagging Span). They are used in the Ichimoku Cloud indicator to identify support, resistance, and trend direction.
> <br> The Ichimoku Cloud indicator is a comprehensive trend-following indicator used in technical analysis. It consists of several components, including the Senkou Span A and Senkou Span B lines, which together form the "cloud" or "kumo." These lines are calculated based on specific periods and plotted ahead of the current price to indicate potential future support and resistance levels.
> <br>- **ISA_9**- The Senkou Span A line, also known as "Leading Span A," is calculated as the average of the Tenkan-sen (Conversion Line) and Kijun-sen (Base Line) plotted forward by a certain number of periods. In the case of "ISA_9," it's calculated with a period of 9. Therefore, "ISA_9" represents the Ichimoku Senkou Span A with a period of 9 in the Ichimoku Cloud indicator. It's used to assess potential future support and resistance levels and to identify trend direction in the market.
> <br>- **ISB_26**- The Senkou Span B line, also known as "Leading Span B," is calculated similarly to Senkou Span A but with a longer period. It's typically calculated as the average of the highest high and lowest low over the past 26 periods, plotted forward by 26 periods. Therefore, "ISB_26" represents the Ichimoku Senkou Span B with a period of 26 in the Ichimoku Cloud indicator. It's used alongside Senkou Span A to provide additional insight into potential future support and resistance levels and to assess the overall trend direction in the market.
> <br>- **ITS_9**- The Tenkan-sen, or Conversion Line, is one of the components of the Ichimoku Cloud indicator. It's calculated as the average of the highest high and lowest low over the past 9 periods. The result is then plotted on the chart. Therefore, "ITS_9" represents the Tenkan-sen with a period of 9 in the Ichimoku Cloud indicator. It's used to assess short-term momentum and potential trend reversals in the market.
> <br>- **IKS_26**- The Kijun-sen, or Base Line, is another key component of the Ichimoku Cloud indicator. It's calculated as the average of the highest high and lowest low over the past 26 periods. The result is then plotted on the chart. Therefore, "IKS_26" represents the Kijun-sen with a period of 26 in the Ichimoku Cloud indicator. It's used to assess medium-term momentum and potential trend reversals in the market.
> <br>- **ICS_26**- The Chikou Span is used to assess momentum and confirm trend direction. When the Chikou Span is above the price plot, it suggests bullish momentum, indicating potential upward movement in prices. Conversely, when the Chikou Span is below the price plot, it suggests bearish momentum, indicating potential downward movement in prices. Therefore, "ICS_26" represents the Chikou Span with a period of 26 in the Ichimoku Cloud indicator, and it's used to confirm the strength and direction of the prevailing trend.


In [13]:
#Formulaic olumns to keep:

formulas_to_keep = ['stock_symbol','Date', 'Open', 'High', 'Low', 'Close','Volume', 'Dividends', 'Stock Splits', 'EMA_10', 'PSARl_0.02_0.2', 'PSARs_0.02_0.2', 'BBL_5_2.0', 'BBM_5_2.0', 'BBU_5_2.0', 'ISA_9', 'ISB_26', 'ITS_9', 'IKS_26', 'ICS_26']
df_1 = df[formulas_to_keep]
df_1

Unnamed: 0,stock_symbol,Date,Open,High,Low,Close,Volume,Dividends,Stock Splits,EMA_10,PSARl_0.02_0.2,PSARs_0.02_0.2,BBL_5_2.0,BBM_5_2.0,BBU_5_2.0,ISA_9,ISB_26,ITS_9,IKS_26,ICS_26
0,ULTA,2007-10-25,28.728543,34.612703,28.570313,29.490023,7487306,0.0,0.0,,,,,,,,,,,26.008974
1,ULTA,2007-10-26,30.211945,32.615056,28.926331,31.645901,1625582,0.0,0.0,,28.570313,,,,,,,,,25.563955
2,ULTA,2007-10-29,32.130481,34.612704,32.130481,34.316025,668513,0.0,0.0,,28.570313,,,,,,,,,26.246321
3,ULTA,2007-10-30,34.830269,35.206062,32.634834,35.037945,455543,0.0,0.0,,28.812009,,,,,,,,,27.185806
4,ULTA,2007-10-31,35.987321,35.987321,32.585388,33.821556,393584,0.0,0.0,,29.195652,,28.800881,32.86229,36.923698,,,,,27.670383
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
27909,COTY,2024-03-22,11.880000,11.970000,11.710000,11.780000,2998800,0.0,0.0,12.150920,,12.683971,11.730597,11.94800,12.165403,12.095,12.010,12.255,12.500,
27910,COTY,2024-03-25,11.760000,11.880000,11.570000,11.580000,1945500,0.0,0.0,12.047116,,12.567094,11.533180,11.84600,12.158819,12.095,12.090,12.185,12.435,
27911,COTY,2024-03-26,11.720000,11.820000,11.620000,11.650000,2689900,0.0,0.0,11.974913,,12.427501,11.456524,11.79200,12.127476,12.095,12.095,12.095,12.435,
27912,COTY,2024-03-27,11.730000,11.850000,11.560000,11.820000,4220500,0.0,0.0,11.946747,,12.307451,11.510884,11.74800,11.985116,12.095,12.095,12.045,12.430,


In [14]:
df_1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 27914 entries, 0 to 27913
Data columns (total 20 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   stock_symbol    27914 non-null  object 
 1   Date            27914 non-null  object 
 2   Open            27914 non-null  float64
 3   High            27914 non-null  float64
 4   Low             27914 non-null  float64
 5   Close           27914 non-null  float64
 6   Volume          27914 non-null  int64  
 7   Dividends       27914 non-null  float64
 8   Stock Splits    27914 non-null  float64
 9   EMA_10          27869 non-null  float64
 10  PSARl_0.02_0.2  14659 non-null  float64
 11  PSARs_0.02_0.2  13250 non-null  float64
 12  BBL_5_2.0       27894 non-null  float64
 13  BBM_5_2.0       27894 non-null  float64
 14  BBU_5_2.0       27894 non-null  float64
 15  ISA_9           27659 non-null  float64
 16  ISB_26          27529 non-null  float64
 17  ITS_9           27874 non-null 

Next, we will identify the null values in the kept columns to be able to decide how to best remove/update null values as necessary. Based on the results of the null values below, we will be updating the null values in respective columns with the mean specific to each stock symbol. For example, the mean will be calculated based on values only from the stock symbol that is being calculated. 

In [15]:
df_1.isnull().sum()

stock_symbol          0
Date                  0
Open                  0
High                  0
Low                   0
Close                 0
Volume                0
Dividends             0
Stock Splits          0
EMA_10               45
PSARl_0.02_0.2    13255
PSARs_0.02_0.2    14664
BBL_5_2.0            20
BBM_5_2.0            20
BBU_5_2.0            20
ISA_9               255
ISB_26              385
ITS_9                40
IKS_26              125
ICS_26              130
dtype: int64

### Updating Null Values
The below code calculates the mean separately for each stock symbol and only for null values, leaving existing values unchanged. This results in no null values in the data frame. 

In [16]:
# Created a mask for all null values in each field that contained null values
mask1 = df_1['EMA_10'].isnull() 
mask2 = df_1['PSARl_0.02_0.2'].isnull()
mask3 = df_1['PSARs_0.02_0.2'].isnull()
mask4 = df_1['BBL_5_2.0'].isnull()
mask5 = df_1['BBM_5_2.0'].isnull()
mask6 = df_1['BBU_5_2.0'].isnull()
mask7 = df_1['ISA_9'].isnull()
mask8 = df_1['ISB_26'].isnull()
mask9 = df_1['ITS_9'].isnull()
mask10 = df_1['IKS_26'].isnull()
mask11 = df_1['ICS_26'].isnull()

# Calculation performed separately for each stock symbol only for null values
df_1.loc[mask1, 'EMA_10'] = df_1[mask1].groupby('stock_symbol')['Close'].transform(lambda x: x.ewm(span=10, adjust=False).mean())
df_1.loc[mask2, 'PSARl_0.02_0.2'] = df_1[mask2].groupby('stock_symbol')['Close'].transform(lambda x: x.ewm(span=10, adjust=False).mean())
df_1.loc[mask3, 'PSARs_0.02_0.2'] = df_1[mask3].groupby('stock_symbol')['Close'].transform(lambda x: x.ewm(span=10, adjust=False).mean())
df_1.loc[mask4, 'BBL_5_2.0'] = df_1[mask4].groupby('stock_symbol')['Close'].transform(lambda x: x.ewm(span=10, adjust=False).mean())
df_1.loc[mask5, 'BBM_5_2.0'] = df_1[mask5].groupby('stock_symbol')['Close'].transform(lambda x: x.ewm(span=10, adjust=False).mean())
df_1.loc[mask6, 'BBU_5_2.0'] = df_1[mask6].groupby('stock_symbol')['Close'].transform(lambda x: x.ewm(span=10, adjust=False).mean())
df_1.loc[mask7, 'ISA_9'] = df_1[mask7].groupby('stock_symbol')['Close'].transform(lambda x: x.ewm(span=10, adjust=False).mean())
df_1.loc[mask8, 'ISB_26'] = df_1[mask8].groupby('stock_symbol')['Close'].transform(lambda x: x.ewm(span=10, adjust=False).mean())
df_1.loc[mask9, 'ITS_9'] = df_1[mask9].groupby('stock_symbol')['Close'].transform(lambda x: x.ewm(span=10, adjust=False).mean())
df_1.loc[mask10, 'IKS_26'] = df_1[mask10].groupby('stock_symbol')['Close'].transform(lambda x: x.ewm(span=10, adjust=False).mean())
df_1.loc[mask11, 'ICS_26'] = df_1[mask11].groupby('stock_symbol')['Close'].transform(lambda x: x.ewm(span=10, adjust=False).mean())

In [17]:
# Checking to confirm there are no longer any null values
df_1.isnull().sum()

stock_symbol      0
Date              0
Open              0
High              0
Low               0
Close             0
Volume            0
Dividends         0
Stock Splits      0
EMA_10            0
PSARl_0.02_0.2    0
PSARs_0.02_0.2    0
BBL_5_2.0         0
BBM_5_2.0         0
BBU_5_2.0         0
ISA_9             0
ISB_26            0
ITS_9             0
IKS_26            0
ICS_26            0
dtype: int64

### Date Type Update
Re-running the info method on the Dataframe confirmed the Date field was still an object type. The code below tranforms the Date fiel to a date field type for better analysis. 

In [18]:
df_1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 27914 entries, 0 to 27913
Data columns (total 20 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   stock_symbol    27914 non-null  object 
 1   Date            27914 non-null  object 
 2   Open            27914 non-null  float64
 3   High            27914 non-null  float64
 4   Low             27914 non-null  float64
 5   Close           27914 non-null  float64
 6   Volume          27914 non-null  int64  
 7   Dividends       27914 non-null  float64
 8   Stock Splits    27914 non-null  float64
 9   EMA_10          27914 non-null  float64
 10  PSARl_0.02_0.2  27914 non-null  float64
 11  PSARs_0.02_0.2  27914 non-null  float64
 12  BBL_5_2.0       27914 non-null  float64
 13  BBM_5_2.0       27914 non-null  float64
 14  BBU_5_2.0       27914 non-null  float64
 15  ISA_9           27914 non-null  float64
 16  ISB_26          27914 non-null  float64
 17  ITS_9           27914 non-null 

In [19]:
import datetime as dt
dt.datetime.strptime('2024-01-03', "%Y-%m-%d")

datetime.datetime(2024, 1, 3, 0, 0)

In [20]:
df_1.Date = pd.to_datetime(df_1.Date)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_1.Date = pd.to_datetime(df_1.Date)


In [21]:
df_1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 27914 entries, 0 to 27913
Data columns (total 20 columns):
 #   Column          Non-Null Count  Dtype         
---  ------          --------------  -----         
 0   stock_symbol    27914 non-null  object        
 1   Date            27914 non-null  datetime64[ns]
 2   Open            27914 non-null  float64       
 3   High            27914 non-null  float64       
 4   Low             27914 non-null  float64       
 5   Close           27914 non-null  float64       
 6   Volume          27914 non-null  int64         
 7   Dividends       27914 non-null  float64       
 8   Stock Splits    27914 non-null  float64       
 9   EMA_10          27914 non-null  float64       
 10  PSARl_0.02_0.2  27914 non-null  float64       
 11  PSARs_0.02_0.2  27914 non-null  float64       
 12  BBL_5_2.0       27914 non-null  float64       
 13  BBM_5_2.0       27914 non-null  float64       
 14  BBU_5_2.0       27914 non-null  float64       
 15  IS

In [22]:
# save concatenated dataframe
df_1.to_csv('/Users/heatheradler/Documents/GitHub/Springboard/Springboard_Projects/Stock_Predictor_Capstone/Concated_Dataframe.csv')
