Skip to content

This program is designed to use Python's PANDAS data analytics libraries to generate a csv file for the specified range of years given by the user. This was designed to make thousands of pairwise comparisons to analysze the correlation between stock price and the percent increase the stock saw. The report is attached bellow.

SShivang/Bulk-Historical-Stock-Data-Retriever

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

Bulk-Historical-Stock-Data-Retriever

This program is designed to use Python's PANDAS data analytics libraries to generate a csv file for the specified range of years given by the user. This was designed to make thousands of pairwise comparisons to analysze the correlation between stock price and the percent increase the stock saw. The report is attached bellow.

I. INTRODUCTION:

Students don’t have much money to set aside to buy up expensive stocks.  Additionally, purchasing partials of stocks can come at a high fee, often offsetting the expected returns. In the stock challenge, many of the students saw large rate of returns from the cheaper stocks. The inspiration for this project came from noticing such observations. Furthermore, research shows that share price can have effects on the psychological behavior of investors. (Pinsent) Research has also shown that investors are more comfortable buying stocks that are lower in price. Stock splits, for example, are often done to keep shares accessible to the populace and to create psychological herd- like effects on the everyday investor. Lower stock prices also increase the liquidity of stocks for the everyday investor, increasing the amount of investment into the stock and, therefore, increasing the rate on investment. (Tonner)  However, a host of other factors also have an effect on the worth of a stock such as the P/E and P/S ratios. For example, an investor can park their money in a penny stock and never see any return. While the same investor can invest in a growth tech stock like Google and see large net returns. In this analysis, this host of factors was controlled by choosing stocks out of the S&P 500, which tend to be more well established than stocks on the general market. (Pei) Furthermore, on the S&P 500, stocks will have P/E and the P/S ratio within more acceptable bounds. Cheap stocks can also provide for growth since they are a characteristic of a smaller company. They can also provide an outlet for low volume customers. The theory is that if one buys 5000 stocks of a $1 stock, than if the price of the stock goes up 1 cent, one will make $50. However, if the same person takes a 500 stock position in a $10 stock, than if the stock goes up by 1 cent, one will only make $5. The goal of the project is to find out if cheaper stocks can grow at this rate and outgrow more expensive stocks over the long term as well as the short term.  In order to describe this correlation, tools from this course were used. The financial statistics unit, particularly the correlation and distribution subsections, was used to understand the relationship. Excel was utilized when calculating rate of returns, plotting these rate of returns, finding the volatility and the general applicability of the trends (distribution), as well as to find the strength of the relationship (correlation).

II. RESEARCH PIPELINE

Python’s panda package provides financial analysts tools to quickly analyze and retrieve stock data. It provides a method to retrieve historical stock prices for many hundreds of stocks at once from Yahoo Finance. The code for this is included in the appendix section. After running a python script, a csv file was generated with the historical stock prices of all the stocks on the S&P 500 within the period of time between 2012 and 2017. The reason for using the entire S&P 500 was to make the relationship as well as the analysis as robust as possible. Using more data points can also serve to make the trend found more applicable to more stocks. The S&P 500 index also screens for stocks that are well established. This means that penny stocks and fraudulent companies will not influence my analysis. Before conducting analysis, stocks with too much daily volatility were excluded from the analysis. If stocks with great daily volatility are included, that could mean that the analysis is less applicable as most stocks don’t have such volatility. It also means that the rate of return is less predictable, which is the entire point of this investigation . In order to do this, a measure of volatility was calculated for the time period within the timeframe of the project. A localized beta value was found for each stock by comparing how all these stocks did against the S&P 500 during the 5 year time period. In order to exclude outliers, stocks that were two standard deviations away from the mean were taken to effectively control for daily volatility. In order to analyze how the price of the stock correlated with the rate of return of the stock, stocks were split into three 33 percentile blocks (cheap, middle, and expensive) based on their closing prices 5 years ago. The rate of return was then calculated for each stock for each closing day and then averaged for each of the three blocks to conduct further analysis. Additionally, the rate of return for each stock at the end of the five years was then used to then observe the strength and the variance in the relation between price and rate of return.

III. RESULTS

By looking at the standard deviation of the localized beta, stocks that were too volatile for the investigation were excluded. 23 stocks from the S&P 500 were excluded. Additionally, the preliminary results were determined by looking at the returns of each of the three stock percentile categories between 2012 and 2017. The purpose of the graph below is to qualitatively see how each 33th percentile bloc did compared to each other as well as the market. 

This bar graph shows the average return of the the three categories that the stocks were broken into. In order to make this graph, all the stock rate of returns within each of the three blocks that stocks were divided into (based on the starting value of the stock in May 2012) were averaged to determine the average rate of return. The lowest cost block garnered an average return of about 153.78% at the end of the 5 year period. The middle cost garnered a return of about a 100% during the five year period and the high cost made about 81%. The market beat the high and middle cost stocks with a return of about 114%. With this knowledge it can be qualitatively be deduced that the lowest cost stocks performed better on average in the long term than the more expensive stocks at the end of the five years. However, time is also a factor that can greatly impact how a market does. A bull market, for example, may see different stocks performing better than other stocks. More speculative stocks can see massive jumps than traditional bear stocks which have lower beta values. Meanwhile in a bear market, bear stocks usually don’t go down as much as the speculative stocks, therefore outperforming them. Additionally, the long term and short term rate of returns can be different. Long term rate of returns of smaller companies can be smaller since they are not stable in the long term. Additionally, short term rate of returns of smaller companies can be higher since they are more volatile than the general market. However, this is not the conclusion that this report draws about the cheaper stocks on the S&P 500 from the graph that is graphed below.

To see how time applies to this inquiry, plot of the daily returns of the average of the three blocks were generated. Each line represents the average daily rate of return for about 160 stocks. The graph illustrates that even after accounting for time the trend observed on the bar graph above continues to hold true. This shows that even after including how these blocs did after a certain time, the lowest cost stocks seem to generate higher rate of returns. More importantly, the trend holds for both short term and long term time frames, meaning that the analysis performed on the long term rate of returns will generally hold true for the short term as well. However just looking at the average rate of returns doesn’t reveal any quantitative value. It doesn’t reveal how strong the relationship between price and rate of return is. Furthermore it doesn’t reveal how generally the relationship can be applied as this qualitative approach loses the sense of volatility.

In order to get the sense of deviation from the trend the Z-score of the average S&P 500 rate of return on the distribution of the cheap, middle, and expensive block was found. Additionally, the probability that each of the three blocks can beat the market was calculated. This approach can serve to undermine the results of outliers which the correlation analysis can sometimes not account for. Furthermore, this method of analysis can help determine how applicable the results of this inquiry are throughout sample set. For example, if one of the cheap stocks performs well above all other cheap stocks, this analysis will illustrate how much all the stocks in the cheap category differ. Comparing the probability of this can serve to inform if this hypothesis holds true for the cost of stocks more generally. Firstly, the standard deviation was determined alongside the average rate of return in order to calculate the Z-score. The values are listed below in the table. In order to calculate the Z-score, the average return from the S&P 500 was taken over the span of 5 years and subtracted from the average rate of return of each of the three categories. The average rate of return was determined by averaging all the rate of returns of each stock in its respective category for its performance at the end of the 5 year period. Then, it was divided by the standard deviation of each of the blocks and a Z-score was determined. This deviation was calculated using the CORREL function in excel.

Z= S&P 500 rate of return - average rate of return of the three blocksstandard deviation

Cheap Middle High Standard Deviation 162.8645573 86.54916399 80.9431181 Average Rate of Return 153.7876 102.2268 87.23279 Z-score of S&P 500 Returns 1.244298702 0.8639709564 0.6693083905 Percent Likely to outperform the market 40% 30.50% 24%

The table shows the percent likelihood that each of the three blocks  are to outperform the S&P 500. This is indicative of how well these stocks do in terms of rate of return. Cheap stocks have a 40% likelihood of outperforming the market meaning that accounting for volatility, cheaper stocks are likely to outperform their more expensive counterparts. This means that this inquiry can be applied to a broad amount of instances. 
Next, a correlation analysis was conducted in order to determine the strength of the relationship was discovered in the qualitative average rate of return analysis. In this correlation analysis the effect of the outliers was controlled to an extent. But the main reason behind it was to describe the strength as well as the significance of the relationship that this inquiry is trying to deduce between cost and rate of return.


This scatterplot illustrates the downwards trend noticed in the qualitative section. Despite some outliers, the trend is largely linear as changing the best fit line to exponential has little effect on the shape of the line. Therefore, a linear regression can be used to describe this correlation. This is because the bulk of the data points are on the bottom. The slope of this function is quite small and the trendline visually indicating that the relationship might not be that strong. But this strength is impossible to determine without finding the r score and the r2 score, which are both crucial in determining the strength and importance of our relationship. The r score can help determine the actual strength of the line based on the physical value generated. The r2  on the other hand illustrates how much of the variation in price affects the variation in the rate of return the stock saw over the span of 5 years. By conducting the same analysis conducted during several instances throughout the course, these two values can be determined. The excel tables used to calculate these values are found in the appendix. The r value (the correlation coefficient) that was found is  -.22, which is a weak negative correlation. This shows that these is a correlation between price and the rate of return of the stock, but this correlation is a weak, almost moderate negative relationship. Then next the r2  value was computed.  This value was a value of 0.052. This indicates that 5.2 percent of the variation of the rate of return over 5 years, can be accounted by the variation in price. This is a weak to moderate significance. 

IV. CONCLUSION: There were three major sets of analyses. The first set was a qualitative approach, in which conclusions about the correlation were determined by looking at the rate of return of the cheaper stocks was generally higher than the rate of return of the other stocks. The plot of average rate of return and time showed that the trend that was qualitatively observed can be applied for short term as well as long terms, since the difference between the trend lines was constant throughout the time period. Next, an analysis of the volatility and the general applicability was conducted. It was determined that cheap stocks across the scale were actually more likely to return a higher rate of return than the market than all the other stocks. Lastly, it was determined that the correlation that was noticed in the first part was a weak and negative correlation, going along with all the other observations that were noticed. Thus, the trend noticed is a weak negative trend, which stays constant over time, and can be applied generally since it is not the result of a few volatile outliers. This confirms the qualitative observations that was made in the preliminary graphs. This also makes sense from the intuition gathered about how the price of the stock affects the psychology of the buyer, who then buys more stocks of the cheaper stock, driving the price as well as the rate of return up. It also aligns with the reason that companies split stocks, in order to encourage more buying. Sample set is also an explanation for why the correlation is the way it is. However, if this analysis, was conducted with all stocks on the New York Stock Exchange (NYSE) , the correlation could have been weaker (as many penny stocks would drag down the rate of returns). So assuming that the buyer buys cheaper stocks of a well established company , this analysis shows that the buyer has a higher chance of making money.

About

This program is designed to use Python's PANDAS data analytics libraries to generate a csv file for the specified range of years given by the user. This was designed to make thousands of pairwise comparisons to analysze the correlation between stock price and the percent increase the stock saw. The report is attached bellow.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages