# Stock Market analysis with Yahoo Finance
---
---
## Goal
This project is intended to exemplify how to analyse the behaviour of stock market tickers from a trusted source as Yahoo Finance, and be able to make better informed decisions in regards of investments. 
## Importing the necessary
For this exercise we will be importing 'pandas' for working the data set, 'yfinance' for collecting online finance data, and 'datetime' for supplying classes to work with date and time which will be useful to stablish the time span to analyse the information we obtain from Yahoo Finance. Later on, we will be importing 'plotly.express' for our viz.

In [4]:
import pandas as pd
import yfinance as yf
from datetime import datetime

## Stablishing time span
In the following lines we create our variables 'start_date' which will be our past date to start our analysis from and we can see it determined by our current date _datetime.now()_ minus the date we want to start from, that for this case is three months in the past _pd.DateOffset(months=3)_ and our 'end_date' is basically today's date _datetime.now()_.

In [5]:
start_date = datetime.now() - pd.DateOffset(months=3)
end_date = datetime.now()

## Deciding our tickers
For this example we will be analysing the following tickers: Amazon (AMZN), Gold (GC=F), Tesla (TSLA), and Meta Platforms(META).

In [6]:
tickers = ['AMZN', 'GC=F', 'TSLA', 'META']

In [7]:
df_list = []

## Data Retrieval Loop

The 'df_list = [ ]' line initializes an empty list called _df_list_ where you'll store the historical data for each of the ticker symbols.
Then 'for ticker in tickers:', which is a loop, iterates through each ticker symbol in the tickers list one by one.
Afterwards, the 'data = yf.download(ticker, start=start_date, end=end_date)' line, uses the yf.download() function to fetch historical data for the current ticker. The start_date and end_date variables are three months ago from now and today's date (whenever this code is run), specifying the date range for which data should be retrieved.
Finally, 'df_list.append(data)' adds (appends) the data for the current ticker to the 'df_list', and this process repeats for each ticker in the list.

In [8]:
for ticker in tickers:
    data = yf.download(ticker, start=start_date, end=end_date)
    df_list.append(data)

[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed


Then, the lines below combine historical data for our multiple assets (each stored in a separate DataFrame within df_list) into a single DataFrame (df) with a multi-level index. The index levels are 'Ticker' and 'Date', allowing us to easily access and analyze the historical data for different assets over time. The print(df.head()) statement displays the first few rows of the combined DataFrame to provide an initial overview of the data.

In [9]:
df = pd.concat(df_list, keys=tickers, names=['Ticker', 'Date'])
print(df.head())

                         Open        High         Low       Close   Adj Close  \
Ticker Date                                                                     
AMZN   2023-06-26  129.330002  131.490005  127.099998  127.330002  127.330002   
       2023-06-27  128.630005  130.089996  127.550003  129.179993  129.179993   
       2023-06-28  128.940002  131.479996  128.440002  129.039993  129.039993   
       2023-06-29  128.770004  129.259995  127.260002  127.900002  127.900002   
       2023-06-30  129.470001  131.250000  128.949997  130.360001  130.360001   

                     Volume  
Ticker Date                  
AMZN   2023-06-26  59989300  
       2023-06-27  46801000  
       2023-06-28  52149500  
       2023-06-29  40761000  
       2023-06-30  54310500  


Then we add the line 'df = df.reset_index()', so the multi-level index previously created ('Ticker' and 'Date') are removed, and we'll have a DataFrame with a default integer index (0,1,2...n). The printed output from print(df.head()) will show the DataFrame with this modified structure.

In [10]:
df = df.reset_index()
print(df.head())

  Ticker       Date        Open        High         Low       Close  \
0   AMZN 2023-06-26  129.330002  131.490005  127.099998  127.330002   
1   AMZN 2023-06-27  128.630005  130.089996  127.550003  129.179993   
2   AMZN 2023-06-28  128.940002  131.479996  128.440002  129.039993   
3   AMZN 2023-06-29  128.770004  129.259995  127.260002  127.900002   
4   AMZN 2023-06-30  129.470001  131.250000  128.949997  130.360001   

    Adj Close    Volume  
0  127.330002  59989300  
1  129.179993  46801000  
2  129.039993  52149500  
3  127.900002  40761000  
4  130.360001  54310500  


## Importing plotly.express
Plotly Express is a Python library for creating interactive and expressive data visualizations with ease. It is built on top of Plotly, a popular graphing library for Python, and aims to simplify the process of creating various types of charts and plots. Plotly Express includes a high-level interface for creating a wide range of charts, including: Scatter plots, Line charts, Bar charts, Pie charts, Histograms, Box plots, Violin plots, Polar plots, 3D scatter plots, and Geographic maps (choropleth maps, scattergeo, etc.)  
Plotly Express is known for its simplicity and conciseness. With just a few lines of code, you can create interactive visualizations that can be easily customized and shared. It's particularly useful for data exploration and quick prototyping of charts.  
Now, let's type in the line to import our library.

In [11]:
import plotly.express as px

## Creating a line chart for comparison
The next lines are used to create and display a line chart for stock market data. Let's break it down:
1. 'fig' is a variable that will hold our line chart.
1. 'px.line' is a function from the Plotly Express library that creates our line chart.
1. 'df' is a DataFrame that contains the stock market data with our columns 'Date', 'Close', and 'Ticker'.
1. 'x='Date'' specifies that the 'Date' column from the DataFrame should be used as the horizontal (x-axis) values for the chart.
1. 'y='Close'' specifies that the 'Close' column from the DataFrame should be used as the vertical (y-axis) values for the chart. This typically represents the closing prices of stocks.
1. 'color='Ticker'' specifies that the 'Ticker' column should be used to color-code the lines in the chart. Each unique value in the 'Ticker' column will have its line with a different color.
1. 'title="Stock Market Performance for the Last 3 Months"' sets the title of the chart to "Stock Market Performance for the Last 3 Months."
1. and finally 'fig.show()' will show us the chart result.  
  
In summary, this code creates a line chart that shows the performance of our different stocks over the last 3 months. Each stock is represented by a line, and you can see how their closing prices have changed over time.

In [12]:
fig = px.line(df, x='Date', 
              y='Close', 
              color='Ticker', 
              title="Stock Market Performance for the Last 3 Months")
fig.show()

  v = v.dt.to_pydatetime()


In [14]:
fig = px.area(df, x='Date', y='Close', color='Ticker',
              facet_col='Ticker',
              labels={'Date':'Date', 'Close':'Closing Price', 'Ticker':'Company'},
              title='Stock Prices for Amazon, Gold, Tesla, and Meta')
fig.show()


The behavior of DatetimeProperties.to_pydatetime is deprecated, in a future version this will return a Series containing python datetime objects instead of an ndarray. To retain the old behavior, call `np.array` on the result

