
# Financial Data Collection using Yahoo Finance

This notebook implements a script for collecting financial data using Yahoo Finance. It follows these steps:

1. Importing necessary libraries.
2. Collecting S&P 500 symbols.
3. Defining the period for historical data.
4. Filtering valid symbols (symbols avalaible on the Wikipedia list from which you can access that in the given period).
5. Collecting data for valid symbols.

## 1. Importing Necessary Libraries

```python
import yfinance as yf # Free API to get stock data
import pandas as pd
import numpy as np
```

## 2. Collecting S&P 500 Symbols

The script reads symbols of S&P 500 companies from a Wikipedia page and stores them in a list.

```python
sp500_symbols = pd.read_html("https://en.wikipedia.org/wiki/List_of_S%26P_500_companies")[0]['Symbol'].tolist()
```

## 3. Defining the Period for Historical Data

Here we define the time period for which we want to collect historical data.

```python
start_date = "2018-01-01"
end_date = "2019-01-01"
```

## 4. Filtering Symbols

The script filters valid and empty symbols by getting historical data for each symbol and verifying its availability, while trying not to overload the API.

```python
valid_symbols = []
invalid_symbols = []

for symbol in sp500_symbols:
    try:
        # Code to get data goes here
        if not data.empty:
            valid_symbols.append(symbol)
    except:
        # Handling invalid symbols
        invalid_symbols.append(symbol)
```

## 5. Additional Steps After Filtering Symbols

### Saving Data for Valid Symbols

For each symbol in `valid_symbols`, we fetch its data with a 1d period (that you can change) and append it to a CSV file named 'data_stock.csv', including the date data.

```python
data = yf.download(valid_symbols, period="1d", start=start_date, end=end_date, auto_adjust=True)['Close']
data = data.reset_index()
data.to_csv("data/raw_data/data_stock.csv", index=False)

```

### Storing Invalid Symbols

The list of `invalid_symbols` is written into a text file named 'invalid_symbols.txt'.

```python
with open("data/raw_data/invalid_symbols.txt", "w") as f:
    f.write("\n".join(invalid_symbols))
```

## Conclusion

At the end of this process, you will have a CSV file `data_stock.csv` with data for all valid symbols and a text file `invalid_symbols.txt` containing the symbols for which data was not available.
