# References

[SMB capital video: combining credit spread with a good one-day directional signal](https://www.youtube.com/watch?v=qabKcPmwjEA)

[How to use bs to scrape technical analysis signal for different time frames from investing.com](https://stackoverflow.com/a/57490089/15373476)

[Don't use this strategy on AM settled index options (monthly) - why](https://support.tastyworks.com/support/solutions/articles/43000435308-settlement-and-expiration-of-cash-settled-index-options)

[cash-settled index options specifications (monthly/weekly)](https://support.tastyworks.com/support/solutions/articles/43000435289?_sp=aec5afae-5774-4ba0-a71c-801743f9e2cb.1643836338440)

# Directional Signal from Investing.com

I will use the daily and weekly TA signals provided by www.investing.com. The website updates these signals about the close of market (`03:59PM GMT`). 

The following code scrape the signals data from the website and store to the csv file.

### How to scrape

The website generates signals for different periods with JavaScript. I followed the following steps to accomplish the task of scraping:

- Use dev tool to inspect the network activity of SPX's TA webpage, e.g. the activities trigerred by clicking the links of "5 hour", "daily" or "weekly":
    - Use `Network` tab in the dev tool
    - press `Clear` button to clear existing activity logs, so it's easier to observe the new logs triggered by the action
    - `headers` and `payload` show the information required to create the `POST` request later
    - `response` shows the elements that contain the signal data
    
    
- Use the data found in `headers` and `payload` to create a `POST` request to send to the website server, and a response shall be returned from the server, which contains a new HTML string. Parse this string to extract the signal data.


- Repeat the above step for all periods desired.


### Run Script Automatically on Schedule using `cron`

**The following script is saved under `/data_pipelines/ta_signals.py`**, it's run in the conda virtual environment of `quantra`. I use `cron` to schedule the script to run at:

- 6:30 am Mon-Fri (open of the market)
- 1:00 pm Mon-Fri (close of the market)

In order for `cron` to know the script shall run with the Python in the virual env, I added `!/Users/catelinn/miniconda3/envs/quantra/bin/python` at the very top of the file.

In [127]:
#!/Users/catelinn/miniconda3/envs/quantra/bin/python
'''
Scrape the technical analysis data from www.investing.com
for indices such as SPX and DJA.
'''

import requests
from bs4 import BeautifulSoup as bs
from datetime import datetime
import re



# Asset's pairID and period codes
# can be found in `payload` tab in the network log details    
periodLabels = {'5hr': 18000, 'daily': 86400, 'weekly': 'week'}
pairIDs = {'SPX': 166, 'DJA': 169}
urls = {'SPX':'https://www.investing.com/technical/us-spx-500-technical-analysis', 
        'DJA': 'https://www.investing.com/technical/dj-30-indices-technical-analysis'}


# Simulate post request to have the web server generate TA signal page
# for specified pair and period,
# which shall be returned in the response HTML
def fetch(pair:str='SPX', period:str ='weekly')-> list:
    
    # headers information can be found in Chrome dev tool -> 'network' tab
    headers = { 'User-Agent': 'Mozilla/5.0',
                'Content-Type': 'application/x-www-form-urlencoded',
                'Referer': urls[pair],
                'X-Requested-With': 'XMLHttpRequest'}
    
    body = {'pairID' : pairIDs[pair], 'period': periodLabels[period], 'viewType' : 'normal'}
    
    with requests.Session() as s:
        # send post request
        r = s.post('https://www.investing.com/technical/Service/GetStudiesContent', 
                   data = body, headers = headers)
        # parse the response
        soup = bs(r.content, 'lxml')
        signal = soup.select('#techStudiesInnerWrap .summary')[-1].select('span')[-1].text # the signal
        date = re.sub('M.*$', 'M', soup.find('div', id='updateTime').text) # when the signal last updated by the website
    
    return [pair, period, signal, date]
    
#print(fetch(period='daily'))

def save(data:list)-> None:
    
    import os
    f_path = '/Volumes/ExtremeSSD/github_repos/01_Trading_app_projects/data_pipelines/outputs/signals.csv'
    
    if os.path.isfile(f_path):
        with open(f_path, 'a') as f:
            f.write(','.join([str(i) for i in data])+'\n')
    else:
        with open(f_path, 'w') as f:
            f.write(','.join([str(i) for i in data])+'\n')

periods = ['daily', 'weekly']
pairs = ['SPX', 'DJA']


if __name__ == '__main__':
    for pair in pairs:
        for period in periods:
            data = fetch(pair, period)
            save(data)
            print(f'{data[0]} for {data[1]} updated on {data[3]}')
