
# ForexFactory: Webscraping of Announcements


In order to let this code download announcement records within a given time window from forexfactory.com
1. Input year, month, and day of start date
2. Input year, month, and day of end date

3. Forex Factory reports announcements at UTC-5 (New York) time.
  * Set timezone_UStoITA = 0 if you wish to keep UTC-5 (New York)
  * Set timezone_UStoITA = 1 if you wish to convert timezone to UTC+1 (Rome)
---
Note that events on the first date before 08:00 Italian time will not be reported, because ForexFactory reports events at UTC-5 New-York time, and therefore, they will are reported in start_date-1




In [None]:
start_year = 2024
start_month = 1
start_day = 1
end_year = 2024
end_month = 2
end_day = 8
timezone_UStoITA = 1

Load packages and modules

You need the module beautifulsoup4 to scrape data from the webpage

You need the module cloudscraper to bypass Cloudflare protection

In [None]:
!pip install beautifulsoup4
!pip install cloudscraper
import cloudscraper
from bs4 import BeautifulSoup
import datetime as dt
from datetime import *
import pandas as pd
import pytz

Function **url_forexfactory** builds the last part of url string, which is
needed to access announcement records on a given day. E.g., the url to access announcement records for 05-Feb-2024 is
https://www.forexfactory.com/calendar?day=Feb05.2024, the function builds string "calendar?day=Feb05.2024"

In [None]:
def url_forexfactory(data):
    month = data.strftime("%b")
    day = data.strftime("%d")
    year = data.year
    url_ff = f"calendar?day={month}{day}.{year}"
    return url_ff

Function **date_range** builds a list of the dates within the start_date and end_date boundaries

In [None]:
def date_range(start_date, end_date):
    date_list = []
    current_date = start_date
    while current_date < end_date:
        date_list.append(current_date)
        current_date += dt.timedelta(days=1)
    return date_list

The code recursively loops across each date within the provided time window, in order to retrieve announcement records. Each attribute of the announcement (e.g. currency, time) is stored in a list.

In [None]:
event_date = list()
event_time = list()
currency = list()
forecast = list()
previous = list()
actual = list()
impact = list()
event = list()
n_event = list()
start_date = dt.date(start_year,start_month,start_day)
end_date = dt.date(end_year,end_month,end_day)
date_list = date_range(start_date, end_date)
for j in date_list:
  date_url = url_forexfactory(j)
  scraper = cloudscraper.create_scraper()
  url = 'https://www.forexfactory.com/' + date_url
  page = scraper.get(url).text
  soup = BeautifulSoup(page, 'html.parser')
  table = soup.find('table', class_ = 'calendar__table')
  n_event = len(table.find_all('td', class_ = 'calendar__event'))

  for i in range(n_event):
    event_date.append(j.strftime("%d/%m/%Y"))
    currency.append(table.find_all('td', class_ = 'calendar__currency')[i].text.strip())
    forecast.append(table.find_all('td', class_ = 'calendar__forecast')[i].text.strip())
    previous.append(table.find_all('td', class_ = 'calendar__previous')[i].text.strip())
    actual.append(table.find_all('td', class_ = 'calendar__actual')[i].text.strip())
    event.append(table.find_all('td', class_ = 'calendar__event')[i].text.strip())
    impact.append(table.find_all('td', class_ = 'calendar__impact')[i].find_next('span')['class'][1])
    event_time.append(table.find_all('td', class_ = 'calendar__time')[i].text.strip())
  print(j)

The attribute "Impact" is refined, in order to display the color as either Red, Orange, Yellow, or Gray.

In [None]:
for i in range(len(impact)):
  if impact[i][16:len(impact[i])] == 'red':
      impact[i] = 'Red'
  elif impact[i][16:len(impact[i])] == 'ora':
      impact[i] = 'Orange'
  elif impact[i][16:len(impact[i])] == 'yel':
     impact[i] = 'Yellow'
  elif impact[i][16:len(impact[i])] == 'gra':
      impact[i] = 'Gray'

The "Time" attribute is refined, in order to display time in 24h HH:MM format.

If timezone_UStoITA = 1, the code changes the original time zone to UTC+1 (Rome).

In [None]:
for i in range(len(event_time)):
  if event_time[i][len(event_time[i])-2:len(event_time[i])] == 'am' or event_time[i][len(event_time[i])-2:len(event_time[i])] == 'pm':
    event_time[i] = datetime.strftime(datetime.strptime(event_time[i], "%I:%M%p"), "%H:%M")
  if event_time[i] == '':
    event_time[i] = event_time[i-1]
  if ((event_time[i] == 'All Day')  and (actual[i] != '')):
    event_time[i] = event_time [i-1]

if timezone_UStoITA == 1:
  for i in range(len(event_time)):
   if event_time[i][2] == ':':
     date_str = f"{event_date[i]} {event_time[i]}"
     date_obj = datetime.strptime(date_str, '%d/%m/%Y %H:%M')
     timestamp_ny = datetime(date_obj.year, date_obj.month, date_obj.day, date_obj.hour, date_obj.minute, tzinfo=pytz.timezone('America/New_York'))
     timestamp_rome = timestamp_ny.astimezone(pytz.timezone('Europe/Rome'))+timedelta(minutes=4)
     event_date[i] = timestamp_rome.strftime("%d/%m/%Y")
     event_time[i] = timestamp_rome.strftime("%H:%M")

Event records are stored in a dataframe and saved to the Excel spreadsheet "FF_startdate_enddate.xlsx"

In [None]:
month_file_start = start_date.strftime("%b")
day_file_start = start_date.strftime("%d")
year_file_start = start_date.year
month_file_end = end_date.strftime("%b")
day_file_end = end_date.strftime("%d")
year_file_end = end_date.year
xlsx_file = f"FF_{day_file_start}{month_file_start}{year_file_start}_{day_file_end}{month_file_end}{year_file_end}.xlsx"
ff_data = {'Date': event_date, 'Time': event_time, 'Event': event, 'Currency': currency, 'Impact': impact, 'Actual': actual, 'Forecast': forecast, 'Previous': previous}
ff_dataframe = pd.DataFrame(data=ff_data)
ff_dataframe.to_excel(xlsx_file,index=False)