# Scraping CME Group's Economic Release Calendar

The CME Group economic release calendar is accessible via [this](https://www.cmegroup.com/education/events/economic-releases-calendar.html) HTML page. 

However, the calendar itself is loaded using AJAX. Taking a look in the javascript section of the page's source we find the following function:


```
function loadCalendar(monthParam,yearParam) {
	
	var month = (monthParam != undefined) ? monthParam : '-';
	var year = (yearParam != undefined)? yearParam : '-';
	
	var setEconomicCalendar = function(content) {
		$j('#ajaxEconomicCalendar').html(content);
	};
	$j.ajax({
		type: 'get',
		url: '/content/cmegroup/en/education/events/economic-releases-calendar/jcr:content/full-par/cmelayoutfull/full-par/cmeeconomycalendar.ajax.'+month+'.'+year+'.html',
		beforeSend: function() {
			var spinner = '<div class="cmeProgressPanel">Processing...</div>';
			setEconomicCalendar(spinner);
		},
		success: function(data) {
	        setEconomicCalendar(data);
		},
		error: function(xhr, status, error) {
			setEconomicCalendar('An error occurred: ' + error);
		}
	});
}
```
Which led me to the following:
https://www.cmegroup.com/content/cmegroup/en/education/events/economic-releases-calendar/jcr:content/full-par/cmelayoutfull/full-par/cmeeconomycalendar.ajax.10.2022.html

Months are numbered 0-11, which took me a couple more minutes to figure out than I would like to admit lol. 



A simple HTML page is returned, the body of which contains a number of `<div>` elements. One `<div class=DateLabel>` exists per date of the month, with HTML text of the date. Between each DateLabel div is a number of `<div>` elements of id="Event_1", one per event. They all seem to share that id value but the class attribute distinguishes the respective country, class="Event US/JP/NZ/ETC". 
Each event div will contain at least two, possibly three child elements:
- The first is an `<a href...>` link to the page on CME's website for that event, with a text value again including the respective country and now a text description of the event as well. 
- The second always-included child is a `<span>` element of class="Time" who's text contains the time of release. 
- The third possible child is an indication of whether the release, on the event's respective page on CME's website, includes a a table of data/"report". If so, an `<img>` tag [pointed](https://www.cmegroup.com/etc/clientlibs/cmegroup/cmegroupClientLibs/images/byreport_butt_new.gif) to a static image of a banner of the word "Report" is a present. We will record whether or not this tag is present

In [None]:
import requests
from bs4 import BeautifulSoup
from datetime import date

This first method simply takes a datetime date object and retrieves the matching monthly economic release calendar. A dictionary is returned, with one key/value pair per date of the month. The value of each dictionary element will be a list of events, each represented in its own dictionary. Example structure:

```
{
  "00" : [], # Should be empty unless problem reading dates
  "01" : [
    {
      "country" : "US",
      "name" : "Example Event",
      "time" : "12:00 PM ET",
      "link" : "https://....",
      "has_report" : True
    },
    ...
  ],
  "02" : [
    ...
  ],
  ...
}
```



In [None]:
def get_monthly_econ_cal(target_date):
  master_dict = {}

  # Check input
  if type(target_date) != date:
    print("Enter target month by passing a full datetime.date object")
    return master_dict

  # URL and UA
  cme_base_url = "https://www.cmegroup.com"
  calendar_url = "{}/content/cmegroup/en/education/events/economic-releases-calendar/jcr:content/full-par/cmelayoutfull/full-par/cmeeconomycalendar.ajax.{}.{}.html".format(cme_base_url, target_date.month - 1, target_date.year)
  request_headers = { "User-Agent" : "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36" }

  # Get contents
  response = requests.get(url = calendar_url, headers = request_headers)
  response.raise_for_status()
  calendar_soup = BeautifulSoup(response.content, "lxml")

  # Grab all of the div elements, loop through them
  current_date = "00" # Ff we somehow encounter events before a DateLabel or otherwise fail to read the dates, default to here
  master_dict[current_date] = []
  all_divs = calendar_soup.find_all("div")
  for cur_div in all_divs:

    # When we hit a DateLabel, update the current date we should be appending events to, and initialize its list in the master dict
    if "DateLabel" in cur_div.get("class"): # Remember, class is a multi value attribute and bs4 returns a list even if only 1 is supplied
      current_date = cur_div.text.strip().zfill(2)
      master_dict[current_date] = []

    # Event
    elif (cur_div.has_attr("id") and "Event_" in cur_div.get("id")):

      # Create a dict for it
      event_dict = {}

      # Grab the link and event name
      event_link = cur_div.find("a", attrs = { "href" : True })
      if not event_link:
        print("Date: {}".format(current_date))
        print("Found event with no link, check format: {}".format(cur_div))
        input()
      event_dict["link"] = cme_base_url + event_link.get("href")

      # Split event name into country and event
      event_name_split = event_link.text.split(":") # Example event name: "IN: PMI Manufacturing"
      event_dict["country"] = event_name_split[0].strip()
      event_dict["name"] = event_name_split[1].strip()

      # Grab time
      time_tag = cur_div.find("span", attrs = { "class" : "Time" })
      if not time_tag:
        print("Date: {}".format(current_date))
        print("Found event with no time, check format: {}".format(cur_div))
        input()
      event_dict["time"] = time_tag.text.strip()

      # Look for image indicated "report" style release
      report_img = cur_div.find("img", attrs = { "src" : True })
      if report_img:
        event_dict["has_report"] = True
      else:
        event_dict["has_report"] = False

      # Append the dictionary to the current date. The key should be intialized when he hit the DateLabel before the corresponding events
      master_dict[current_date].append(event_dict)

    else:
      pass

  return master_dict

In [None]:
# Target a specific date's calendar
def get_daily_econ_cal(target_date):
  day_list = []

  # Check input
  if type(target_date) != date:
    print("Enter target date by passing a full datetime.date object")
    return day_list

  # Get monthly calendar
  monthly_cal = get_monthly_econ_cal(target_date)
  if not monthly_cal:
    print("Failed to get monthly calendar for {}/{}".format(target_date.month, target_date.year))
    return day_list

  # Find the matching date's key
  day_list = monthly_cal.get(str(target_date.day))
  return day_list


In [None]:
test_date = date.today()

#print(get_monthly_econ_cal(test_date))
print(get_daily_econ_cal(test_date))