# CUNY Fall 2021 Academic Calendar Scraper

This notebook scrapes the CUNY Fall 2021 academic calendar from the CCNY registrar website and creates a pandas DataFrame with the calendar data.

**Objective**: Create a DataFrame with:
- Index: Python date objects
- Column 'dow': Day of the week
- Column 'text': Event description


## Import Required Libraries


In [3]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
from datetime import datetime
import re


## Scrape the Calendar Data


In [4]:
# URL of the CUNY Fall 2021 academic calendar
url = "https://www.ccny.cuny.edu/registrar/fall"

# Send GET request to the website
response = requests.get(url)
print(f"Status Code: {response.status_code}")

# Check if request was successful
if response.status_code == 200:
    print("Successfully retrieved the webpage!")
else:
    print(f"Failed to retrieve webpage. Status code: {response.status_code}")


Status Code: 200
Successfully retrieved the webpage!


In [5]:
# Parse the HTML content
soup = BeautifulSoup(response.content, 'html.parser')

# Find the calendar table - looking for the table with calendar data
table = soup.find('table')

if table:
    print("Found calendar table!")
    # Let's examine the table structure
    rows = table.find_all('tr')
    print(f"Number of rows found: {len(rows)}")
    
    # Print first few rows to understand structure
    for i, row in enumerate(rows[:3]):
        cells = row.find_all(['td', 'th'])
        print(f"Row {i}: {len(cells)} cells")
        for j, cell in enumerate(cells):
            print(f"  Cell {j}: {cell.get_text().strip()[:50]}...")
else:
    print("No table found - let's examine the page structure")
    print("Page title:", soup.title.get_text() if soup.title else "No title found")

Found calendar table!
Number of rows found: 37
Row 0: 3 cells
  Cell 0: DATES...
  Cell 1: DAYS...
  Cell 2: ...
Row 1: 3 cells
  Cell 0: August 01...
  Cell 1: Sunday...
  Cell 2: Application for degree for January and February 20...
Row 2: 3 cells
  Cell 0: August 18...
  Cell 1: Wednesday...
  Cell 2: Last day to apply for Study Abroad...
