## Service alerts along CTA
Use the Chicago Transit Authority's [RSS feed](https://lapi.transitchicago.com/rss/) to read in service alerts along train lines and sort as pandas DataFrame

In [1]:
import pandas as pd
import requests
from bs4 import BeautifulSoup

In [154]:
# https://www.geeksforgeeks.org/python-requests-tutorial/
# https://beautiful-soup-4.readthedocs.io/en/latest/#installing-beautiful-soup

def read_rss_status(url):
    """Use requests, BeautifulSoup, and pandas to read in dates and descriptions of service alerts 
                                        from Chicago Transit Authority
    Inputs:
        url (str) RSS feed link, tested with .aspx file type
        
    Outputs:
        print() statement - status code for whether url was successfully read in, should be 200
        df (DataFrame) rows as each service alert, 
                       columns as indicated start time ("date_from"), end time ("date_to") and description ("description")
                       (note: date_from can be converted to datetime object, but date_to includes "TBD"
                             might be useful to replace all "TBD" with today's date and convert date_to to datetime as well)
    """
    # retrieve current version of webpage
    r = requests.get(url)
    print(r.status_code) # check that retrieval was successful
    
    # read in webpage content, parsed as XML file
    soup = BeautifulSoup(r.content, 'lxml')
    alerts = [i.description.string for i in soup.find_all("item")] # hold onto everything labeled 'item' on webpage
    df = pd.DataFrame(columns=['date_from', 'date_to', 'description']) # create table to organize item data better

    for alert in alerts:
        # split text by dates to/from and alert description
        dates, desc = alert.split(') ', 1) 
        # parse from and to dates
        d_from, d_to = dates.split('(')[-1].split(' to ')

        df = pd.concat([df, pd.DataFrame([{'date_from' : d_from,
                                   'date_to' : d_to,
                                   'description' : desc}])], ignore_index=True)
    return df

In [155]:
url = 'https://lapi.transitchicago.com/rss/railalertsrss.aspx'
df = read_rss_status(url)

200


In [153]:
df

Unnamed: 0,date_from,date_to,description
0,"Sun, Jul 24 2022 7:40 PM",TBD,Red Line trains are running w/residual delays ...
1,"Sun, May 16 2021 12:01 AM",TBD,Lawrence station is temporarily closed. Please...
2,"Sun, May 16 2021 12:01 AM",TBD,Berwyn station is temporarily closed. Please u...
3,"Mon, Jul 25 2022 10:00 PM","Tue, Jul 26 2022 4:00 AM",Blue Line trains in both directions will opera...
4,"Mon, Jul 25 2022 11:00 PM","Tue, Jul 26 2022 4:00 AM",Red Line trains will operate on the same track...
5,"Tue, Jul 26 2022 10:00 PM","Wed, Jul 27 2022 4:00 AM",Purple Line trains in both directions will ope...
6,"Tue, Jul 26 2022 10:00 PM","Wed, Jul 27 2022 4:00 AM",Blue Line trains in both directions will opera...
7,"Wed, Jul 27 2022 10:00 PM","Thu, Jul 28 2022 4:00 AM",Blue Line trains in both directions will opera...
8,"Wed, Jul 27 2022 11:00 PM","Thu, Jul 28 2022 4:00 AM",Red Line trains will operate on the same track...
9,"Thu, Jul 28 2022 10:00 PM","Fri, Jul 29 2022 4:00 AM",Purple Line trains in both directions will ope...


In [174]:
## new alerts feed from https://www.transitchicago.com/developers/alerts/ , requires different parsing
# r = requests.get('http://lapi.transitchicago.com/api/1.0/alerts.aspx')
# print(r.status_code)

# soup = BeautifulSoup(r.content, 'lxml')
# alerts = [i.headline.string for i in soup.find_all("alert")]
# alerts

## Alternative: Google Transit Feed Specification (GTFS) data
Use [GTFS scheduled service data provided by the CTA](https://www.transitchicago.com/developers/gtfs/), for use with Google Maps but might be limited to dates from now to a few months in the future...\
Contains list of csv's as txt files according to [Google Transit API info](https://developers.google.com/transit/gtfs/reference?csw=1) \
\
\
of particular interest is **calendar_dates.txt**:

| Field Name     | Description                                                                                                                                                                                               |   |   |   |   |   |   |   |   |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---|---|---|---|---|---|---|---|
| service_id     | Identifies a set of dates when a service exception occurs for one or more routes.                                                                                                                         |   |   |   |   |   |   |   |   |
| date           | Date when service exception occurs.                                                                                                                                                                       |   |   |   |   |   |   |   |   |
| exception_type | Indicates whether service is available on the date specified in the date field. Valid options are: 1 - Service has been added for the specified date. 2 - Service has been removed for the specified date.  |   |   |   |   |   |   |   |   |
|                |  