# Annual Ridership by RTPA Report Outline #

See [Github issue #1112 - Research Rquest - Annual NTD by RTPA report](https://github.com/cal-itp/data-analyses/issues/1112)


This notebook will outline the methodology of developing the new `Annual Ridership by RTPA Report` Portfolio site (Annual Report).
Similar to the `monthly ridership by RTPA` portfolio site, the Annual Report will:
    
- use data from the FTA NTD 
- clean, prepare, transform new columns 
- export to gcs 
- create charts
- deploy online

The Annual Report will take inspiration from the Monthly Ridership Report in that the intended output will be a portfolio site where each chapter is a seperate RTPA from paramaterized notebook.



## Goals 

### be able to read in and clean up annual ridership data
- test reading in data from warehouse, use siuba?
- filter data by agencies within the CA RTPAs
- apply RTPA crosswalk to agencies
<br>
<br>
### create a paramaterized portfolio
- create template notebook >> inside data-analyses/ntd_annual_ridership
- populate cells according to cal-itp docs and example notebook
- reference the monthly ridership notebook as needed
- create `README.md` file >> inside data-analyses/ntd_annual_ridership
<br>
<br>
### create script to clean and seperate ridership data by rtpa
- ingest data (from either ntd or warehouse)
- filters data by CA
- add new columns for % change, TOS, Mode
- splits data into RTPA
- save each RTPA as seperate csv
- zips all the CSVs to signular file
- export to GCS
<br>
<br>
### create annual reoprt MAKEFILE
- create `MAKEFILE` in `data-analyses/ntd_annual_ridership`
- this make file will work similar to the Annual Report MAKEFILE
- execute script to prepare data for report
<br>
<br>
### prepare files for portfolio folder
- create `{site}.yml` for portfolio/sites folder >> `ntd_annual_ridership.yml`
- create `_config.ylm` inside portofolio/ntd_annual_ridership folder
- create `_toc.yml` inside portofolio/ntd_annual_ridership folder
- create ntd_annual_ridership folder in portfolio directory
<br>
<br>
### add to root MAKEFILE
- add section for `build_ntd_annual_report`

    

## Outline

- Review data sources, see what information can be reported
    - VOMS, VRM, VRH, UPT
    - how to filter this data to only show agencies in CA?
    - how to assign the CA agencies to a RTPA?
    - confirm with Tiffany if `dim_annual_ntd_agency_information` has the ridership data I need.
- start notebook to test which imports and functions are needed >> see monthly ridership report files
- review of portfolio process (which make files to hit, which folders in `data-analyses/porfolio` to make, etc)
- start draft of all required files
    - **STARTED** notebook for parameterization, taking pieces from the `monthly ridership` NB
    - **STARTED** various .yml files
        - {site}.yml
        - _config.yml
        - _toc.yml
    - scripts for, similar to 
        - cleaning data
        - spliting into serperate RTPA files, exporting to GCS
        - producing .yml for portffolio/site?
- place files in correct locations
    - files going into `data-analyses/portfolio`
    - files going into `portfolio/site`
- test notebook parameterization
    - ensure a notebook is created for each RTPA
- test deploying portfolio with netlify
    - see what a draft site looks like


## Overview of Cal-ITP Portfolio Docs

[Link to Docs here](https://docs.calitp.org/data-infra/publishing/sections/4_analytics_portfolio_site.html)
- COMPLETE ~~set up Netlify key~~
    - but need to log in to see if my keys need refreshing
- COMPLETE ~~create README.md~~
    - see `data-analyses/ntd_annual_ridership/README.md`
- create a `{site}.yml` file in `data-analyses/portfolio/site`
- create `_config.yml` file
- create `_toc.yml` file

- prepare notebook
    - [see sample parameterized NB here](https://github.com/cal-itp/data-analyses/blob/main/starter_kit/parameterized_notebook.ipynb)
- build report with `python portfolio/portfolio.py build my_report`
- deploy report with `python portfolio/portfolio.py build my_report --deploy`


## notes about preparing notebook
- must have all the elements for notebook to parameterize.
- `RTPA` will be the parameter for this notebook. aka a notebook will be created for each RTPA
- 


In [None]:
rtpa = 'Metropolitan Transportation Commission'

In [None]:
%%capture_parameters
rtpa

## Questions: 

### ~~Where should this file live?~~
- should the annual ridership report live in the same `ntd` folder as the `monthly_ridership_report` files?
- ~~i believe it can, but will have to add a new make function to the `data-analyses/makefile` for this annual ridership report~~
- ~~ensure everything points to the annual ridership files~~
- starting a new folder called `ntd_annual_ridership`




### Which libraries/packages should be imported?
- start with whats in the `monthly ridership report`
- commenting out the monthly report specific files for now (`update_vars`, `monthly_ridership_by_rtpa`, `shared_utils.rt_dates`)

In [2]:
%%capture
import sys
sys.path.append("../bus_service_increase")

import warnings
warnings.filterwarnings('ignore')

import altair as alt
import calitp_data_analysis.magics
import pandas as pd

from IPython.display import display, HTML

from bus_service_utils import chart_utils
from calitp_data_analysis import calitp_color_palette as cp
#from update_vars import GCS_FILE_PATH, PUBLIC_FILENAME, YEAR, MONTH
#from monthly_ridership_by_rtpa import get_percent_change
#from shared_utils.rt_dates import MONTH_DICT

#alt.renderers.enable("html")
alt.data_transformers.enable('default', max_rows=None)

### Any functions i should import?
- considering `from monthly_ridership_by_rtpa import get_percent_change`


### What does the data look like?
- need to test out whats in `dim_annual_ntd_agency_information` & `dim_annual_ntd_agency_service` for 2022.
- see notebook `annual_report_test.ipynb`

## how to filter the 2022 data to only show the RTPAs & Agencies in California?
- per `monthly_ridership_by_rtpa.py`, function `produce_ntd_monthly_ridership_by_rtpa`
    - takes in NTD data and filters `UZA name` by `, CA`. 
    - then left merges in the NTD ID crosswalk. NTD ID crosswalk contain all the operators within the RTPAs in CA.
    - only matchnig rows between NTD data and crosswalk (`_merge == "both"`) make it into the report.

## What metrics to include in the notebook?
- after looking at the data, include the following at the very least
    - `vehicles_passenger_cars_operated_in_maximum_service`
    - `actual_vehicles_passenger_car_revenue_miles`
    - `actual_vehicle_passenger_car_revenue_hours`
    - `unlinked_passenger_trips__upt_`
- aggregate the above by 
    - `agency_name`
    - `mode`
    - `tos`
    - 

## Draft `ntd_annual_ridedrship.yml` for `data-analyses/portfolio/sites`
---
title: NTD Annual Ridership by RTPA
directory: ./ntd_annual_ridership
readme: ./ntd_annual_ridership/README.md
parts:
- caption: Introduction
- chapters:
  - params:
      rtpa:
  - params:
      rtpa:
  - params:
      rtpa:


## Draft `_config.ylm` for `portofolio/ntd_annual_ridership`
copied from `portfolio/ntd_monthly_ridership`

---
# Book settings
# Learn more at https://jupyterbook.org/customize/config.html

title: NTD Annual Ridership by RTPA
author: Cal-ITP
copyright: "2024"
#logo: calitp_logo_MAIN.png

# Force re-execution of notebooks on each build.
# See https://jupyterbook.org/content/execute.html
execute:
 execute_notebooks: 'off'
  allow_errors: false
  timeout: -1

# Define the name of the latex output file for PDF builds
latex:
  latex_documents:
    targetname: book.tex

launch_buttons:
  binderhub_url: "https://mybinder.org"
  jupyterhub_url: "https://hubtest.k8s.calitp.jarv.us"
  thebe: true

repository:
  url: https://github.com/cal-itp/data-analyses/  # Online location of your book
#  path_to_book: docs  # Optional path to your book, relative to the repository root
  path_to_book: ntd_annual_ridership
  branch: main  # Which branch of the repository should be used when creating links (optional)

# Add GitHub buttons to your book
# See https://jupyterbook.org/customize/config.html#add-a-link-to-your-repository
html:
  use_issues_button: true
  use_repository_button: true
  use_edit_page_button: true
  google_analytics_id: 'G-JCX3Z8JZJC'

sphinx:
  config:
    html_js_files:
    - https://cdnjs.cloudflare.com/ajax/libs/require.js/2.3.4/require.min.js


## Draft `_toc.yml` for `portofolio/ntd_annual_ridership folder`
copied from `portfolio/ntd_monthly_ridership`

---
format: jb-book
parts:
-   caption: null
    chapters:
    -   file: 
    -   file: 
    -   file:
root: README

## Next steps

- test how to read in and clean the annual data
    - how to filter the data to only show the agencies within the CA RTPAs
    - any functions i can take from the monthly report?
- test how to aggregate data by different categories
    - see monthly report functions
- test creating charts for the aggregated data
    - also see monthly reports
    