# Transparency in Coverage Data with Python: Part 2

Hello there! If you are starting with this notebook and have not yet checked out the [introductory notebook in this series]('./tic_start.ipynb'), please do that first. It won't take long, and you'll be better prepared for the content in this notebook.

## In-Network Rate Files

As we left off at the end of the previous notebook, we had defined a parser function to handle TOC files. I've saved off that function in a separate Python file so we can import it, and start off where we left off from Part 1. Let's quickly confirm that it works, because we'll be using it to fetch in-network rates.

In [25]:
from tic_toc import parse_tic_toc

rps, infs, aafs = parse_tic_toc('./toc_file.json')
display(rps, infs, aafs)

Unnamed: 0,plan_name,plan_id_type,plan_id,plan_market_type,reporting_structure_number,reporting_entity_name,reporting_entity_type
0,BLUECHOICE ADVANTAGE_POS,EIN,52-6002033,group,0,CareFirst Inc,HEALTH INSURANCE ISSUER
1,BLUECHOICE HMO HDHP INTEG DED_HMO,HIOS,10207VA038,individual,1,CareFirst Inc,HEALTH INSURANCE ISSUER
2,BLUECHOICE HMO HDHP INTEG DED_HMO,HIOS,28137MD037,individual,1,CareFirst Inc,HEALTH INSURANCE ISSUER
3,BLUECHOICE HMO HDHP INTEG DED_HMO,HIOS,86052DC040,individual,1,CareFirst Inc,HEALTH INSURANCE ISSUER
4,BLUECHOICE HMO HDHP NON INTDED_HMO,HIOS,10207VA038,individual,2,CareFirst Inc,HEALTH INSURANCE ISSUER
...,...,...,...,...,...,...,...
134,DHMO - BlueDental HMO_HMO,EIN,83-4713006,group,4,CareFirst Inc,HEALTH INSURANCE ISSUER
135,DHMO - BlueDental HMO_HMO,EIN,87-0787360,group,4,CareFirst Inc,HEALTH INSURANCE ISSUER
136,DHMO - Dental HMO_HMO,EIN,52-0348850,group,5,CareFirst Inc,HEALTH INSURANCE ISSUER
137,DHMO - Dental HMO_HMO,EIN,52-2064235,group,5,CareFirst Inc,HEALTH INSURANCE ISSUER


Unnamed: 0,description,location,reporting_structure_number,reporting_entity_name,reporting_entity_type
0,Carefirst in-network HMO file,https://carefirstbcbs.mrf.bcbs.com/2022-08_690...,0,CareFirst Inc,HEALTH INSURANCE ISSUER
1,Carefirst in-network HMO file,https://carefirstbcbs.mrf.bcbs.com/2022-08_690...,0,CareFirst Inc,HEALTH INSURANCE ISSUER
2,Carefirst in-network HMO file,https://carefirstbcbs.mrf.bcbs.com/2022-08_690...,0,CareFirst Inc,HEALTH INSURANCE ISSUER
3,Carefirst in-network HMO file,https://carefirstbcbs.mrf.bcbs.com/2022-08_690...,0,CareFirst Inc,HEALTH INSURANCE ISSUER
4,Carefirst in-network HMO file,https://carefirstbcbs.mrf.bcbs.com/2022-08_690...,0,CareFirst Inc,HEALTH INSURANCE ISSUER
...,...,...,...,...,...
835,Carefirst in-network HMO file,https://carefirstbcbs.mrf.bcbs.com/2022-08_690...,5,CareFirst Inc,HEALTH INSURANCE ISSUER
836,Carefirst in-network HMO file,https://carefirstbcbs.mrf.bcbs.com/2022-08_690...,5,CareFirst Inc,HEALTH INSURANCE ISSUER
837,Carefirst in-network HMO file,https://carefirstbcbs.mrf.bcbs.com/2022-08_690...,5,CareFirst Inc,HEALTH INSURANCE ISSUER
838,Carefirst in-network HMO file,https://carefirstbcbs.mrf.bcbs.com/2022-08_690...,5,CareFirst Inc,HEALTH INSURANCE ISSUER


Unnamed: 0,description,location,reporting_structure_number,reporting_entity_name,reporting_entity_type
0,Carefirst allowed amount HMO file,https://mrf.carefirst.com/mrf-files/allowed-am...,0,CareFirst Inc,HEALTH INSURANCE ISSUER
1,Carefirst allowed amount HMO file,https://mrf.carefirst.com/mrf-files/allowed-am...,1,CareFirst Inc,HEALTH INSURANCE ISSUER
2,Carefirst allowed amount HMO file,https://mrf.carefirst.com/mrf-files/allowed-am...,2,CareFirst Inc,HEALTH INSURANCE ISSUER
3,Carefirst allowed amount HMO file,https://mrf.carefirst.com/mrf-files/allowed-am...,3,CareFirst Inc,HEALTH INSURANCE ISSUER
4,Carefirst allowed amount HMO file,https://mrf.carefirst.com/mrf-files/allowed-am...,4,CareFirst Inc,HEALTH INSURANCE ISSUER
5,Carefirst allowed amount HMO file,https://mrf.carefirst.com/mrf-files/allowed-am...,5,CareFirst Inc,HEALTH INSURANCE ISSUER


## Unique In-Network Rates Only

In the last notebook, we discovered that while there are 140 x 6 = 840 unique File Location objects in the TOC file that refer to URLs containing In-Network Rates, that these 140 URLs are duplicated 6 times for each of the 6 `reporting_structure` objects in the TOC file.

So, we only need the unqiue URLs. Let's get them now.

In [31]:
import pandas as pd

unique_infs = infs['location'].drop_duplicates()
display(pd.DataFrame(unique_infs))
unique_infs[0]

Unnamed: 0,location
0,https://carefirstbcbs.mrf.bcbs.com/2022-08_690...
1,https://carefirstbcbs.mrf.bcbs.com/2022-08_690...
2,https://carefirstbcbs.mrf.bcbs.com/2022-08_690...
3,https://carefirstbcbs.mrf.bcbs.com/2022-08_690...
4,https://carefirstbcbs.mrf.bcbs.com/2022-08_690...
...,...
135,https://carefirstbcbs.mrf.bcbs.com/2022-08_690...
136,https://carefirstbcbs.mrf.bcbs.com/2022-08_690...
137,https://carefirstbcbs.mrf.bcbs.com/2022-08_690...
138,https://carefirstbcbs.mrf.bcbs.com/2022-08_690...


'https://carefirstbcbs.mrf.bcbs.com/2022-08_690_08C0_in-network-rates_10_of_35.json.gz?&Expires=1663613365&Signature=VRHNewRTnPMNCIwpz9Npy5b2gkcBC6p78Ga8bM-iqKMT0ZgZOYCIzCwnjvgQbPPOTqTPluw3VPuBTwM5zpjG0ZeDcTyV8DXJBSzZgCgh1qXWX7MfpdgbhhiAONfTh2mkW9ZW5PdaZebc1bau0u2zEJivhio6K1RS-NHoZj41OHFLjR6ecASFq4WHFcOdlpGsnh3TEFLIy0342FX9-Q5wuGt6NsdOQedqtbHq15MZlihqtcjPNMOiDWXreGPcKXY5XGAesU83VdtzK7DayIppwXSSlRmVdFIN5wBwM-z-QqsJJi0rgP~GfFVDyrm2wHpPue123x1tSPR3tFSrKBUOKg__&Key-Pair-Id=K27TQMT39R1C8A'

## GET header Requests over HTTPS

You may have heard that these in-network rate files can be quite large. Before we start going willy-nilly downloading them all, let's transmit a bunch of GET requests over HTTPS for just the request header - which will tell us a number of things about the file, including its size, in bytes.

Let's *also* remember that payers have to pay egress charges for these downloads, so when we download these large files, we need to be mindful to not spam them with download requests - either intentionally or unintentionally. (An issue has already been raised on the CMS GitHub site concerning this very issue.)

We're going to convert bytes to Megabytes (MB) by dividing the header value returned for the key `Content-length` by 1000^2. Then, we'll combine this list with the URLs and sort our URLs in order of smallest to largest.

(This will take a moment to run as it is fetch the header content from sending GET requests over HTTPS for all 140 URLs)

In [32]:
import requests as rq

#header_info = [rq.head(url).headers['Content-length'] for url in unique_infs]
header_info_gb = [ int(hi) / (1000**2) for hi in header_info]
display(header_info_gb)

[10.833255,
 13.788947,
 21.501631,
 10.222329,
 6.423174,
 7.511928,
 11.825912,
 12.441736,
 5.234537,
 7.098336,
 11.08275,
 7.642125,
 5.50449,
 4.204823,
 4.532067,
 4.424713,
 4.208398,
 0.490344,
 0.539974,
 0.289662,
 0.211853,
 12.674242,
 0.257835,
 0.275912,
 0.315486,
 0.309174,
 1.359759,
 1.158882,
 12.946733,
 13.037408,
 12.612977,
 9.840172,
 12.918737,
 11.911627,
 12.909938,
 10.833255,
 13.788947,
 21.501631,
 10.222329,
 6.423174,
 7.511928,
 11.825912,
 12.441736,
 5.234537,
 7.098336,
 11.08275,
 7.642125,
 5.50449,
 4.204823,
 4.532067,
 4.424713,
 4.208398,
 0.490344,
 0.539974,
 0.289662,
 0.211853,
 12.674242,
 0.257835,
 0.275912,
 0.315486,
 0.309174,
 1.359759,
 1.158882,
 12.946733,
 13.037408,
 12.612977,
 9.840172,
 12.918737,
 11.911627,
 12.909938,
 10.833255,
 13.788947,
 21.501631,
 10.222329,
 6.423174,
 7.511928,
 11.825912,
 12.441736,
 5.234537,
 7.098336,
 11.08275,
 7.642125,
 5.50449,
 4.204823,
 4.532067,
 4.424713,
 4.208398,
 0.490344,
 0.