# Collecting [Data.Gov Archives](https://source.coop/repositories/harvard-lil/gov-data/description)

### Retrieve metadata
```bash
wget https://data.source.coop/harvard-lil/gov-data/metadata.csv.zip
```


In [2]:
import pandas as pd

df = pd.read_csv('metadata.csv.zip')
df

Unnamed: 0,name,organization,title,date,metadata_path,collection_path
0,afsc-race-sap-armistead-1975-2016-eastern-beri...,National Oceanic and Atmospheric Administratio...,AFSC/RACE/SAP/Armistead: 1975 - 2016 eastern B...,2024-04-01T15:41:44.087648,metadata/data_gov/afsc-race-sap-armistead-1975...,collections/data_gov/afsc-race-sap-armistead-1...
1,tiger-line-shapefile-2023-county-moca-municipi...,"U.S. Census Bureau, Department of Commerce","TIGER/Line Shapefile, 2023, County, Moca Munic...",2023-12-14T19:59:29.508512,metadata/data_gov/tiger-line-shapefile-2023-co...,collections/data_gov/tiger-line-shapefile-2023...
2,fbsab-recruit-reef-fish-belt-transect-survey-a...,National Oceanic and Atmospheric Administratio...,FBSAB RECRUIT Reef Fish Belt Transect Survey a...,2024-10-19T04:23:29.895676,metadata/data_gov/fbsab-recruit-reef-fish-belt...,collections/data_gov/fbsab-recruit-reef-fish-b...
3,yellowknife-n-w-t-nt-cyzf4,National Oceanic and Atmospheric Administratio...,"Yellowknife, N. W. T., NT (CYZF)",2024-12-26T19:19:35.911143,metadata/data_gov/yellowknife-n-w-t-nt-cyzf4/v...,collections/data_gov/yellowknife-n-w-t-nt-cyzf...
4,l02194-nos-hydrographic-survey,National Oceanic and Atmospheric Administratio...,L02194: NOS Hydrographic Survey,2020-11-12T05:37:42.757838,metadata/data_gov/l02194-nos-hydrographic-surv...,collections/data_gov/l02194-nos-hydrographic-s...
...,...,...,...,...,...,...
311815,ow-noaa-avhrr-gac-sea-surface-temperature1,National Oceanic and Atmospheric Administratio...,OW NOAA AVHRR-GAC Sea-Surface Temperature,2024-10-19T01:59:09.505283,metadata/data_gov/ow-noaa-avhrr-gac-sea-surfac...,collections/data_gov/ow-noaa-avhrr-gac-sea-sur...
311816,mora-county-blocks-age-by-5-year-age-groups-fo...,"Earth Data Analysis Center, University of New ...","Mora County Blocks, Age by 5-Year Age Groups f...",2020-12-02T16:58:45.622945,metadata/data_gov/mora-county-blocks-age-by-5-...,collections/data_gov/mora-county-blocks-age-by...
311817,tiger-line-shapefile-2022-county-brunswick-cou...,"U.S. Census Bureau, Department of Commerce","TIGER/Line Shapefile, 2022, County, Brunswick ...",2024-01-27T18:37:06.493199,metadata/data_gov/tiger-line-shapefile-2022-co...,collections/data_gov/tiger-line-shapefile-2022...
311818,sbuv2-noaa-16-level-2-daily-ozone-profile-and-...,National Aeronautics and Space Administration,SBUV2/NOAA-16 Level 2 Daily Ozone Profile and ...,2023-12-06T23:46:43.937530,metadata/data_gov/sbuv2-noaa-16-level-2-daily-...,collections/data_gov/sbuv2-noaa-16-level-2-dai...


In [3]:
# Look at the unique organizations in the dataset
df['organization'].unique()

array(['National Oceanic and Atmospheric Administration, Department of Commerce',
       'U.S. Census Bureau, Department of Commerce',
       'Department of Agriculture', 'Department of the Interior',
       'Board of Governors of the Federal Reserve System',
       'General Services Administration', 'Department of Energy',
       'National Aeronautics and Space Administration',
       'City of Baton Rouge', 'U.S. Environmental Protection Agency',
       'US Agency for International Development',
       'Department of Homeland Security', 'Lake County, Illinois',
       'Department of Commerce',
       'Earth Data Analysis Center, University of New Mexico',
       'City of Providence', 'AmeriCorps', 'City of Chicago',
       'City of New York', 'State of Connecticut', 'State of Alaska',
       'Social Security Administration', 'Department of Education',
       'State of California', 'National Science Foundation',
       'Department of Transportation', 'State of Oregon',
       'Alleghen

In [4]:
# Filter the dataset to only include rows where the organization is 'U.S. Department of Health & Human Services' and reset the index
# This will give us a new dataframe with only the relevant rows
df_temp = df[df['organization'].str.contains('U.S. Department of Health & Human Services')].reset_index(drop = True)
df_temp

Unnamed: 0,name,organization,title,date,metadata_path,collection_path
0,the-panel-study-of-income-dynamics-psid,U.S. Department of Health & Human Services,The Panel Study of Income Dynamics (PSID),2023-07-26T16:12:50.135150,metadata/data_gov/the-panel-study-of-income-dy...,collections/data_gov/the-panel-study-of-income...
1,covid-19-booster-dose-eligibility-in-the-unite...,U.S. Department of Health & Human Services,COVID-19 Booster Dose Eligibility in the Unite...,2023-05-14T02:36:50.535277,metadata/data_gov/covid-19-booster-dose-eligib...,collections/data_gov/covid-19-booster-dose-eli...
2,medicare-advantage-geographic-variation-nation...,U.S. Department of Health & Human Services,Medicare Advantage Geographic Variation - Nati...,2024-06-27T14:44:35.567652,metadata/data_gov/medicare-advantage-geographi...,collections/data_gov/medicare-advantage-geogra...
3,list-of-serials-indexed-for-online-lsiou,U.S. Department of Health & Human Services,List of Serials Indexed for Online Users (LSIOU),2022-10-01T06:29:31.213924,metadata/data_gov/list-of-serials-indexed-for-...,collections/data_gov/list-of-serials-indexed-f...
4,nadac-national-average-drug-acquisition-cost-2013,U.S. Department of Health & Human Services,NADAC (National Average Drug Acquisition Cost)...,2023-01-20T11:32:58.252646,metadata/data_gov/nadac-national-average-drug-...,collections/data_gov/nadac-national-average-dr...
...,...,...,...,...,...,...
2307,nndss-table-1g-carbapenemase-producing-carbape...,U.S. Department of Health & Human Services,NNDSS - Table 1G. Carbapenemase-producing carb...,2022-01-25T02:04:48.987317,metadata/data_gov/nndss-table-1g-carbapenemase...,collections/data_gov/nndss-table-1g-carbapenem...
2308,2019-child-and-adult-health-care-quality-measu...,U.S. Department of Health & Human Services,2019 Child and Adult Health Care Quality Measu...,2023-09-13T15:49:44.448182,metadata/data_gov/2019-child-and-adult-health-...,collections/data_gov/2019-child-and-adult-heal...
2309,nndss-table-ii-hepatitis-viral-acute-by-type-c,U.S. Department of Health & Human Services,"NNDSS - Table II. Hepatitis (viral, acute, by ...",2021-04-30T02:03:07.295151,metadata/data_gov/nndss-table-ii-hepatitis-vir...,collections/data_gov/nndss-table-ii-hepatitis-...
2310,product-data-for-newly-reported-drugs-in-the-m...,U.S. Department of Health & Human Services,Product Data for Newly Reported Drugs in the M...,2024-04-29T12:23:27.709309,metadata/data_gov/product-data-for-newly-repor...,collections/data_gov/product-data-for-newly-re...


In [5]:
# Locate the name of the PSID dataset
df_temp.loc[0, 'name']

'the-panel-study-of-income-dynamics-psid'

In [6]:
# Locate the collection path of the PSID dataset
df_temp.loc[0, 'collection_path']

'collections/data_gov/the-panel-study-of-income-dynamics-psid/v1.json'

In [7]:
base_path = 'https://data.source.coop/harvard-lil/gov-data/' 

data_path = base_path + df_temp.loc[0, 'collection_path'].replace('json', 'zip')
data_path

'https://data.source.coop/harvard-lil/gov-data/collections/data_gov/the-panel-study-of-income-dynamics-psid/v1.zip'

In [8]:
metadata_path = base_path + df_temp.loc[0, 'metadata_path']
metadata_path

'https://data.source.coop/harvard-lil/gov-data/metadata/data_gov/the-panel-study-of-income-dynamics-psid/v1.json'

## Collect files to Koa/local 

```bash
wget https://data.source.coop/harvard-lil/gov-data/collections/data_gov/the-panel-study-of-income-dynamics-psid/v1.zip
wget https://data.source.coop/harvard-lil/gov-data/metadata/data_gov/the-panel-study-of-income-dynamics-psid/v1.json
```

or

```bash
curl -o psid_data.zip https://data.source.coop/harvard-lil/gov-data/collections/data_gov/the-panel-study-of-income-dynamics-psid/v1.zip
curl -o psid_metadata.json https://data.source.coop/harvard-lil/gov-data/metadata/data_gov/the-panel-study-of-income-dynamics-psid/v1.json
```


## Transfer files directly to GDrive

```bash

rclone copyurl https://data.source.coop/harvard-lil/gov-data/metadata/data_gov/the-panel-study-of-income-dynamics-psid/v1.json uh_gdrive:Workshop/psid_metadata.json --progress
rclone copyurl https://data.source.coop/harvard-lil/gov-data/collections/data_gov/the-panel-study-of-income-dynamics-psid/v1.zip uh_gdrive:Workshop/psid_data.zip --progress

```