# Forest Inventory and Analysis `10 points`

Source:

* Dataset: https://apps.fs.usda.gov/fia/datamart/datamart.html
* Documentation: https://www.fia.fs.fed.us/library/database-documentation/index.php
        
Description from [Data Is Plural](https://www.data-is-plural.com/archive/2019-08-21-edition/):

> The U.S. Forest Service’s Forest Inventory and Analysis program tracks “trends in forest area and location; in the species, size, and health of trees; in total tree growth, mortality, and removals by harvest; in wood production and utilization rates by various products; and in forest land ownership.” It also “serves as perhaps the largest publicly available” dataset of “downed and dead wood.” The inventory is available to download and comes with user guides.

**Topics:**

* Downloading files
* Opening Excel files
* Using parameters when opening Excel files
* When to do things manually vs doing things with code

## Automatic downloading `2 points`

If you want to download files for Excel, you need to go to [this page](https://apps.fs.usda.gov/fia/datamart/datamart_excel.html) and click on the map. It leads you to a file like `https://apps.fs.usda.gov/fia/datamart/Workbooks/IL.xlsm`. Awful user interface!

Instead, I want you to use `requests` and a `for` loop to download all of the states automatically. You might find [this SO answer](https://stackoverflow.com/questions/44699682/how-to-save-a-file-to-a-specific-directory-in-python) useful.

*Note that the page says they don't have information for every state.*

In [1]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
import glob


In [2]:
baseurl = 'https://apps.fs.usda.gov/fia/datamart/'
response = requests.get(baseurl + 'datamart_excel.html')
doc = BeautifulSoup(response.text, 'html.parser')

In [19]:
for state in doc.find_all('area'):
    link = baseurl + state['href'].replace('\\','/')
    # Easier way with wget
    # !wget {link} --directory-prefix=files
    # Using requests
    filename = state['href'].split('\\')[1]
    r = requests.get(link)  
    with open('files/' + filename, 'wb') as f:
        f.write(r.content)
    print(f'Downloaded {link}')

Downloaded https://apps.fs.usda.gov/fia/datamart/Workbooks/AS.xlsm
Downloaded https://apps.fs.usda.gov/fia/datamart/Workbooks/FM.xlsm
Downloaded https://apps.fs.usda.gov/fia/datamart/Workbooks/GU.xlsm
Downloaded https://apps.fs.usda.gov/fia/datamart/Workbooks/MP.xlsm
Downloaded https://apps.fs.usda.gov/fia/datamart/Workbooks/PW.xlsm
Downloaded https://apps.fs.usda.gov/fia/datamart/Workbooks/AL.xlsm
Downloaded https://apps.fs.usda.gov/fia/datamart/Workbooks/AK.xlsm
Downloaded https://apps.fs.usda.gov/fia/datamart/Workbooks/AZ.xlsm
Downloaded https://apps.fs.usda.gov/fia/datamart/Workbooks/AR.xlsm
Downloaded https://apps.fs.usda.gov/fia/datamart/Workbooks/CA.xlsm
Downloaded https://apps.fs.usda.gov/fia/datamart/Workbooks/CO.xlsm
Downloaded https://apps.fs.usda.gov/fia/datamart/Workbooks/CT.xlsm
Downloaded https://apps.fs.usda.gov/fia/datamart/Workbooks/DE.xlsm
Downloaded https://apps.fs.usda.gov/fia/datamart/Workbooks/DC.xlsm
Downloaded https://apps.fs.usda.gov/fia/datamart/Workbooks/FL.

## Reading in the data `3 points`

### Read in the data for Virginia

**We're interested in sheet `SR004`**, which explains how many acres cover each type of ownership.

Read the file in so that it the dataset looks like this:

|Forest type group|Total|National Forest|Other federal|State and local|Private|
|---|---|---|---|---|---|
|Total|16025876|1688425.0|518217.0|657963.0|13161271|
|...|...|...|...|...|...|
|Nonstocked|81574|0.0|1590.0|0.0|79984|

and your index goes up to `15`.

In [28]:
va = pd.read_excel('files/VA.xlsm',sheet_name='SR004',skiprows=11,skipfooter=117)
va

Unnamed: 0,Forest type group,Total,National Forest,Other federal,State and local,Private
0,Total,16025876,1688425,518217,657963,13161271
1,White / red / jack pine group,171292,33764,2534,-,134995
2,Spruce / fir group,7735,-,-,6188,1547
3,Longleaf / slash pine group,10293,-,-,-,10293
4,Loblolly / shortleaf pine group,3038306,63540,79536,89038,2806193
5,Other eastern softwoods group,75076,-,-,5876,69201
6,Exotic softwoods group,4157,-,-,-,4157
7,Oak / pine group,1649711,140950,58413,53515,1396832
8,Oak / hickory group,9755134,1375367,314345,405917,7659506
9,Oak / gum / cypress group,373717,2939,40461,21746,308570


### Read in the data for South Dakota

You'll have fewer rows in this dataset than for Virginia.

In [27]:
sd = pd.read_excel('files/SD.xlsm',sheet_name='SR004',skiprows=11,skipfooter=113)

sd

Unnamed: 0,Forest type group,Total,National Forest,Other federal,State and local,Private
0,Total,1897358,993588,60164,96499,747106
1,White / red / jack pine group,6098,-,-,-,6098
2,Spruce / fir group,85957,60844,17348,-,7765
3,Other eastern softwoods group,52741,-,6331,-,46411
4,Pinyon / juniper group,71273,17366,5920,-,47987
5,Ponderosa pine group,1023191,699839,4748,63442,255163
6,Oak / pine group,9775,-,-,4340,5435
7,Oak / hickory group,156796,24422,5978,7412,118985
8,Elm / ash / cottonwood group,140167,-,-,12700,127468
9,Maple / beech / birch group,8449,-,-,-,8449


# Calculations `1 point`

## What percent of forested land is a "National Forest" in South Dakota vs Virginia?

You can do this calculation manually. Pay special attention to column names.

In [38]:
sd_percent = round((sd['National Forest'][0]/sd['Total'][0])*100)
print(f'{sd_percent}% of forest land in South Dakota is "National Forest"')

52% of forest land in South Dakota is "National Forest"


In [40]:
va_percent = round((va['National Forest'][0]/va['Total'][0])*100)
print(f'{va_percent}% of forest land in Virginia is "National Forest"')

11% of forest land in Virginia is "National Forest"


## What percent of forested land is privately owned in SD vs VA?

In [42]:
sd_priv_percent = round((sd['Private'][0]/sd['Total'][0])*100)

print(f'{sd_priv_percent}% of forest land in South Dakota is privately owned')


39% of forest land in South Dakota is privately owned


In [43]:
va_priv_percent = round((va['Private'][0]/va['Total'][0])*100)

print(f'{va_priv_percent}% of forest land in Virginia is privately owned')


82% of forest land in Virginia is privately owned


## Do the calculation for private ownership of all forests in South Dakota using only one line, and without typing the actual numbers `1 point`

Tip: `df.loc[0]` will be your friend

In [46]:
round((sd['Private'][0]/sd['Total'][0])*100)

39

## Using the files you downloaded, calculate the private ownership rate for all forested land in each state `3 points`

> Tip: Use a for loop

In [58]:
all_files = glob.glob("files/*.xlsm")

# https://apps.fs.usda.gov/fia/datamart/Workbooks/DC.xlsm
# This URL gives a 404 error so I will remove it from my list
all_files.remove('files/DC.xlsm')

In [59]:
list = []

for filename in all_files:
    df = pd.read_excel(filename,sheet_name='SR004',skiprows=11,index_col=None) 
    table = df.loc[:(df == 'Sampling error percent (Confidence level 68%):').any(1).idxmax()]
    state = filename.split('/')[1].split('.')[0]
    table.insert(0,"region",state)
    list.append(table)

final_df = pd.concat(list, axis=0, ignore_index=True)

In [66]:
len(final_df.region.unique()) == len(all_files)

True

In [80]:
total = final_df[final_df['Forest type group'] == 'Total']

In [84]:
total.insert(7,"Private Percent",total.Private/total.Total*100)

total

Unnamed: 0,region,Forest type group,Total,National Forest,Other federal,State and local,Private,Private Percent
0,MT,Total,25779495,15608675.0,2438728.0,1089587,6642505,25.766622
18,FM,Total,148924,,,76237,72687,48.808117
22,AS,Total,39156,,6237.0,27312,5607,14.319644
26,IN,Total,4774495,212683.0,186256.0,401995,3973562,83.22476
41,OK,Total,11839462,358383.0,505135.0,574754,10401190,87.85188
56,MI,Total,20167228,2826227.0,289714.0,4654013,12397274,61.472375
74,GA,Total,24418249,865923.0,990698.0,855022,21706606,88.895015
89,NC,Total,18724888,1222920.0,857977.0,1146140,15497850,82.766049
106,VI,Total,46967,,9738.0,2056,35174,74.890881
110,NY,Total,18622212,15645.0,148696.0,4774083,13683788,73.481002
