# Enhasing the Data with Census FIPS and Ecosystem Data
### Purpose
In this notebook I will add in columns to the working data set that will contain 1) The census declaired blocks FIPS and County FIPS and 2) USGS declared ecosystems for each CBC location and 3) USGS declared ecosystems for each NOAA station location.

The census declaired blocks FIPS and County FIPS codes are the unquie identifyers census uses to identify an area. To learn more you can visit: https://www.census.gov/programs-surveys/geography/guidance/geo-identifiers.html


### Author: 
Ren C'deBaca
### Date: 
2020-04-21
### Update Date: 
2020-04-26

### Inputs 
1.0-rec-initial-data-cleaning.txt - Tab seperated file of cleaned Christmas Bird Count events  Each row represents a single count in a given year. Data Dictonary can be found here: http://www.audubon.org/sites/default/files/documents/cbc_report_field_definitions_2013.pdf

np-circles-to-ecosys_data.csv - Comma seperated file from Nathan Pavlovic(nathan.pavlovic@gmail.com). This file was produced by first passing Nathan a file of approximatly 4000 unique lat lon present in the clean data file. 

Nathan then used the 2008 USGS raster ecosystem dataset. Info here https://rmgsc.cr.usgs.gov/outgoing/ecosystems/USdata/  

He used the Extract Values to Points tool in ArcGIS to find the raster value at each point. 

unique_stations_latlong_ecosys.csv _ Comma seperated file from Nathan Pavlovic(nathan.pavlovic@gmail.com). This file was produced by first passing Nathan a file of the unique NOAA station lat lons that were present in the file 1.1-circles_to_many_stations_usa_weather_data_20200424213015.csv.  See the abouve notes on his process.

1.2-ijd-fetch-circle-elevations_20200502155633.csv - CSV file of cbc circles matched with NOAA stations and elivation data. Each row is a cbc circle matched to a NOAA station. A cbc location can appear on multiple rows if they are matched to multiple stations




### Output Files
1.3-rec-connecting-fips-data.csv -- CSV file of the unique lat lons present in cbc data. Each Lat lon is matched to a Block FIPS and County FIPS. (This is the file that was shared with Nathan) 

1.3-rec-connecting-fips-ecosystem-data -- CSV file of the station matched cbc data with added columns for Ecosystem data for cbc circles and NOAA stations and cencus FIPS data


## Steps or Proceedures in the notebook 
1. Load in the cleaned data 
2. Identify the unique Lat Lons present in the cbc circle locations 
3. 
    OPTION 1: Load in the saved census FIPS data
    OPTION 2: Run the data through the census API (Note: Takes a few hours) 
5. Load in Ecological Data from Nathan
6. Create a key to based on the lat long of the cbc circles to merge the station matched data with the ecological data
7. Merge in the census FIPS data, the cbc ecological data, and the noaa station ecological data 


## Where the Data will Be Saved 
The raw ecosystem data and the output data will be saved in the Google Drive Folder
https://drive.google.com/drive/folders/1Nlj9Nq-_dPFTDbrSDf94XMritWYG6E2I

The path should look like this: 
audubon-cbc/data/Cloud_Data/<DATA FILE>

## Reference
    https://geo.fcc.gov/api/census/#!/block/get_block_find


In [1]:
# Imports
import os
from datetime import datetime
# Version .24.0
from google.cloud import bigquery
import pandas as pd
import pandas
import requests
import time
import numpy as np

pd.set_option('display.max_columns', 500)
pd.options.display.max_rows = 999

In [2]:
# ALL File Paths should be declared at the TOP of the notebook
PATH_TO_CLEAN_CBC_DATA = "../data/Cloud_Data/1.0-rec-initial-data-cleaning.txt"
PATH_TO_WORKING_DATA = "../data/Cloud_Data/1.2-ijd-fetch-circle-elevations_20200623011321.txt"


PATH_TO_CBC_ECO_DATA = "../data/np-circles-to-ecosys_data.csv" 
PATH_TO_NOAA_ECO_DATA = "../data/unique_stations_latlong_ecosys.csv"

USE_CENSUS_BACKUP_FILE = True

## Load in the Clean Data

In [3]:
clean_data = pd.read_csv(PATH_TO_CLEAN_CBC_DATA, encoding = "ISO-8859-1", sep="\t")

  interactivity=interactivity, compiler=compiler, result=result)


In [4]:
clean_data.shape

(90411, 48)

In [5]:
clean_data.head()

Unnamed: 0,circle_name,country_state,lat,lon,count_year,count_date,n_field_counters,n_feeder_counters,min_field_parties,max_field_parties,field_hours,feeder_hours,nocturnal_hours,field_distance,nocturnal_distance,distance_units,min_temp,max_temp,temp_unit,min_wind,max_wind,wind_unit,min_snow,max_snow,snow_unit,am_cloud,pm_cloud,am_rain,pm_rain,am_snow,pm_snow,field_distance_imperial,field_distance_metric,nocturnal_distance_imperial,nocturnal_distance_metric,min_snow_imperial,min_snow_metric,max_snow_metric,max_snow_imperial,min_temp_imperial,max_temp_imperial,min_temp_metric,max_temp_metric,min_wind_metric,max_wind_metric,min_wind_imperial,max_wind_imperial,ui
0,Pacific Grove,US-CA,36.6167,-121.9167,1901,12/25/00,1.0,,,,,,,,,Miles,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,36.6167-121.9167_1901
1,Pueblo,US-CO,38.175251,-104.519575,1901,12/25/00,1.0,,,,,,,,,Miles,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,38.175251-104.519575_1901
2,Bristol,US-CT,41.6718,-72.9495,1901,12/25/00,2.0,,,,,,,,,Miles,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,41.6718-72.9495_1901
3,Norwalk,US-CT,41.1167,-73.4,1901,12/25/00,1.0,,,,,,,,,Miles,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,41.1167-73.4_1901
4,Glen Ellyn,US-IL,41.8833,-88.0667,1901,12/25/00,1.0,,,,,,,,,Miles,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,41.8833-88.0667_1901


### Create a string key to represent a unique lat lon combonation 

In [6]:
clean_data['temp_key_str'] = clean_data['lat'].astype(str) + clean_data['lon'].astype(str)

In [7]:
clean_data['temp_key_str'].nunique()

4585

In [8]:
clean_data['country_state'].value_counts()

US-CA    5866
US-TX    4997
US-NY    4835
US-PA    4120
US-OH    4018
US-WI    3866
US-FL    3280
US-IL    3078
US-MI    2925
US-VA    2418
US-MN    2378
US-NJ    2170
US-MA    2157
US-NC    2089
US-IN    2085
US-CO    2026
US-OR    1882
US-IA    1709
US-WA    1629
US-AZ    1494
US-MO    1481
US-MD    1477
US-KS    1457
US-TN    1438
US-ME    1391
US-AK    1374
US-MT    1342
US-NM    1327
US-GA    1254
US-CT    1236
US-OK    1168
US-AR    1115
us-wi    1042
US-LA    1030
US-ND    1006
US-ID    1000
US-SC     990
US-SD     927
US-VT     923
US-NH     904
US-WV     903
US-KY     886
US-UT     845
US-WY     823
US-MS     792
US-NE     681
US-AL     635
US-NV     509
US-HI     434
US-DE     382
US-RI     370
US-DC      93
us-ma      84
us-mn      64
us-fl       6
Name: country_state, dtype: int64

## Census Data 
There are two options here
OPTION 1: Send the unique lat lons though a census API to find the block and county fips 
OPTION 2: Load in the saved census FIPS data

### Option 1: Load in the saved census FIPS data 

In [9]:
## Option: Set USE_CENSUS_BACKUP_FILE to True to use the file from backup
if USE_CENSUS_BACKUP_FILE:
    census_prep_df = pd.read_csv("1.3-rec-connecting-fips-data.csv")
    census_prep_df = census_prep_df[["lat", "lon", "block_fips", "county_fips"]]
    census_prep_df['temp_key_str'] = census_prep_df['lat'].astype(str) + census_prep_df['lon'].astype(str)
    print(clean_data.shape)
    census_prep_df.head()

(90411, 49)


### Option 2: Run the data through the census API (Note: Takes a few hours) 

In [10]:
if not USE_CENSUS_BACKUP_FILE:
    # Create a small dataframe of unique lat lon location to use with cencus data 
    census_prep_df = clean_data[['temp_key_str', 'lat', 'lon']]

In [11]:
if not USE_CENSUS_BACKUP_FILE:
    census_prep_df.shape

In [12]:
if not USE_CENSUS_BACKUP_FILE:
    # Drop duplicate rows 
    census_prep_df = census_prep_df.drop_duplicates(subset=['lat', 'lon'], keep= 'first') 
    print(census_prep_df.shape)

### Create a test call to the API to see how the data comes back 

In [13]:
if not USE_CENSUS_BACKUP_FILE:
    # Test Lat and Lon
    lat = 51.409713
    lon = 179.284881

    BASE_URL = "https://geo.fcc.gov/api/census/block/find?format=json&latitude=%s&longitude=%s"
    url = BASE_URL % (lat, lon)

    payload = {}
    headers= {}

    response = requests.request("GET", url, headers=headers, data = payload)

    print(response.text.encode('utf8'))

### Build a loop to build of list of results from the census API to get the block FIPS code and county FIPS code

In [14]:
if not USE_CENSUS_BACKUP_FILE:
    result_list = []
    county_result_list = []

    BASE_URL = "https://geo.fcc.gov/api/census/block/find?format=json&latitude=%s&longitude=%s"

    TIME_DELAY = 2

    for index, row in census_prep_df.iterrows():
        block_fips = ''
        county_fips = ''

        lat = row['lat']
        lon = row['lon']

        url = BASE_URL % (lat, lon)
        payload = {}
        headers= {}
        response = requests.request("GET", url, headers=headers, data = payload)

        try:
            block_fips = response.json()['Block']['FIPS']
            county_fips = response.json()['County']['FIPS']
        except:
            "Could not get FIPS "

        result_list.append(block_fips)
        county_result_list.append(county_fips)

        time.sleep(TIME_DELAY)


In [15]:
if not USE_CENSUS_BACKUP_FILE:
    print(len(result_list))
    print(len(county_result_list))


In [16]:
if not USE_CENSUS_BACKUP_FILE:
    # Turn the result list into arrays 
    result_arry = pd.Series(result_list)
    county_array = pd.Series(county_result_list)

In [17]:
if not USE_CENSUS_BACKUP_FILE:
    # Add the series into the data frame 
    census_prep_df['block_fips'] = result_arry.values
    census_prep_df['county_fips'] = county_array.values

In [18]:
if not USE_CENSUS_BACKUP_FILE:
    census_prep_df.head

### Choose to save the data to a file

In [19]:
## Save the data to a file 
# if not USE_CENSUS_BACKUP_FILE:
#    census_prep_df.to_csv('1.3-rec-connecting-fips-data.csv')

# Add Ecosystem Data to the Working Dataset

### Notes: The file 1.3-rec-connecting-fips-data.csv is the file I passed to Nathan for Ecosystem Processing.  He then returned to me a dataset with the ecosystem data added as columns. The next section will proceed to to add in the ecosystme data  

## Load in Ecosystem data for the CBC Circles 

In [20]:
eco_data = pd.read_csv(PATH_TO_CBC_ECO_DATA)

In [21]:
eco_data.shape

(4531, 15)

### Notes On Definitions 
Ecosys - The numberic code for an ecosystme provided by USGS https://www.arcgis.com/home/item.html?id=8e8015c1e60b431fb191b5ed0de97b33. Translates into the Usgsid_sys human readable value 
Usgsid_sys - Human Readable Ecosystem label 
Nlcd_code - The numberic code for an  National Land Cover Database code provided by USGS https://www.arcgis.com/home/item.html?id=8e8015c1e60b431fb191b5ed0de97b33. Translates into the Nlcd human readable value 
Nlcd - Human Readable Ecosystem label of National Land Cover Code

In [22]:
# Take the Columns we Need
eco_data = eco_data[["lat","lon","Ecosys", "Usgsid_sys", "Nlcd_code", "Nlcd"]]

In [23]:
print(eco_data.shape)
eco_data.head(20)

(4531, 6)


Unnamed: 0,lat,lon,Ecosys,Usgsid_sys,Nlcd_code,Nlcd
0,36.6167,-121.9167,66.0,66_California Coastal Live Oak Woodland and Sa...,3.0,Steppe/Savanna
1,38.175251,-104.519575,274.0,274_Western Great Plains Shortgrass Prairie,4.0,Herbaceous
2,41.6718,-72.9495,300.0,300_Appalachian (Hemlock)-Northern Hardwood Fo...,1.0,Forest and Woodland
3,41.1167,-73.4,487.0,487_Northern Atlantic Coastal Plain Pitch Pine...,1.0,Forest and Woodland
4,41.8833,-88.0667,254.0,254_North-Central Interior Beech-Maple Forest,1.0,Forest and Woodland
5,29.8333,-91.55,337.0,337_West Gulf Coastal Plain Nonriverine Wet Ha...,5.0,Woody Wetland
6,42.3833,-71.1667,301.0,301_Northeastern Interior Dry-Mesic Oak Forest,1.0,Forest and Woodland
7,42.397309,-71.095501,324.0,324_Laurentian-Acadian Northern Hardwoods Forest,1.0,Forest and Woodland
8,42.45,-71.1333,324.0,324_Laurentian-Acadian Northern Hardwoods Forest,1.0,Forest and Woodland
9,40.05,-91.5,250.0,250_North-Central Interior Oak Savanna,3.0,Steppe/Savanna


In [24]:
# Create a temporary key to merge on
eco_data['temp_key_str'] = eco_data['lat'].astype(str) + eco_data['lon'].astype(str)

eco_data['temp_key_str'] = round(eco_data['lat'],3).astype(str) + round(eco_data['lon'],3).astype(str)


In [25]:
# Some of the Ecosystem data is blank, lets put the lat lons that did NOT return ecosystem data into a new df 
na_eco_data = eco_data.loc[(eco_data["Ecosys"].isna() | (eco_data["Ecosys"] == 602.0))]

print(na_eco_data.shape)
na_eco_data

(181, 7)


Unnamed: 0,lat,lon,Ecosys,Usgsid_sys,Nlcd_code,Nlcd,temp_key_str
30,40.8667,-73.4333,602.0,"602_Unknown - pixel count <= 20,000",0.0,,40.867-73.433
54,43.4333,-82.55,602.0,"602_Unknown - pixel count <= 20,000",0.0,,43.433-82.55
173,43.05,-74.35,602.0,"602_Unknown - pixel count <= 20,000",0.0,,43.05-74.35
306,40.6,-74.1,602.0,"602_Unknown - pixel count <= 20,000",0.0,,40.6-74.1
440,41.9833,-76.5167,602.0,"602_Unknown - pixel count <= 20,000",0.0,,41.983-76.517
652,37.75,-122.4333,602.0,"602_Unknown - pixel count <= 20,000",0.0,,37.75-122.433
732,40.7667,-73.75,602.0,"602_Unknown - pixel count <= 20,000",0.0,,40.767-73.75
959,40.8,-73.7333,602.0,"602_Unknown - pixel count <= 20,000",0.0,,40.8-73.733
967,41.1667,-71.5667,602.0,"602_Unknown - pixel count <= 20,000",0.0,,41.167-71.567
1012,41.7667,-70.1,602.0,"602_Unknown - pixel count <= 20,000",0.0,,41.767-70.1


In [26]:
# Limit eco data to only rows that contain data
eco_data = eco_data.loc[(eco_data["Ecosys"].notnull() & (eco_data["Ecosys"] != 602.0))]

print(eco_data.shape)
eco_data

(4350, 7)


Unnamed: 0,lat,lon,Ecosys,Usgsid_sys,Nlcd_code,Nlcd,temp_key_str
0,36.616700,-121.916700,66.0,66_California Coastal Live Oak Woodland and Sa...,3.0,Steppe/Savanna,36.617-121.917
1,38.175251,-104.519575,274.0,274_Western Great Plains Shortgrass Prairie,4.0,Herbaceous,38.175-104.52
2,41.671800,-72.949500,300.0,300_Appalachian (Hemlock)-Northern Hardwood Fo...,1.0,Forest and Woodland,41.672-72.95
3,41.116700,-73.400000,487.0,487_Northern Atlantic Coastal Plain Pitch Pine...,1.0,Forest and Woodland,41.117-73.4
4,41.883300,-88.066700,254.0,254_North-Central Interior Beech-Maple Forest,1.0,Forest and Woodland,41.883-88.067
...,...,...,...,...,...,...,...
4526,41.088856,-96.234197,253.0,253_North-Central Interior Floodplain,7.0,Mixed Upland and Wetland,41.089-96.234
4527,41.879000,-119.058500,190.0,190_Inter-Mountain Basins Big Sagebrush Steppe,3.0,Steppe/Savanna,41.879-119.058
4528,42.133971,-75.081353,322.0,322_Acadian-Appalachian Montane Spruce-Fir Forest,1.0,Forest and Woodland,42.134-75.081
4529,38.684400,-78.127700,301.0,301_Northeastern Interior Dry-Mesic Oak Forest,1.0,Forest and Woodland,38.684-78.128


In [27]:
# Investigate the duplicates 
eco_data['temp_key_dup'] = eco_data.duplicated(subset=['temp_key_str'], keep=False)
print("Are there any duplicate keys? : " + str(any(eco_data['temp_key_dup'])))

eco_data.sort_values(by=['temp_key_dup', 'temp_key_str'], inplace=True, ascending = False)

eco_data.head(100)



Are there any duplicate keys? : True


Unnamed: 0,lat,lon,Ecosys,Usgsid_sys,Nlcd_code,Nlcd,temp_key_str,temp_key_dup
346,41.7,-70.3,487.0,487_Northern Atlantic Coastal Plain Pitch Pine...,1.0,Forest and Woodland,41.7-70.3,True
3349,41.700378,-70.300172,487.0,487_Northern Atlantic Coastal Plain Pitch Pine...,1.0,Forest and Woodland,41.7-70.3,True
145,49.0,-122.75,130.0,130_North Pacific Oak Woodland,1.0,Forest and Woodland,49.0-122.75,False
1890,48.9833,-97.2667,251.0,251_North-Central Interior Maple-Basswood Forest,1.0,Forest and Woodland,48.983-97.267,False
1462,48.9167,-122.3333,528.0,528_North Pacific Maritime Mesic-Wet Douglas-f...,1.0,Forest and Woodland,48.917-122.333,False
2882,48.8833,-100.0833,260.0,"260_Eastern Great Plains Wet Meadow, Prairie, ...",6.0,Herbaceous Wetland,48.883-100.083,False
3897,48.876853,-115.051908,168.0,168_Northern Rocky Mountain Mesic Montane Mixe...,1.0,Forest and Woodland,48.877-115.052,False
3730,48.869758,-95.768551,261.0,261_Northern Tallgrass Prairie,4.0,Herbaceous,48.87-95.769,False
1224,48.8667,-101.5167,261.0,261_Northern Tallgrass Prairie,4.0,Herbaceous,48.867-101.517,False
3797,48.833238,-97.883471,259.0,259_Eastern Great Plains Tallgrass Aspen Parkland,1.0,Forest and Woodland,48.833-97.883,False


In [28]:
# Drop the duplicates 
eco_data.drop_duplicates(subset= ['temp_key_str'], inplace = True)
print(eco_data.shape)

eco_data['temp_key_dup'] = eco_data.duplicated(subset=['temp_key_str'], keep=False)
print("Are there any duplicate keys? : " + str(any(eco_data['temp_key_dup'])))

(4349, 8)
Are there any duplicate keys? : False


In [29]:
eco_data.sort_values(by=['temp_key_str']).head()


Unnamed: 0,lat,lon,Ecosys,Usgsid_sys,Nlcd_code,Nlcd,temp_key_str,temp_key_dup
1371,24.580846,-81.704636,423.0,423_South Florida Pine Rockland,1.0,Forest and Woodland,24.581-81.705,False
2032,24.6279,-82.8722,601.0,"601_Unknown - pixel count > 20,000",0.0,,24.628-82.872,False
1878,24.6667,-81.35,423.0,423_South Florida Pine Rockland,1.0,Forest and Woodland,24.667-81.35,False
2164,24.7333,-81.0,423.0,423_South Florida Pine Rockland,1.0,Forest and Woodland,24.733-81.0,False
2320,24.8667,-80.7667,600.0,600_Water,0.0,,24.867-80.767,False


## Now Load and Merge in the Station Eco Data
We wont need a temporary key for this file because the station id's are unique

In [30]:
station_eco_data = pd.read_csv(PATH_TO_NOAA_ECO_DATA)

In [31]:
station_eco_data.head()

Unnamed: 0,X,id,latitude,longitude,RASTERVALU,Red,Green,Blue,Opacity,Ecosys,Usgsid_sys,Nlcd_code,Nlcd
0,0,USC00500252,51.3833,179.2833,,,,,,,,,
1,2,USW00014607,46.8706,-68.0172,324.0,0.504556,0.623333,0.369333,1.0,324.0,324_Laurentian-Acadian Northern Hardwoods Forest,1.0,Forest and Woodland
2,9,USC00176937,46.6539,-68.0089,325.0,0.469988,0.594037,0.333426,1.0,325.0,325_Laurentian-Acadian Pine-Hemlock-Hardwood F...,1.0,Forest and Woodland
3,49,US1MEAR0015,46.6796,-68.0127,324.0,0.504556,0.623333,0.369333,1.0,324.0,324_Laurentian-Acadian Northern Hardwoods Forest,1.0,Forest and Woodland
4,56,USC00171833,45.6611,-67.8614,324.0,0.504556,0.623333,0.369333,1.0,324.0,324_Laurentian-Acadian Northern Hardwoods Forest,1.0,Forest and Woodland


In [32]:
station_eco_data.shape

(11652, 13)

## Merge in the FIPs census data, the CBC circle Ecosystem data, and the NOAA station data with the Station Matched Data 

In [33]:
# Load in the file of noaa matched cbc circles
full_working_df = pd.read_csv(PATH_TO_WORKING_DATA, compression = "gzip", sep = "\t")

  interactivity=interactivity, compiler=compiler, result=result)


In [34]:
full_working_df.head()

Unnamed: 0,circle_name,country_state,lat,lon,count_year,count_date,n_field_counters,n_feeder_counters,min_field_parties,max_field_parties,field_hours,feeder_hours,nocturnal_hours,field_distance,nocturnal_distance,distance_units,min_temp,max_temp,temp_unit,min_wind,max_wind,wind_unit,min_snow,max_snow,snow_unit,am_cloud,pm_cloud,field_distance_imperial,field_distance_metric,nocturnal_distance_imperial,nocturnal_distance_metric,min_snow_imperial,min_snow_metric,max_snow_metric,max_snow_imperial,min_temp_imperial,max_temp_imperial,min_temp_metric,max_temp_metric,min_wind_metric,max_wind_metric,min_wind_imperial,max_wind_imperial,ui,geohash_circle,circle_id,id,latitude,longitude,elevation,state,name,gsn_flag,hcn_crn_flag,wmoid,geohash_station,temp_min_value,temp_max_value,precipitation_value,temp_avg,snow,snwd,am_rain,pm_rain,am_snow,pm_snow,circle_elev
0,Hawai'i: Volcano,US-HI,19.517,-155.3,1973,1972-12-30,14.0,0.0,5.0,5.0,57.0,0.0,0.0,169.0,0.0,Miles,37.0,75.0,2.0,0.0,5.0,1.0,0.0,0.0,2.0,1.0,2.0,169.0,271.966527,0.0,0.0,0.0,0.0,0.0,0.0,37.0,75.0,2.777778,23.888889,0.0,8.046347,0.0,5.0,19.516651-155.299965_1973,8e3x,8e3x40f,USC00516552,19.5486,-155.11,466.3,HI,MTN VIEW 91,,,,8e3x,144.0,244.0,18.0,,0.0,0.0,2,2,3,3,1551.44
1,Hawai'i: Volcano,US-HI,19.517,-155.3,1973,1972-12-30,14.0,0.0,5.0,5.0,57.0,0.0,0.0,169.0,0.0,Miles,37.0,75.0,2.0,0.0,5.0,1.0,0.0,0.0,2.0,1.0,2.0,169.0,271.966527,0.0,0.0,0.0,0.0,0.0,0.0,37.0,75.0,2.777778,23.888889,0.0,8.046347,0.0,5.0,19.516651-155.299965_1973,8e3x,8e3x40f,USC00519025,19.6581,-155.1325,320.0,HI,WAIAKEA SCD 88.2,,,,8e3x,,,86.0,,0.0,0.0,2,2,3,3,1551.44
2,Hawai'i: Volcano,US-HI,19.517,-155.3,1973,1972-12-30,14.0,0.0,5.0,5.0,57.0,0.0,0.0,169.0,0.0,Miles,37.0,75.0,2.0,0.0,5.0,1.0,0.0,0.0,2.0,1.0,2.0,169.0,271.966527,0.0,0.0,0.0,0.0,0.0,0.0,37.0,75.0,2.777778,23.888889,0.0,8.046347,0.0,5.0,19.516651-155.299965_1973,8e3x,8e3x40f,USC00511487,19.6833,-155.1667,487.7,HI,HILO COUNTRY CLUB 86,,,,8e3x,,,3.0,,0.0,0.0,2,2,3,3,1551.44
3,Hawai'i: Volcano,US-HI,19.517,-155.3,1973,1972-12-30,14.0,0.0,5.0,5.0,57.0,0.0,0.0,169.0,0.0,Miles,37.0,75.0,2.0,0.0,5.0,1.0,0.0,0.0,2.0,1.0,2.0,169.0,271.966527,0.0,0.0,0.0,0.0,0.0,0.0,37.0,75.0,2.777778,23.888889,0.0,8.046347,0.0,5.0,19.516651-155.299965_1973,8e3x,8e3x40f,USC00515021,19.5833,-155.3333,1748.3,HI,KULANI SCHOOL SITE 78,,,,8e3x,,,,,0.0,0.0,2,2,3,3,1551.44
4,Hawai'i: Volcano,US-HI,19.517,-155.3,1973,1972-12-30,14.0,0.0,5.0,5.0,57.0,0.0,0.0,169.0,0.0,Miles,37.0,75.0,2.0,0.0,5.0,1.0,0.0,0.0,2.0,1.0,2.0,169.0,271.966527,0.0,0.0,0.0,0.0,0.0,0.0,37.0,75.0,2.777778,23.888889,0.0,8.046347,0.0,5.0,19.516651-155.299965_1973,8e3x,8e3x40f,USC00515011,19.5494,-155.3011,1575.8,HI,KULANI CAMP 79,,,,8e3x,83.0,167.0,10.0,,0.0,0.0,2,2,3,3,1551.44


In [35]:
full_working_df.shape

(110051, 67)

In [36]:
full_working_df['temp_key_str'] = full_working_df['lat'].astype(str) + full_working_df['lon'].astype(str)

full_working_df['temp_key_str'] = round(full_working_df['lat'],3).astype(str) + round(full_working_df['lon'],3).astype(str)



In [37]:
full_working_df['temp_key_str'].head()

0    19.517-155.3
1    19.517-155.3
2    19.517-155.3
3    19.517-155.3
4    19.517-155.3
Name: temp_key_str, dtype: object

In [38]:
full_working_df['temp_key_str'].nunique()

2965

In [39]:
# Merge in the FIPS data with the full station data
full_working_df = pd.merge(full_working_df, census_prep_df[["temp_key_str", "block_fips", "county_fips"]], how="left", left_on="temp_key_str", right_on="temp_key_str")




In [40]:
full_working_df.shape

(110051, 70)

In [41]:
# Merge in the CBC Circle eco data 
full_working_df = pd.merge(full_working_df, eco_data[["temp_key_str","Ecosys", "Usgsid_sys", "Nlcd_code", "Nlcd"]], how="left", left_on= "temp_key_str", right_on = "temp_key_str")


In [42]:
full_working_df.shape

(110051, 74)

In [43]:
print(full_working_df['Ecosys'].value_counts())
print("The number of NAs:" + str(full_working_df['Ecosys'].isna().sum()))
print("The number of circles with %s:" % 'Ecosys' + str(full_working_df.shape[0] - full_working_df['Ecosys'].isna().sum()))



600.0    7612
254.0    7129
300.0    4585
301.0    2734
324.0    2708
416.0    2362
264.0    2204
274.0    2038
439.0    1667
250.0    1652
325.0    1634
112.0    1585
320.0    1575
487.0    1564
251.0    1477
265.0    1399
191.0    1267
272.0    1263
508.0    1259
287.0    1126
68.0     1092
188.0    1044
261.0    1042
104.0    1037
478.0    1013
428.0     981
79.0      974
409.0     920
130.0     918
276.0     896
408.0     882
61.0      873
154.0     872
288.0     868
148.0     862
221.0     822
412.0     803
480.0     792
10.0      762
262.0     757
76.0      746
601.0     673
147.0     639
499.0     630
67.0      629
69.0      621
302.0     594
438.0     588
180.0     580
49.0      578
224.0     548
292.0     543
589.0     532
128.0     530
574.0     513
469.0     511
165.0     507
582.0     507
538.0     507
239.0     483
278.0     483
172.0     480
207.0     473
11.0      466
227.0     464
71.0      459
185.0     444
527.0     437
560.0     432
72.0      432
417.0     430
322.0 

In [44]:
full_working_df.head(50)

Unnamed: 0,circle_name,country_state,lat,lon,count_year,count_date,n_field_counters,n_feeder_counters,min_field_parties,max_field_parties,field_hours,feeder_hours,nocturnal_hours,field_distance,nocturnal_distance,distance_units,min_temp,max_temp,temp_unit,min_wind,max_wind,wind_unit,min_snow,max_snow,snow_unit,am_cloud,pm_cloud,field_distance_imperial,field_distance_metric,nocturnal_distance_imperial,nocturnal_distance_metric,min_snow_imperial,min_snow_metric,max_snow_metric,max_snow_imperial,min_temp_imperial,max_temp_imperial,min_temp_metric,max_temp_metric,min_wind_metric,max_wind_metric,min_wind_imperial,max_wind_imperial,ui,geohash_circle,circle_id,id,latitude,longitude,elevation,state,name,gsn_flag,hcn_crn_flag,wmoid,geohash_station,temp_min_value,temp_max_value,precipitation_value,temp_avg,snow,snwd,am_rain,pm_rain,am_snow,pm_snow,circle_elev,temp_key_str,block_fips,county_fips,Ecosys,Usgsid_sys,Nlcd_code,Nlcd
0,Hawai'i: Volcano,US-HI,19.517,-155.3,1973,1972-12-30,14.0,0.0,5.0,5.0,57.0,0.0,0.0,169.0,0.0,Miles,37.0,75.0,2.0,0.0,5.0,1.0,0.0,0.0,2.0,1.0,2.0,169.0,271.966527,0.0,0.0,0.0,0.0,0.0,0.0,37.0,75.0,2.777778,23.888889,0.0,8.046347,0.0,5.0,19.516651-155.299965_1973,8e3x,8e3x40f,USC00516552,19.5486,-155.11,466.3,HI,MTN VIEW 91,,,,8e3x,144.0,244.0,18.0,,0.0,0.0,2,2,3,3,1551.44,19.517-155.3,,,,,,
1,Hawai'i: Volcano,US-HI,19.517,-155.3,1973,1972-12-30,14.0,0.0,5.0,5.0,57.0,0.0,0.0,169.0,0.0,Miles,37.0,75.0,2.0,0.0,5.0,1.0,0.0,0.0,2.0,1.0,2.0,169.0,271.966527,0.0,0.0,0.0,0.0,0.0,0.0,37.0,75.0,2.777778,23.888889,0.0,8.046347,0.0,5.0,19.516651-155.299965_1973,8e3x,8e3x40f,USC00519025,19.6581,-155.1325,320.0,HI,WAIAKEA SCD 88.2,,,,8e3x,,,86.0,,0.0,0.0,2,2,3,3,1551.44,19.517-155.3,,,,,,
2,Hawai'i: Volcano,US-HI,19.517,-155.3,1973,1972-12-30,14.0,0.0,5.0,5.0,57.0,0.0,0.0,169.0,0.0,Miles,37.0,75.0,2.0,0.0,5.0,1.0,0.0,0.0,2.0,1.0,2.0,169.0,271.966527,0.0,0.0,0.0,0.0,0.0,0.0,37.0,75.0,2.777778,23.888889,0.0,8.046347,0.0,5.0,19.516651-155.299965_1973,8e3x,8e3x40f,USC00511487,19.6833,-155.1667,487.7,HI,HILO COUNTRY CLUB 86,,,,8e3x,,,3.0,,0.0,0.0,2,2,3,3,1551.44,19.517-155.3,,,,,,
3,Hawai'i: Volcano,US-HI,19.517,-155.3,1973,1972-12-30,14.0,0.0,5.0,5.0,57.0,0.0,0.0,169.0,0.0,Miles,37.0,75.0,2.0,0.0,5.0,1.0,0.0,0.0,2.0,1.0,2.0,169.0,271.966527,0.0,0.0,0.0,0.0,0.0,0.0,37.0,75.0,2.777778,23.888889,0.0,8.046347,0.0,5.0,19.516651-155.299965_1973,8e3x,8e3x40f,USC00515021,19.5833,-155.3333,1748.3,HI,KULANI SCHOOL SITE 78,,,,8e3x,,,,,0.0,0.0,2,2,3,3,1551.44,19.517-155.3,,,,,,
4,Hawai'i: Volcano,US-HI,19.517,-155.3,1973,1972-12-30,14.0,0.0,5.0,5.0,57.0,0.0,0.0,169.0,0.0,Miles,37.0,75.0,2.0,0.0,5.0,1.0,0.0,0.0,2.0,1.0,2.0,169.0,271.966527,0.0,0.0,0.0,0.0,0.0,0.0,37.0,75.0,2.777778,23.888889,0.0,8.046347,0.0,5.0,19.516651-155.299965_1973,8e3x,8e3x40f,USC00515011,19.5494,-155.3011,1575.8,HI,KULANI CAMP 79,,,,8e3x,83.0,167.0,10.0,,0.0,0.0,2,2,3,3,1551.44,19.517-155.3,,,,,,
5,Hawai'i: Volcano,US-HI,19.517,-155.3,1974,1973-12-30,18.0,0.0,5.0,5.0,32.0,0.0,0.0,59.0,0.0,Miles,55.0,70.0,2.0,0.0,15.0,1.0,0.0,0.0,2.0,3.0,2.0,59.0,94.946894,0.0,0.0,0.0,0.0,0.0,0.0,55.0,70.0,12.777778,21.111111,0.0,24.139041,0.0,15.0,19.516651-155.299965_1974,8e3x,8e3x40f,USC00515021,19.5833,-155.3333,1748.3,HI,KULANI SCHOOL SITE 78,,,,8e3x,,,,,0.0,0.0,2,2,3,3,1551.44,19.517-155.3,,,,,,
6,Hawai'i: Volcano,US-HI,19.517,-155.3,1974,1973-12-30,18.0,0.0,5.0,5.0,32.0,0.0,0.0,59.0,0.0,Miles,55.0,70.0,2.0,0.0,15.0,1.0,0.0,0.0,2.0,3.0,2.0,59.0,94.946894,0.0,0.0,0.0,0.0,0.0,0.0,55.0,70.0,12.777778,21.111111,0.0,24.139041,0.0,15.0,19.516651-155.299965_1974,8e3x,8e3x40f,USC00511487,19.6833,-155.1667,487.7,HI,HILO COUNTRY CLUB 86,,,,8e3x,,,450.0,,0.0,0.0,2,2,3,3,1551.44,19.517-155.3,,,,,,
7,Hawai'i: Volcano,US-HI,19.517,-155.3,1974,1973-12-30,18.0,0.0,5.0,5.0,32.0,0.0,0.0,59.0,0.0,Miles,55.0,70.0,2.0,0.0,15.0,1.0,0.0,0.0,2.0,3.0,2.0,59.0,94.946894,0.0,0.0,0.0,0.0,0.0,0.0,55.0,70.0,12.777778,21.111111,0.0,24.139041,0.0,15.0,19.516651-155.299965_1974,8e3x,8e3x40f,USC00515011,19.5494,-155.3011,1575.8,HI,KULANI CAMP 79,,,,8e3x,106.0,150.0,64.0,,0.0,0.0,2,2,3,3,1551.44,19.517-155.3,,,,,,
8,Hawai'i: Volcano,US-HI,19.517,-155.3,1974,1973-12-30,18.0,0.0,5.0,5.0,32.0,0.0,0.0,59.0,0.0,Miles,55.0,70.0,2.0,0.0,15.0,1.0,0.0,0.0,2.0,3.0,2.0,59.0,94.946894,0.0,0.0,0.0,0.0,0.0,0.0,55.0,70.0,12.777778,21.111111,0.0,24.139041,0.0,15.0,19.516651-155.299965_1974,8e3x,8e3x40f,USC00516552,19.5486,-155.11,466.3,HI,MTN VIEW 91,,,,8e3x,156.0,289.0,526.0,,0.0,0.0,2,2,3,3,1551.44,19.517-155.3,,,,,,
9,Hawai'i: Volcano,US-HI,19.517,-155.3,1974,1973-12-30,18.0,0.0,5.0,5.0,32.0,0.0,0.0,59.0,0.0,Miles,55.0,70.0,2.0,0.0,15.0,1.0,0.0,0.0,2.0,3.0,2.0,59.0,94.946894,0.0,0.0,0.0,0.0,0.0,0.0,55.0,70.0,12.777778,21.111111,0.0,24.139041,0.0,15.0,19.516651-155.299965_1974,8e3x,8e3x40f,USC00519025,19.6581,-155.1325,320.0,HI,WAIAKEA SCD 88.2,,,,8e3x,,,544.0,,0.0,0.0,2,2,3,3,1551.44,19.517-155.3,,,,,,


In [45]:
# A good number of these guys did not get ecosystem data, let see if their lat lons are in the Na_eco_data set

# Get the list of temp keys with no Ecosys data from the merge
fw_nas_keys = full_working_df.loc[full_working_df["Ecosys"].isna(), "temp_key_str"]
print("The size of unmatech keys before dropping dups: " + str(fw_nas_keys.shape[0]))
fw_nas_keys.drop_duplicates(inplace = True)
print("The size of unmatech keys after dropping dups: " + str(fw_nas_keys.shape[0]))
print("This is the keys that were not able to be merged")

print("The size of the eco data set:" + str(na_eco_data.shape[0]))

The size of unmatech keys before dropping dups: 6987
The size of unmatech keys after dropping dups: 160
This is the keys that were not able to be merged
The size of the eco data set:181


In [46]:
# Now lets see how many of these are in the NA Eco Dataset 

# Get the Set Interaction of keys that are in the unmatched full working set keys that are 
# also in the list of ecosytem keys that have no data to offer. 
temp_keys_intersection = pd.Series(list(set(fw_nas_keys) & set(na_eco_data['temp_key_str'])))
len(temp_keys_intersection)

print("Out of %s keys in the Full Data Set that were unable to be matched, %s appeared also in the list of temperary keys in the eco data set that had not ecosystem data avilible (n = %s)" % (str(fw_nas_keys.shape[0]), str(len(temp_keys_intersection)), str(na_eco_data.shape[0]))) 



Out of 160 keys in the Full Data Set that were unable to be matched, 115 appeared also in the list of temperary keys in the eco data set that had not ecosystem data avilible (n = 181)


In [47]:
# Merge in the NOAA Station Eco data 
full_working_df = pd.merge(full_working_df, station_eco_data[["id","Ecosys", "Usgsid_sys", "Nlcd_code", "Nlcd"]], how="left", left_on= "id", right_on = "id", suffixes = ("_circle", "_station"))


In [48]:
full_working_df.shape

(110051, 78)

In [49]:
full_working_df['ui'].nunique()

53516

In [50]:
full_working_df.head()

Unnamed: 0,circle_name,country_state,lat,lon,count_year,count_date,n_field_counters,n_feeder_counters,min_field_parties,max_field_parties,field_hours,feeder_hours,nocturnal_hours,field_distance,nocturnal_distance,distance_units,min_temp,max_temp,temp_unit,min_wind,max_wind,wind_unit,min_snow,max_snow,snow_unit,am_cloud,pm_cloud,field_distance_imperial,field_distance_metric,nocturnal_distance_imperial,nocturnal_distance_metric,min_snow_imperial,min_snow_metric,max_snow_metric,max_snow_imperial,min_temp_imperial,max_temp_imperial,min_temp_metric,max_temp_metric,min_wind_metric,max_wind_metric,min_wind_imperial,max_wind_imperial,ui,geohash_circle,circle_id,id,latitude,longitude,elevation,state,name,gsn_flag,hcn_crn_flag,wmoid,geohash_station,temp_min_value,temp_max_value,precipitation_value,temp_avg,snow,snwd,am_rain,pm_rain,am_snow,pm_snow,circle_elev,temp_key_str,block_fips,county_fips,Ecosys_circle,Usgsid_sys_circle,Nlcd_code_circle,Nlcd_circle,Ecosys_station,Usgsid_sys_station,Nlcd_code_station,Nlcd_station
0,Hawai'i: Volcano,US-HI,19.517,-155.3,1973,1972-12-30,14.0,0.0,5.0,5.0,57.0,0.0,0.0,169.0,0.0,Miles,37.0,75.0,2.0,0.0,5.0,1.0,0.0,0.0,2.0,1.0,2.0,169.0,271.966527,0.0,0.0,0.0,0.0,0.0,0.0,37.0,75.0,2.777778,23.888889,0.0,8.046347,0.0,5.0,19.516651-155.299965_1973,8e3x,8e3x40f,USC00516552,19.5486,-155.11,466.3,HI,MTN VIEW 91,,,,8e3x,144.0,244.0,18.0,,0.0,0.0,2,2,3,3,1551.44,19.517-155.3,,,,,,,,,,
1,Hawai'i: Volcano,US-HI,19.517,-155.3,1973,1972-12-30,14.0,0.0,5.0,5.0,57.0,0.0,0.0,169.0,0.0,Miles,37.0,75.0,2.0,0.0,5.0,1.0,0.0,0.0,2.0,1.0,2.0,169.0,271.966527,0.0,0.0,0.0,0.0,0.0,0.0,37.0,75.0,2.777778,23.888889,0.0,8.046347,0.0,5.0,19.516651-155.299965_1973,8e3x,8e3x40f,USC00519025,19.6581,-155.1325,320.0,HI,WAIAKEA SCD 88.2,,,,8e3x,,,86.0,,0.0,0.0,2,2,3,3,1551.44,19.517-155.3,,,,,,,,,,
2,Hawai'i: Volcano,US-HI,19.517,-155.3,1973,1972-12-30,14.0,0.0,5.0,5.0,57.0,0.0,0.0,169.0,0.0,Miles,37.0,75.0,2.0,0.0,5.0,1.0,0.0,0.0,2.0,1.0,2.0,169.0,271.966527,0.0,0.0,0.0,0.0,0.0,0.0,37.0,75.0,2.777778,23.888889,0.0,8.046347,0.0,5.0,19.516651-155.299965_1973,8e3x,8e3x40f,USC00511487,19.6833,-155.1667,487.7,HI,HILO COUNTRY CLUB 86,,,,8e3x,,,3.0,,0.0,0.0,2,2,3,3,1551.44,19.517-155.3,,,,,,,,,,
3,Hawai'i: Volcano,US-HI,19.517,-155.3,1973,1972-12-30,14.0,0.0,5.0,5.0,57.0,0.0,0.0,169.0,0.0,Miles,37.0,75.0,2.0,0.0,5.0,1.0,0.0,0.0,2.0,1.0,2.0,169.0,271.966527,0.0,0.0,0.0,0.0,0.0,0.0,37.0,75.0,2.777778,23.888889,0.0,8.046347,0.0,5.0,19.516651-155.299965_1973,8e3x,8e3x40f,USC00515021,19.5833,-155.3333,1748.3,HI,KULANI SCHOOL SITE 78,,,,8e3x,,,,,0.0,0.0,2,2,3,3,1551.44,19.517-155.3,,,,,,,,,,
4,Hawai'i: Volcano,US-HI,19.517,-155.3,1973,1972-12-30,14.0,0.0,5.0,5.0,57.0,0.0,0.0,169.0,0.0,Miles,37.0,75.0,2.0,0.0,5.0,1.0,0.0,0.0,2.0,1.0,2.0,169.0,271.966527,0.0,0.0,0.0,0.0,0.0,0.0,37.0,75.0,2.777778,23.888889,0.0,8.046347,0.0,5.0,19.516651-155.299965_1973,8e3x,8e3x40f,USC00515011,19.5494,-155.3011,1575.8,HI,KULANI CAMP 79,,,,8e3x,83.0,167.0,10.0,,0.0,0.0,2,2,3,3,1551.44,19.517-155.3,,,,,,,,,,


## Now Lets Look at Cleaning Up the Ecosystem data

In [51]:
# check that the merge went through
print("The number of Nans is: %s" % full_working_df['Ecosys_circle'].isna().sum())
full_working_df['Ecosys_circle'].value_counts(dropna = False)

The number of Nans is: 6987


600.0    7612
254.0    7129
NaN      6987
300.0    4585
301.0    2734
324.0    2708
416.0    2362
264.0    2204
274.0    2038
439.0    1667
250.0    1652
325.0    1634
112.0    1585
320.0    1575
487.0    1564
251.0    1477
265.0    1399
191.0    1267
272.0    1263
508.0    1259
287.0    1126
68.0     1092
188.0    1044
261.0    1042
104.0    1037
478.0    1013
428.0     981
79.0      974
409.0     920
130.0     918
276.0     896
408.0     882
61.0      873
154.0     872
288.0     868
148.0     862
221.0     822
412.0     803
480.0     792
10.0      762
262.0     757
76.0      746
601.0     673
147.0     639
499.0     630
67.0      629
69.0      621
302.0     594
438.0     588
180.0     580
49.0      578
224.0     548
292.0     543
589.0     532
128.0     530
574.0     513
469.0     511
538.0     507
165.0     507
582.0     507
278.0     483
239.0     483
172.0     480
207.0     473
11.0      466
227.0     464
71.0      459
185.0     444
527.0     437
560.0     432
72.0      432
417.0 

In [52]:
# check that the merge went through
full_working_df['Usgsid_sys_circle'].value_counts(dropna = False)

600_Water                                                                          7612
254_North-Central Interior Beech-Maple Forest                                      7129
NaN                                                                                6987
300_Appalachian (Hemlock)-Northern Hardwood Forest                                 4585
301_Northeastern Interior Dry-Mesic Oak Forest                                     2734
324_Laurentian-Acadian Northern Hardwoods Forest                                   2708
416_Northern Atlantic Coastal Plain Pitch Pine Lowland                             2362
264_Central Tallgrass Prairie                                                      2204
274_Western Great Plains Shortgrass Prairie                                        2038
439_Southern Piedmont Dry Oak-(Pine) Forest                                        1667
250_North-Central Interior Oak Savanna                                             1652
325_Laurentian-Acadian Pine-Heml

In [53]:
# check that the merge went through
full_working_df['Nlcd_code_circle'].value_counts(dropna = False)

1.0    50374
4.0    14288
0.0     8303
2.0     7737
3.0     7687
NaN     6987
7.0     5402
5.0     4446
8.0     3634
6.0     1193
Name: Nlcd_code_circle, dtype: int64

In [54]:
# check that the merge went through
full_working_df['Nlcd_circle'].value_counts(dropna = False)

Forest and Woodland         50374
NaN                         15290
Herbaceous                  14288
Shrubland                    7737
Steppe/Savanna               7687
Mixed Upland and Wetland     5402
Woody Wetland                4446
Barren                       3634
Herbaceous Wetland           1193
Name: Nlcd_circle, dtype: int64

In [55]:
# The Ecosys_circle code of 601 , 601_Unknown - pixel count > 20,000, is essentally nan so let make it so 
print("The number of Nans before the cleaning is: %s" % full_working_df['Ecosys_circle'].isna().sum())
print("The number of Nans before the cleaning is: %s" % full_working_df['Usgsid_sys_circle'].isna().sum())

full_working_df['Ecosys_circle'].replace(601.0, np.nan, inplace = True)
full_working_df['Usgsid_sys_circle'].replace('601_Unknown - pixel count > 20,000', np.nan, inplace = True)
full_working_df['Usgsid_sys_circle'].replace('0_n/a', np.nan, inplace = True)

print("The number of Nans after the cleaning is: %s" % full_working_df['Ecosys_circle'].isna().sum())
print("The number of Nans after the cleaning is: %s" % full_working_df['Usgsid_sys_circle'].isna().sum())


The number of Nans before the cleaning is: 6987
The number of Nans before the cleaning is: 6987
The number of Nans after the cleaning is: 7660
The number of Nans after the cleaning is: 7678


## Prepare the data for Saving

In [56]:
# Drop the temportary key 
full_working_df = full_working_df.drop("temp_key_str",axis=1)

In [57]:
full_working_df.to_csv('../data/Cloud_data/1.3-rec-connecting-fips-ecosystem-data.txt', compression = "gzip", sep="\t", index=False)

In [58]:
full_working_df['Ecosys_circle'].value_counts()
print("The number of NAs:" + str(full_working_df['Ecosys_circle'].isna().sum()))
print("The number of circles with %s:" % 'Ecosys_circle' + str(full_working_df.shape[0] - full_working_df['Ecosys_circle'].isna().sum()))



The number of NAs:7660
The number of circles with Ecosys_circle:102391
