## Assigning circles to weather stations
### Purpose
Using a custom table created from uploading the CSV to Big Query (this table is called `cleaned_bird_counts_gstorage`) a join is done with the view that contains the flatten data.

### Author: 
Francisco Vannini
### Date: 
2020-04-02
### Update Date: 
2020-04-02

### Inputs
<ol>
<li> Google credential auth JSON </li>
<li> noaa_from_1900_to_present view in BQ</li>
<li> flatten_noaa_from_1900_to_present in BQ</li>
<li> cleaned_bird_count data</li>
</ol>

### Output Files
This notebook produces <strong>1.1-circles-to-many-noaa-stations-usa-weather-data-[data_this_process_was_run].csv.gzip</strong>. This data contains non-empty weather measurements for the NOAA stations that are in close proximity (using geohashes) of our CDC bird count. 

## Steps or Proceedures in the notebook
This notebook creates a query that interlaces the CDC bird count data, matches it with NOAA stations in close proximity with this station and then extracts the NOAA station weather measurements pertinenet to the dates. After the data is extracted the rows that have a NULL value of "temp_min" are pruned AND only USA weather measurements included.

To prep for the query, it loads in cleaned data and uploads it to BiqQuery so the query has access to it.

## Where the Data will Be Saved 
This script produces data at the level where this notebook is located.

## NOTES on Running This Notebook
If you are getting errors from the biquery modual that seem weird, Try complely stoping your notebook kernal and restarting it. There are some werid errors that can happen when running BigQuery from a notebook.

In [34]:
# Imports
import os
from datetime import datetime
# Version .24.0
from google.cloud import bigquery
import pandas as pd
import pandas

pd.set_option('display.max_columns', 500)

In [35]:
# Set Up the Enviroment 

# The path to your json credentials file. Replace with your corresponding file.
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "your_path_to_google_auth_keys.json"
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "../apicred/BirdProject-2020-074473a55b86.json"

# Used to classify the name 
time_now = datetime.today().strftime('%Y%m%d%H%M%S')

client = bigquery.Client()
project = 'birdproject-2020'
source_dataset_id = 'audubon_cdc'
# source_table_id = 'us_states'
shared_dataset_ref = client.dataset(source_dataset_id)

In [36]:
client

<google.cloud.bigquery.client.Client at 0x15068e320>

## Load in the Most Recent Data File 
THIS IS NOT REQUIRED -- But It is good practice to confirm it is there and can be read correctly. 
The next section will load the data as part of the upload to bigquery

In [37]:
# ALL File Paths should be declared at the TOP of the notebook
PATH_TO_CLEAN_CBC_DATA = "../data/Cloud_Data/1.0-rec-initial-data-cleaning.txt"

In [38]:
clean_data = pd.read_csv(PATH_TO_CLEAN_CBC_DATA, encoding = "ISO-8859-1", sep="\t")

  interactivity=interactivity, compiler=compiler, result=result)


In [39]:
clean_data.tail(50)

Unnamed: 0,circle_name,country_state,lat,lon,count_year,count_date,n_field_counters,n_feeder_counters,min_field_parties,max_field_parties,field_hours,feeder_hours,nocturnal_hours,field_distance,nocturnal_distance,distance_units,min_temp,max_temp,temp_unit,min_wind,max_wind,wind_unit,min_snow,max_snow,snow_unit,am_cloud,pm_cloud,am_rain,pm_rain,am_snow,pm_snow,field_distance_imperial,field_distance_metric,nocturnal_distance_imperial,nocturnal_distance_metric,min_snow_imperial,min_snow_metric,max_snow_metric,max_snow_imperial,min_temp_imperial,max_temp_imperial,min_temp_metric,max_temp_metric,min_wind_metric,max_wind_metric,min_wind_imperial,max_wind_imperial,ui
90361,Spencer,US-WI,44.8,-90.2333,2018,12/17/17,10.0,2.0,6.0,6.0,50.25,5.0,3.0,536.75,25.5,Miles,23.0,28.0,2.0,5.0,5.0,1.0,4.0,10.0,2.0,2.0,2.0,3.0,3.0,3.0,3.0,536.75,863.775346,25.5,41.036369,4.0,10.16,25.4,10.0,23.0,28.0,-5.0,-2.222222,8.046347,8.046347,5.0,5.0,44.8-90.2333_2018
90362,Stevens Point,US-WI,44.524307,-89.568834,2018,12/16/17,30.0,,8.0,8.0,63.25,,2.5,427.75,,Miles,16.0,23.0,2.0,12.0,20.0,1.0,3.0,3.0,2.0,2.0,2.0,3.0,3.0,3.0,3.0,427.75,688.364982,,,3.0,7.62,7.62,3.0,16.0,23.0,-8.888889,-5.0,19.311233,32.185388,12.0,20.0,44.524307-89.568834_2018
90363,Summit Lake,US-WI,45.373087,-89.113219,2018,12/27/17,7.0,,4.0,4.0,15.5,,,218.5,,Miles,-27.0,-2.0,2.0,0.0,10.0,1.0,5.0,10.0,2.0,1.0,1.0,3.0,3.0,3.0,3.0,218.5,351.625362,,,5.0,12.7,25.4,10.0,-27.0,-2.0,-32.777778,-18.888889,0.0,16.092694,0.0,10.0,45.373087-89.113219_2018
90364,Superior,US-WI,46.658055,-92.066182,2018,12/30/17,9.0,1.0,5.0,5.0,23.0,9.0,,100.0,,Miles,-17.0,-4.0,2.0,8.0,9.0,1.0,6.0,9.0,2.0,1.0,6.0,3.0,3.0,3.0,3.0,100.0,160.926939,,,6.0,15.24,22.86,9.0,-17.0,-4.0,-27.222222,-20.0,12.874155,14.483425,8.0,9.0,46.658055-92.066182_2018
90365,Washington Island,US-WI,45.383338,-86.883312,2018,12/16/17,11.0,31.0,3.0,3.0,18.0,49.0,1.0,138.0,3.0,Miles,11.4,26.6,2.0,3.4,24.9,1.0,4.0,6.0,2.0,5.0,2.0,3.0,3.0,2.0,2.0,138.0,222.079176,3.0,4.827808,4.0,10.16,15.24,6.0,11.4,26.6,-11.444444,-3.0,5.471516,40.070808,3.4,24.9,45.383338-86.883312_2018
90366,Waterloo,US-WI,43.081715,-89.011403,2018,12/17/17,22.0,11.0,11.0,11.0,74.25,15.0,8.0,573.5,17.5,Miles,32.0,35.0,2.0,5.0,5.0,1.0,0.0,0.0,2.0,2.0,2.0,3.0,3.0,3.0,3.0,573.5,922.915996,17.5,28.162214,0.0,0.0,0.0,0.0,32.0,35.0,0.0,1.666667,8.046347,8.046347,5.0,5.0,43.081715-89.011403_2018
90367,Waukesha,US-WI,42.973992,-88.355334,2018,12/16/17,32.0,5.0,8.0,8.0,67.0,4.0,4.0,223.5,25.0,Miles,22.0,38.0,2.0,0.0,15.0,1.0,0.0,2.0,2.0,2.0,5.0,3.0,3.0,3.0,3.0,223.5,359.671709,25.0,40.231735,0.0,0.0,5.08,2.0,22.0,38.0,-5.555556,3.333333,0.0,24.139041,0.0,15.0,42.973991999999996-88.355334_2018
90368,Wausau,US-WI,44.951311,-89.622626,2018,12/16/17,17.0,5.0,1.0,9.0,50.25,5.0,0.5,545.25,,Miles,17.0,25.0,2.0,5.0,15.0,1.0,1.0,7.0,2.0,2.0,2.0,3.0,3.0,3.0,3.0,545.25,877.454136,,,1.0,2.54,17.78,7.0,17.0,25.0,-8.333333,-3.888889,8.046347,24.139041,5.0,15.0,44.951311-89.622626_2018
90369,Wautoma,US-WI,44.114356,-89.190996,2018,1/1/18,4.0,1.0,4.0,4.0,39.5,4.0,0.5,212.1,,Miles,-14.0,3.0,2.0,0.0,5.0,1.0,1.0,2.0,2.0,1.0,1.0,3.0,3.0,3.0,3.0,212.1,341.326038,,,1.0,2.54,5.08,2.0,-14.0,3.0,-25.555556,-16.111111,0.0,8.046347,0.0,5.0,44.114356-89.190996_2018
90370,Willard,US-WI,44.712745,-90.699654,2018,12/31/17,9.0,3.0,5.0,5.0,36.75,8.5,0.75,393.0,6.0,Miles,-19.0,-9.0,2.0,10.0,15.0,1.0,4.0,5.0,2.0,1.0,1.0,3.0,3.0,3.0,3.0,393.0,632.442871,6.0,9.655616,4.0,10.16,12.7,5.0,-19.0,-9.0,-28.333333,-22.777778,16.092694,24.139041,10.0,15.0,44.712745-90.69965400000001_2018


## Push this data up to bigQuery

In [40]:
# Set up Data name 
table_id = 'rec_initial_data_cleaning'

table_ref = shared_dataset_ref.table(table_id)

table_full = project + "."+ source_dataset_id + "." + "rec_initial_data_cleaning"

In [41]:
# Delete the exisiting table if it exisits so we can replace it with new data
client.delete_table(table_full, not_found_ok=True)  # Make an API request.
print("Deleted table '{}'.".format(table_full))

Deleted table 'birdproject-2020.audubon_cdc.rec_initial_data_cleaning'.


In [42]:
# Push our file up to BigQuery
filename = PATH_TO_CLEAN_CBC_DATA

# Build the Job Config
job_config = bigquery.LoadJobConfig()
job_config.source_format = bigquery.SourceFormat.CSV
job_config.skip_leading_rows = 1
job_config.autodetect = True


with open(filename, "rb") as source_file:
    job = client.load_table_from_file(source_file, table_ref, job_config=job_config)
job.result()  # Waits for table load to complete.
print("Loaded {} rows into {}:{}.".format(job.output_rows, source_dataset_id, table_id))

Loaded 90411 rows into audubon_cdc:rec_initial_data_cleaning.


## Build the Query and Submit it 
This is the query that interlaces the CDC bird count data, matches it with NOAA stations in close proximity with this station and then extracts the NOAA station weather measurements pertinenet to the dates. After the data is extracted the rows that have a NULL value of "temp_min" are pruned AND only USA weather measurements included

In [43]:
query = f"""
WITH circles_hash as (SELECT x.*, ST_GEOHASH(ST_GEOGPOINT(x.lon,x.lat), 4) as geohash_circle, ST_GEOHASH(ST_GEOGPOINT(x.lon,x.lat), 7) as circle_id

FROM `{project}.audubon_cdc.rec_initial_data_cleaning` x),

stations_hash as (SELECT y.*, ST_GEOHASH(ST_GEOGPOINT(y.longitude,y.latitude),4) as geohash_station FROM `bigquery-public-data`.ghcn_d.ghcnd_stations y),

circle_with_matched_stations as (SELECT * FROM circles_hash x INNER JOIN stations_hash y ON x.geohash_circle = y.geohash_station)

SELECT x.*, y.temp_min_value,y.temp_max_value,y.precipitation_value,y.temp_avg,y.snow,y.snwd

FROM circle_with_matched_stations x
LEFT JOIN `{project}.audubon_cdc.flatten_noaa_from_1900_to_present` y ON x.id = y.id AND x.count_date = y.date

ORDER BY circle_id DESC,count_date ASC """

# Queries BigQuery public data set and creates a new dataframe object
df_circles_to_stations_weather_data = client.query(query)


In [44]:
df_circles_to_stations_weather_data = df_circles_to_stations_weather_data.to_dataframe()

In [45]:
df_circles_to_stations_weather_data.shape

(1045018, 66)

## Top 5 records
Showing the top 5 records of the data extracted to the query above

In [46]:
df_circles_to_stations_weather_data.tail(50)

Unnamed: 0,circle_name,country_state,lat,lon,count_year,count_date,n_field_counters,n_feeder_counters,min_field_parties,max_field_parties,field_hours,feeder_hours,nocturnal_hours,field_distance,nocturnal_distance,distance_units,min_temp,max_temp,temp_unit,min_wind,max_wind,wind_unit,min_snow,max_snow,snow_unit,am_cloud,pm_cloud,am_rain,pm_rain,am_snow,pm_snow,field_distance_imperial,field_distance_metric,nocturnal_distance_imperial,nocturnal_distance_metric,min_snow_imperial,min_snow_metric,max_snow_metric,max_snow_imperial,min_temp_imperial,max_temp_imperial,min_temp_metric,max_temp_metric,min_wind_metric,max_wind_metric,min_wind_imperial,max_wind_imperial,ui,geohash_circle,circle_id,id,latitude,longitude,elevation,state,name,gsn_flag,hcn_crn_flag,wmoid,geohash_station,temp_min_value,temp_max_value,precipitation_value,temp_avg,snow,snwd
1044968,Kaua'i: Waimea,US-HI,22.0333,-159.6667,1975,1974-12-27,4.0,0.0,2.0,2.0,16.0,0.0,0.0,79.0,0.0,Miles,65.0,80.0,2.0,0.0,15.0,1.0,0.0,0.0,2.0,2.0,1.0,2.0,2.0,3.0,3.0,79.0,127.132282,0.0,0.0,0.0,0.0,0.0,0.0,65.0,80.0,18.333333,26.666667,0.0,24.139041,0.0,15.0,22.0333-159.6667_1975,87ym,87ymqen,USC00512161,21.9828,-159.6831,243.8,HI,HUKIPO 945,,,,87ym,,,38.0,,0.0,0.0
1044969,Kaua'i: Waimea,US-HI,22.0333,-159.6667,1975,1974-12-27,4.0,0.0,2.0,2.0,16.0,0.0,0.0,79.0,0.0,Miles,65.0,80.0,2.0,0.0,15.0,1.0,0.0,0.0,2.0,2.0,1.0,2.0,2.0,3.0,3.0,79.0,127.132282,0.0,0.0,0.0,0.0,0.0,0.0,65.0,80.0,18.333333,26.666667,0.0,24.139041,0.0,15.0,22.0333-159.6667_1975,87ym,87ymqen,USR0000HMAH,22.1306,-159.7153,545.0,HI,MAKAHA RIDGE HAWAII,,,,87ym,,,,,,
1044970,Kaua'i: Waimea,US-HI,22.0333,-159.6667,1976,1975-12-20,7.0,0.0,2.0,2.0,13.0,0.0,0.0,86.0,0.0,Miles,55.0,80.0,2.0,9.0,14.0,1.0,0.0,0.0,2.0,1.0,1.0,4.0,4.0,4.0,4.0,86.0,138.397168,0.0,0.0,0.0,0.0,0.0,0.0,55.0,80.0,12.777778,26.666667,14.483425,22.529771,9.0,14.0,22.0333-159.6667_1976,87ym,87ymqen,USC00512161,21.9828,-159.6831,243.8,HI,HUKIPO 945,,,,87ym,,,0.0,,0.0,0.0
1044971,Kaua'i: Waimea,US-HI,22.0333,-159.6667,1976,1975-12-20,7.0,0.0,2.0,2.0,13.0,0.0,0.0,86.0,0.0,Miles,55.0,80.0,2.0,9.0,14.0,1.0,0.0,0.0,2.0,1.0,1.0,4.0,4.0,4.0,4.0,86.0,138.397168,0.0,0.0,0.0,0.0,0.0,0.0,55.0,80.0,12.777778,26.666667,14.483425,22.529771,9.0,14.0,22.0333-159.6667_1976,87ym,87ymqen,USR0000HMAH,22.1306,-159.7153,545.0,HI,MAKAHA RIDGE HAWAII,,,,87ym,,,,,,
1044972,Kaua'i: Waimea,US-HI,22.0333,-159.6667,1976,1975-12-20,7.0,0.0,2.0,2.0,13.0,0.0,0.0,86.0,0.0,Miles,55.0,80.0,2.0,9.0,14.0,1.0,0.0,0.0,2.0,1.0,1.0,4.0,4.0,4.0,4.0,86.0,138.397168,0.0,0.0,0.0,0.0,0.0,0.0,55.0,80.0,12.777778,26.666667,14.483425,22.529771,9.0,14.0,22.0333-159.6667_1976,87ym,87ymqen,USC00519130,22.1167,-159.6167,1051.9,HI,WAIAKOALI CAMP 1082,,,,87ym,,,,,,
1044973,Kaua'i: Waimea,US-HI,22.0333,-159.6667,1976,1975-12-20,7.0,0.0,2.0,2.0,13.0,0.0,0.0,86.0,0.0,Miles,55.0,80.0,2.0,9.0,14.0,1.0,0.0,0.0,2.0,1.0,1.0,4.0,4.0,4.0,4.0,86.0,138.397168,0.0,0.0,0.0,0.0,0.0,0.0,55.0,80.0,12.777778,26.666667,14.483425,22.529771,9.0,14.0,22.0333-159.6667_1976,87ym,87ymqen,USC00517790,22.1333,-159.6333,1135.1,HI,PAUKAHANA 1080,,,,87ym,,,,,,
1044974,Kaua'i: Waimea,US-HI,22.0333,-159.6667,1976,1975-12-20,7.0,0.0,2.0,2.0,13.0,0.0,0.0,86.0,0.0,Miles,55.0,80.0,2.0,9.0,14.0,1.0,0.0,0.0,2.0,1.0,1.0,4.0,4.0,4.0,4.0,86.0,138.397168,0.0,0.0,0.0,0.0,0.0,0.0,55.0,80.0,12.777778,26.666667,14.483425,22.529771,9.0,14.0,22.0333-159.6667_1976,87ym,87ymqen,USC00514272,22.0025,-159.7547,3.0,HI,KEKAHA 944,,,,87ym,,,0.0,,0.0,0.0
1044975,Kaua'i: Waimea,US-HI,22.0333,-159.6667,1976,1975-12-20,7.0,0.0,2.0,2.0,13.0,0.0,0.0,86.0,0.0,Miles,55.0,80.0,2.0,9.0,14.0,1.0,0.0,0.0,2.0,1.0,1.0,4.0,4.0,4.0,4.0,86.0,138.397168,0.0,0.0,0.0,0.0,0.0,0.0,55.0,80.0,12.777778,26.666667,14.483425,22.529771,9.0,14.0,22.0333-159.6667_1976,87ym,87ymqen,USC00516850,22.0331,-159.7406,381.0,HI,NIU RIDGE 1035,,,,87ym,150.0,261.0,0.0,,0.0,0.0
1044976,Kaua'i: Waimea,US-HI,22.0333,-159.6667,1976,1975-12-20,7.0,0.0,2.0,2.0,13.0,0.0,0.0,86.0,0.0,Miles,55.0,80.0,2.0,9.0,14.0,1.0,0.0,0.0,2.0,1.0,1.0,4.0,4.0,4.0,4.0,86.0,138.397168,0.0,0.0,0.0,0.0,0.0,0.0,55.0,80.0,12.777778,26.666667,14.483425,22.529771,9.0,14.0,22.0333-159.6667_1976,87ym,87ymqen,USC00518205,22.0322,-159.6928,487.7,HI,PUEHU RIDGE 1040,,,,87ym,,,0.0,,0.0,0.0
1044977,Kaua'i: Waimea,US-HI,22.0333,-159.6667,1976,1975-12-20,7.0,0.0,2.0,2.0,13.0,0.0,0.0,86.0,0.0,Miles,55.0,80.0,2.0,9.0,14.0,1.0,0.0,0.0,2.0,1.0,1.0,4.0,4.0,4.0,4.0,86.0,138.397168,0.0,0.0,0.0,0.0,0.0,0.0,55.0,80.0,12.777778,26.666667,14.483425,22.529771,9.0,14.0,22.0333-159.6667_1976,87ym,87ymqen,USC00516082,22.03,-159.7628,6.1,HI,MANA 1026,,,,87ym,156.0,283.0,0.0,,0.0,0.0


## Statistics on dataset
How many records are empty for the various temperature measurements

In [47]:
import numpy as np

record_count = len(df_circles_to_stations_weather_data.index)
print('How many rows in dataset with missing vals: ', record_count)

temp_min_nas = df_circles_to_stations_weather_data.temp_min_value.isna().sum()
print("Missing min temperature: " + str(temp_min_nas))

temp_max_nas = df_circles_to_stations_weather_data.temp_max_value.isna().sum()
print("Missing max temperature: " + str(temp_max_nas))

temp_avg_nas = df_circles_to_stations_weather_data.temp_avg.isna().sum()
print("Missing avg temperature: " + str(temp_avg_nas))

snow = df_circles_to_stations_weather_data.snow.isna().sum()
print("Missing snow temperature: " + str(snow))

How many rows in dataset with missing vals:  1045018
Missing min temperature: 978763
Missing max temperature: 978756
Missing avg temperature: 1036454
Missing snow temperature: 952601


## Remove rows with empty weather data
Create new data frame

In [48]:
#Inspect Shape before dropping data
df_circles_to_stations_weather_data.shape

(1045018, 66)

In [49]:
# Drop the rows where min temp value, max temp value, average temp value, and snow are all NULL
paired_data_cleaned = df_circles_to_stations_weather_data.dropna(subset=['temp_min_value', 'temp_max_value', 'temp_avg', 'snow'], how='all')



In [50]:
paired_data_cleaned.shape

(110051, 66)

In [51]:
paired_data_cleaned.head()

Unnamed: 0,circle_name,country_state,lat,lon,count_year,count_date,n_field_counters,n_feeder_counters,min_field_parties,max_field_parties,field_hours,feeder_hours,nocturnal_hours,field_distance,nocturnal_distance,distance_units,min_temp,max_temp,temp_unit,min_wind,max_wind,wind_unit,min_snow,max_snow,snow_unit,am_cloud,pm_cloud,am_rain,pm_rain,am_snow,pm_snow,field_distance_imperial,field_distance_metric,nocturnal_distance_imperial,nocturnal_distance_metric,min_snow_imperial,min_snow_metric,max_snow_metric,max_snow_imperial,min_temp_imperial,max_temp_imperial,min_temp_metric,max_temp_metric,min_wind_metric,max_wind_metric,min_wind_imperial,max_wind_imperial,ui,geohash_circle,circle_id,id,latitude,longitude,elevation,state,name,gsn_flag,hcn_crn_flag,wmoid,geohash_station,temp_min_value,temp_max_value,precipitation_value,temp_avg,snow,snwd
6,Amchitka Island,US-AK,51.409713,179.284881,1980,1979-12-18,4.0,,,,8.0,,,43.0,,Miles,33.0,35.0,,40.0,48.0,,0.0,0.0,,6.0,6.0,3.0,3.0,2.0,2.0,43.0,69.198584,,,0.0,0.0,0.0,0.0,117.0,120.6,0.555556,1.666667,64.370776,77.244931,24.856,29.8272,51.409713179.284881_1980,zcpk,zcpkrwz,USC00500252,51.3833,179.2833,68.0,AK,AMCHITKA,,,,zcpk,-17.0,17.0,5.0,,3.0,0.0
9,Amchitka Island,US-AK,51.409713,179.284881,1993,1992-12-20,2.0,0.0,1.0,1.0,7.0,0.0,0.0,46.0,0.0,Miles,35.0,40.0,2.0,10.0,10.0,1.0,0.0,0.0,2.0,2.0,2.0,4.0,4.0,4.0,4.0,46.0,74.026392,0.0,0.0,0.0,0.0,0.0,0.0,35.0,40.0,1.666667,4.444444,16.092694,16.092694,10.0,10.0,51.409713179.284881_1993,zcpk,zcpkrwz,USC00500252,51.3833,179.2833,68.0,AK,AMCHITKA,,,,zcpk,,,,,0.0,0.0
28,Caribou,US-ME,46.912573,-67.947428,2012,2011-12-28,10.0,3.0,1.0,4.0,19.5,14.0,,222.25,,Miles,17.0,46.0,2.0,15.0,22.0,1.0,4.0,1.0,2.0,2.0,2.0,321.0,1.0,2.0,3.0,222.25,357.660122,,,4.0,10.16,2.54,1.0,17.0,46.0,-8.333333,7.777778,24.139041,35.403927,15.0,22.0,46.912572999999995-67.947428_2012,f2rd,f2rdvu4,USW00014607,46.8706,-68.0172,190.2,ME,CARIBOU MUNI AP,GSN,,72712.0,f2rd,-83.0,78.0,71.0,,8.0,25.0
29,Caribou,US-ME,46.912573,-67.947428,2013,2012-12-29,10.0,4.0,2.0,5.0,35.0,7.0,0.0,366.6,0.0,Miles,7.0,21.0,2.0,0.0,12.0,1.0,4.0,12.0,2.0,6.0,6.0,3.0,3.0,3.0,3.0,366.6,589.958159,0.0,0.0,4.0,10.16,30.48,12.0,7.0,21.0,-13.888889,-6.111111,0.0,19.311233,0.0,12.0,46.912572999999995-67.947428_2013,f2rd,f2rdvu4,USW00014607,46.8706,-68.0172,190.2,ME,CARIBOU MUNI AP,GSN,,72712.0,f2rd,-139.0,-61.0,0.0,,0.0,229.0
31,Caribou,US-ME,46.912573,-67.947428,2014,2014-01-01,7.0,5.0,1.0,4.0,20.0,18.0,,208.85,,Miles,-27.0,7.0,2.0,0.0,5.0,1.0,15.0,18.0,2.0,6.0,6.0,3.0,3.0,3.0,3.0,208.85,336.095912,,,15.0,38.1,45.72,18.0,-27.0,7.0,-32.777778,-13.888889,0.0,8.046347,0.0,5.0,46.912572999999995-67.947428_2014,f2rd,f2rdvu4,USW00014607,46.8706,-68.0172,190.2,ME,CARIBOU MUNI AP,GSN,,72712.0,f2rd,-282.0,-155.0,0.0,,3.0,460.0


## Repair the Rain and Snow Data
When we get our data back from BigQuery, our rain and snow data has been changes from comma seperated indicators (example 1,3)  into continious variables 13.0 

To fix this we will merge in the correct values back in

In [52]:
paired_data_cleaned['am_rain'].value_counts()

3.0      60623
4.0      21752
2.0       9350
1.0       1352
32.0       717
21.0       638
321.0      242
31.0        19
12.0         6
123.0        1
23.0         1
34.0         1
Name: am_rain, dtype: int64

In [53]:
paired_data_cleaned['pm_rain'].value_counts()

3.0      61295
4.0      21812
2.0       8383
1.0       1535
32.0       697
21.0       622
321.0      187
31.0        32
23.0         8
24.0         5
34.0         1
Name: pm_rain, dtype: int64

In [54]:
paired_data_cleaned['am_snow'].value_counts()

3.0      62963
4.0      21620
2.0       7854
1.0       1144
32.0       403
21.0       289
321.0       75
31.0        21
12.0         5
34.0         4
23.0         3
24.0         2
Name: am_snow, dtype: int64

In [55]:
paired_data_cleaned['pm_snow'].value_counts()

3.0      63783
4.0      21696
2.0       6809
1.0       1231
21.0       371
32.0       334
321.0      103
23.0        10
31.0        10
24.0         1
34.0         1
Name: pm_snow, dtype: int64

In [56]:
clean_data_smol = clean_data[['ui','am_rain', 'pm_rain', 'am_snow', 'pm_snow']]

In [57]:
paired_data_cleaned = pd.merge(paired_data_cleaned, clean_data_smol, how = "left", on = "ui", suffixes = ("_old","_recovered"))



In [58]:
paired_data_cleaned.shape

(110051, 70)

In [59]:
paired_data_cleaned.head(50)

Unnamed: 0,circle_name,country_state,lat,lon,count_year,count_date,n_field_counters,n_feeder_counters,min_field_parties,max_field_parties,field_hours,feeder_hours,nocturnal_hours,field_distance,nocturnal_distance,distance_units,min_temp,max_temp,temp_unit,min_wind,max_wind,wind_unit,min_snow,max_snow,snow_unit,am_cloud,pm_cloud,am_rain_old,pm_rain_old,am_snow_old,pm_snow_old,field_distance_imperial,field_distance_metric,nocturnal_distance_imperial,nocturnal_distance_metric,min_snow_imperial,min_snow_metric,max_snow_metric,max_snow_imperial,min_temp_imperial,max_temp_imperial,min_temp_metric,max_temp_metric,min_wind_metric,max_wind_metric,min_wind_imperial,max_wind_imperial,ui,geohash_circle,circle_id,id,latitude,longitude,elevation,state,name,gsn_flag,hcn_crn_flag,wmoid,geohash_station,temp_min_value,temp_max_value,precipitation_value,temp_avg,snow,snwd,am_rain_recovered,pm_rain_recovered,am_snow_recovered,pm_snow_recovered
0,Amchitka Island,US-AK,51.409713,179.284881,1980,1979-12-18,4.0,,,,8.0,,,43.0,,Miles,33.0,35.0,,40.0,48.0,,0.0,0.0,,6.0,6.0,3.0,3.0,2.0,2.0,43.0,69.198584,,,0.0,0.0,0.0,0.0,117.0,120.6,0.555556,1.666667,64.370776,77.244931,24.856,29.8272,51.409713179.284881_1980,zcpk,zcpkrwz,USC00500252,51.3833,179.2833,68.0,AK,AMCHITKA,,,,zcpk,-17.0,17.0,5.0,,3.0,0.0,3.0,3.0,2.0,2.0
1,Amchitka Island,US-AK,51.409713,179.284881,1993,1992-12-20,2.0,0.0,1.0,1.0,7.0,0.0,0.0,46.0,0.0,Miles,35.0,40.0,2.0,10.0,10.0,1.0,0.0,0.0,2.0,2.0,2.0,4.0,4.0,4.0,4.0,46.0,74.026392,0.0,0.0,0.0,0.0,0.0,0.0,35.0,40.0,1.666667,4.444444,16.092694,16.092694,10.0,10.0,51.409713179.284881_1993,zcpk,zcpkrwz,USC00500252,51.3833,179.2833,68.0,AK,AMCHITKA,,,,zcpk,,,,,0.0,0.0,4.0,4.0,4.0,4.0
2,Caribou,US-ME,46.912573,-67.947428,2012,2011-12-28,10.0,3.0,1.0,4.0,19.5,14.0,,222.25,,Miles,17.0,46.0,2.0,15.0,22.0,1.0,4.0,1.0,2.0,2.0,2.0,321.0,1.0,2.0,3.0,222.25,357.660122,,,4.0,10.16,2.54,1.0,17.0,46.0,-8.333333,7.777778,24.139041,35.403927,15.0,22.0,46.912572999999995-67.947428_2012,f2rd,f2rdvu4,USW00014607,46.8706,-68.0172,190.2,ME,CARIBOU MUNI AP,GSN,,72712.0,f2rd,-83.0,78.0,71.0,,8.0,25.0,321.0,1.0,2.0,3.0
3,Caribou,US-ME,46.912573,-67.947428,2013,2012-12-29,10.0,4.0,2.0,5.0,35.0,7.0,0.0,366.6,0.0,Miles,7.0,21.0,2.0,0.0,12.0,1.0,4.0,12.0,2.0,6.0,6.0,3.0,3.0,3.0,3.0,366.6,589.958159,0.0,0.0,4.0,10.16,30.48,12.0,7.0,21.0,-13.888889,-6.111111,0.0,19.311233,0.0,12.0,46.912572999999995-67.947428_2013,f2rd,f2rdvu4,USW00014607,46.8706,-68.0172,190.2,ME,CARIBOU MUNI AP,GSN,,72712.0,f2rd,-139.0,-61.0,0.0,,0.0,229.0,3.0,3.0,3.0,3.0
4,Caribou,US-ME,46.912573,-67.947428,2014,2014-01-01,7.0,5.0,1.0,4.0,20.0,18.0,,208.85,,Miles,-27.0,7.0,2.0,0.0,5.0,1.0,15.0,18.0,2.0,6.0,6.0,3.0,3.0,3.0,3.0,208.85,336.095912,,,15.0,38.1,45.72,18.0,-27.0,7.0,-32.777778,-13.888889,0.0,8.046347,0.0,5.0,46.912572999999995-67.947428_2014,f2rd,f2rdvu4,USW00014607,46.8706,-68.0172,190.2,ME,CARIBOU MUNI AP,GSN,,72712.0,f2rd,-282.0,-155.0,0.0,,3.0,460.0,3.0,3.0,3.0,3.0
5,Caribou,US-ME,46.912573,-67.947428,2015,2014-12-14,10.0,6.0,1.0,6.0,21.3,8.0,,300.6,,Miles,32.0,36.0,2.0,4.0,8.0,1.0,2.0,12.0,2.0,2.0,2.0,3.0,3.0,3.0,3.0,300.6,483.746379,,,2.0,5.08,30.48,12.0,32.0,36.0,0.0,2.222222,6.437078,12.874155,4.0,8.0,46.912572999999995-67.947428_2015,f2rd,f2rdvu4,USW00014607,46.8706,-68.0172,190.2,ME,CARIBOU MUNI AP,GSN,,72712.0,f2rd,0.0,22.0,0.0,,0.0,150.0,3.0,3.0,3.0,3.0
6,Caribou,US-ME,46.912573,-67.947428,2016,2015-12-19,9.0,2.0,1.0,6.0,22.6,0.0,0.5,275.95,,Miles,18.0,34.0,2.0,0.0,15.0,1.0,2.0,3.0,2.0,6.0,6.0,3.0,3.0,3.0,3.0,275.95,444.077889,,,2.0,5.08,7.62,3.0,18.0,34.0,-7.777778,1.111111,0.0,24.139041,0.0,15.0,46.912572999999995-67.947428_2016,f2rd,f2rdvu4,USW00014607,46.8706,-68.0172,190.2,ME,CARIBOU MUNI AP,GSN,,72712.0,f2rd,-77.0,11.0,8.0,,10.0,50.0,3.0,3.0,3.0,3.0
7,Caribou,US-ME,46.912573,-67.947428,2017,2016-12-17,11.0,3.0,1.0,4.0,20.5,6.0,,235.75,,Miles,-14.0,7.0,2.0,,,0.0,20.0,25.0,2.0,2.0,2.0,,,2.0,2.0,235.75,379.385259,,,20.0,50.8,63.5,25.0,-14.0,7.0,-25.555556,-13.888889,,,,,46.912572999999995-67.947428_2017,f2rd,f2rdvu4,USW00014607,46.8706,-68.0172,190.2,ME,CARIBOU MUNI AP,GSN,,72712.0,f2rd,-243.0,-99.0,64.0,-202.0,89.0,360.0,,,2.0,2.0
8,Caribou,US-ME,46.912573,-67.947428,2018,2017-12-16,11.0,2.0,1.0,5.0,18.0,3.0,,234.75,,Miles,-5.0,11.0,2.0,5.0,15.0,1.0,5.0,14.0,2.0,6.0,6.0,3.0,3.0,3.0,3.0,234.75,377.77599,,,5.0,12.7,35.56,14.0,-5.0,11.0,-20.555556,-11.666667,8.046347,24.139041,5.0,15.0,46.912572999999995-67.947428_2018,f2rd,f2rdvu4,USW00014607,46.8706,-68.0172,190.2,ME,CARIBOU MUNI AP,GSN,,72712.0,f2rd,-160.0,-99.0,3.0,-132.0,5.0,230.0,3.0,3.0,3.0,3.0
9,Presque Isle,US-ME,46.681224,-68.015489,1970,1969-12-29,9.0,,,,24.0,,,186.0,,Miles,20.0,27.0,,15.0,20.0,,10.0,10.0,,2.0,1.0,3.0,4.0,2.0,4.0,186.0,299.324107,,,3.937008,25.4,25.4,3.937008,93.6,106.2,-6.666667,-2.777778,24.139041,32.185388,9.321,12.428,46.681224-68.015489_1970,f2r9,f2r9s60,USC00176937,46.6539,-68.0089,182.6,ME,PRESQUE ISLE,,HCN,,f2r9,-61.0,22.0,0.0,,0.0,152.0,3.0,4.0,2.0,4.0


In [60]:
# Lets clean up some names 
paired_data_cleaned['am_rain'] = paired_data_cleaned['am_rain_recovered']
paired_data_cleaned['pm_rain'] = paired_data_cleaned['pm_rain_recovered']
paired_data_cleaned['am_snow'] = paired_data_cleaned['am_snow_recovered']
paired_data_cleaned['pm_snow'] = paired_data_cleaned['pm_snow_recovered']

In [61]:
paired_data_cleaned = paired_data_cleaned.drop(columns=['am_rain_old', 
                 'am_rain_recovered', 
                 'pm_rain_old', 
                 'pm_rain_recovered', 
                 'am_snow_old', 
                 'am_snow_recovered', 
                 'pm_snow_old', 
                 'pm_snow_recovered'])




In [62]:
paired_data_cleaned['am_rain'].value_counts()

3        60623
4        21752
2         9350
1         1352
3,2        717
2,1        638
3,2,1      242
3,1         19
1,2          6
2,3          1
1,2,3        1
3,4          1
Name: am_rain, dtype: int64

In [63]:
paired_data_cleaned['pm_rain'].value_counts()

3        61295
4        21812
2         8383
1         1535
3,2        697
2,1        622
3,2,1      187
3,1         32
2,3          8
2,4          5
3,4          1
Name: pm_rain, dtype: int64

In [64]:
paired_data_cleaned['am_snow'].value_counts()

3        62963
4        21620
2         7854
1         1144
3,2        403
2,1        289
3,2,1       75
3,1         21
1,2          5
3,4          4
2,3          3
2,4          2
Name: am_snow, dtype: int64

In [65]:
paired_data_cleaned['pm_snow'].value_counts()

3        63783
4        21696
2         6809
1         1231
2,1        371
3,2        334
3,2,1      103
3,1         10
2,3         10
2,4          1
3,4          1
Name: pm_snow, dtype: int64

## Size of dataframe

In [66]:
print("The total number of records in this data set is: ", paired_data_cleaned.shape[0])

The total number of records in this data set is:  110051


In [67]:
print("The total number of unique circle records in this data set is: ",paired_data_cleaned['ui'].nunique())
# This is a reduction from 90411 unique stations because we drop locations that did not have weather data



The total number of unique circle records in this data set is:  53516


In [68]:
print("The number of unique circle  matched to multiple stations is: ")
paired_data_cleaned['ui'].value_counts()

The number of unique circle  matched to multiple stations is: 


32.301632-110.97348899999999_2014    49
32.301632-110.97348899999999_2011    45
32.301632-110.97348899999999_2017    45
39.6167-105.0167_1908                42
32.300018-106.71670400000001_2015    41
                                     ..
26.127751-81.764507_1998              1
42.116754-76.350001_1985              1
37.240251-107.025704_2015             1
45.291131-91.08997099999999_1980      1
42.288835-89.637199_1989              1
Name: ui, Length: 53516, dtype: int64

In [69]:
# To Get an idea of what one of the stations matched 49 times looks like: 
paired_data_cleaned.loc[paired_data_cleaned['ui'] == "32.301632-110.97348899999999_2014"]

Unnamed: 0,circle_name,country_state,lat,lon,count_year,count_date,n_field_counters,n_feeder_counters,min_field_parties,max_field_parties,field_hours,feeder_hours,nocturnal_hours,field_distance,nocturnal_distance,distance_units,min_temp,max_temp,temp_unit,min_wind,max_wind,wind_unit,min_snow,max_snow,snow_unit,am_cloud,pm_cloud,field_distance_imperial,field_distance_metric,nocturnal_distance_imperial,nocturnal_distance_metric,min_snow_imperial,min_snow_metric,max_snow_metric,max_snow_imperial,min_temp_imperial,max_temp_imperial,min_temp_metric,max_temp_metric,min_wind_metric,max_wind_metric,min_wind_imperial,max_wind_imperial,ui,geohash_circle,circle_id,id,latitude,longitude,elevation,state,name,gsn_flag,hcn_crn_flag,wmoid,geohash_station,temp_min_value,temp_max_value,precipitation_value,temp_avg,snow,snwd,am_rain,pm_rain,am_snow,pm_snow
93488,Tucson Valley,US-AZ,32.301632,-110.973489,2014,2013-12-15,97.0,5.0,28.0,41.0,264.25,9.0,2.0,565.25,,Miles,31.0,68.0,2.0,0.0,14.0,1.0,,,,1.0,1.0,565.25,909.639524,,,,,,,31.0,68.0,-0.555556,20.0,0.0,22.529771,0.0,14.0,32.301632-110.97348899999999_2014,9t9p,9t9pf8r,US1AZPM0132,32.2918,-110.8027,796.4,AZ,TUCSON 8.5 NE,,,,9t9p,,,0.0,,0.0,,3,3,3,3
93489,Tucson Valley,US-AZ,32.301632,-110.973489,2014,2013-12-15,97.0,5.0,28.0,41.0,264.25,9.0,2.0,565.25,,Miles,31.0,68.0,2.0,0.0,14.0,1.0,,,,1.0,1.0,565.25,909.639524,,,,,,,31.0,68.0,-0.555556,20.0,0.0,22.529771,0.0,14.0,32.301632-110.97348899999999_2014,9t9p,9t9pf8r,US1AZPM0046,32.318,-111.031,682.1,AZ,FLOWING WELLS 2.1 NW,,,,9t9p,,,0.0,,0.0,,3,3,3,3
93490,Tucson Valley,US-AZ,32.301632,-110.973489,2014,2013-12-15,97.0,5.0,28.0,41.0,264.25,9.0,2.0,565.25,,Miles,31.0,68.0,2.0,0.0,14.0,1.0,,,,1.0,1.0,565.25,909.639524,,,,,,,31.0,68.0,-0.555556,20.0,0.0,22.529771,0.0,14.0,32.301632-110.97348899999999_2014,9t9p,9t9pf8r,USR0000ASAG,32.3167,-110.8133,944.9,AZ,SAGUARO ARIZONA,,,,9t9p,28.0,211.0,,108.0,,,3,3,3,3
93491,Tucson Valley,US-AZ,32.301632,-110.973489,2014,2013-12-15,97.0,5.0,28.0,41.0,264.25,9.0,2.0,565.25,,Miles,31.0,68.0,2.0,0.0,14.0,1.0,,,,1.0,1.0,565.25,909.639524,,,,,,,31.0,68.0,-0.555556,20.0,0.0,22.529771,0.0,14.0,32.301632-110.97348899999999_2014,9t9p,9t9pf8r,US1AZPM0014,32.2168,-110.8825,783.9,AZ,TUCSON 1.5 NNE,,,,9t9p,,,0.0,,0.0,,3,3,3,3
93492,Tucson Valley,US-AZ,32.301632,-110.973489,2014,2013-12-15,97.0,5.0,28.0,41.0,264.25,9.0,2.0,565.25,,Miles,31.0,68.0,2.0,0.0,14.0,1.0,,,,1.0,1.0,565.25,909.639524,,,,,,,31.0,68.0,-0.555556,20.0,0.0,22.529771,0.0,14.0,32.301632-110.97348899999999_2014,9t9p,9t9pf8r,US1AZPM0082,32.2261,-110.7984,805.9,AZ,TUCSON 5.8 ENE,,,,9t9p,,,0.0,,0.0,,3,3,3,3
93493,Tucson Valley,US-AZ,32.301632,-110.973489,2014,2013-12-15,97.0,5.0,28.0,41.0,264.25,9.0,2.0,565.25,,Miles,31.0,68.0,2.0,0.0,14.0,1.0,,,,1.0,1.0,565.25,909.639524,,,,,,,31.0,68.0,-0.555556,20.0,0.0,22.529771,0.0,14.0,32.301632-110.97348899999999_2014,9t9p,9t9pf8r,US1AZPM0135,32.2709,-111.0296,703.2,AZ,TUCSON 7.3 WNW,,,,9t9p,,,0.0,,0.0,,3,3,3,3
93494,Tucson Valley,US-AZ,32.301632,-110.973489,2014,2013-12-15,97.0,5.0,28.0,41.0,264.25,9.0,2.0,565.25,,Miles,31.0,68.0,2.0,0.0,14.0,1.0,,,,1.0,1.0,565.25,909.639524,,,,,,,31.0,68.0,-0.555556,20.0,0.0,22.529771,0.0,14.0,32.301632-110.97348899999999_2014,9t9p,9t9pf8r,US1AZPM0198,32.3339,-110.9119,913.8,AZ,TUCSON 7.9 N,,,,9t9p,,,0.0,,0.0,,3,3,3,3
93495,Tucson Valley,US-AZ,32.301632,-110.973489,2014,2013-12-15,97.0,5.0,28.0,41.0,264.25,9.0,2.0,565.25,,Miles,31.0,68.0,2.0,0.0,14.0,1.0,,,,1.0,1.0,565.25,909.639524,,,,,,,31.0,68.0,-0.555556,20.0,0.0,22.529771,0.0,14.0,32.301632-110.97348899999999_2014,9t9p,9t9pf8r,US1AZPM0193,32.2988,-111.077,714.8,AZ,TUCSON 10.7 WNW,,,,9t9p,,,0.0,,0.0,,3,3,3,3
93496,Tucson Valley,US-AZ,32.301632,-110.973489,2014,2013-12-15,97.0,5.0,28.0,41.0,264.25,9.0,2.0,565.25,,Miles,31.0,68.0,2.0,0.0,14.0,1.0,,,,1.0,1.0,565.25,909.639524,,,,,,,31.0,68.0,-0.555556,20.0,0.0,22.529771,0.0,14.0,32.301632-110.97348899999999_2014,9t9p,9t9pf8r,US1AZPM0248,32.2258,-110.9046,769.6,AZ,TUCSON 1.0 ENE,,,,9t9p,,,0.0,,0.0,,3,3,3,3
93497,Tucson Valley,US-AZ,32.301632,-110.973489,2014,2013-12-15,97.0,5.0,28.0,41.0,264.25,9.0,2.0,565.25,,Miles,31.0,68.0,2.0,0.0,14.0,1.0,,,,1.0,1.0,565.25,909.639524,,,,,,,31.0,68.0,-0.555556,20.0,0.0,22.529771,0.0,14.0,32.301632-110.97348899999999_2014,9t9p,9t9pf8r,US1AZPM0108,32.2785,-110.9303,719.9,AZ,TUCSON 4.1 N,,,,9t9p,,,0.0,,0.0,,3,3,3,3


In [70]:
# Finally, Review the data before we save it 
paired_data_cleaned.tail(50)

Unnamed: 0,circle_name,country_state,lat,lon,count_year,count_date,n_field_counters,n_feeder_counters,min_field_parties,max_field_parties,field_hours,feeder_hours,nocturnal_hours,field_distance,nocturnal_distance,distance_units,min_temp,max_temp,temp_unit,min_wind,max_wind,wind_unit,min_snow,max_snow,snow_unit,am_cloud,pm_cloud,field_distance_imperial,field_distance_metric,nocturnal_distance_imperial,nocturnal_distance_metric,min_snow_imperial,min_snow_metric,max_snow_metric,max_snow_imperial,min_temp_imperial,max_temp_imperial,min_temp_metric,max_temp_metric,min_wind_metric,max_wind_metric,min_wind_imperial,max_wind_imperial,ui,geohash_circle,circle_id,id,latitude,longitude,elevation,state,name,gsn_flag,hcn_crn_flag,wmoid,geohash_station,temp_min_value,temp_max_value,precipitation_value,temp_avg,snow,snwd,am_rain,pm_rain,am_snow,pm_snow
110001,Kaua'i: Waimea,US-HI,22.0333,-159.6667,1974,1973-12-15,6.0,0.0,3.0,3.0,20.0,0.0,0.0,155.0,0.0,Miles,63.0,75.0,2.0,0.0,0.0,1.0,0.0,0.0,2.0,1.0,1.0,155.0,249.436756,0.0,0.0,0.0,0.0,0.0,0.0,63.0,75.0,17.222222,23.888889,0.0,0.0,0.0,0.0,22.0333-159.6667_1974,87ym,87ymqen,USC00513099,22.1297,-159.6586,1097.3,HI,KANALOHULUHULU 1075,,,,87ym,44.0,194.0,97.0,,0.0,0.0,4,4,4,4
110002,Kaua'i: Waimea,US-HI,22.0333,-159.6667,1974,1973-12-15,6.0,0.0,3.0,3.0,20.0,0.0,0.0,155.0,0.0,Miles,63.0,75.0,2.0,0.0,0.0,1.0,0.0,0.0,2.0,1.0,1.0,155.0,249.436756,0.0,0.0,0.0,0.0,0.0,0.0,63.0,75.0,17.222222,23.888889,0.0,0.0,0.0,0.0,22.0333-159.6667_1974,87ym,87ymqen,USC00518205,22.0322,-159.6928,487.7,HI,PUEHU RIDGE 1040,,,,87ym,,,0.0,,0.0,0.0,4,4,4,4
110003,Kaua'i: Waimea,US-HI,22.0333,-159.6667,1974,1973-12-15,6.0,0.0,3.0,3.0,20.0,0.0,0.0,155.0,0.0,Miles,63.0,75.0,2.0,0.0,0.0,1.0,0.0,0.0,2.0,1.0,1.0,155.0,249.436756,0.0,0.0,0.0,0.0,0.0,0.0,63.0,75.0,17.222222,23.888889,0.0,0.0,0.0,0.0,22.0333-159.6667_1974,87ym,87ymqen,USC00514272,22.0025,-159.7547,3.0,HI,KEKAHA 944,,,,87ym,,,0.0,,0.0,0.0,4,4,4,4
110004,Kaua'i: Waimea,US-HI,22.0333,-159.6667,1974,1973-12-15,6.0,0.0,3.0,3.0,20.0,0.0,0.0,155.0,0.0,Miles,63.0,75.0,2.0,0.0,0.0,1.0,0.0,0.0,2.0,1.0,1.0,155.0,249.436756,0.0,0.0,0.0,0.0,0.0,0.0,63.0,75.0,17.222222,23.888889,0.0,0.0,0.0,0.0,22.0333-159.6667_1974,87ym,87ymqen,USC00516850,22.0331,-159.7406,381.0,HI,NIU RIDGE 1035,,,,87ym,156.0,261.0,0.0,,0.0,0.0,4,4,4,4
110005,Kaua'i: Waimea,US-HI,22.0333,-159.6667,1974,1973-12-15,6.0,0.0,3.0,3.0,20.0,0.0,0.0,155.0,0.0,Miles,63.0,75.0,2.0,0.0,0.0,1.0,0.0,0.0,2.0,1.0,1.0,155.0,249.436756,0.0,0.0,0.0,0.0,0.0,0.0,63.0,75.0,17.222222,23.888889,0.0,0.0,0.0,0.0,22.0333-159.6667_1974,87ym,87ymqen,USC00519253,21.9944,-159.7314,3.0,HI,WAIAWA 943,,,,87ym,,,0.0,,0.0,0.0,4,4,4,4
110006,Kaua'i: Waimea,US-HI,22.0333,-159.6667,1974,1973-12-15,6.0,0.0,3.0,3.0,20.0,0.0,0.0,155.0,0.0,Miles,63.0,75.0,2.0,0.0,0.0,1.0,0.0,0.0,2.0,1.0,1.0,155.0,249.436756,0.0,0.0,0.0,0.0,0.0,0.0,63.0,75.0,17.222222,23.888889,0.0,0.0,0.0,0.0,22.0333-159.6667_1974,87ym,87ymqen,USC00514735,22.0758,-159.7589,11.0,HI,KOLO 1033,,,,87ym,,,0.0,,0.0,0.0,4,4,4,4
110007,Kaua'i: Waimea,US-HI,22.0333,-159.6667,1974,1973-12-15,6.0,0.0,3.0,3.0,20.0,0.0,0.0,155.0,0.0,Miles,63.0,75.0,2.0,0.0,0.0,1.0,0.0,0.0,2.0,1.0,1.0,155.0,249.436756,0.0,0.0,0.0,0.0,0.0,0.0,63.0,75.0,17.222222,23.888889,0.0,0.0,0.0,0.0,22.0333-159.6667_1974,87ym,87ymqen,USC00512161,21.9828,-159.6831,243.8,HI,HUKIPO 945,,,,87ym,,,0.0,,0.0,0.0,4,4,4,4
110008,Kaua'i: Waimea,US-HI,22.0333,-159.6667,1975,1974-12-27,4.0,0.0,2.0,2.0,16.0,0.0,0.0,79.0,0.0,Miles,65.0,80.0,2.0,0.0,15.0,1.0,0.0,0.0,2.0,2.0,1.0,79.0,127.132282,0.0,0.0,0.0,0.0,0.0,0.0,65.0,80.0,18.333333,26.666667,0.0,24.139041,0.0,15.0,22.0333-159.6667_1975,87ym,87ymqen,USW00022501,22.0333,-159.7833,4.0,HI,BARKING SANDS,,,91162.0,87ym,178.0,267.0,0.0,,0.0,0.0,2,2,3,3
110009,Kaua'i: Waimea,US-HI,22.0333,-159.6667,1975,1974-12-27,4.0,0.0,2.0,2.0,16.0,0.0,0.0,79.0,0.0,Miles,65.0,80.0,2.0,0.0,15.0,1.0,0.0,0.0,2.0,2.0,1.0,79.0,127.132282,0.0,0.0,0.0,0.0,0.0,0.0,65.0,80.0,18.333333,26.666667,0.0,24.139041,0.0,15.0,22.0333-159.6667_1975,87ym,87ymqen,USC00513099,22.1297,-159.6586,1097.3,HI,KANALOHULUHULU 1075,,,,87ym,122.0,183.0,15.0,,0.0,0.0,2,2,3,3
110010,Kaua'i: Waimea,US-HI,22.0333,-159.6667,1975,1974-12-27,4.0,0.0,2.0,2.0,16.0,0.0,0.0,79.0,0.0,Miles,65.0,80.0,2.0,0.0,15.0,1.0,0.0,0.0,2.0,2.0,1.0,79.0,127.132282,0.0,0.0,0.0,0.0,0.0,0.0,65.0,80.0,18.333333,26.666667,0.0,24.139041,0.0,15.0,22.0333-159.6667_1975,87ym,87ymqen,USC00514272,22.0025,-159.7547,3.0,HI,KEKAHA 944,,,,87ym,,,20.0,,0.0,0.0,2,2,3,3


In [71]:
# Saving stations in csv COMPRESSED IN GZIP!!!
paired_data_cleaned.to_csv(r'../data/Cloud_Data/1.1-circles_to_many_stations_usa_weather_data_' + str(time_now) +  '.txt', sep="\t", index=False, compression = "gzip")

