## Weather Raw Data

Created the weather_daily collection. This collection includes a daily summary of the weather in Chicago in 2022

#### Steps Taken

1. Used the Open Weather api for the daily weather summary
    - https://openweathermap.org/api/one-call-3#history_daily_aggregation
2. Defined the location and date range of the weather data needed
    - Location = Chicago
    - Date Range = 2022
3. Used a for loop to retrieve Open Weather data for all dates in the date list
4. Created a dictionary named weather_dict with the weather data
5. Created a dataframe of the weather_dict
    - Convert precipitation from mm to in
    - Create the new boolean precipitation column
6. Export our weather dataframe to a csv file
7. Used mongoimport to create the weather data collection

In [73]:
import pandas as pd
import json
import requests
from datetime import date, timedelta, datetime
from config import api_key
from pymongo import MongoClient

In [4]:
# Open weather api for daily aggregation
# https://openweathermap.org/api/one-call-3#history_daily_aggregation

url = "https://api.openweathermap.org/data/3.0/onecall/day_summary?"

# Coordinates for Chicago
lat = 41.881832
lon = -87.623177

# Temperature will be in degrees Fahrenheit (°F)
units = "imperial"

In [5]:
# Define start and end dates
# Create a list of dates, starting from the start date and ending at or before the end_date

start_date = date(2022, 1, 1)
end_date = date(2022, 12, 31)

date_list = []

while start_date <= end_date:
    date_list.append(start_date.strftime('%Y-%m-%d'))
    start_date+=timedelta(days=1)

# Print some values from the list
preview_size = 5
preview = date_list[:preview_size]
print(preview)

['2022-01-01', '2022-01-02', '2022-01-03', '2022-01-04', '2022-01-05']


In [8]:
#Create a for loop to retrieve OpenWeather data for all dates in the date list

#Set up response info lists
cloud_cover = []
precipitation = []
min_temp = []
max_temp = []
temp_mrng = []
temp_aft = []
temp_eve = []
temp_night = []
wind_max = []

# Loop through all the cities in our list to fetch weather data
for day in date_list:
        
# Create endpoint URL with each day
    query_url = f"{url}&lat={lat}&lon={lon}&units={units}&date={day}&appid={api_key}&q="

    response = requests.get(query_url).json()
    cloud_cover.append(response['cloud_cover']['afternoon'])
    precipitation.append(response['precipitation']['total'])
    min_temp.append(response['temperature']['min'])
    max_temp.append(response['temperature']['max'])
    temp_mrng.append(response['temperature']['morning'])
    temp_aft.append(response['temperature']['afternoon'])
    temp_eve.append(response['temperature']['evening'])
    temp_night.append(response['temperature']['night'])
    wind_max.append(response['wind']['max']['speed'])

In [9]:
weather_dict = {
    "date": date_list,
    "cloud_cover": cloud_cover,
    "precipitation": precipitation,
    "min_temp": min_temp,
    "max_temp": max_temp,
    "morning_temp": temp_mrng,
    "afternoon_temp": temp_aft,
    "evening_temp": temp_eve,
    "night_temp": temp_night,
    "max_windspeed": wind_max
}

# Iterate through the dictionary items and print the key and first value
for key, value in weather_dict.items():
    if len(value) > 0:
        print(f"{key}: {value[0]}")
    else:
        print(f"{key}: No data available")

date: 2022-01-01
cloud_cover: 90.0
precipitation: 0.18
min_temp: 33.22
max_temp: 42.1
morning_temp: 42.1
afternoon_temp: 38.43
evening_temp: 35.24
night_temp: 38.44
max_windspeed: 15.01


In [58]:
# Convert precipitation from mm to in
weather_df = pd.DataFrame(weather_dict)

# Define the conversion factor
mm_to_in = 0.0393701

# Convert the "precipitation" column from mm to in
weather_df["precipitation"] = weather_df["precipitation"] * mm_to_in

# Define the threshold value for significant precipitation comparison 
# Source: WeatherShack Rain Measurement
# https://www.weathershack.com/static/ed-rain-measurement.html
threshold = 0.1

# Create the new boolean precipitation column
weather_df['significant_precipitation'] = weather_df['precipitation'] > threshold

In [62]:
# Export our weather dataframe to a csv file
weather_df.to_csv("csv_data/weather/weather_daily.csv", index=False, header=True)

In [63]:
csv_file_path = 'csv_data/weather/weather_daily.csv'
weather_df = pd.read_csv(csv_file_path)
weather_df.head()

Unnamed: 0,date,cloud_cover,precipitation,min_temp,max_temp,morning_temp,afternoon_temp,evening_temp,night_temp,max_windspeed,significant_precipitation
0,2022-01-01,90.0,0.007087,33.22,42.1,42.1,38.43,35.24,38.44,15.01,False
1,2022-01-02,90.0,0.543701,18.81,32.77,28.24,26.64,19.63,32.77,18.41,True
2,2022-01-03,16.0,0.0,6.58,26.37,21.76,12.25,7.21,24.21,8.99,False
3,2022-01-04,4.0,0.0,18.03,30.36,18.03,20.97,21.29,19.26,10.0,False
4,2022-01-05,64.0,0.009843,13.57,33.39,31.53,32.74,22.19,31.64,17.0,False


#### Create Weather Data Collection:
mongoimport --type csv -d divvy_db -c weather_daily --headerline csv_data/weather/weather_daily.csv

In [82]:
# Create an instance of MongoClient and specify the database name
mongo = MongoClient(port=27017)
db = mongo.divvy_db

# Specify the name of the collection you want to work with
collection = db[collection_name]

# Access the collection directly
collection = db[collection_name]

# Count the number of documents in the collection
document_count = collection.count_documents({})

# Print the collection name and the document count
print(f"Collection Name: {collection_name}")
print(f"Count of Documents: {document_count}")

Collection Name: weather_daily
Count of Documents: 365
