# WeatherPy
----

#### Note
* Instructions have been included for each segment. You do not have to follow them exactly, but they are included to help you think through the steps.

In [1]:
%matplotlib notebook

In [2]:
# Dependencies and Setup
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import requests
import time
from scipy.stats import linregress

# Import API key
from api_keys import weather_api_key

# Incorporated citipy to determine city based on latitude and longitude
from citipy import citipy

# Output File (CSV)
output_data_file = "output_data/cities.csv"

# Range of latitudes and longitudes
lat_range = (-90, 90)
lng_range = (-180, 180)

## Generate Cities List

In [3]:
# List for holding lat_lngs and cities
lat_lngs = []
cities = []

# Create a set of random lat and lng combinations
lats = np.random.uniform(low=-90.000, high=90.000, size=1500)
lngs = np.random.uniform(low=-180.000, high=180.000, size=1500)
lat_lngs = zip(lats, lngs)

# Identify nearest city for each lat, lng combination
for lat_lng in lat_lngs:
    city = citipy.nearest_city(lat_lng[0], lat_lng[1]).city_name
    
    # If the city is unique, then add it to a our cities list
    if city not in cities:
        cities.append(city)

# Print the city count to confirm sufficient count
len(cities)

620

### Perform API Calls
* Perform a weather check on each city using a series of successive API calls.
* Include a print log of each city as it'sbeing processed (with the city number and city name).


In [4]:
# Save config information.
url = "http://api.openweathermap.org/data/2.5/weather?"
units = "imperial"

# Build partial query URL
query_url = f"{url}appid={weather_api_key}&units={units}&q="


In [5]:
# set up lists to hold reponse info
City = []
Cloudiness = []
Country = []
Date = []
Humidity = []
Lat = []
Lng = []
Max_Temp = []
Wind_Speed = []


In [6]:
# Loop through the cities to append the information into the above lists
for city in cities:
    try:
        response = requests.get(query_url + city).json()
        City.append(response["name"])
        Cloudiness.append(response["clouds"]["all"])
        Country.append(response["sys"]["country"])
        Date.append(response["dt"])
        Humidity.append(response["main"]["humidity"])
        Lat.append(response["coord"]["lat"])
        Lng.append(response["coord"]["lon"])
        Max_Temp.append(response["main"]["temp_max"])
        Wind_Speed.append(response["wind"]["speed"])
        city_count = len(City)
        print(f"City {city_count} added, {city}")
        print("--------------------")
    except KeyError:
        print("Missing field/result... skipping.")
        print("--------------------")
   

City 1 added, rock springs
--------------------
City 2 added, rikitea
--------------------
City 3 added, kruisfontein
--------------------
City 4 added, albany
--------------------
City 5 added, dawlatabad
--------------------
City 6 added, vao
--------------------
City 7 added, qaanaaq
--------------------
City 8 added, hobart
--------------------
City 9 added, arraial do cabo
--------------------
City 10 added, pauini
--------------------
City 11 added, mataura
--------------------
City 12 added, besancon
--------------------
City 13 added, quzhou
--------------------
City 14 added, orumiyeh
--------------------
City 15 added, namibe
--------------------
City 16 added, shimoda
--------------------
City 17 added, tandlianwala
--------------------
City 18 added, qasigiannguit
--------------------
City 19 added, asyut
--------------------
Missing field/result... skipping.
--------------------
Missing field/result... skipping.
--------------------
City 20 added, ushuaia
-----------------

City 156 added, mahebourg
--------------------
City 157 added, hithadhoo
--------------------
City 158 added, johnstown
--------------------
City 159 added, bandarbeyla
--------------------
City 160 added, san policarpo
--------------------
City 161 added, kieta
--------------------
City 162 added, sulangan
--------------------
City 163 added, saldanha
--------------------
City 164 added, moyale
--------------------
City 165 added, airai
--------------------
City 166 added, tuktoyaktuk
--------------------
City 167 added, lilburn
--------------------
Missing field/result... skipping.
--------------------
City 168 added, ostrovnoy
--------------------
City 169 added, balabac
--------------------
City 170 added, ust-kan
--------------------
City 171 added, cervo
--------------------
Missing field/result... skipping.
--------------------
City 172 added, jalu
--------------------
City 173 added, clarksburg
--------------------
City 174 added, nioro
--------------------
City 175 added, bamb

City 312 added, staryy nadym
--------------------
City 313 added, panacan
--------------------
City 314 added, camara de lobos
--------------------
Missing field/result... skipping.
--------------------
City 315 added, mandal
--------------------
City 316 added, verkhnyaya inta
--------------------
City 317 added, kazachinskoye
--------------------
City 318 added, mayo
--------------------
Missing field/result... skipping.
--------------------
City 319 added, dunda
--------------------
City 320 added, polunochnoye
--------------------
City 321 added, cedar city
--------------------
City 322 added, geraldton
--------------------
City 323 added, samarai
--------------------
City 324 added, sandwick
--------------------
City 325 added, teya
--------------------
City 326 added, altagracia de orituco
--------------------
City 327 added, silver city
--------------------
City 328 added, severodvinsk
--------------------
City 329 added, ous
--------------------
City 330 added, sitka
----------

City 473 added, la baule-escoublac
--------------------
City 474 added, nautla
--------------------
City 475 added, inhambane
--------------------
City 476 added, solnechnyy
--------------------
City 477 added, kahului
--------------------
Missing field/result... skipping.
--------------------
City 478 added, teguldet
--------------------
City 479 added, tuatapere
--------------------
City 480 added, svetlaya
--------------------
City 481 added, shitanjing
--------------------
City 482 added, viedma
--------------------
City 483 added, leningradskiy
--------------------
City 484 added, mandali
--------------------
City 485 added, weligama
--------------------
City 486 added, birkenfeld
--------------------
City 487 added, luanda
--------------------
City 488 added, jacmel
--------------------
City 489 added, pitimbu
--------------------
City 490 added, requena
--------------------
City 491 added, kuressaare
--------------------
City 492 added, pandan
--------------------
City 493 added

### Convert Raw Data to DataFrame
* Export the city data into a .csv.
* Display the DataFrame

In [7]:
# Convert the raw data into a DataFrame
weather_dict = {
    "City": City,
    "Cloudiness": Cloudiness,
    "Country": Country,
    "Date": Date,
    "Humidity": Humidity,
    "Lat": Lat,
    "Lng": Lng,
    "Max Temp": Max_Temp, 
    "Wind Speed": Wind_Speed
}
weather_data = pd.DataFrame(weather_dict)
weather_data.head()

Unnamed: 0,City,Cloudiness,Country,Date,Humidity,Lat,Lng,Max Temp,Wind Speed
0,Rock Springs,1,US,1587154171,55,41.59,-109.2,33.8,13.87
1,Rikitea,1,PF,1587154171,65,-23.12,-134.97,77.18,2.51
2,Kruisfontein,90,ZA,1587154172,83,-34.0,24.73,59.94,3.96
3,Albany,100,US,1587154172,21,42.6,-73.97,50.0,4.0
4,Dawlatabad,0,AF,1587154172,36,36.41,64.91,59.52,2.48


In [8]:
# Check that there are over 500 rows
weather_data.count()

City          567
Cloudiness    567
Country       567
Date          567
Humidity      567
Lat           567
Lng           567
Max Temp      567
Wind Speed    567
dtype: int64

In [9]:
# Save the DataFrame
weather_data.to_csv(r"../output_data/weather.csv", index = False)

### Plotting the Data
* Use proper labeling of the plots using plot titles (including date of analysis) and axes labels.
* Save the plotted figures as .pngs.

In [10]:
# Create variables to clean up loop
plot_lat = weather_data["Lat"]
plot_temp = weather_data["Max Temp"]
plot_humidity = weather_data["Humidity"]
plot_cloud = weather_data["Cloudiness"]
plot_wind = weather_data["Wind Speed"]
date = "04/17/2020"

In [11]:
# Make lists for the for loop to iterate through
y_plots = [plot_temp, plot_humidity, plot_cloud, plot_wind]
y_labels = ["Temperature (F)", "Humidity (%)", "Cloudiness (%)", "Wind Speed (MPH)"]

In [12]:
# Create a loop to make scatter plots
for x in range(4):
    plt.figure()
    plt.scatter(plot_lat, y_plots[x])

    # Set title, x labels, and y labels for the chart
    plt.title(f"Latitude vs. {y_labels[x]} ({date})")
    plt.xlabel("Latitude")
    plt.ylabel(f"{y_labels[x]}")
    plt.grid()

    # Display Chart with Tight Layout
    plt.show()
    plt.tight_layout()
    
    # Save the figure
    plt.savefig(f"../output_data/Fig{x + 1}.png")
    
    # create a print statement for each graph
    if y_labels[x] == "Temperature (F)":
        print(f"This is a scatter plot that is analyzing the latitude compared to the maximum temperature, in Fahrenheit, for all cities in the \nDataFrame. From this data, we can see that, generally,  as the cities  moved closer to the equator, the temperature rose. This is data for April 17, 2020.")
    elif y_labels[x] == "Humidity (%)":
        print(f"This is a scatter plot that is analyzing the latitude compared to the humidity percentage, for all cities in the DataFrame. \nFrom this data, we can see that there is correlation between higher latitude and higher percentages of humidity. This is data \nfor April 17, 2020.")
    elif y_labels[x] == "Cloudiness (%)":
        print(f"This is a scatter plot that is analyzing the latitude compared to the cloud coverage percentage, for all cities in the \nDataFrame. From this data, we can see that there is no clear correlation between latitude and cloud coverage percentage. \nThis is data for April 17, 2020.")
    elif y_labels[x] == "Wind Speed (MPH)":
        print(f"This is a scatter plot that is analyzing the latitude compared to the wind speed, in miles per hour, for all cities in the \nDataFrame. From this data, we can see that there is no clear correlation between latitude and wind speed. This is data for \nApril 17, 2020.")

<IPython.core.display.Javascript object>

This is a scatter plot that is analyzing the latitude compared to the maximum temperature, in Fahrenheit, for all cities in the 
DataFrame. From this data, we can see that, generally,  as the cities  moved closer to the equator, the temperature rose. This is data for April 17, 2020.


<IPython.core.display.Javascript object>

This is a scatter plot that is analyzing the latitude compared to the humidity percentage, for all cities in the DataFrame. 
From this data, we can see that there is correlation between higher latitude and higher percentages of humidity. This is data 
for April 17, 2020.


<IPython.core.display.Javascript object>

This is a scatter plot that is analyzing the latitude compared to the cloud coverage percentage, for all cities in the 
DataFrame. From this data, we can see that there is no clear correlation between latitude and cloud coverage percentage. 
This is data for April 17, 2020.


<IPython.core.display.Javascript object>

This is a scatter plot that is analyzing the latitude compared to the wind speed, in miles per hour, for all cities in the 
DataFrame. From this data, we can see that there is no clear correlation between latitude and wind speed. This is data for 
April 17, 2020.


## Linear Regression

In [13]:
# Split the DataFrame by Hemisphere
nh_lat = weather_data[weather_data["Lat"] > 0]
sh_lat = weather_data[weather_data["Lat"] < 0]

# Create variables to clean up loop
plot_nh_lat = nh_lat["Lat"]
plot_nh_temp = nh_lat["Max Temp"]
plot_nh_humidity = nh_lat["Humidity"]
plot_nh_cloud = nh_lat["Cloudiness"]
plot_nh_wind = nh_lat["Wind Speed"]

plot_sh_lat = sh_lat["Lat"]
plot_sh_temp = sh_lat["Max Temp"]
plot_sh_humidity = sh_lat["Humidity"]
plot_sh_cloud = sh_lat["Cloudiness"]
plot_sh_wind = sh_lat["Wind Speed"]

In [14]:
# Make lists for the for loop to iterate through
x_reg_plots = [plot_nh_lat, plot_sh_lat, plot_nh_lat, plot_sh_lat, plot_nh_lat, plot_sh_lat, plot_nh_lat, plot_sh_lat]
y_reg_plots = [plot_nh_temp, plot_sh_temp, plot_nh_humidity, plot_sh_humidity, plot_nh_cloud, plot_sh_cloud, plot_nh_wind, plot_sh_wind]
x_reg_labels = ["Northern Hemisphere Latitude", "Southern Hemisphere Latitude", "Northern Hemisphere Latitude", "Southern Hemisphere Latitude", "Northern Hemisphere Latitude", "Southern Hemisphere Latitude", "Northern Hemisphere Latitude", "Southern Hemisphere Latitude"]
y_reg_labels = ["Temperature (F)", "Temperature (F)", "Humidity (%)", "Humidity (%)", "Cloudiness (%)", "Cloudiness (%)", "Wind Speed (MPH)", "Wind Speed (MPH)"]

In [15]:
# Create a loop to make scatter plots
for x in range(8):
    plt.figure()
    plt.scatter(x_reg_plots[x], y_reg_plots[x])

    # Set title, x labels, and y labels for the chart
    plt.title(f"{x_reg_labels[x]} vs. {y_reg_labels[x]} ({date})")
    plt.xlabel("Latitude")
    plt.ylabel(f"{y_reg_labels[x]}")
    plt.grid()
    
    # Create Linear Regression
    (slope, intercept, rvalue, pvalue, stderr) = linregress(x_reg_plots[x], y_reg_plots[x])
    r_squared = rvalue**2
    print(f"The r-squared is: {r_squared}")
    regress_values = x_reg_plots[x] * slope + intercept
    line_eq = "y = " + str(round(slope,2)) + "x + " + str(round(intercept,2))
    plt.plot(x_reg_plots[x],regress_values,"r-")
    plt.annotate(line_eq,(x_reg_plots[x].min(),y_reg_plots[x].max()),fontsize=15,color="red")
    
    # Display Chart with Tight Layout
    plt.show()
    plt.tight_layout()
    
    # Save the figure
    plt.savefig(f"../output_data/Fig{x + 5}.png")
    
    # create a print statement for each graph
    if (x_reg_labels[x] == "Southern Hemisphere Latitude" and y_reg_labels[x] == "Temperature (F)"):
        print(f"These graphs are a set of scatter plots that are analyzing the latitude compared to the maximum temperature, in Fahrenheit, for all cities in the DataFrame. The first scatter plot looks at this relationship for the Northern Hemisphere. We can see that as \nthe farther north, away from the equator, you get, the colder the temperature tends to be. The r-squared value is strong, so \nthis is reliable information. The second scatter plot looks at this relationship for the Southern Hemisphere. We can see that \nas the farther north, toward from the equator, you get, the warmer the temperature tends to be. The r-squared value is strong, so this is reliable information. This is data for April 17, 2020.")
    elif (x_reg_labels[x] == "Southern Hemisphere Latitude" and y_reg_labels[x] == "Humidity (%)"):
        print(f"These graphs are a set of scatter plots that are analyzing the latitude compared to the humidity percentage, for all cities in the DataFrame. The first scatter plot looks at this relationship for the Northern Hemisphere. We can see that as the farther \nnorth, away from the equator, you get, the more humid the weather tends to be. The r-squared value is very weak, so this is not reliable information. The second scatter plot looks at this relationship for the Southern Hemisphere. We can see that as the \nfarther north, toward from the equator, you get, the more humid the weather tends to be. The r-squared value is very weak, so \nthis is not reliable information. This is data for April 17, 2020.")
    elif (x_reg_labels[x] == "Southern Hemisphere Latitude" and y_reg_labels[x] == "Cloudiness (%)"):
        print(f"These graphs are a set of scatter plots that are analyzing the latitude compared to the cloud coverage percentage, for all \ncities in the DataFrame. The first scatter plot looks at this relationship for the Northern Hemisphere. We can see that as the \nfarther north, away from the equator, you get, the more cloud coverage there tends to be. The r-squared value is very weak, so this is not reliable information. The second scatter plot looks at this relationship for the Southern Hemisphere. We can see \nthat as the farther north, toward from the equator, you get, the more cloud coverage there tends to be. The r-squared value is very weak, so this is not reliable information. This is data for April 17, 2020.")
    elif (x_reg_labels[x] == "Southern Hemisphere Latitude" and y_reg_labels[x] == "Wind Speed (MPH)"):
        print(f"These graphs are a set of scatter plots that are analyzing the latitude compared to the wind speed, in miles per hour, for all \ncities in the DataFrame. The first scatter plot looks at this relationship for the Northern Hemisphere. We can see that as the farther north, away from the equator, you get, the less wind there tends to be. The r-squared value is very weak, so this is \nnot reliable information. The second scatter plot looks at this relationship for the Southern Hemisphere. We can see that as \nthe farther north, toward from the equator, you get, the windier  the weather tends to be. The r-squared value is very weak, so this is not reliable information. This is data for April 17, 2020.")
        

<IPython.core.display.Javascript object>

The r-squared is: 0.7867600267171967


<IPython.core.display.Javascript object>

The r-squared is: 0.48287552476511786
These graphs are a set of scatter plots that are analyzing the latitude compared to the maximum temperature, in Fahrenheit, for all cities in the DataFrame. The first scatter plot looks at this relationship for the Northern Hemisphere. We can see that as 
the farther north, away from the equator, you get, the colder the temperature tends to be. The r-squared value is strong, so 
this is reliable information. The second scatter plot looks at this relationship for the Southern Hemisphere. We can see that 
as the farther north, toward from the equator, you get, the warmer the temperature tends to be. The r-squared value is strong, so this is reliable information. This is data for April 17, 2020.


<IPython.core.display.Javascript object>

The r-squared is: 0.07614101301652464


<IPython.core.display.Javascript object>

The r-squared is: 0.009680391050772237
These graphs are a set of scatter plots that are analyzing the latitude compared to the humidity percentage, for all cities in the DataFrame. The first scatter plot looks at this relationship for the Northern Hemisphere. We can see that as the farther 
north, away from the equator, you get, the more humid the weather tends to be. The r-squared value is very weak, so this is not reliable information. The second scatter plot looks at this relationship for the Southern Hemisphere. We can see that as the 
farther north, toward from the equator, you get, the more humid the weather tends to be. The r-squared value is very weak, so 
this is not reliable information. This is data for April 17, 2020.


<IPython.core.display.Javascript object>

The r-squared is: 0.011012732052669881


<IPython.core.display.Javascript object>

The r-squared is: 0.0008682907909482733
These graphs are a set of scatter plots that are analyzing the latitude compared to the cloud coverage percentage, for all 
cities in the DataFrame. The first scatter plot looks at this relationship for the Northern Hemisphere. We can see that as the 
farther north, away from the equator, you get, the more cloud coverage there tends to be. The r-squared value is very weak, so this is not reliable information. The second scatter plot looks at this relationship for the Southern Hemisphere. We can see 
that as the farther north, toward from the equator, you get, the more cloud coverage there tends to be. The r-squared value is very weak, so this is not reliable information. This is data for April 17, 2020.


<IPython.core.display.Javascript object>

The r-squared is: 0.04603632734435227


<IPython.core.display.Javascript object>

The r-squared is: 0.0749264439603203
These graphs are a set of scatter plots that are analyzing the latitude compared to the wind speed, in miles per hour, for all 
cities in the DataFrame. The first scatter plot looks at this relationship for the Northern Hemisphere. We can see that as the farther north, away from the equator, you get, the less wind there tends to be. The r-squared value is very weak, so this is 
not reliable information. The second scatter plot looks at this relationship for the Southern Hemisphere. We can see that as 
the farther north, toward from the equator, you get, the windier  the weather tends to be. The r-squared value is very weak, so this is not reliable information. This is data for April 17, 2020.


In [16]:
# Specify Output Location 
output = "../output_data/observable_trends.txt"

In [17]:
# Write Observable Trends to Text File
with open(output, "w", encoding="utf-8") as txtfile:
    txtfile.write("Observable Trends:\n")
    txtfile.write("1. The strongest r-squared value I found was comparing the Northern Hemisphere’s latitude to the temperatures of cities that correspond. I am most confident saying that as you get farther north from the equator, the average temperature will fall.\n")
    txtfile.write("2. The r-squared value for the Northern Hemisphere’s latitude to the temperatures of cities that correspond is markedly higher than the same comparison for the Southern Hemisphere. I would want to investigate this further to see why the Northern Hemisphere appears to be more consistent. \n")
    txtfile.write("3. The graphs for latitude vs cloudiness, humidity, and wind speed, respectively, have very low r-squared values. The correlation between latitude and these different measures are too low to draw meaningful conclusions.\n")