Whether financial, political, or social -- data's true power lies in its ability to answer questions definitively. In this project, we try to answer a fundamental question: "What's the weather like as we approach the equator?"
Now, we know what you may be thinking: "Duh. It gets hotter..."
But, if pressed, how would you prove it?
Please refer WeatherPy.ipynb and VacationPy.ipynb for the detailed implementation.
-
Latitude values are measured relative to the equator and range from -90° at the South Pole to +90° at the North Pole. Longitude values are measured relative to the prime meridian. They range from -180° when traveling west to 180° when traveling east.Please checkout geographic coordinate system for further details.
-
Generate a set of representation latitude and longitude values
# Range of latitudes and longitudes lat_range = (-90, 90) lng_range = (-180, 180) #Create a seed np.random.seed(1000) # Create a set of random lat and lng combinations lats = np.random.uniform(lat_range[0], lat_range[1], size=1600) lngs = np.random.uniform(lng_range[0], lng_range[1], size=1600)
-
Find the closest city for each of the representational latitude and longitude values using python citipy library
# Incorporate citipy to determine city based on latitude and longitude from citipy import citipy cities = [] lat_lngs = zip(lats, lngs) # Identify nearest city for each lat, lng combination for lat_lng in lat_lngs: city = citipy.nearest_city(lat_lng[0], lat_lng[1]).city_name # If the city is unique, then add it to a our cities list if city not in cities: cities.append(city)
Note:- Some latitude, longitude combination will not have nearest city (eg:- in the ocean). Hence, a larger set of lat,long was kept initially to get more than 500 cities
-
Next, we perform weather check on each city in the list, using a series of successive API calls to OpenWeatherMap API and extract ['City','Lat', 'Lng', 'Max Temp', 'Humidity', 'Cloudiness', 'Wind Speed', 'Country', 'Date']. This extracted data is kept in a DataFrame.
#Create a placeholder DF for the extracted data from API calls weather_DF = pd.DataFrame(columns=['City','Lat', 'Lng', 'Max Temp', 'Humidity', 'Cloudiness', 'Wind Speed', 'Country', 'Date']) #Data to get extracted summary = ['name', 'coord.lat', 'coord.lon', 'main.temp_max', 'main.humidity', 'clouds.all', 'wind.speed', 'sys.country', 'dt'] #Parms to pass to the API call params = {'units': 'imperial', 'appid' : weather_api_key} #Iteratively call openweathermap api using python wrapper print("Beginning Data Retrieval\n\ -----------------------------") count=0 #Successful queries for index, city in enumerate(cities): try: result = owm.get_current(city,**params) weather_DF.loc[count] = result(*summary) print(f"Processed Record {index} | {city}") count+=1 except: print(f"Record {index}: City {city} not found. Skipping...") time.sleep(1) #1 sec delay between API calls print("-----------------------------\n\ Data Retrieval Complete\n\ -----------------------------")
-
Create a series of scatter plots to showcase the following relationships:
-
Write a function that creates the linear regression plots
def linregress_plots(DF, xl, yl, xlabel='Latitude', ylabel='', title='', figname='plot.png'): m, c, r, p, _ = linregress(DF[xl], DF[yl]) print(f"The r-squared is: {r**2}") #Create a new figure _=plt.figure() #Scatter plot ax = DF.plot(x=xl, y=yl, kind='scatter', s=30, title=title, ylim = (min(DF[yl])-5, max(DF[yl]+15)) ) _=ax.set_xlabel(xlabel) _=ax.set_ylabel(ylabel) #Regression Line y=m*DF[xl] + c _=ax.plot(DF[xl], y, 'r-') pos=((0.15, 0.2) if m<=-0.4 else ((0.15, 0.75) if m>0.4 else (0.5, 0.80))) #Annotate position #A way to dynamically finds the number of decimal positions if there is avery small value Eg:- 0.000000067 #We don't want to denote it as 0.00 val = m*100 digits = 2 while int(val)==0: val*=10 digits+=1 s = "{:."+f"{digits}"+"f}" format_string = "y = "+s+"x + {:.2f}" linear_eqn = format_string.format(m, c) _=ax.annotate(linear_eqn, xy=pos, xycoords='figure fraction', fontsize=15, color='r') plt.savefig(f"../Images/{figname}") _=plt.show() return(r, p) #This function returns the r value, and p value #r value: Pearson Correlation Coefficient #p value: is a measure of the significance of the gradient. If p value is < 0.01 (Significance level), #it means that, we cannot independent variable affects dependant variable
-
Run linear regression on each relationship, only this time separating them into Northern Hemisphere (greater than or equal to 0 degrees latitude) and Southern Hemisphere (less than 0 degrees latitude):
-
Northern Hemisphere - Temperature (F) vs. Latitude
-
Southern Hemisphere - Temperature (F) vs. Latitude
+ Temperature depends on the distance from equator. * Please observe the p value of the linear regression estimator << 0. This means that slope is NOT zero * In both hemispheres, a high correlation between latitude and temperature * We can observe a pattern in scatter plot also + As we move towards equator, temperature increases in both sides of the hemisphere + From the data, it looks like, temperatures at cities equidistant from equator in both the sides might not be same. * For instance, . At latitude +30, temperature is approximated as -0.57*30+90.47=73.37F . At latitude -30, temperature is approximated as 0.65*-30+78.31 = 58.81F. * This is because, most of the northern hemisphere is land and most of the southern hemisphere is ocean and ocean is likely to be colder
-
Northern Hemisphere - Humidity (%) vs. Latitude
-
Southern Hemisphere - Humidity (%) vs. Latitude
- Humidity(%) doesn't correlate with the distance from equator. * Please observe that p value of the linear regression estimator >> 0 (>significance level(typically 0.05)). This means that WE CANNOT say that slope is NOT zero. * In both hemispheres, a near to ZERO correlation between latitude and humidity. * No pattern in scatter plot. - Humidity is centered around different values in both hemispheres. * In northern hemisphere, most of the cities are having humidity around 67%. * In southern hemisphere, most of the cities are having humidity around 73%.
-
Northern Hemisphere - Cloudiness (%) vs. Latitude
-
Southern Hemisphere - Cloudiness (%) vs. Latitude
- Cloudiness(%) doesn't correlate with the distance from equator. * Please observe that p value of the linear regression estimator > significance level (typically 0.05). This means that WE CANNOT say that slope is NOT zero. * In both hemispheres, a weak correlation between latitude and cloudiness. * No pattern in scatter plot. - Cloudiness is centered around different values in both hemispheres. * Northern hemisphere has average cloudiness around 53%. * Southern hemisphere has average cloudiness around 46%.
-
Northern Hemisphere - Wind Speed (mph) vs. Latitude
-
Southern Hemisphere - Wind Speed (mph) vs. Latitude
- Windspeed doesn't correlate with the distance from equator. * Please observe that p value of the linear regression estimator > significance level (typically 0.05). This means that WE CANNOT say that slope is NOT zero. * In both hemispheres, a weak correlation between latitude and Windspeed. * No pattern in scatter plot. - Windspeed is centered around different but close values in both hemispheres. * Northern hemisphere has average windspeed around 8.6 mph. * Southern hemisphere has average windspeed around 7.9 mph.
-
-
Narrow down the DataFrame to find your ideal weather condition. For example:
-
A max temperature lower than 80 degrees but higher than 72.
-
Wind speed less than 10 mph.
-
Zero cloudiness.
Drop any rows that don't contain all three conditions. You want to be sure the weather is ideal.
DF_IDEAL = DF.drop(DF[~((DF['Max Temp']<80.0) & (DF['Max Temp']>70.0) & (DF['Wind Speed']<10.0) & (DF['Cloudiness']==0))].index) DF_IDEAL.info() <class 'pandas.core.frame.DataFrame'> Int64Index: 9 entries, 37 to 536 Data columns (total 8 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 City 9 non-null object 1 Country 9 non-null object 2 Lat 9 non-null float64 3 Lng 9 non-null float64 4 Max Temp 9 non-null float64 5 Humidity 9 non-null float64 6 Cloudiness 9 non-null float64 7 Wind Speed 9 non-null float64 dtypes: float64(6), object(2) memory usage: 648.0+ bytes
-
-
Using Google Places API to find the first hotel for each city located within 5000 meters of your coordinates (The result is sorted based on popularity)
hotel_df = DF_IDEAL.iloc[:,:4].copy() hotel_df['Hotel Name'] = "" base_url = 'https://maps.googleapis.com/maps/api/place/textsearch/json' for index, row in hotel_df.iterrows(): params = { "location": f"{row['Lat']},{row['Lng']}", "query": 'hotel', "radius": 5000, "key": g_key } try: result = requests.get(base_url, params).json() hotel_df.loc[index, "Hotel Name"] = result['results'][0]['name'] except: print(f"Couldn't retrive hotel for {row['City']} at index {index}..Skipping")