<a href="https://colab.research.google.com/github/ecomunick/omdena/blob/main/EDA_project/notebook/GlobalWeather.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>




## **Global Weather Information Dataset**
This dataset provides a treasure trove of daily weather information for capital cities around the world. From temperature fluctuations and wind speeds to air quality measurements and moon phases, this dataset opens up a world of possibilities for your analyses.  🌦️☁️🌤️🌧️🌈




### Key Features

*   country: Country of the weather data
*   location_name: Name of the location (city)
*   latitude: Latitude coordinate of the location
*   longitude: Longitude coordinate of the location
*   timezone: Timezone of the location
*   last_updated_epoch: Unix timestamp of the last data update
*   last_updated: Local time of the last data update
*   temperature_celsius: Temperature in degrees Celsius
*   temperature_fahrenheit: Temperature in degrees Fahrenheit
*   condition_text: Weather condition description
*   wind_mph: Wind speed in miles per hour
*   wind_kph: Wind speed in kilometers per hour
*   wind_degree: Wind direction in degrees
*   wind_direction: Wind direction as a 16-point compass
*   pressure_mb: Pressure in millibars
*   pressure_in: Pressure in inches
*   precip_mm: Precipitation amount in millimeters
*   precip_in: Precipitation amount in inches
*   humidity: Humidity as a percentage
*   cloud: Cloud cover as a percentage
*   feels_like_celsius: Feels-like temperature in Celsius
*   feels_like_fahrenheit: Feels-like temperature in Fahrenheit
*   visibility_km: Visibility in kilometers
*   visibility_miles: Visibility in miles
*   uv_index: UV Index
*   gust_mph: Wind gust in miles per hour
*   gust_kph: Wind gust in kilometers per hour
*   air_quality_Carbon_Monoxide: Air quality measurement: Carbon Monoxide
*   air_quality_Ozone: Air quality measurement: Ozone
*   air_quality_Nitrogen_dioxide: Air quality measurement: Nitrogen Dioxide
*   air_quality_Sulphur_dioxide: Air quality measurement: Sulphur Dioxide
*   air_quality_PM2.5: Air quality measurement: PM2.5
*   air_quality_PM10: Air quality measurement: PM10
*   air_quality_us-epa-index: Air quality measurement: US EPA Index
*   air_quality_gb-defra-index: Air quality measurement: GB DEFRA Index
*   sunrise: Local time of sunrise
*   sunset: Local time of sunset
*   moonrise: Local time of moonrise
*   moonset: Local time of moonset
*   moon_phase: Current moon phase
*   moon_illumination: Moon illumination percentage


In [16]:
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go

In [2]:
URL = 'https://raw.githubusercontent.com/ecomunick/omdena/main/EDA_project/data/GlobalWeatherRepository.csv'

#df = pd.read_csv("/GlobalWeatherRepository.csv", sep=",")
df = pd.read_csv(URL, sep=",")
df.head()

Unnamed: 0,country,location_name,latitude,longitude,timezone,last_updated_epoch,last_updated,temperature_celsius,temperature_fahrenheit,condition_text,...,air_quality_PM2.5,air_quality_PM10,air_quality_us-epa-index,air_quality_gb-defra-index,sunrise,sunset,moonrise,moonset,moon_phase,moon_illumination
0,Afghanistan,Kabul,34.52,69.18,Asia/Kabul,1693301400,2023-08-29 14:00,28.8,83.8,Sunny,...,7.9,11.1,1,1,05:24 AM,06:24 PM,05:39 PM,02:48 AM,Waxing Gibbous,93
1,Albania,Tirana,41.33,19.82,Europe/Tirane,1693301400,2023-08-29 11:30,27.0,80.6,Partly cloudy,...,28.2,29.6,2,3,06:04 AM,07:19 PM,06:50 PM,03:25 AM,Waxing Gibbous,93
2,Algeria,Algiers,36.76,3.05,Africa/Algiers,1693301400,2023-08-29 10:30,28.0,82.4,Partly cloudy,...,6.4,7.9,1,1,06:16 AM,07:21 PM,06:46 PM,03:50 AM,Waxing Gibbous,93
3,Andorra,Andorra La Vella,42.5,1.52,Europe/Andorra,1693301400,2023-08-29 11:30,10.2,50.4,Sunny,...,0.5,0.8,1,1,07:16 AM,08:34 PM,08:08 PM,04:38 AM,Waxing Gibbous,93
4,Angola,Luanda,-8.84,13.23,Africa/Luanda,1693301400,2023-08-29 10:30,25.0,77.0,Partly cloudy,...,139.6,203.3,4,10,06:11 AM,06:06 PM,04:43 PM,04:41 AM,Waxing Gibbous,93


In [6]:
# see how many lines and columns (variables) the dataset has
df.shape

(2339, 41)

In [5]:
# cool, looks like there is no NAs values
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2339 entries, 0 to 2338
Data columns (total 41 columns):
 #   Column                        Non-Null Count  Dtype  
---  ------                        --------------  -----  
 0   country                       2339 non-null   object 
 1   location_name                 2339 non-null   object 
 2   latitude                      2339 non-null   float64
 3   longitude                     2339 non-null   float64
 4   timezone                      2339 non-null   object 
 5   last_updated_epoch            2339 non-null   int64  
 6   last_updated                  2339 non-null   object 
 7   temperature_celsius           2339 non-null   float64
 8   temperature_fahrenheit        2339 non-null   float64
 9   condition_text                2339 non-null   object 
 10  wind_mph                      2339 non-null   float64
 11  wind_kph                      2339 non-null   float64
 12  wind_degree                   2339 non-null   int64  
 13  win

In [8]:
# just to confirm
#df.isnull().sum()

In [9]:
df.describe()

Unnamed: 0,latitude,longitude,last_updated_epoch,temperature_celsius,temperature_fahrenheit,wind_mph,wind_kph,wind_degree,pressure_mb,pressure_in,...,gust_kph,air_quality_Carbon_Monoxide,air_quality_Ozone,air_quality_Nitrogen_dioxide,air_quality_Sulphur_dioxide,air_quality_PM2.5,air_quality_PM10,air_quality_us-epa-index,air_quality_gb-defra-index,moon_illumination
count,2339.0,2339.0,2339.0,2339.0,2339.0,2339.0,2339.0,2339.0,2339.0,2339.0,...,2339.0,2339.0,2339.0,2339.0,2339.0,2339.0,2339.0,2339.0,2339.0,2339.0
mean,19.30926,21.770073,1693744000.0,22.47422,72.453784,6.59115,10.605857,164.875588,1013.566481,29.929436,...,17.255537,492.114536,40.87717,10.424498,6.328132,20.251689,32.671868,1.460026,2.050876,80.57546
std,24.579353,65.626359,291436.8,6.492561,11.686864,4.590691,7.390326,103.338255,5.89966,0.174017,...,11.206073,962.135212,33.323934,19.184467,14.292353,54.475459,72.733313,0.916303,2.196792,20.370388
min,-41.3,-175.2,1693301000.0,2.9,37.2,2.2,3.6,1.0,992.0,29.3,...,0.0,123.5,0.0,0.0,0.0,0.5,0.5,1.0,1.0,39.0
25%,3.75,-6.84,1693482000.0,17.7,63.9,3.1,5.0,80.0,1010.0,29.83,...,9.0,220.3,17.7,1.05,0.4,3.1,5.7,1.0,1.0,65.0
50%,17.25,23.24,1693784000.0,23.8,74.8,5.6,9.0,160.0,1013.0,29.91,...,15.1,270.4,35.8,3.7,1.4,7.5,12.3,1.0,1.0,88.0
75%,41.33,50.58,1693998000.0,27.55,81.6,9.4,15.1,246.0,1017.0,30.03,...,22.7,437.25,55.8,11.5,5.4,17.5,27.5,2.0,2.0,98.0
max,63.83,179.22,1694214000.0,45.0,113.0,43.8,70.6,360.0,1036.0,30.59,...,110.5,18158.0,320.4,241.3,169.8,895.1,1079.1,6.0,10.0,100.0


# EDA - Exploratory Data Analysis

### Business Problem Solving
#### Analysis by region/country

*   Temperature in degrees Celsius
*   Feels-like temperature in Celsius
*   Wind speed in kilometers per hour
*   Precipitation amount in millimeters
*   Humidity as a percentage
*   Cloud cover as a percentage
*   Air quality measurement: Carbon Monoxide
*   Air quality measurement: Ozone
*   Air quality measurement: Nitrogen Dioxide
*   Air quality measurement: Sulphur Dioxide
*   Moon illumination percentage









In [17]:
#Temperature in degrees Celsius by country

bar_chart = px.bar(df, x='country', y='temperature_celsius', title='Average Temperature by Country')
bar_chart.show()