### A Brief Justification for the Suitability of the Data Set

**Reason:**

The Global Weather Dataset has weather information from many places around the world. It talks about important weather factors like temperature, humidity, and rainfall. The file is perfect for weather research and forecasting on a world scale because it covers a lot of ground and has a lot of information. This information can be used to look at the effects of climate change, area weather trends, and extreme weather events.

### Data Processing (20 Marks):

In [13]:
# Importing the necessary libraries
import pandas as pd # For data manipulation
import numpy as np # For numerical computation

### 2.2 Load the dataset

In [15]:
# Load the dataset
data = pd.read_csv('GlobalWeatherRepository.csv')

### 2.3 Data visualization

In [16]:
# Display the first 5 rows of the dataset
data.head()

Unnamed: 0,country,location_name,latitude,longitude,timezone,last_updated_epoch,last_updated,temperature_celsius,temperature_fahrenheit,condition_text,...,air_quality_PM2.5,air_quality_PM10,air_quality_us-epa-index,air_quality_gb-defra-index,sunrise,sunset,moonrise,moonset,moon_phase,moon_illumination
0,Afghanistan,Kabul,34.52,69.18,Asia/Kabul,1715849100,2024-05-16 13:15,26.6,79.8,Partly Cloudy,...,8.4,26.6,1,1,04:50 AM,06:50 PM,12:12 PM,01:11 AM,Waxing Gibbous,55
1,Albania,Tirana,41.33,19.82,Europe/Tirane,1715849100,2024-05-16 10:45,19.0,66.2,Partly cloudy,...,1.1,2.0,1,1,05:21 AM,07:54 PM,12:58 PM,02:14 AM,Waxing Gibbous,55
2,Algeria,Algiers,36.76,3.05,Africa/Algiers,1715849100,2024-05-16 09:45,23.0,73.4,Sunny,...,10.4,18.4,1,1,05:40 AM,07:50 PM,01:15 PM,02:14 AM,Waxing Gibbous,55
3,Andorra,Andorra La Vella,42.5,1.52,Europe/Andorra,1715849100,2024-05-16 10:45,6.3,43.3,Light drizzle,...,0.7,0.9,1,1,06:31 AM,09:11 PM,02:12 PM,03:31 AM,Waxing Gibbous,55
4,Angola,Luanda,-8.84,13.23,Africa/Luanda,1715849100,2024-05-16 09:45,26.0,78.8,Partly cloudy,...,183.4,262.3,5,10,06:12 AM,05:55 PM,01:17 PM,12:38 AM,Waxing Gibbous,55


### 2.4  Check for Missing Values

In [17]:
# Check for missing values
data.isnull().sum()

country                         0
location_name                   0
latitude                        0
longitude                       0
timezone                        0
last_updated_epoch              0
last_updated                    0
temperature_celsius             0
temperature_fahrenheit          0
condition_text                  0
wind_mph                        0
wind_kph                        0
wind_degree                     0
wind_direction                  0
pressure_mb                     0
pressure_in                     0
precip_mm                       0
precip_in                       0
humidity                        0
cloud                           0
feels_like_celsius              0
feels_like_fahrenheit           0
visibility_km                   0
visibility_miles                0
uv_index                        0
gust_mph                        0
gust_kph                        0
air_quality_Carbon_Monoxide     0
air_quality_Ozone               0
air_quality_Ni

### 2.5 Check Data Types
* This helps us understand which columns might need data type conversions.

In [18]:
# Check data types
df.dtypes

country                                 object
location_name                           object
latitude                               float64
longitude                              float64
timezone                                object
last_updated_epoch                       int64
last_updated                    datetime64[ns]
temperature_celsius                    float64
temperature_fahrenheit                 float64
condition_text                          object
wind_mph                               float64
wind_kph                               float64
wind_degree                              int64
wind_direction                          object
pressure_mb                            float64
pressure_in                            float64
precip_mm                              float64
precip_in                              float64
humidity                                 int64
cloud                                    int64
feels_like_celsius                     float64
feels_like_fa

### 2.6 Cleaning Up Data
#### 2.6.1 What To Do With Missing Values
*Take the right steps to deal with lost numbers.*

Choose:
Get rid of empty values: Get rid of any rows or sections that are blank.
Fill in empty values: Replace empty spaces with a good number, like the mean or median.


In [23]:
# first of all I will check the percentage of missing values in each column
missing_values = data.isnull().mean()*100
print(missing_values) # as we see data are clean and there are no missing values
# Also I want to check which values are they like str,int,float,etc.
data.info()

country                         0.0
location_name                   0.0
latitude                        0.0
longitude                       0.0
timezone                        0.0
last_updated_epoch              0.0
last_updated                    0.0
temperature_celsius             0.0
temperature_fahrenheit          0.0
condition_text                  0.0
wind_mph                        0.0
wind_kph                        0.0
wind_degree                     0.0
wind_direction                  0.0
pressure_mb                     0.0
pressure_in                     0.0
precip_mm                       0.0
precip_in                       0.0
humidity                        0.0
cloud                           0.0
feels_like_celsius              0.0
feels_like_fahrenheit           0.0
visibility_km                   0.0
visibility_miles                0.0
uv_index                        0.0
gust_mph                        0.0
gust_kph                        0.0
air_quality_Carbon_Monoxide 

### 2.6.2 Data Type Conversion

* a. Convert Date and Time Columns

Identify columns that should be in datetime format and convert them.

In [24]:
# Convert 'last_updated' to datetime
df['last_updated'] = pd.to_datetime(df['last_updated'])
# Converting other time columns to datetime
time_columns = ['sunrise', 'sunset', 'moonrise', 'moonset']
for col in time_columns:
    df[col] = pd.to_datetime(df[col], format='%I:%M %p')