### A Brief Justification for the Suitability of the Data Set

**Reason:**

The Global Weather Dataset has weather information from many places around the world. It talks about important weather factors like temperature, humidity, and rainfall. The file is perfect for weather research and forecasting on a world scale because it covers a lot of ground and has a lot of information. This information can be used to look at the effects of climate change, area weather trends, and extreme weather events.

### Data Processing (20 Marks):

In [None]:
# Importing the necessary libraries
import pandas as pd  # For data manipulation
import numpy as np  # For numerical computation

### 2.2 Load the dataset

In [None]:
# Load the dataset
data = pd.read_csv('app/GlobalWeatherRepository.csv')

### 2.3 Data visualization

In [None]:
# Display the first 5 rows of the dataset
data.head()

### 2.4  Check for Missing Values

In [None]:
# Check for missing values
data.isnull().sum()

### 2.5 Check Data Types
* This helps us understand which columns might need data type conversions.

In [None]:
# Check data types
data.dtypes

### 2.6 Cleaning Up Data



In [None]:
# first of all I will check the percentage of missing values in each column
missing_values = data.isnull().mean() * 100
print(missing_values)  # as we see data are clean and there are no missing values
# Also I want to check which values are they like str,int,float,etc.
data.info()

### 2.6.2 Data Type Conversion

 ### a. Convert the Date and Time columns

Determine which columns should be datetime formatted and convert them.

Soo lets looking at the data types, we can observe the following:

**Geographic Location and Basic Details:** The 'country', 'location_name', 'latitude', 'longitude', and 'timezone' columns provide the location and time zone of each observation. These columns seem to have the correct data types ('Object' and 'float64').

**Update time:** The columns 'last_updated_epoch' (an epoch timestamp) and 'last_updated' (which may be in readable date format) denote the update time. If you wish to do an analysis in date format, you must convert the 'last_updated' column to 'datetime'.

**Temperature and Condition Details:** The 'temperature_celsius' and 'temperature_fahrenheit' columns include temperature data, and 'condition_text' offers textual information on the current weather conditions.

In this case, the only modification required is to convert the 'last_updated' column to 'datetime' format. Following that, you may review the cleaned data, which we may save.


In [None]:
# Convert 'last_updated' to datetime
data['last_updated'] = pd.to_datetime(data['last_updated'])
# Converting other time columns to datetime

# showing cleaned data
data.to_csv('GlobalWeatherRepository.csv', index=False)

# describe the data
data.describe()

### Step 3: Data Analysis and Visualization.

We will now do some simple analysis and build visualizations to better comprehend the data's important trends. Here is what we will do:

**Global Temperature Summary:** Determine the top five warmest and coolest regions.

**Group Analysis:** Calculate the average temperature per area or nation.

**Visualizations:** Create at least two visualizations, including a histogram of world temperatures.
- A line map of temperature variations over time in a specified location.


### 3.1 Global Temperature Summary (English)
* To find the top 5 hottest and coldest locations globally, we’ll sort the dataset by the temperature_celsius column and then select the top and bottom entries.

In [None]:
# Top 5 hottest locations
hottest_locations = data.sort_values(by='temperature_celsius', ascending=False).head(5)
print("Top 5 Hottest Locations:\n", hottest_locations[['location_name', 'country', 'temperature_celsius']])

# Top 5 coldest locations
coldest_locations = data.sort_values(by='temperature_celsius', ascending=True).head(5)
print("Top 5 Coldest Locations:\n", coldest_locations[['location_name', 'country', 'temperature_celsius']])


### 3.2 Group Analysis
* Now I'll run a group analysis to get the average temperature by nation. This may aid in determining general temperature trends in various places.

In [None]:
# Average temperature by country
avg_temp_by_country = data.groupby('country')['temperature_celsius'].mean().sort_values(ascending=False)
print("Average Temperature by Country:\n", avg_temp_by_country)


### 3.3 Visualization 
* Next, I will develop some visualizations:

* Histogram of Global Temperatures: This depicts the distribution of temperatures across all places.
Line Plot of Temperature Over Time: If we have time data (such as last_updated), we can make a line plot to examine how temperatures vary over time in a given place.


In [None]:
import matplotlib.pyplot as plt

# Histogram of Global Temperatures
plt.figure(figsize=(10, 6))
plt.hist(data['temperature_celsius'], bins=30, edgecolor='red')
plt.title('Global Temperature Distribution')
plt.xlabel('Temperature (°C)')
plt.ylabel('Frequency')
plt.show()

# Line Plot of Temperature Over Time for Turkey
turkey_data = data[data['country'] == 'Turkey']  # Filter for data from Turkey
plt.figure(figsize=(10, 6))
plt.plot(turkey_data['last_updated'], turkey_data['temperature_celsius'], marker='o')
plt.title('Temperature Over Time - Turkey')
plt.xlabel('Date')
plt.ylabel('Temperature (°C)')
plt.xticks(rotation=45)
plt.show()