# climate analysis in Galicia, Spain  
### NOTE: I did not add screenshots or code about data preparation or processing to avoid making this notebook too long. If you want to see this part better, go to the GitHub repository.

Introducción

Obtención de datos

Limpieza y transformación

Análisis exploratorio

Modelado (si aplica)

Conclusiones

Ideas futuras

## Motivation
I lived in Galicia for a little over two years, and I was always told that it rained a lot in this area, and even more in the past, that "the cold in Galicia is different, it gets into your bones."  
This always sparked my curiosity, asking myself: How many days does it rain a year? How cold is it? Is this "chill in your bones" due to humidity? Which city is the coldest of all? And many other questions I didn't know how to answer.  
So I decided to do this project. Its purpose is to answer several of these questions that sparked my curiosity.  

## ASK 
This project contains data from the largest cities in Galicia (Coruña, Lugo, Ourense, Vigo, Pontevedra, and Santiago de Compostela), where I will conduct data analysis from January 1, 2023, to March 1, 2025. To answer some of the aforementioned questions, I will conduct other types of studies, comparisons, and predictions.

### Questions  
- Which city has the most stable climate (least variability in temperature)?
- How are cities organized by precipitation?
- How are cities organized by temperature?
- How are cities organized by humidity?
- Which is the most extreme city (maximum and minimum temperatures furthest from the Galician average)?
- What climate trends are observed between 2023 and 2024?
- What relationships exist between temperature, humidity, and precipitation?
- What is the percentage of rainfall in Galicia? (days per year)

## Prepare  
### Data
All data was obtained from MeteoGalicia and its MeteoSIX API.
They cover the period from January 1, 2023, to March 31, 2025. Three variables of interest are included: Precipitation, Temperature, and Humidity.

### Tools
The project is largely written in Python.
The libraries used are: Pandas, OS, Streamlit, Plotly, Seaborn, Folium, among others.


### Data type
The data obtained by MeteoGalicia is provided in CSV format. They have a simple graphical interface for obtaining this data from their website. You can obtain data for up to 10 years, but only for one point at a time. Forecast data, on the other hand, is obtained from the MeteoSIX API in JSON (see the streamlit).  
### They are organized  
We have 6 tables (one for each city) with a total of 4 columns (date, humidity, precipitation, and temperature) and 821 rows (one row is equivalent to one day). This represents a total of almost 5,000 data points.
Meanwhile, for the forecast, there are 6 time tables (one for each city) and 5 columns (here, sky_state is added to the columns). This represents a total of 144 data points, considering all the tables.


## Process
We performed a transformation on the DataFrame since it had two levels using pivot_table. The pivot code was as follows:
df_pivot = df.pivot_table(index="Date", columns="Variable", values="Value", aggfunc="first")

In addition, a ".concat" was performed on each table to generate a main table for Galicia, with a "city" column representing the city on which day these values ​​are collected. In other words, the id is composed of: date + city.

## Analyze  
For a better analysis we will divide each variable of interest, where we will have precipitation, temperature and humidity, in that order, but first we need to charge the libraries and the data  
### Libraries

In [1]:
import numpy as np
import pandas as pd
import os
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
import folium
import tempfile

### Data

In [70]:
project = os.path.dirname(os.getcwd())  
folder = os.path.join(project, 'data', 'processed', 'galicia')
file = "galicia.csv"
path_file = os.path.join(folder, file)
df = pd.read_csv(path_file, index_col=0, parse_dates=["fecha"])
df.columns = ['date', 'hum', 'prep', 'temp', 'city']
df['month'] = df['date'].dt.month

In [71]:
df.head()

Unnamed: 0,date,hum,prep,temp,city,month
0,2023-01-01,98.0,22.6,12.01,Coruña,1
1,2023-01-02,90.0,1.1,10.98,Coruña,1
2,2023-01-03,86.0,0.0,12.01,Coruña,1
3,2023-01-04,91.0,0.0,14.55,Coruña,1
4,2023-01-05,95.0,0.0,12.99,Coruña,1


In [72]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 4926 entries, 0 to 820
Data columns (total 6 columns):
 #   Column  Non-Null Count  Dtype         
---  ------  --------------  -----         
 0   date    4926 non-null   datetime64[ns]
 1   hum     4926 non-null   float64       
 2   prep    4926 non-null   float64       
 3   temp    4926 non-null   float64       
 4   city    4926 non-null   object        
 5   month   4926 non-null   int32         
dtypes: datetime64[ns](1), float64(3), int32(1), object(1)
memory usage: 250.1+ KB


### Precipitation  
#### Precipitation about cities:
In this first part we can respond question about cities, like: How are cities organized by precipitation?

In [73]:
df_kpi = df.groupby("city")
rain_days = df[df["prep"] > 0].groupby("city").size()
prom_rain = df.groupby("city")["prep"].mean()

In [74]:
rain_list = rain_days.sort_values(ascending=False).reset_index().rename(columns={0:"prep days count"})
print(rain_list)
fig = px.bar(rain_list, x="city", y="prep days count", 
                        title=f"         Precipitation days count per city ")
fig.update_layout(
        plot_bgcolor='rgba(0, 0, 0, 0)',
        paper_bgcolor='rgba(0, 0, 0, 0)',
        font=dict(color='white'),
        title_font=dict(color='white'),
        legend=dict(font=dict(color='white')),
        xaxis=dict(title='cities', color='white'),
        yaxis=dict(title='rainy days', color='white', gridcolor='rgba(255, 255, 255, 0.4)'),
        autosize=True,
        margin=dict(l=20, r=20, t=40, b=40)
    )
fig.show()

                     city  prep days count
0  Santiago de Compostela              382
1                  Coruña              363
2                    Lugo              351
3              Pontevedra              346
4                    Vigo              324
5                 Ourense              293


In [75]:
prom_rain.sort_values(ascending=False)

prom_rain_list = prom_rain.sort_values(ascending=False).reset_index().rename(columns={0:"prep"})
print(prom_rain_list)
fig = px.bar(prom_rain_list, x="city", y="prep", 
                        title=f"         Average precipitation per city ")
fig.update_layout(
        plot_bgcolor='rgba(0, 0, 0, 0)',
        paper_bgcolor='rgba(0, 0, 0, 0)',
        font=dict(color='white'),
        title_font=dict(color='white'),
        legend=dict(font=dict(color='white')),
        xaxis=dict(title='cities', color='white'),
        yaxis=dict(title='precipitation (L/m2)', color='white', gridcolor='rgba(255, 255, 255, 0.4)'),
        autosize=True,
        margin=dict(l=20, r=20, t=40, b=40)
    )
fig.show()

                     city      prep
0  Santiago de Compostela  5.783313
1              Pontevedra  5.211206
2                    Vigo  4.042996
3                    Lugo  3.358465
4                  Coruña  3.340073
5                 Ourense  2.622412


The distribution for **days with rain**:  
**Santiago de Compostela > Coruña > Lugo > Pontevedra > Vigo > Ourense**  
  
The distribution for **average precipitation** per city:  
**Santiago de Compostela > Pontevedra > Vigo > Lugo > Coruña > Ourense**  
  
**Santiago de Compostela** is the city with most days and most average precipitation  
**Ourense** is the city with the lowest quantity of rainy days and lowest average precipitation  


#### Precipitation about dates:
In this second part we can respond question about dates, like: What is the month in which it rains the most?