Instalando libs

In [2]:
!pip install sqlalchemy psycopg2-binary



Conectando o PostgreSQL a partir do Jupyter

In [3]:
import pandas as pd
from sqlalchemy import create_engine

# conex√£o do Jupyter (dentro do docker) com o Postgres (outro container)
engine = create_engine("postgresql://postgres:postgres@postgres:5432/clima")

Testando

In [4]:
pd.read_sql("SELECT NOW()", engine)

Unnamed: 0,now
0,2025-12-04 05:58:00.080379+00:00


Lendo a tabela inmet_raw para o Pandas

In [5]:
df = pd.read_sql("SELECT * FROM inmet_raw", engine)
df.head()
df.shape

(28854, 9)

In [6]:
df = pd.read_sql("SELECT * FROM inmet_raw LIMIT 20;", engine)
df

Unnamed: 0,id,device_name,ts,temp_ar,umidade,radiacao,vento_vel,precipitacao,pressao
0,1,INMET_Garanhuns,2020-01-01 00:00:00,19.9,94.0,90.6,1.1,0.8,922.6
1,2,INMET_Garanhuns,2020-01-01 01:00:00,19.9,94.0,90.6,0.0,1.6,922.8
2,3,INMET_Garanhuns,2020-01-01 04:00:00,20.2,95.0,90.6,2.3,0.0,920.6
3,4,INMET_Garanhuns,2020-01-01 02:00:00,20.0,95.0,90.6,1.6,0.8,922.4
4,5,INMET_Garanhuns,2020-01-01 03:00:00,20.0,96.0,90.6,2.9,0.6,921.4
5,6,INMET_Garanhuns,2020-01-01 05:00:00,20.1,95.0,90.6,2.8,0.0,920.3
6,7,INMET_Garanhuns,2020-01-01 08:00:00,20.3,95.0,90.6,2.0,0.0,921.0
7,8,INMET_Garanhuns,2020-01-01 06:00:00,20.2,95.0,90.6,2.9,0.0,920.3
8,9,INMET_Garanhuns,2020-01-01 09:00:00,20.4,95.0,90.6,1.0,0.0,921.8
9,10,INMET_Garanhuns,2020-01-01 12:00:00,23.0,81.0,1379.2,2.3,0.0,922.5


An√°lise Explorat√≥ria 

In [8]:
# Selecionar vari√°veis clim√°ticas para an√°lise
variaveis_climaticas = ['temp_ar', 'umidade', 'vento_vel', 'precipitacao', 'pressao']

# Adicionar radia√ß√£o se existir
if 'radiacao' in df.columns:
    variaveis_climaticas.append('radiacao')

# Filtrar apenas colunas que existem
variaveis_climaticas = [v for v in variaveis_climaticas if v in df.columns]

print("üìä Vari√°veis clim√°ticas selecionadas:")
for var in variaveis_climaticas:
    print(f"   ‚û§ {var}")

# Estat√≠sticas descritivas
print("\nüìà Estat√≠sticas Descritivas:")
print(df[variaveis_climaticas].describe())

üìä Vari√°veis clim√°ticas selecionadas:
   ‚û§ temp_ar
   ‚û§ umidade
   ‚û§ vento_vel
   ‚û§ precipitacao
   ‚û§ pressao
   ‚û§ radiacao

üìà Estat√≠sticas Descritivas:
         temp_ar    umidade  vento_vel  precipitacao     pressao     radiacao
count  20.000000  20.000000  20.000000     20.000000   20.000000    20.000000
mean   22.660000  82.350000   2.020000      0.190000  921.210000   720.740000
std     2.810394  14.582885   0.733126      0.427847    1.196882   805.032525
min    19.900000  58.000000   0.000000      0.000000  919.200000    90.600000
25%    20.175000  69.000000   1.675000      0.000000  920.300000    90.600000
50%    21.450000  92.000000   2.150000      0.000000  921.300000   152.500000
75%    25.200000  95.000000   2.450000      0.000000  922.400000  1446.675000
max    27.500000  96.000000   2.900000      1.600000  922.800000  2223.200000


In [11]:
# Detec√ß√£o de outliers usando IQR (Interquartile Range)
def remover_outliers_iqr(df, colunas, factor=1.5):
    """
    Remove outliers usando o m√©todo IQR.
    """
    df_clean = df.copy()
    outliers_removidos = 0
    
    for col in colunas:
        Q1 = df_clean[col].quantile(0.25)
        Q3 = df_clean[col].quantile(0.75)
        IQR = Q3 - Q1
        
        lower_bound = Q1 - factor * IQR
        upper_bound = Q3 + factor * IQR
        
        mask_outlier = (df_clean[col] < lower_bound) | (df_clean[col] > upper_bound)
        n_outliers = mask_outlier.sum()
        
        if n_outliers > 0:
            print(f"   ‚ö†Ô∏è  {col}: {n_outliers} outliers detectados ({n_outliers/len(df_clean)*100:.2f}%)")
            outliers_removidos += n_outliers
            df_clean = df_clean[~mask_outlier]
    
    print(f"\n‚úÖ Total de outliers removidos: {outliers_removidos}")
    return df_clean

print("üîç Detectando e removendo outliers...")
df_sem_outliers = remover_outliers_iqr(df, variaveis_climaticas)
print(f"   ‚û§ Registros ap√≥s remo√ß√£o: {len(df_sem_outliers):,} ({len(df_sem_outliers)/len(df)*100:.1f}% dos dados originais)")

üîç Detectando e removendo outliers...
   ‚ö†Ô∏è  vento_vel: 1 outliers detectados (5.00%)
   ‚ö†Ô∏è  precipitacao: 3 outliers detectados (15.79%)

‚úÖ Total de outliers removidos: 4
   ‚û§ Registros ap√≥s remo√ß√£o: 16 (80.0% dos dados originais)


Modelo - Objetivo: Agrupar per√≠odos clim√°ticos chave para o ciclo da uva no Vale do S√£o Francisco, tentando estimar padr√µes em Garanhuns a partir de Petrolina.

O modelo de K-means treinado em Petrolina gera padr√µes clim√°ticos hor√°rios que conseguimos aplicar em Garanhuns para identificar similaridades e diferen√ßas clim√°ticas. Assim, o que ‚Äòprevemos‚Äô n√£o √© uma vari√°vel, mas sim o padr√£o clim√°tico (cluster) ao qual cada observa√ß√£o de Garanhuns pertence. A m√©trica ARI foi utilizada para comparar os clusters reais de Garanhuns com as previs√µes feitas pelo modelo baseado em Petrolina.