## Etapa: 01. Análise exploratória

#### Código criado por: Eduardo Arthur Bitencourt  | Data: 31/07/2023
##### Github: https://github.com/bitencourt-eduardo  | Linkedin: https://www.linkedin.com/in/bitencourt-eduardo/

In [74]:
# !pip install pandas

In [75]:
#Importar Bibliotecas
import pandas as pd
import numpy as np


#### Carregando arquivos FieldPRO

Sensor_FieldPRO, dados transmitidos pelo sensor

In [76]:
# Carregar arquivo de dados transmitidos pelo sensor
name_file = 'Sensor_FieldPRO.csv'
caminho_base = '../datasets/{name_file}'

df_sensor = pd.read_csv(caminho_base.format(name_file=name_file), sep=',', decimal='.' )
df_sensor.head(10)

Unnamed: 0,Datetime – utc,air_humidity_100,air_temperature_100,atm_pressure_main,num_of_resets,piezo_charge,piezo_temperature
0,2020-09-30T23:00:00Z,38.0,31.366,9412,0,45123,30
1,2020-10-01T00:00:00Z,,,9412,0,45025,31
2,2020-10-01T01:00:00Z,39.0,31.366,9419,0,44923,31
3,2020-10-01T02:00:00Z,39.0,31.322,9419,0,44825,31
4,2020-10-01T03:00:00Z,38.0,31.24,9416,0,44728,31
5,2020-10-01T04:00:00Z,39.0,30.828,9411,0,44632,30
6,2020-10-01T05:00:00Z,,,9411,0,44537,30
7,2020-10-01T06:00:00Z,39.0,30.686,9413,0,44441,30
8,2020-10-01T07:00:00Z,40.0,30.63,9415,0,44347,30
9,2020-10-01T08:00:00Z,41.0,30.588,9419,0,44251,30


In [77]:
df_sensor.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1711 entries, 0 to 1710
Data columns (total 7 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   Datetime – utc       1711 non-null   object 
 1   air_humidity_100     1705 non-null   float64
 2   air_temperature_100  1705 non-null   float64
 3   atm_pressure_main    1711 non-null   int64  
 4   num_of_resets        1711 non-null   int64  
 5   piezo_charge         1711 non-null   int64  
 6   piezo_temperature    1711 non-null   int64  
dtypes: float64(2), int64(4), object(1)
memory usage: 93.7+ KB


In [78]:
df_sensor.isnull().sum() 

Datetime – utc         0
air_humidity_100       6
air_temperature_100    6
atm_pressure_main      0
num_of_resets          0
piezo_charge           0
piezo_temperature      0
dtype: int64

* Data e hora estão campo time-stamp ISO-8601 cujo time zone é UTC-0, 
    * contudo não detectado Dtype correto pelo pandas, indicando algum dado incorreto
* Observado valores nullos para humidade do ar e temperatura

Corrigindo formato time-stamp do campo data

In [79]:
df_sensor['timestamp-utc'] = pd.to_datetime(df_sensor['Datetime – utc'], format='ISO8601')
df_sensor.head()

Unnamed: 0,Datetime – utc,air_humidity_100,air_temperature_100,atm_pressure_main,num_of_resets,piezo_charge,piezo_temperature,timestamp-utc
0,2020-09-30T23:00:00Z,38.0,31.366,9412,0,45123,30,2020-09-30 23:00:00+00:00
1,2020-10-01T00:00:00Z,,,9412,0,45025,31,2020-10-01 00:00:00+00:00
2,2020-10-01T01:00:00Z,39.0,31.366,9419,0,44923,31,2020-10-01 01:00:00+00:00
3,2020-10-01T02:00:00Z,39.0,31.322,9419,0,44825,31,2020-10-01 02:00:00+00:00
4,2020-10-01T03:00:00Z,38.0,31.24,9416,0,44728,31,2020-10-01 03:00:00+00:00


In [80]:
df_sensor.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1711 entries, 0 to 1710
Data columns (total 8 columns):
 #   Column               Non-Null Count  Dtype              
---  ------               --------------  -----              
 0   Datetime – utc       1711 non-null   object             
 1   air_humidity_100     1705 non-null   float64            
 2   air_temperature_100  1705 non-null   float64            
 3   atm_pressure_main    1711 non-null   int64              
 4   num_of_resets        1711 non-null   int64              
 5   piezo_charge         1711 non-null   int64              
 6   piezo_temperature    1711 non-null   int64              
 7   timestamp-utc        1711 non-null   datetime64[ns, UTC]
dtypes: datetime64[ns, UTC](1), float64(2), int64(4), object(1)
memory usage: 107.1+ KB


_____________________________________________________________________________________________________________________________

#### Carregando arquivos

Estacao_Convencional, dados transmitidos pela estão metereológica próxima

In [81]:
name_file = 'Estacao_convencional.csv'
caminho_base = '../datasets/{name_file}'

df_estacao = pd.read_csv(caminho_base.format(name_file=name_file), sep=',', decimal='.' )
df_estacao.head()

Unnamed: 0,data,Hora (Brasília),chuva
0,2020-09-01,00:00:00,0.0
1,2020-09-01,01:00:00,0.0
2,2020-09-01,02:00:00,0.0
3,2020-09-01,03:00:00,0.0
4,2020-09-01,04:00:00,0.0


In [82]:
df_estacao.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2256 entries, 0 to 2255
Data columns (total 3 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   data             2256 non-null   object 
 1   Hora (Brasília)  2256 non-null   object 
 2   chuva            2256 non-null   float64
dtypes: float64(1), object(2)
memory usage: 53.0+ KB


* Nenhum valor nulo observado
* Observados 
    * 2256 registros transmitidos pela estação metereológica
    * 1711 registros transmitidos pela placa FieldPro


* Campos data e hora colunas distintas, e time-zone UTC-3h00 (Brasília).
    * Observar horário de verão para o período
    * Necessário compatibilizar bases de tempo(data/hora)

In [83]:
df_estacao['timestamp'] = pd.to_datetime(df_estacao['data'] + ' ' + df_estacao['Hora (Brasília)'] + '+03:00', format='ISO8601')
df_estacao.head()

Unnamed: 0,data,Hora (Brasília),chuva,timestamp
0,2020-09-01,00:00:00,0.0,2020-09-01 00:00:00+03:00
1,2020-09-01,01:00:00,0.0,2020-09-01 01:00:00+03:00
2,2020-09-01,02:00:00,0.0,2020-09-01 02:00:00+03:00
3,2020-09-01,03:00:00,0.0,2020-09-01 03:00:00+03:00
4,2020-09-01,04:00:00,0.0,2020-09-01 04:00:00+03:00


In [88]:
df_estacao['timestamp-utc'] = df_estacao['timestamp'].tz_convert('UTC')
#timestamp_utc = timestamp_with_timezone.tz_convert('UTC')

TypeError: index is not a valid DatetimeIndex or PeriodIndex

In [84]:
df_estacao.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2256 entries, 0 to 2255
Data columns (total 4 columns):
 #   Column           Non-Null Count  Dtype                    
---  ------           --------------  -----                    
 0   data             2256 non-null   object                   
 1   Hora (Brasília)  2256 non-null   object                   
 2   chuva            2256 non-null   float64                  
 3   timestamp        2256 non-null   datetime64[ns, UTC+03:00]
dtypes: datetime64[ns, UTC+03:00](1), float64(1), object(2)
memory usage: 70.6+ KB


In [85]:
df_estacao['timestamp'].min()

Timestamp('2020-09-01 00:00:00+0300', tz='UTC+03:00')

In [86]:
df_sensor['timestamp'].min()

KeyError: 'timestamp'

In [None]:
df_estacao['timestamp'].max()

In [None]:
df_sensor['timestamp'].max()