# Henter API

Her henter vi inn API-et og lagrer det i en csv fil
Vi henter data for lufttemperatur, nedmørsmengde og vindhastighet over 45 år i Oslo på værstasjonen Oslo-Blindern

In [36]:
# Libraries needed (pandas is not standard and must be installed in Python)
import requests
import pandas as pd

# Insert your own client ID here
client_id = '61f26ba4-3f68-4e56-a3b8-09733ceb82ed'

# Define endpoint and parameters
endpoint = 'https://frost.met.no/observations/v0.jsonld'
parameters = {
    'sources': 'SN18700',
    'elements': 'mean(air_temperature P1D),sum(precipitation_amount P1D),mean(wind_speed P1D)',
    'referencetime': '1975-01-01/2020-12-31',
}
# Issue an HTTP GET request
r = requests.get(endpoint, parameters, auth=(client_id,''))
# Extract JSON data
json = r.json()

# Check if the request worked, print out any errors
if r.status_code == 200:
    data = json['data']
    print('Data retrieved from frost.met.no!')
else:
    print('Error! Returned status code %s' % r.status_code)
    print('Message: %s' % json['error']['message'])
    print('Reason: %s' % json['error']['reason'])


df.to_csv('../data/data_45years.csv')

Data retrieved from frost.met.no!


# Vis data

Her viser vi at dataen eksisterer 

In [37]:
# This will return a Dataframe with all of the observations in a table format
df = pd.DataFrame()
for i in range(len(data)):
    row = pd.DataFrame(data[i]['observations'])
    row['referenceTime'] = data[i]['referenceTime']
    row['sourceId'] = data[i]['sourceId']
    df = pd.concat([df, row])

df = df.reset_index()

print(df)




       index                      elementId  value  unit  \
0          0      mean(air_temperature P1D)   -2.3  degC   
1          1      mean(air_temperature P1D)   -0.4  degC   
2          2  sum(precipitation_amount P1D)    0.0    mm   
3          3  sum(precipitation_amount P1D)    7.4    mm   
4          4           mean(wind_speed P1D)    1.0   m/s   
...      ...                            ...    ...   ...   
83989      0      mean(air_temperature P1D)    1.7  degC   
83990      1      mean(air_temperature P1D)    2.3  degC   
83991      2  sum(precipitation_amount P1D)    5.9    mm   
83992      3  sum(precipitation_amount P1D)    5.6    mm   
83993      4           mean(wind_speed P1D)    5.1   m/s   

                                                   level timeOffset  \
0      {'levelType': 'height_above_ground', 'unit': '...       PT0H   
1      {'levelType': 'height_above_ground', 'unit': '...       PT6H   
2                                                    NaN      PT18

# Analyserer data

Vi ser at vi har mange unødvendige kolonner, som gjør dataen uoversiktlig. Derfor fjerner vi kolonnene: 
level, timeResolution, timeSeriesId, performanceCategory, exposureCategory, qualityCode, sourceId

Deretter lagrer vi dette i en ny csv-fil som vi kaller data_45years_update1

In [38]:
df = df.drop(['level','timeResolution','timeSeriesId','performanceCategory','exposureCategory','qualityCode','sourceId'], axis=1)
print(df)


df.to_csv('../data/data_45years_update1.csv')



       index                      elementId  value  unit timeOffset  \
0          0      mean(air_temperature P1D)   -2.3  degC       PT0H   
1          1      mean(air_temperature P1D)   -0.4  degC       PT6H   
2          2  sum(precipitation_amount P1D)    0.0    mm      PT18H   
3          3  sum(precipitation_amount P1D)    7.4    mm       PT6H   
4          4           mean(wind_speed P1D)    1.0   m/s       PT0H   
...      ...                            ...    ...   ...        ...   
83989      0      mean(air_temperature P1D)    1.7  degC       PT0H   
83990      1      mean(air_temperature P1D)    2.3  degC       PT6H   
83991      2  sum(precipitation_amount P1D)    5.9    mm      PT18H   
83992      3  sum(precipitation_amount P1D)    5.6    mm       PT6H   
83993      4           mean(wind_speed P1D)    5.1   m/s       PT0H   

                  referenceTime  
0      1975-01-01T00:00:00.000Z  
1      1975-01-01T00:00:00.000Z  
2      1975-01-01T00:00:00.000Z  
3      1975

# Legger til en kolonne 'dato'

Vi legger til en kolonne som viser dato, uten klokkeslett

In [39]:
#oppretter en ny kolone med dato uten tiden

df['date'] = pd.to_datetime(df['referenceTime']).dt.date

cols = ['date'] + [col for col in df.columns if col != 'date']
df = df[cols]

df.to_csv('../data/test_date.csv')

# Finner gjennomsnittsverdier per dag
Vi ønsker å hente gjennomsnittet av målinger av de ulike elementene for hver dag, men ser at i filen er det for air_temperature og presipitation_amount to verdier per dag. Derfor vil vi finne gjennomsnittet av de elementene der vi har to verdier per dag

In [40]:
# Gruppér og ta gjennomsnitt per dag og elementId
aggregert = df.groupby(['date', 'elementId']).agg({
    'value': 'mean',
    'unit': 'first'  # beholder enheten
}).reset_index()

aggregert.to_csv('../data/gjsnitt_data.csv')

print(aggregert)




             date                      elementId  value  unit
0      1975-01-01      mean(air_temperature P1D)  -1.35  degC
1      1975-01-01           mean(wind_speed P1D)   1.00   m/s
2      1975-01-01  sum(precipitation_amount P1D)   3.70    mm
3      1975-01-02      mean(air_temperature P1D)  -0.75  degC
4      1975-01-02           mean(wind_speed P1D)   1.00   m/s
...           ...                            ...    ...   ...
50392  2020-12-29           mean(wind_speed P1D)   5.30   m/s
50393  2020-12-29  sum(precipitation_amount P1D)   5.80    mm
50394  2020-12-30      mean(air_temperature P1D)   2.00  degC
50395  2020-12-30           mean(wind_speed P1D)   5.10   m/s
50396  2020-12-30  sum(precipitation_amount P1D)   5.75    mm

[50397 rows x 4 columns]
