# Rainy days on the stock market

Dataproject by Josefine Pedersen, Viktor Texel and Pernille Svendsen

> **Table of contents** 
> - Import and set magics
> - Introduction
- Indsæt billede og forklar data
> - Read and clean data from DMI and Yahoo Finance
> - Explore each dataset
- 2 plots med hhv. nedbør og OMXC25 fordelt på måneder --> 1 plot med 2 firkanter heri hvor hvert plot kan ses. Herudover også et interaktivt plot hvor man kan udvælge på specifikke måneder for begge plots samtidig.  
> - Merge datasets
> - Analysis
- Vi skal flytte change_stock til analysis og vise de 10 dage med mest nedgang/fremgang på stockmarkedet og sammenholde med nedbør på disse dage. 
> - Conclusion

- Overvej mulighed for at streamline data ved at lægge ind i py-filen. 

*Imports and set magics:*

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import ipywidgets as widgets
import requests # library for making HTTP requests
import datetime as dt # library for handling date and time objects


# autoreload modules when code is run
%load_ext autoreload
%autoreload 2

# user written modules
import dataproject


# Introduction

In this dataproject we wish to explore if there could be a correlation between weather and developments on the stock market. Through API's we import datasets from DMI and Yahoo Finance to examine whether or not there is a correlation between price fluktuations in the danish OMX C25-index and the amount of precipitation that falls in Denmark. 

# Read and clean data from DMI and Yahoo Finance

Import your data, either through an API or manually, and load it. 

**We import data from DMI**:

In [None]:
# We install a package to inspect data from DMI (Danish Meteorological Institute):

#%pip install dmi-open-data

In [None]:
# We use our API-key given to us from DMI's database
api_key = 'bd463c7d-f6f8-431d-a5a7-c466766a8363'

DMI_URL = 'https://dmigw.govcloud.dk/v2/metObs/collections/observation/items'
r = requests.get(DMI_URL, params={'api-key': api_key}) # Issues a HTTP GET request
print(r)

In [None]:
json = r.json()  # Extract JSON data
print(json.keys())  # Print the keys of the JSON dictionary

df = pd.json_normalize(json['features'])  # Convert JSON object to a Pandas DataFrame


In [None]:
df['time'] = pd.to_datetime(df['properties.observed'])


In [None]:
parameter_ids = df['properties.parameterId'].unique()  # Generate a list of unique parameter ids
print(parameter_ids)  # Print all unique parameter ids

In [None]:
# Specify the desired start and end time
start_time = pd.Timestamp(2022, 1, 1)
end_time = pd.Timestamp(2023, 1, 1)

# Specify one or more station IDs or all_stations
all_stationsDK = [
    '05005', '05009', '05015', '05031', '05035', '05042', '05065', 
    '05070', '05075', '05081', '05085', '05089', '05095', '05105', 
    '05109', '05135', '05140', '05150', '05160', '05165', '05169', 
    '05185', '05199', '05202', '05205', '05220', '05225', '05269', 
    '05272', '05276', '05277', '05290', '05296', '05300', '05305', 
    '05320', '05329', '05343', '05345', '05350', '05355', '05365', 
    '05375', '05381', '05395', '05400', '05406', '05408', '05435', 
    '05440', '05450', '05455', '05469', '05499', '05505', '05510', 
    '05529', '05537', '05545', '05575', '05735', '05880', '05889', 
    '05935', '05945', '05970', '05986', '05994'
]

# Specify one or more parameter IDs or all_parameters
parameterId = ['precip_past1h']

# Derive datetime specifier string
datetime_str = start_time.tz_localize('UTC').isoformat() + '/' + end_time.tz_localize('UTC').isoformat()

dfs = []
for station in all_stationsDK:
    for parameter in parameterId:
        # Specify query parameters
        params = {
            'api-key' : api_key,
            'datetime' : datetime_str,
            'stationId' : station,
            'parameterId' : parameter,
            'limit' : '300000',  # max limit
        }

        # Submit GET request with url and parameters
        r = requests.get(DMI_URL, params=params)
        # Extract JSON object
        json = r.json() # Extract JSON object
        # Convert JSON object to a MultiIndex DataFrame and add to list
        dfi = pd.json_normalize(json['features'])
        if dfi.empty is False:
            dfi['Time'] = pd.to_datetime(dfi['properties.observed'])
            dfi[['station', 'parameter']] = station, parameter
            #dfi = dfi.set_index(['parameter', 'station', 'Time'])
            #dfi = dfi['properties.value'].unstack(['station','parameter'])
            dfi = dfi.set_index(['station', 'Time'])
            dfi = dfi['properties.value'].unstack(['station'])
            dfs.append(dfi)

df = pd.concat(dfs, axis='columns').sort_index()
df.head()



In [None]:
df.reset_index(inplace=True) 
list(df.columns)
df


In [None]:
# We create a row-average of the observations across weather stations
df['Precip'] = df.mean(axis=1)
df['Date'] = df.Time.dt.date 
df2 = df[['Time', 'Date', 'Precip']] 
df2.drop(df2.tail(1).index,inplace=True) # drop last n rows
df2


In [None]:
# We check to see which days have missing hours
tjek = df2.groupby(['Date'])['Time'].count()
tjek = pd.DataFrame(tjek)
tjek

tjek2 = tjek.loc[tjek['Time']!=24]
tjek2

In [None]:
df3 = df2.groupby('Date').mean()
df3.reset_index(inplace=True) 
df3['Date'] =pd.to_datetime(df3['Date'])
df3

In [None]:
# a. create the figure
fig = plt.figure()

# b. plot
ax = fig.add_subplot(1,1,1)

ax.bar(df3['Date'],df3['Precip'])

ax.set_title('Average precipation in 2022')
ax.set_xlabel('Date')
ax.set_ylabel('Precipation');

**We import data from Yahoo Finance**

In [None]:
# We install nescesarry packages for import

#%pip install yfinance
#%pip install yahoofinancials

In [None]:
import yfinance as yf

from yahoofinancials import YahooFinancials

OMXC25 = yf.download('^OMXC25', start='2022-01-01', end='2023-01-01', progress=False)
OMXC25.reset_index(inplace=True) 
OMXC25['Date'] =  pd.to_datetime(OMXC25['Date'])
OMXC25


## Explore each data set

In order to be able to **explore the raw data**, you may provide **static** and **interactive plots** to show important developments 

**Interactive plot** :

In [None]:
def plot_func():
    # Function that operates on data set
    pass

widgets.interact(plot_func, 
    # Let the widget interact with data through plot_func()    
); 


Explain what you see when moving elements of the interactive plot around. 

# Merge data sets

We create combinations of our loaded data sets from DMI and Yahoo Finance.

In [None]:
precip_stock = pd.merge(OMXC25, df3, on='Date', how='left')
precip_stock2 = precip_stock[['Date', 'Close', 'Precip']]
precip_stock2


In [None]:
# We calculate the pct. change from day-to-day on the closing price of OMXC25
precip_stock2['Change_in_stock'] = ((precip_stock2['Close'] / precip_stock2['Close'].shift(1) - 1)* 100)
precip_stock2

Looking at monthly data to get a view of trends

In [None]:
precip_stock3 = precip_stock2
precip_stock3['Month'] = precip_stock3.Date.dt.month 
precip_stock3 = precip_stock3.groupby('Month').mean()
precip_stock3.reset_index(inplace=True) 
precip_stock3

#trend = precip_stock2.groupby(['Date']).count()
#trend

#tjek2 = tjek.loc[tjek['Time']!=24]
#tjek2

Here we are dropping elements from both data set X and data set Y. A left join would keep all observations in data X intact and subset only from Y. 

Make sure that your resulting data sets have the correct number of rows and columns. That is, be clear about which observations are thrown away. 

**Note:** Don't make Venn diagrams in your own data project. It is just for exposition. 

# Analysis

To get a quick overview of the data, we show some **summary statistics** on a meaningful aggregation. 

MAKE FURTHER ANALYSIS. EXPLAIN THE CODE BRIEFLY AND SUMMARIZE THE RESULTS.

# Conclusion

ADD CONCISE CONLUSION.