This exercise will require you to pull some data from https://data.nasdaq.com/ (formerly Quandl API).

As a first step, you will need to register a free account on the https://data.nasdaq.com/ website.

After you register, you will be provided with a unique API key, that you should store:

*Note*: Use a `.env` file and put your key in there and `python-dotenv` to access it in this notebook. 

The code below uses a key that was used when generating this project but has since been deleted. Never submit your keys to source control. There is a `.env-example` file in this repository to illusrtate what you need. Copy that to a file called `.env` and use your own api key in that `.env` file. Make sure you also have a `.gitignore` file with a line for `.env` added to it. 

The standard Python gitignore is [here](https://github.com/github/gitignore/blob/master/Python.gitignore) you can just copy that. 

In [1]:
# !pip install python-dotenv

In [2]:
# !pip install requests

In [3]:
# get api key from your .env file
import os
from dotenv import load_dotenv

load_dotenv()
API_KEY = os.getenv('NASDAQ_API_KEY')

# print(API_KEY)

Nasdaq Data has a large number of data sources, but, unfortunately, most of them require a Premium subscription. Still, there are also a good number of free datasets.

For this mini project, we will focus on equities data from the Frankfurt Stock Exhange (FSE), which is available for free. We'll try and analyze the stock prices of a company called Carl Zeiss Meditec, which manufactures tools for eye examinations, as well as medical lasers for laser eye surgery: https://www.zeiss.com/meditec/int/home.html. The company is listed under the stock ticker AFX_X.

You can find the detailed Nasdaq Data API instructions here: https://docs.data.nasdaq.com/docs/in-depth-usage

While there is a dedicated Python package for connecting to the Nasdaq API, we would prefer that you use the *requests* package, which can be easily downloaded using *pip* or *conda*. You can find the documentation for the package here: http://docs.python-requests.org/en/master/ 

Finally, apart from the *requests* package, you are encouraged to not use any third party Python packages, such as *pandas*, and instead focus on what's available in the Python Standard Library (the *collections* module might come in handy: https://pymotw.com/3/collections/).
Also, since you won't have access to DataFrames, you are encouraged to us Python's native data structures - preferably dictionaries, though some questions can also be answered using lists.
You can read more on these data structures here: https://docs.python.org/3/tutorial/datastructures.html

Keep in mind that the JSON responses you will be getting from the API map almost one-to-one to Python's dictionaries. Unfortunately, they can be very nested, so make sure you read up on indexing dictionaries in the documentation provided above.

In [4]:
# First, import the relevant modules
import requests
import json
import numpy as np

Note: API's can change a bit with each version, for this exercise it is reccomended to use the nasdaq api at `https://data.nasdaq.com/api/v3/`. This is the same api as what used to be quandl so `https://www.quandl.com/api/v3/` should work too.

Hint: We are looking for the `AFX_X` data on the `datasets/FSE/` dataset.

### 1. Collect data from the Franfurt Stock Exchange, for the ticker AFX_X, for the whole year 2017 (keep in mind that the date format is YYYY-MM-DD).


In [5]:
# create the url using variables 

database = 'FSE'
dataset = 'AFX_X'
start_date = '2017-01-01'
end_date = '2017-12-31'

#in the previews submit, accidently set the year for start_date and end_date to 2007, instead o

url = "https://data.nasdaq.com/api/v3/datasets" + "/"+database+'/'+dataset+'/data.json?start_date='+start_date+'&end_date='+end_date+'&api_key='+API_KEY

## 2. Convert the returned JSON object into a Python dictionary.

In [6]:
# get the data object and store to variable 'afx'

r = requests.get(url)
afx = json.loads(r.text)
# afx

The extracted JSON data is structured as a  **multidimensional** dictionary. The relevant information we are interested in is nested inside the main dictionary ('afx') under the key "dataset_data". Within this nested dictionary, we need to extract the values associated with the keys **'column_names'** and **'data'**. 
<br/>Let's further explore and retrieve these values.

In [7]:
np.shape(afx['dataset_data']['data']), np.shape(afx['dataset_data']['column_names'])

((255, 11), (11,))

The 'data' is represented as a 2D list with 255 sublists, each containing 11 elements, which matches the shape of the 'column_names' list. We will proceed by creating a new Python dictionary called **'afx_data'** using the column names as keys and assigning the 'data' records as their corresponding values.

In [8]:
#create an empty dictionary
afx_data = {}

# Itinerate through column names
for col in range(0,11):
    # Get the column name
    col_name = afx['dataset_data']['column_names'][col]
    # Create an empty list to store column values
    afx_data[col_name] = []
    
    # itinerate through values
    for val in range(0,255):
        # append the column values to he corresponding list
        afx_data[col_name].append(afx['dataset_data']['data'][val][col]);

In [9]:
afx_data.keys()

dict_keys(['Date', 'Open', 'High', 'Low', 'Close', 'Change', 'Traded Volume', 'Turnover', 'Last Price of the Day', 'Daily Traded Units', 'Daily Turnover'])

## 3. Calculate what the highest and lowest opening prices were for the stock in this period.


First let's check if the column 'Open' has the correct datatype:

In [10]:
# convert it to a np array and check the datatypes:
open_array = np.array(afx_data['Open'])
open_array.dtype

dtype('O')

There are different data types in the list / array. 
<br>We'll create a function called `get_dtype_counts()` that takes a list as input and returns a dictionary with all the data types present in the list along with their counts.

In [11]:
def get_dtype_counts(li):
    dt_types = {}
    for value in li:
        if type(value) not in dt_types:
            dt_types[type(value)] = 1
        else:
            dt_types[type(value)] += 1
        
    return(dt_types)

In [12]:
get_dtype_counts(afx_data['Open'])

{float: 252, NoneType: 3}

The 'Open' column has the correct datatype, but has ***3 missing values***
<p>
Now we find the min and max values in the list and their indexes

In [13]:
max_value = None
max_index = None
min_value = None
min_index = None

for index, value in enumerate(afx_data['Open']):
    if value is not None:
        if max_value is None or value > max_value:
            max_value = value
            max_index = index
        if min_value is None or value < min_value:
            min_value = value
            min_index = index

print("Max Value:", max_value)
print("Max Index:", max_index)
print("Min Value:", min_value)
print("Min Index:", min_index)

Max Value: 53.11
Max Index: 9
Min Value: 34.0
Min Index: 238


In [14]:
print(f"The highest opening price ocurred on {afx_data['Date'][max_index]} and has the value {max_value}")
print(f"The Lowest opening price ocurred on {afx_data['Date'][min_index]} and has the value {min_value}")

The highest opening price ocurred on 2017-12-14 and has the value 53.11
The Lowest opening price ocurred on 2017-01-24 and has the value 34.0


## 4. What was the largest change in any one day (based on High and Low price)?


In [15]:
# Use the `get_dtype_counts()` to check the datatypes:
get_dtype_counts(afx_data['High']), get_dtype_counts(afx_data['Low']) 

({float: 255}, {float: 255})

Both columns have the data stored correctly, datatype is float without any missing value

In [16]:
max_change = 0
for i in range(0,len(afx_data['High'])):
    if afx_data['High'][i] !=None and afx_data['Low'][i] !=None :
        delta = abs(afx_data['High'][i] - afx_data['Low'][i])
        if max_change < delta:
            max_change = delta
            index = i
            
print(f"The largest change occured on {afx_data['Date'][index]} where the Highest value hit {afx_data['High'][index]} and the Lowest {afx_data['Low'][index]}.\nThe difference was {max_change}")  

The largest change occured on 2017-05-11 where the Highest value hit 46.06 and the Lowest 43.25.
The difference was 2.8100000000000023


## 5. What was the largest change between any two days (based on Closing Price)?


In [17]:
# check for datatypes and missing values

get_dtype_counts(afx_data['Close'])

{float: 255}

All the data types are float, there are no missing values in the column 'Close'

In [18]:
closing_prices = afx_data['Close']

# using np.diff() to get the list of the differences, and np.abs() to get the absolute values
diffs = np.abs(np.diff(closing_prices))

# np.argmax() will give the index of the largest change
max_change_index = np.argmax(diffs)
max_change = diffs[max_change_index]

print(f"The largest change between any two days, based on the Closing Price, occured when the price \nchanged from ${afx_data['Close'][max_change_index]} on {afx_data['Date'][max_change_index]} to ${afx_data['Close'][max_change_index+1]} on {afx_data['Date'][max_change_index+1]} \nwith an absolute difference of ${max_change}")

The largest change between any two days, based on the Closing Price, occured when the price 
changed from $41.81 on 2017-08-09 to $44.37 on 2017-08-08 
with an absolute difference of $2.559999999999995


## 6. What was the average daily trading volume during this year?


In [19]:
get_dtype_counts(afx_data['Traded Volume'])

{float: 255}

The `Traded Volume` column has a **float** data type and ***no missing values***

In [20]:
# store the trading volume column into a list using list comprehension to remove None values
trading_volume = [x for x in afx_data['Traded Volume'] if x != None]

# calculate the mean
avg_daily_vol = sum(trading_volume)/len(trading_volume)

print(f"The calculated average daily trading volume for AFX_X in the year 2017 is approximately {avg_daily_vol}")

The calculated average daily trading volume for AFX_X in the year 2017 is approximately 89124.33725490196


## 7. (Optional) What was the median trading volume during this year. (Note: you may need to implement your own function for calculating the median.)

In [21]:
# create a function to find the median
def median(li):
    #sort the list
    li =sorted(li)
    
    #if the list length is odd, the element in the middle is the median
    if len(li) %2 == 1:
        median_index = int((len(li)+1) / 2)-1
        
        return (li[median_index])
    
    # the length is even, the average of the two elements in center is the median
    else:
        median_index =int(len(li) / 2)
        return (li[median_index-1] + li[median_index]) / 2 

In [22]:
print(f" The median trading volume during 2017 was {median(trading_volume)}")

 The median trading volume during 2017 was 76286.0
