### Resubmission

I found some probable misinterpretations when I reviewed my work, which I have detailed below. 
If an answer is still incorrect due to *performing the* ***incorrect*** *calculation* and not due to *performing the calculation* ***incorrectly***, then can you please provide the proper interpretation of the question on the grading form? 

Thank you.

# [Answers](#Exercises)

*data returned from API call saved as JSON `data.json` and can be loaded to save a call*

 - Resubmission Review
   - #4, I believe this was a simple misread.
     - > What was the largest change in any one day (based on High and Low price)?
     - correction: high - low for each day, then max change
     - 
   - #5, possible misinterpretation. I'm still unsure of the correct way to read question, I tried two new approaches.
     - > What was the largest change between any two days (based on Closing Price)?
     - ~~correction 1: largest overall change in closing price for the year (range)~~
     - correction 2: largest change for consecutive closing prices (day+1 - day)
       - consecutive days and consecutive data points yielded the same answer
   - #7, again unsure of my error. I tried treating the non-trading days to have `0` trading volume to change the median calculation
     - > (Optional) What was the median trading volume during this year. (Note: you may need to implement your own function for calculating the median.)
     - previous try: sort daily trading volume data, take middle value for median
     - ~~correction: prepend sorted data with 0's such that data contains 365 trading volume data points, take middle value for median~~
     - correction 2: remove non-trading days, days without opening price values. From 255 days to 252 days.
       - there is still a discrepancy with the 251 trading days found from external sources for 2017



In [2]:
with open('data.json', 'r') as file:
    struct = json.loads(file.read()) 
    
# keys saved as strings and note datetime objects, would have to convert back
# datetime.strptime(<key>,'%Y-%m-%d')

---

This exercise will require you to pull some data from https://data.nasdaq.com/ (formerly Quandl API).

As a first step, you will need to register a free account on the https://data.nasdaq.com/ website.

After you register, you will be provided with a unique API key, that you should store:

*Note*: Use a `.env` file and put your key in there and `python-dotenv` to access it in this notebook. 

The code below uses a key that was used when generating this project but has since been deleted. Never submit your keys to source control. There is a `.env-example` file in this repository to illusrtate what you need. Copy that to a file called `.env` and use your own api key in that `.env` file. Make sure you also have a `.gitignore` file with a line for `.env` added to it. 

The standard Python gitignore is [here](https://github.com/github/gitignore/blob/master/Python.gitignore) you can just copy that. 

In [168]:
print(API_KEY)

...


*I removed my key prior to submission. It will not be located within an ".env" file within the repo. Please use your own key when evaluating my work, thanks.*

In [79]:
API_KEY = '...' # KEY REMOVED BEFORE SUBMISSION

In [1]:
# First, import the relevant modules
import requests
import json

Note: API's can change a bit with each version, for this exercise it is reccomended to use the nasdaq api at `https://data.nasdaq.com/api/v3/`. This is the same api as what used to be quandl so `https://www.quandl.com/api/v3/` should work too.

Hint: We are looking for the `AFX_X` data on the `datasets/FSE/` dataset.

In [3]:
# Now, call the Nasdaq API and pull out a small sample of the data (only one day) to get a glimpse
# into the JSON structure that will be returned

In [3]:
URL = f'https://data.nasdaq.com/api/v3/datasets/FSE/AFX_X.json?start_date=2020-11-30&end_date=2020-12-01&api_key={API_KEY}'
r = requests.get(URL)
if r.status_code == 200:
    data = r.json()

In [4]:
# Inspect the JSON structure of the object you created, and take note of how nested it is,
# as well as the overall structure
data

{'dataset': {'id': 10095370,
  'dataset_code': 'AFX_X',
  'database_code': 'FSE',
  'name': 'Carl Zeiss Meditec (AFX_X)',
  'description': 'Stock Prices for Carl Zeiss Meditec (2020-11-02) from the Frankfurt Stock Exchange.<br><br>Trading System: Xetra<br><br>ISIN: DE0005313704',
  'refreshed_at': '2020-12-01T14:48:09.907Z',
  'newest_available_date': '2020-12-01',
  'oldest_available_date': '2000-06-07',
  'column_names': ['Date',
   'Open',
   'High',
   'Low',
   'Close',
   'Change',
   'Traded Volume',
   'Turnover',
   'Last Price of the Day',
   'Daily Traded Units',
   'Daily Turnover'],
  'frequency': 'daily',
  'type': 'Time Series',
  'premium': False,
  'limit': None,
  'transform': None,
  'column_index': None,
  'start_date': '2020-11-30',
  'end_date': '2020-12-01',
  'data': [['2020-12-01',
    112.2,
    112.2,
    111.5,
    112.0,
    None,
    51.0,
    5703.0,
    None,
    None,
    None],
   ['2020-11-30',
    111.0,
    113.6,
    111.0,
    112.1,
    None,
   

These are your tasks for this mini project:

1. Collect data from the Franfurt Stock Exchange, for the ticker AFX_X, for the whole year 2017 (keep in mind that the date format is YYYY-MM-DD).
2. Convert the returned JSON object into a Python dictionary.
3. Calculate what the highest and lowest opening prices were for the stock in this period.
4. What was the largest change in any one day (based on High and Low price)?
5. What was the largest change between any two days (based on Closing Price)?
6. What was the average daily trading volume during this year?
7. (Optional) What was the median trading volume during this year. (Note: you may need to implement your own function for calculating the median.)

----

Data comes as list for each day within `"data"` key. List contains, in order (see `"column_names"` key
 - date "YYYY-MM-DD"
 - Opening price
 - High
 - Low
 - Close
 - Change
 - Traded Volume
 - Turnover
 - Last Price of the Day
 - Daily Traded Units
 - Daily Turnover
```
  ['2020-11-30',
    111.0,
    113.6,
    111.0,
    112.1,
    None,
    315.0,
    35111.5,
    None,
    None,
    None]
```

----
# Exercises

1. Collect data from the Franfurt Stock Exchange, for the ticker AFX_X, for the whole year 2017 (keep in mind that the date format is YYYY-MM-DD).
2. Convert the returned JSON object into a Python dictionary.

*dataset collected in cell below, individual column data extracted into lists later*

*`json` and `requests` packages imported above*

In [6]:
call_params = 'start_date=2017-01-01&end_date=2017-12-31&order=asc' # collapse to daily has no effect on returned data
key = f'&api_key={API_KEY}'

URL = f'https://data.nasdaq.com/api/v3/datasets/FSE/AFX_X.json?{call_params}{key}'
r = requests.get(URL)
if r.status_code == 200:
    r = r.json()

In [7]:
r['dataset'].keys()

dict_keys(['id', 'dataset_code', 'database_code', 'name', 'description', 'refreshed_at', 'newest_available_date', 'oldest_available_date', 'column_names', 'frequency', 'type', 'premium', 'limit', 'transform', 'column_index', 'start_date', 'end_date', 'data', 'collapse', 'order', 'database_id'])

In [22]:
# data
# github notebook display does not allow for scrollable outputs, I will comment out long outputs prior to submission

In [9]:
for i,val in enumerate(r['dataset']['column_names']):
    print(i,val,'\t~\t', r['dataset']['data'][0][i])

0 Date 	~	 2017-01-02
1 Open 	~	 34.99
2 High 	~	 35.94
3 Low 	~	 34.99
4 Close 	~	 35.8
5 Change 	~	 None
6 Traded Volume 	~	 44700.0
7 Turnover 	~	 1590561.0
8 Last Price of the Day 	~	 None
9 Daily Traded Units 	~	 None
10 Daily Turnover 	~	 None


In [10]:
data = r['dataset']['data']

In [11]:
len(data)

255

In [12]:
# changes is mostly none, skip
# 3 None for opens, skip for now to allow max() operation on list

opens = []
closes = []
highs = []
lows = []
vols = []
# changes = []

for datum in data:
    if datum[1]:
        opens.append(datum[1]) 
    highs.append(datum[2])
    lows.append(datum[3]) 
    closes.append(datum[4])
    # changes.append(datum[5])
    vols.append(int(datum[6]))

In [13]:
print('Missing opens:', opens.count(None), 'out of', len(opens))
print('Missing highs:', highs.count(None), 'out of', len(highs))
print('Missing lows:', lows.count(None), 'out of', len(lows))
print('Missing closes:', closes.count(None), 'out of', len(closes))
print('Missing vols:', vols.count(None), 'out of', len(vols))

Missing opens: 0 out of 252
Missing highs: 0 out of 255
Missing lows: 0 out of 255
Missing closes: 0 out of 255
Missing vols: 0 out of 255


According to [wikipedia](https://en.wikipedia.org/wiki/Trading_day#:~:text=after%20Thanksgiving%20Day), 
there were 251 trading days in 2017. I do not know how to interpret missing or extra dates. I'd guess maybe the opens are the real days

In [14]:
# Structured Data
from datetime import datetime, date, timedelta

struct = {}

for datum in data:
    stamp = datetime.strptime(datum[0],'%Y-%m-%d')
    struct[stamp] = {
        'open':datum[1],
        'high':datum[2],
        'low':datum[3],
        'close':datum[4],
        'volume':int(datum[6])
    }
    if not datum[1]:
        print(stamp.strftime('%x'), 'null opening value. Was this a trading day?')
        print(struct[stamp])

04/14/17 null opening value. Was this a trading day?
{'open': None, 'high': 42.48, 'low': 41.985, 'close': 42.2, 'volume': 88416}
04/17/17 null opening value. Was this a trading day?
{'open': None, 'high': 42.48, 'low': 41.985, 'close': 42.2, 'volume': 88416}
05/01/17 null opening value. Was this a trading day?
{'open': None, 'high': 42.245, 'low': 41.655, 'close': 41.72, 'volume': 86348}


*these days will be removed prior to median calculation for Q7*

---
3. Calculate what the highest and lowest opening prices were for the stock in this period.

**The maximum opening price for 2017 was \\$53.11 and the minimum was \\$34.0.**

In [152]:
print(f'The maximum opening price for 2017 was ${max(opens)} and the minimum was ${min(opens)}.')

The maximum opening price for 2017 was $53.11 and the minimum was $34.0.


---
4. What was the largest change in any one day (based on High and Low price)?

**The largest change in 2017 based on high and low price for a day was \\$2.81**

In [153]:
day_change = [highs[i]-lows[i] for i in range(len(highs))]

In [154]:
day_change_bounds = (min(day_change),max(day_change))
day_change_bounds

(0.18999999999999773, 2.8100000000000023)

*unstructured data should be fine, as each pair of high/low values comes from the same date*

---
5. What was the largest change between any two days (based on Closing Price)?

**The largest change between any two consecutive days was $-2.56 dollars, and the largest growth was \\$1.72**

In [156]:

# two days?
by_date = []
for day in struct.keys():
    future = day+timedelta(days=1)
    if future in struct.keys():
        by_date.append(struct[future]['close'] - struct[day]['close'])
    

In [158]:
len(by_date), min(by_date), max(by_date)

(201, -2.559999999999995, 1.7199999999999989)

*Some other ideas below. . .*

In [None]:
# two points?
by_point = [closes[i+1]-closes[i] for i in range(len(closes)-1)]


In [157]:
len(by_point), min(by_point), max(by_point)

(254, -2.559999999999995, 1.7199999999999989)

In [155]:
# largest overall change
# **The largest change between any two days' Closing Prices was $19.03 (range of the closing prices over the year).**
close_change = max(closes)-min(closes)
close_change

19.03

---
6. What was the average daily trading volume during this year?

The average trading volume during 2017 was **32,265** when considering the total and each day of the year. The average when considering only the 255 days with daily trading volume data was **89,124.**

In [18]:
avg_vol = sum(vols)/365
avg_vol2 = sum(vols)/len(vols)

print(avg_vol, avg_vol2, 'volume data for', len(vols), 'days')

62264.94794520548 89124.33725490196 volume data for 255 days


---
7. (Optional) What was the median trading volume during this year. (Note: you may need to implement your own function for calculating the median.)

The median trading volume during 2017 was **$74,723.5** when considering only the 252 days with non-empty values for "Opening", which I interpreted as days with actual trading.

*Interpretation Two* - remove days without opening values, as they might not have been trading days. This leaves 252 values. 
Recall, according to [wikipedia](https://en.wikipedia.org/wiki/Trading_day#:~:text=after%20Thanksgiving%20Day), 
there were 251 trading days in 2017.

 - null opening values
   - 04/14/17
   - 04/17/17
   - 05/01/17
 - no duplicate dates were found in the data
 - could any other date be removed?
   - did not find any reason to do so

In [None]:
# duplicate date check

In [45]:
dates = struct.copy()
vols2 = vols.copy()

In [46]:
# no duplicate dates
dates.keys() == set(dates.keys())

True

In [47]:
# certain days can have the same trading volume
print(len(vols2), len(set(vols2)))

255 250


In [49]:
# find dates with "null" opening value, see above, and their indices
# remove from trading volume list and structured data

inds = {}
for null_day in ['2017-04-14','2017-04-17','2017-05-01']:
    dt = datetime.strptime(null_day,'%Y-%m-%d')
    inds[list(dates.keys()).index(dt)] = dt


In [53]:
list(inds.keys())[::-1]

[85, 75, 74]

In [54]:
print(len(vols2), 'days with volume data before clean')
print(len(dates.keys()), 'days with structured volume data before clean')

# perform in reverse order such that indices do not lose correspondence to dates
for ind in list(inds.keys())[::-1]:
    vols2.pop(ind)
    dates.pop(inds[ind])
    
print(len(vols2), 'days with volume data after clean')
print(len(dates.keys()), 'days with structured volume data after clean')

255 days with volume data before clean
255 days with structured volume data before clean
252 days with volume data after clean
252 days with structured volume data after clean


In [62]:
# consecuitive repeates in volume data?

for i,val in enumerate(vols2[:-1]):
    if vols2[i+1] == val:
        print('repeate volume data for')
        print(list(dates.keys())[i], list(dates.keys())[i+1]) 

In [77]:
# sort and find median
# average middle two values, as there are now an even number of days
vols2.sort()
print('median daily trading volume:', sum(vols2[125:127])/2)

median daily trading volume: 74723.5


---

*Interpretation One* - simple median of datatset (255 values)

The median trading volume during 2017 was **76,600** when considering only the 255 days with trading volume data. 

In [159]:
sorted_vols = vols.copy()

In [160]:
sorted_vols.sort()

In [161]:
print('median daily trading volume:', sorted_vols[round(len(sorted_vols)/2)])

median daily trading volume: 76600


## Incorrect
**4,5,7** incorrect with first submission. I believe I may have simply misinterpreted **4** and **5**. Original submissions with added notes below:

***I thought this meant day-to-day change for highs as one group, and for lows as another. not high vs low for a given day***

---
4. What was the largest change in any one day (based on High and Low price)?

**The largest change in 2017 based on high price was -\\$2.81 and was -\\$3.44 based on low price.**

*I used the API to return price differences instead of calculating them from the original dataset. The largest change considered both negative and positive changes, and in this case they were both negative values.*

In [12]:
call_params = 'start_date=2017-01-01&end_date=2017-12-31&order=asc&collapse=daily&transform=diff&column_index=2'
key = f'&api_key={API_KEY}'

URL = f'https://data.nasdaq.com/api/v3/datasets/FSE/AFX_X.json?{call_params}{key}'
r4a = requests.get(URL)
if r4a.status_code == 200:
    r4a = r4a.json()
data4a = r4a['dataset']['data']

call_params = 'start_date=2017-01-01&end_date=2017-12-31&order=asc&collapse=daily&transform=diff&column_index=3'
key = f'&api_key={API_KEY}'

URL = f'https://data.nasdaq.com/api/v3/datasets/FSE/AFX_X.json?{call_params}{key}'
r4b = requests.get(URL)
if r4b.status_code == 200:
    r4b = r4b.json()
data4b = r4b['dataset']['data']

In [None]:
high_changes = [val[1] for val in data4a]
high_change_bounds = (min(high_changes),max(high_changes))

low_changes = [val[1] for val in data4b]
low_change_bounds = (min(low_changes),max(low_changes))

In [None]:
print(high_change_bounds) # (-2.81, 2.46)
print(low_change_bounds) # (-3.44, 1.61)

---

*any two days should mean the biggest overall change, not specified as any two-day* ***period***

---
5. What was the largest change between any two days (based on Closing Price)?

**The largest two-day change in 2017 based on closing price was -\\$3.15**



In [165]:
two_day_change_close = [closes[i+2]-closes[i] for i in range(len(closes)-2)]

In [166]:
two_day_change_bounds = (min(two_day_change_close),max(two_day_change_close))
print(two_day_change_bounds)

(-3.1499999999999986, 2.280000000000001)


---

***I think sorting and taking the middle value was the correct approach. I tried considering days without data to have `0` trading volume to get a different answer.***

---
7. (Optional) What was the median trading volume during this year. (Note: you may need to implement your own function for calculating the median.)

The median trading volume during 2017 was **76,600**

In [None]:
sorted_vols = vols.copy()
sorted_vols.sort()
print('median daily trading volume:', sorted_vols[round(len(sorted_vols)/2)])