# Running reproducable code
This Notebook guides you in creating code that reproduces the same results over and over.
Additionally, we are going to connect to our RestDB data.

In [10]:
import requests
import pandas as pd

* Enter your URL for the data set from RestDB below.
* Additionaly, enter your API-key.

In [15]:
restUrl = 'https://danielappdatabase-4448.restdb.io/rest/danielappcolletion'
api_key = '0baa4770b809c0273f37e5e3bc30399df8bea'

If your credentials above are correct and you execute the following cell, it should print:<br />
**<Response [200]>**

In [16]:
response = requests.get(restUrl, headers = { 'x-apikey': api_key })
print(response)

<Response [429]>


Now we are going to get our responses in a JSON format (we print only the first and second record).

In [17]:
jsonValues = response.json()
print(jsonValues[:2])

TypeError: unhashable type: 'slice'

Now lets fetch them in a Pandas DataFrame so that we have a more structured set

In [18]:
df = pd.DataFrame.from_dict(jsonValues)
df.head()

ValueError: If using all scalar values, you must pass an index

## Getting the same results over and over again
Execute the following two cells, we are going to fetch 2 results out of the set. Are they the same? Is this reproducible?

In [19]:
df.sample(n=2)

NameError: name 'df' is not defined

In [20]:
df.sample(n=2)

NameError: name 'df' is not defined

We want all of these things to be the same so that machine learning (during development) on every computer will act the same and so that we can compare our performance over various computers and or various models.
To do so, we need to set a random seed, this can be any number.

In [21]:
seed = 20200923

Now let us sample twice again, but with the random_state set to a fixed number.

In [22]:
df.sample(n=2, random_state=seed)

NameError: name 'df' is not defined

In [23]:
df.sample(n=2, random_state=seed)

NameError: name 'df' is not defined

## Reproducible data set for training/development purposes
Now we connected to our RestDB data set, we have created a connection that is not reproducible. This data set is growing over time or old records could be cleared.
When developing in a (data science) team, we want all developers to have the same data at all time. Therefore, we are going to include our data set in our project.

* Go back to your Google Sheet containing the data and choose 'File'->'Download'->'Comma separated values (.csv, current sheet)'
* When this csv-file is downloaded, add it to your repository by including it in the same folder as this notebook.
* Change the filename below
* Run the cells to check whether it works.

In [None]:
csv_filename = 'ATDS - Architecture Technologies for data science - TheThingsNetwork data'

In [None]:
df = pd.read_csv(csv_filename)

In [None]:
df.head(2)

Now we all have the same data and use the same seed.
Every random value assigned and gotten from the set will be the same for all developers.

**Important: This random seed should not be used in production as models will not perform as good with a fixed seed.**

In [None]:
df.sample(n = 2, random_state=seed)

In [None]:
df.sample(n = 2, random_state=seed)