# Warm-Up

Start by running the usual Library Import cell:

In [1]:
import matplotlib
%matplotlib inline
import numpy as np
import pandas as pd

## Load URLs from CSV

If you paid attention to the files in your repo, you might have noticed a `urls.csv` in the project. Open it in VS Code, review the URLs and maybe add a few of your choice.

Then load this CSV in a `urls_df` dataframe using Pandas:

In [2]:
urls_df = pd.read_csv('urls.csv')

urls_df

Unnamed: 0,url
0,https://www.lewagon.com
1,https://stackoverflow.com/questions/tagged/python


## Enrich Dataset with an API

Let's use the `fetch_metadata` function that we just implemented in the `opengraph.py` file.

First let's import it and make sure that it works in the Notebook. 

1. Write the relevant `from ... import ...` line
1. Call the `fetch_metadata` on a URL of your choice. You can write `fetch_` then `<TAB>` to autocomplete, then `<SHIFT> + <TAB>` to view the Docstring from your Python file!

In [3]:
from opengraph import fetch_metadata

fetch_metadata(urls_df['url'][0])

{'title': 'Coding Bootcamp | Le Wagon', 'desc': 'Coding Bootcamp | Le Wagon'}

Iterate over the `urls_df` dataframe to add `title` and `description` columns for each URL

<details>
  <summary>🆘 Hint</summary>

  <p>Have a look at today's Lecture, you can start by copy/pasting what we did for <code>tracks_df</code> and adapt the code</p>

</details>

In [4]:
urls_df['title'] = ''
urls_df['description'] = ''
for idx, url in urls_df.iterrows():
    metadata = fetch_metadata(urls_df['url'][idx])
    urls_df['title'][idx] = metadata['title']
    urls_df['description'][idx] = metadata['desc']
    
urls_df

Unnamed: 0,url,title,description
0,https://www.lewagon.com,Coding Bootcamp | Le Wagon,Coding Bootcamp | Le Wagon
1,https://stackoverflow.com/questions/tagged/python,Newest 'python' Questions - Stack Overflow,Newest 'python' Questions - Stack Overflow


## Check your code!

Run the cell below to check your code:

In [5]:
from nbresult import ChallengeResult

result = ChallengeResult('warmup',
    df_columns=urls_df.columns,
)
result.write()
print(result.check())


platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0 -- /bin/python3
cachedir: .pytest_cache
rootdir: /home/quantium/labs/lewagon/data-challenges/02-Data-Toolkit/02-Data-Sourcing/00-Warmup/tests
plugins: dash-1.19.0
[1mcollecting ... [0mcollected 1 item

test_warmup.py::TestWarmup::test_dataframe_has_new_columns [32mPASSED[0m[32m        [100%][0m



💯 You can commit your code:

[1;32mgit[39m add tests/warmup.pickle

[32mgit[39m commit -m [33m'Completed warmup step'[39m

[32mgit[39m push origin master



## (Optional) Autoreload

Today's Lecture introduced you to the usefulness of [`autoreload`](https://ipython.readthedocs.io/en/stable/config/extensions/autoreload.html) in the notebook, let's experiment with it!

Run the following cell, it should return `True` if your method returns `None` when a website is not found.

In [16]:
fetch_metadata("https://www.a.com") == None

%load_ext autoreload
%autoreload 2

Open VS Code and change the behavior of the function, to make it return an empty string `""` rather than `None` if the HTTP response is something else than `200`. Save your file on the drive, and re-run the cell above.

Do you see something changing? No? That's normal! The first version of the `fetch_metadata` code is stored in the Notebook Kernel.

---

OK, let's change back the `fetch_metadata` code in VS Code back to `None`.

Then, add the following two lines to your first Notebook code cell:

```python
%load_ext autoreload
%autoreload 2
```

Then in the menu bar, go to `Kernel` > `Restart & Run all`.

---

Now that autoreload is enabled, go to VS Code, and once again change the behavior so that it returns an empty string. Re-run the code cell above. Do you get `False`? Good! That means that the Notebook is now monitoring changes to the files imported, like `opengraph.py`, and will reload them if the code within them changes!

### Conclusion

You might find this confusing, jumping through Notebook and VS Code, don't worry you will get used to it. The Notebook is a perfect tool to experiment, to keep notes, to get graphical output of the data, etc. Still, the end goal of a Data Team is to **ship** something (a product, an API, a model, etc.), so productizing the code and refactoring it _out_ of the Notebook into proper Python modules is a critical skill that you will learn!