# How to download datasets from IPython Shell in Datacamp
> This blog post will show you how to download datasets from DataCamp Ipython Shell/Console in a web browser.

- toc: true
- branch: master
- badges: true
- comments: true
- author: Datacamp
- categories: [DataCamp, Dataset, Download, IPython Shell]
- image: images/datacamp.png
- hide: false
- search_exclude: true
- metadata_key1: metadata_value1
- metadata_key2: metadata_value2



> **TL;DR:** Use the python [script](#Full-Code) to extract the link to download the targeting datasets from IPython Shell.
 

**Motivation**: While studying DataCamp courses, there were so many times that I could not reproduce the course results in my local environment with the provided datasets, which was quite frustrating. I had tried to reach out to the support team, but the results were not satisfactory. After spending hours searching on Stackoverflow [here](https://stackoverflow.com/questions/31893930/download-csv-from-an-ipython-notebook) and [there](https://stackoverflow.com/questions/26497912/trigger-file-download-within-ipython-notebook), then trying and failing numerous times with several scripts, I could finally manage to download the dataset that the course was using.

I decided to share my workaround here. Hopefully, it could help everyone who is facing the same problems. <br>

I have summarized how I did it in the following steps:
- [**1.**](#Quick-EDA) Check the general info of the dataset
- [**2.**](#Save-the-dataset-to-the-cloud-storage) Put the download script to the script.py
- [**3.**](#Encode-the-file-data-and-an-generate-HTML-link) Extract the "href "link, and download it.

## Quick EDA

Run the following command to take do a quick check on the data

```python
print(df.head())

                   y
2013-01-01  1.624345
2013-01-02 -0.936625
2013-01-03  0.081483
2013-01-04 -0.663558
2013-01-05  0.738023

```

then a quick look with ```df.info()```

```python

print(df.info())


<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1000 entries, 2013-01-01 to 2015-09-27
Freq: D
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   y       1000 non-null   float64
dtypes: float64(1)
memory usage: 55.6 KB
None

```

Okay, the dataset has 1000 rows, with Freq set as Date, Dtype = float64, etc.

Here are the first 5 rows of data:

```console
2013-01-01  1.624345
2013-01-02 -0.936625
2013-01-03  0.081483
2013-01-04 -0.663558
2013-01-05  0.738023
```
We will remember this, so we can check later once we download the dataset to our local environment. 

## Save the dataset to the cloud storage

Next, we will save the dataset from the DataFrame to the cloud storage. Try the following code in the script.py.

![](./images/tempsnip.png)

CODE:
```python 
# Get the filename fullpath
from pathlib import Path

filename = "data.csv"
filename = Path.cwd() / filename
df.to_csv(filename)
```
<br>

Check the result by typing the following commands to the IPython Shell: <br>

```console
!pwd
!ls
```

![](./images/tempsnip2.png)


## Encode the file data and an generate HTML link

Next, we run the following code, to generate HTML data.  

```python
import base64
import pandas as pd
from IPython.display import HTML

in_file  = open(filename, "rb")
csv = in_file.read()
# print(csv) # Uncomment this if you want to check the csv content
in_file.close()

b64 = base64.b64encode(csv)
payload = b64.decode()
html = '<a download="{filename}" href="data:text/csv;base64,{payload}" target="_blank">{title}</a>'
html = html.format(payload=payload,title=title,filename=filename)
# Print the link
print("data:text/csv;base64,{}".format(payload))
```
<br>

**Result:** <br>

![](./images/tempsnip3.png)

Paste the extracted link in the Shell output to a new tab in the browser. 
<br>


![](./images/tempsnip4.PNG)
<br>

**Voila**! That is the dataset that we want. <br>

![](./images/tempsnip5.PNG)

## Check the downloaded dataset in your local Jupyter notebook

Use this code: <br>

```python

df = pd.read_csv('./datasets/download.csv', parse_dates=True, index_col=[0])
df = df.asfreq('d')  # Set the frequent as DATE 
df.info()

```

then <br>

```python
df.head()
```

Results: <br>

![](./images/tempsnip6.PNG)

<br>

As you can see, the downloaded dataset looks exactly like what we saw in the quick [EDA](#quick-eda) section. You can now freely run your experiment locally with no worries about preprocessing the raw datasets.

## Full Code
Here is the full code of the post. Use this script in the script.py, you should be able to download the targeted datasets.


```python 

import base64
import pandas as pd
from IPython.display import HTML
from pathlib import Path


def create_download_link( df, title = "Download CSV file", filename = "data.csv"):

    filename = Path.cwd() / filename
    df.to_csv(filename)

    in_file  = open(filename, "rb")
    csv = in_file.read()
    # print(csv) # Uncomment this if you want to check the csv content
    in_file.close()

    b64 = base64.b64encode(csv)
    payload = b64.decode()
    html = '<a download="{filename}" href="data:text/csv;base64,{payload}" target="_blank">{title}</a>'
    html = html.format(payload=payload,title=title,filename=filename)
    # print the link
    print("data:text/csv;base64,{}".format(payload))
    return HTML(html)

create_download_link(df)
print(df.info())


```

That's it. Thank you for reading. If you find the blog post useful, please give the [GitHub blog repo](https://github.com/anhhaibkhn/Data-Science-selfstudy-notes-Blog) a star to show your support and share it with others. Also, please let me know in the comments section of the post if you have any questions.