# Getting the free Springer eBooks 
Springer offers some eBooks due to the COVID-19 pandemic free of charge. Instead of downloading and sorting them manually you can also use this notebook to achieve this task.

Credit for putting the information out about this belongs to [Volker Weber](https://vowe.net) who published a [blog post](https://vowe.net/archives/018485.html) about this.

## Import the needed stuff
As we're going to deal with web related stuff as well as Excel related things we're going to need to import some libs.

In [None]:
import os
import pandas as pd
import requests

## Small download function
As we're about to download a couple hundred files from the web we're going to use a small function for that which uses the `requests` lib. 

In [None]:
def get_stuff_from_web(url, filename):
    if not os.path.exists(filename):
        # get the content from the URL
        r = requests.get(url)
        # check if the status indicates OK
        if r.status_code == 200:
            # save under specified file name
            with open(filename,'wb') as f: 
                f.write(r.content)

## Get the source files for our operation
The information about the freely available Springer eBooks is stored in two different `.xlsx` files on the Springer servers. One is for english eBooks and one is for the german ones. We did obtain those URLs from the blog post mentioned in the beginning. And we use our small download function to download them.

In [None]:
ebooks_en_url = "https://resource-cms.springernature.com/springer-cms/rest/v1/content/17858272/data/v4"
ebooks_de_url = "https://resource-cms.springernature.com/springer-cms/rest/v1/content/17863240/data/v2"
  
get_stuff_from_web(url=ebooks_en_url, filename="ebooks_en.xlsx")
get_stuff_from_web(url=ebooks_de_url, filename="ebooks_de.xlsx")

## Check the downloaded files
Once the download is finished we should be able to see the files named `ebooks_en.xlsx` and `ebooks_de.xlsx`.

In [None]:
!ls

## Getting the english eBooks
First we're going to take care of getting and properly naming and sorting the english eBooks. Therefore we need to use the `ebooks_en.xlsx` file. We use the `pandas` library to do that. The initial step is to load the data.

In [None]:
df_en = pd.read_excel("ebooks_en.xlsx")

After loading the data we can have a first look at it via the `head()` function.

In [None]:
df_en.head()

It shows us the content of the loaded data or to be more specific the first `5` lines of it. We can see the rows and columns of the Excel file. For our purpose we're interested in the following columns:
* `Book Title` -> for defining the file name
* `DOI URL` -> for defining the correct download URL
* `Subject Classification` -> for sorting

We can also use the `describe()` function to see some stats about our data.

In [None]:
df_en.describe()

The most important part is the `count` value which shows that we can download `407` books.

## Download the actual eBooks
Now we're ready to start to download all the books which are referenced in our data. 

**CAUTION:** You're about to download `~20GB` of data (including the german eBooks). Depending on your network connection this can take quite some time. 

In [None]:
# we`re iterating over each row
for index, row in df_en.iterrows():
    # we want to store the eBooks in a structured way
    # therefore we're going to use the first element in
    # the `Subject Classification` column which will 
    # define the folder in which we're going to store 
    # the eBook
    dir_name = row['Subject Classification'].split(";")[0]
    # we're only creating a new folder if it does not exist
    if not os.path.exists(dir_name):
        os.makedirs(dir_name)
    # we need to put together our download URLs
    # for this we need two parts of the `DOI URL`
    # so we're splitting the URL by `/`
    doi_url = row['DOI URL'].split("/")
    # then we're generating the `.epub` URL
    url_epub = f"https://link.springer.com/download/epub/{doi_url[3]}%2F{doi_url[4]}.epub"
    # and the `.pdf` URL
    url_pdf = f"https://link.springer.com/content/pdf/{doi_url[3]}%2F{doi_url[4]}.pdf"
    # finally we're trying to download the eBook as `.epub` and `.pdf`
    file_name = row['Book Title'].replace(r'/', '-')
    get_stuff_from_web(url=url_epub, filename=dir_name+"/"+file_name+".epub")
    get_stuff_from_web(url=url_pdf, filename=dir_name+"/"+file_name+".pdf")

## Getting the german eBooks
**NOTE:** If you're not interested in the german eBooks you can stop here and start reading the books you've downloaded so far.
Now we're going to do the same for the german eBooks.

In [None]:
df_de = pd.read_excel("ebooks_de.xlsx")

In [None]:
df_de.head()

In [None]:
df_de.describe()

In [None]:
for index, row in df_de.iterrows():
    dir_name = row['Subject Classification'].split(";")[0]
    if not os.path.exists(dir_name):
        os.makedirs(dir_name)
    doi_url = row['DOI URL'].split("/")
    url_epub = f"https://link.springer.com/download/epub/{doi_url[3]}%2F{doi_url[4]}.epub"
    url_pdf = f"https://link.springer.com/content/pdf/{doi_url[3]}%2F{doi_url[4]}.pdf"
    file_name = row['Book Title'].replace(r'/', '-')
    get_stuff_from_web(url=url_epub, filename=dir_name+"/"+file_name+"_de.epub")
    get_stuff_from_web(url=url_pdf, filename=dir_name+"/"+file_name+"_de.pdf")