## Reading Data from a URL

## Objectives


<font size=4>

    In this section, we would be 

        1. downloading a dataset (.zip) from a public website (https://archive.ics.uci.edu/ml/machine-learning-databases/00553/)

        2. extracting the contents of the zip file. 

        3. saving the extracted csv file as an excel file.

</font>

### First Part - Download the data

To do this, a python module called **requests** is required. Ideally, this should come with your installation. If not, it can be installed by running **!pip install requests** in a new cell.

In [None]:
import requests

In [None]:
# Provide the web link to the data
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/00553/e-shop data and description.zip'

In [None]:
#get the data to the computer memory
r = requests.get(url)

# save it as jamiu_url.zip
open('jamiu_url.zip', 'wb').write(r.content)

### Second Part - Unzip the downloaded file

To do this, a python module called **zipfile** is required. Ideally, this should come with your installation. If not, it can be installed by running **!pip install zipfile** in a new cell.

In [None]:
import zipfile

In [None]:
# read the data to a variable called contents
contents = zipfile.ZipFile('jamiu_url.zip')

# then extract all the files
contents.extractall()

Before we proceed to the last part of this exercise, let us look at the files in our working directory. To do that, we need a module called **os**. Ideally, this should come with your installation. If not, it can be installed by running **!pip install os** in a new cell.

In [None]:
import os

In [None]:
#Now, we would use the function listdir() in os to list the contents of our working directory
os.listdir()

Also, let's have a look at the contents of the .txt file. This will give us some ideas on to open text files and read their contents line by line

In [None]:
file = open('e-shop clothing 2008 data description.txt')

In [None]:
lines = file.readlines()

In [None]:
lines

### Note
<font size=3>The '\n' that we see everywhere is a newline character that we need to get rid of. To do that, we would use the replace method of strings</font>

In [None]:
# let's replace the \n with empty space '' and overwrite the original lines
lines = [item.replace('\n', '') for item in lines] #this is called list comprehension
#Now, let's see what lines looks like
lines

### Note
<font size=3>The ' ' that we see everywhere is the empty space created by replacing the newline character in the previous step. We can use the same idea as before to overwrite lines but this time without the ' '.</font>

In [None]:
# use if statement to conditional remove '' from lines
lines = [item for item in lines if item != '']
#Finally, let's now see what lines look like
lines

### Last Part - save the .csv file as an excel file

To do this, a python module called **pandas** is required. Ideally, this should come with your installation. If not, it can be installed by running **!pip install pandas** in a new cell.

In [None]:
import pandas as pd

In [None]:
# use the read_csv function in pandas to load the csv file 
# and save it to a variable named data
data = pd.read_csv('e-shop clothing 2008.csv')
#Now take a look at the data
data

### Notes
This is obviously not what we want. Let's repeat the data loading with read_csv but this time we would use the **sep** keyword to separate the data. Note that the data is separated by ";". 

In [None]:
# same as before but now including sep =";"
data = pd.read_csv('e-shop clothing 2008.csv', sep=";")
#Now take a look at the data
data

In [None]:
# to check the entry at row 501 under country, use
data.country[500]  # or data['country'][500]

In [None]:
# Finally, export the data as a .xlsx file without the index and starting from cell D3
data.to_excel('jamiu_url.xlsx', index=False, startrow=2, startcol=3)