# Importing data from various sources

<br>
As a Data Scientist, you may work with a lot of data! This data can be from one or more sources of any type, say databases or excel sheets or csv or json. It may also be unstructured, media data like spoken text, audio files, images etc. In this lesson we shall see how to import data from the following sources:

* Text files
* Relational data - CSV and Excel sheets
* Objects - JSON
* Images - TIFF
* Audio - MP3/WAV
* XML & HTML
* Data Sources from the web

<img src="../../../images/data_handling-data_formats.PNG" width="50vw">

<br>
## 1. Text files:

Text files are very easy for computer to read. Text files contain data with  no structured relationships between records; they contain only basic formatting and have small fixed number of fields. While reading a text file, you need to specify the access mode of a file through the *mode* argument. This might be ** *r* **, ** *w* ** or ** *a* ** for **read**, **write** and  **append** respectively. Let's say I have a text file *sample.txt* which has the following data: <br><br>
*
Welcome to Colaberry!<br>
You are currently in "DS in 100 days" course<br>
Hope you have a good learning experience.<br>
*

If we want to input this data from the text file:

``` python
text_file = open("sample.txt", "r")
lines = text_file.read()
print(lines)

# Output
>>> Welcome to Colaberry!
>>> You are currently in 'DS in 100 days' course
>>> Hope you have a good learning experience.
```

### Exercise

Now open a new text file and type in the following in seperate lines as given: <br><br>
*
I have enrolled in "DS in 100 days" course <br>
I am liking the course contents! <br>
It's helping me learn Data Science in a easy way... <br>
*<br>
Save your text file as **my_sample.txt**. Now open this file, read your data and print it.

In [21]:
file = open("textfile.txt","w") 
 
file.write('I have enrolled in "DS in 100 days" course \n') 
file.write("I am liking the course contents! \n") 
file.write("It's helping me learn Data Science in a easy way...") 
 
file.close() 


In [22]:
f = open("textfile.txt","r")
text = f.readlines()
f.close()
for txt in text:
    print(txt)

I have enrolled in "DS in 100 days" course 

I am liking the course contents! 

It's helping me learn Data Science in a easy way...


### Solution code

```python
text_file = open("my_sample.txt", "r")
lines = text_file.read()
print(lines)
```

## 2. Excel sheets:

Microsoft Excel is a quick way for business analysts to provide data for engineers to enrich reporting. Excel files are a huge part of any business operation and it becomes imperative that you learn exactly how to import these into python for data analysis. Python's ** *pandas* ** library has a function called **ExcelFile()** which converts data from your excel file into a pandas *dataframe*.

``` python
import pandas as pd

excel_file = pd.ExcelFile("myfile.xls") 
print(excel_file.sheet_names) # printing out all the sheet names in the excel file
df = excel_file.parse('sheet1') # extracting data from the first sheet as a dataframe
df.head()
```

### Exercise

Now open an excel file in the following location **"../../../data/sample_excel.xls"**. Print the sheet names as well as the sheet contents. 

In [29]:
import pandas as pd

excel_file = pd.ExcelFile("https://github.com/colaberry/DSin100days/blob/master/data/sample_excel.xls?raw=true") 
print(excel_file.sheet_names) # printing out all the sheet names in the excel file
df = excel_file.parse('sheet1') # extracting data from the first sheet as a dataframe
df.head()

['sheet1']


Unnamed: 0,1,"Eldon Base for stackable storage shelf, platinum",Muhammed MacIntyre,3,-213.25,38.94,35,Nunavut,Storage & Organization,0.8
0,2,"1.7 Cubic Foot Compact ""Cube"" Office Refrigera...",Barry French,293,457.81,208.16,68.02,Nunavut,Appliances,0.58
1,3,"Cardinal Slant-D® Ring Binder, Heavy Gauge Vinyl",Barry French,293,46.7075,8.69,2.99,Nunavut,Binders and Binder Accessories,0.39
2,4,R380,Clay Rozendal,483,1198.971,195.99,3.99,Nunavut,Telephones and Communication,0.58
3,5,Holmes HEPA Air Purifier,Carlos Soltero,515,30.94,21.78,5.94,Nunavut,Appliances,0.5
4,6,G.E. Longer-Life Indoor Recessed Floodlight Bulbs,Carlos Soltero,515,4.43,6.64,4.95,Nunavut,Office Furnishings,0.37


### Solution code

```python
import pandas as pd

excel_file = pd.ExcelFile("../../../data/sample_excel.xls") 
print(excel_file.sheet_names) # printing out all the sheet names in the excel file
df = excel_file.parse('sheet1') # extracting data from the first sheet as a dataframe
df.head()
```

## 3. CSV

CSV (Comma Separated Values) files usually contain mixed data types and are used to transfer large database between programs. There are multiple ways to import csv into Python. The first method we’ll look at uses the **csv** module, a powerful and versatile module available in the core python install. It has **reader()** function which reads in the data as rows, then we can print each row. The second and by far the best method, is to import it as a dataframe using the pandas **read_csv()** funtion in python. 

* **Method 1:**

Using the CSV module

```python
import csv

csv_file = csv.reader(open("myfile.csv"))
for row in csv_file:
    print(row)
```

* **Method 2:**

Using the pandas module

```python
import pandas as pd

df = pd.read_csv("myfile.csv")
df.head()
```

### Exercise

Now open a csv file in the following location **"../../../data/NBA.csv"** and use pandas to import the data into Python. After importing, use *head()* function to print the rows in your dataframe.

### Solution code

```python
df = pd.read_csv("../../../data/NBA.csv")
df.head()
```

## 4. XML & HTML

### 4.1 XML

Another common format to exchange data is XML. XML is used to structure data so that it can be stored and transported. Unfortunately, Pandas package does not have an inbuilt function to import data from XML, so we need to use standard XML package and then convert the data to Pandas DataFrames. Let's see how to do this..

* First let's import data from a xml source

``` python
import requests

plant_catalog_url = "https://www.w3schools.com/xml/plant_catalog.xml"
xml_data = requests.get(plant_catalog_url).content
```

* Second let's see a function that converts XML to a dataframe

``` python
import xml.etree.ElementTree as ET
import pandas as pd

class convert_XML2DataFrame:

    def __init__(self, xml_data):
        self.root = ET.XML(xml_data)

    def parse_root(self, root):
        return [self.parse_element(child) for child in iter(root)]

    def parse_element(self, element, parsed=None):
        if parsed is None:
            parsed = dict()
        for key in element.keys():
            parsed[key] = element.attrib.get(key)
        if element.text:
            parsed[element.tag] = element.text
        for child in list(element):
            self.parse_element(child, parsed)
        return parsed

    def process(self):
        structured_data = self.parse_root(self.root)
        return pd.DataFrame(structured_data)

conversion = convert_XML2DataFrame(xml_data)
xml_df = conversion.process()
xml_df.head()
```
<br>
### 4.2 HTML

As a Data Scientist, sometimes you might have to search the web for some raw data needed for a project. Unfortunately the data is inside a web page and you don't want to waste time in coming up with a crappy script to scrape the data.. Luckily there's a simple way to do this in Python. The Pandas library has a built-in method called **read_html()**, to scrape tabular data from html pages.

``` python
import pandas as pd

tables = pd.read_html("myfile.html") # Scraping tabular data out of html file
df = tables[0] # Transforming tables to dataframe, here I am using the first table scraped
df.head()
```

### Exercise

Now open a html file in the following location **"../../../data/sample_html.html"** and use pandas to import the data into Python. After importing, use *head()* function to print the rows in your dataframe.

### Solution code

```python
import pandas as pd

tables = pd.read_html("../../../data/sample_html.html")
df = tables[0]
df
```

## 5. JSON

Javascript Object Notation (JSON), similar to XML, is another common way for exchange of structured information over a network and sharing information across platforms. It is basically text with *dict* structure, i.e it stores data as ** *key:value pairs* **. The structure can be simple to complex. There are mutiple ways to do this.

* **Method 1:**

Python pandas can easliy read JSON files using the **read_json()** function. 

``` python
import pandas as pd

df = pd.read_json("myfile.json")
df.head()
```

* **Method 2:**

There is also an alternate way of doing this. You can first read the data using **json** module, and then transform it into a dataframe.

``` python
import json
with open('myfile.json', 'r') as f:
    data = json.load(f)

df = pd.DataFrame({'value': data})
df.head()
```

### Exercise

Now open a json file in the following location **"../../../data/Customer.json"** and use pandas to import the data into Python. After importing, use *head()* function to print the rows in your dataframe.

### Solution code

```python
import pandas as pd

df = pd.read_json("../../../data/Customer.json")
df.head()
```

## 6. Images - TIFFs and PNGs

### TIFF

TIFF (Tagged Image File Format) is a kind of image format like JPEG, PNG, Bitmap and GIF. However the advantage of TIFF over other formats is that it has unparalleled image quality and file security. It is a loss-less format and it is also difficult to alter making it ideal for protecting information and archiving. Python has an imaging library called **PIL**, that can process images. 

* First, let's see how to use this library to import images..

``` python
from PIL import Image

im = Image.open('myfile.tif')
im.show()
```
**Note:** The *.show()* function opens image in your systems's default image viewer.

* Next, let's convert this image to a numpy array for processing, it's as simple as:

``` python
import numpy as np

imarray = np.array(im)
imarray
```
This converts your image into numpy array values, which can be used for manipulating the image.

#### Exercise

Now open a tiff file in the following location **"../../../images/data_handling-sample_tiff.tiff"** and use pil to import the data into Python. After importing, convert the image into numpy array values and print the array.

### Solution code

```python
from PIL import Image

im = Image.open('../../../images/data_handling-sample_tiff.tif')
im.show()


import numpy as np

imarray = np.array(im)
imarray
```

### PNG

Portable Network Graphics (PNG) is an open image format that was created to replace the GIF (Graphics Interchange Format). It is the most widely used lossless image compression file format on the internet today. (Source: https://en.wikipedia.org/wiki/Portable_Network_Graphics)

### Viewing an image file using matplotlib

There are many libraries which allow reading, loading and analysis of images in Python. 'Pillow' is probably one of the most widely used libraries for image analysis. As seen in above section, the 'open' function from 'Image' module launches an image viewer in your local operating system and reads/displays the image file in that application. If we would like to display the image within the Jupyter Notebook, we can do so by simply plotting the image data on coordinate system. We can do this easily using the matplotlib library.

The 'image' sub-module within matplotlib contains a function called 'imread()'. This function is specifically built to read image files. Once the data is read using this function, it can be passed to the 'show()' function from 'pyplot' sub-module, to display as a plot. See below for a simple example:

An Example:
```python
# importing pyplot
import matplotlib.pyplot as plt
# importing image module from matplotlib
import matplotlib.image as mpimg
# Rendering all plots inline within the notebook
%matplotlib inline

# Reading the image file data into a variable
py_img=mpimg.imread('../../../data/python-logo.png')

# Showing/Displaying the image data as a plot
plt.imshow(py_img)

# Output
>>>
```

<img src="../../../images/py-logo.PNG" style="width:65vw">

#### Exercise

Use the same image as in above exercise. Display **"../../../data/tiny.png"** image as a matplotlib plot. Refer to the example code above.

### Solution code

```python
# importing pyplot
import matplotlib.pyplot as plt
# importing image module from matplotlib
import matplotlib.image as mpimg
# Rendering all plots inline within the notebook
%matplotlib inline

# Reading the image file data into a variable
py_img=mpimg.imread('../../../data/tiny.png')

# Showing/Displaying the image data as a plot
plt.imshow(py_img)
```

### Constituents of an image

When you load the image into a numpy array and see the quantitative values and associated structure that makes up the image file, we notice that the resulting array is (generally) a 3D array. The 3 dimensions of the image signify - height, width and depth.

* Height - height of the image
* Width - width of the image
* Depth - The color (RGB) values of each pixel

The 'shape' attribute of an ndarray object will tell us about the dimensional values of an image file. In a color image, each 2D array (that constitutes to the 3D array) can be said to be a line of pixels with set 'RGB' values. The number of 'RGB' values in the 2D array denote the number of pixels in the line, also denoting the length of the line (in pixels). The number of lines that make up the 3D array would then be the height of the image in pixels.

#### Exercise

'tiny.png' is a 160 x 160 png image file. The image is that of a black circle on a magenta colored background.

* Load the image using the link '../../../data/tiny.png'
* Convert the image into an ndarray (Refer to above lesson) and print it out
* Print the shape of the ndarray and observe the dimensions

### Solution code

```python
from PIL import Image

im = Image.open('../../../data/tiny.png')
im.show()


import numpy as np

imarray = np.array(im)
print(imarray, imarray.shape)
```

### Manipulating image data - Recoloring

There are many ways in which an image can be manipulated and analyzed - mostly by playing with one or more of its dimensions. Resizing, rescaling, distorting (by changing aspect ratio - i.e. ratio of height to width), recoloring are all some of the more common manipulations of images. Once the image is loaded as a ndarray, manipulation is fairly easy.

For example, for the image 'tiny.png' which is an image of a black circle on a magenta background, we can easily manipulate it to show black circle on a white background. This can be simply achieved by converting all magenta colored pixels into white pixels.

Example:
```python
# ORIGINAL IMAGE

# Loading Image module from Pillow library
from PIL import Image

# Loading numpy for array manipulation of image data
import numpy as np

# importing pyplot
import matplotlib.pyplot as plt

# importing image module from matplotlib
import matplotlib.image as mpimg

# Rendering all plots inline within the notebook
%matplotlib inline

im = Image.open('../../../data/tiny.png') # Opening the image
imarray = np.array(im) # Converting image data into n-dimensional array

# Showing/Displaying the image data as a plot
plt.imshow(imarray)

# Output
>>>
```
<img src="../../../images/before-tiny.PNG" style="width:10vw">
<br>

```python
# IMAGE AFTER MANIPULATION (Re-coloring)

# Looping the 3D array to scan for RGB values of each pixel 
for i in range(0,len(imarray)):
    for j in range(0,len(imarray[i])):
        if (imarray[i][j] == np.array([255,0,255])).all(): # Verify if pixel has magenta color
            imarray[i][j] = np.array([255,255,255]) # Changing magenta pixel to white color
            
# Showing/Displaying the image data as a plot         
plt.imshow(imarray)

# Output
>>>
```
<img src="../../../images/after-tiny.PNG" style="width:10vw">
<br>

Likewise, many manipulations can be done on the image. Below exercise allows you to practice re-coloring

#### Exercise

* Load the image "tiny.png" using the link "../../../images/tiny.png"
* Re-color the background as white (i.e. execute above sample code example to create white background)
* Mark a red dot in the center of the image by color exactly 100 pixels (10x10 square) at the center of the picture with the color 'red'. Use RGB value for red as (255,0,0)

### Solution code

```python
# Loading Image module from Pillow library
from PIL import Image

# Loading numpy for array manipulation of image data
import numpy as np

# importing pyplot
import matplotlib.pyplot as plt

# importing image module from matplotlib
import matplotlib.image as mpimg

# Rendering all plots inline within the notebook
%matplotlib inline

im = Image.open('../../../data/tiny.png') # Opening the image
imarray = np.array(im) # Converting image data into n-dimensional array

# Looping the 3D array to scan for RGB values of each pixel 
for i in range(0,len(imarray)):
    for j in range(0,len(imarray[i])):
        if (imarray[i][j] == np.array([255,0,255])).all(): # Verify if pixel has magenta color
            imarray[i][j] = np.array([255,255,255]) # Changing magenta pixel to white color

# Printing the shape of the array
print(imarray.shape)

# As the number of pixels on row and column is even, there would be two points which constitute the center

# Coloring the red dot
midpt = int((imarray.shape[0]/2)) # Calculating the index value around mid point of image using shape of image. Here it is 80

# Recoloring 100 pixels
for i in range(-5,5): # 5 pixels before and after the mid point
    for j in range(-5,5): # 2nd iterator used as there are 2 dimensions on the coordinate scale
        imarray[midpt+i][midpt+j] = np.array([255,0,0])

# Showing/Displaying the image data as a plot         
plt.imshow(imarray)
```

### Manipulating image data - Vertical and Horizontal flip

Another manipulation that we will learn here is to flip the image in order to obtain the mirror reflection of a given image. 

Note: A horizontal flip is when you flip the image on an imaginary vertical axis. A vertical flip happens when you flip the image on an imaginary horizontal axis.

You may have flipped images on axes in tools such as paint brush, photoshop or any other photo editing tools. Here we will do the same using a pythonic, programmatic approach.

As we know that the image data is nothing but a numpy array, we can use numpy's flip function to flip the values of the array over one of the dimensions. Here is an example:

```python
# ORIGINAL IMAGE

# Loading Image module from Pillow library
from PIL import Image

# Loading numpy for array manipulation of image data
import numpy as np

# importing pyplot
import matplotlib.pyplot as plt

# importing image module from matplotlib
import matplotlib.image as mpimg

# Rendering all plots inline within the notebook
%matplotlib inline

# Reading the image file data into a variable
py_img=mpimg.imread('../../../data/python-logo.png')

# Converting image data into n-dimensional array
imarray = np.array(py_img)

# Showing/Displaying the image data as a plot
plt.imshow(imarray)

# Output
>>>
```
<img src="../../../images/py-logo.PNG" style="width:25vw">
<br>

```python
# IMAGE AFTER MANIPULATION (Horizontal flip)

# Using numpy's flip function to flip the image array. Flip on axis=0 for a Horizontal flip
imarray = np.flip(imarray,axis=2)
            
# Showing/Displaying the image data as a plot         
plt.imshow(imarray)

# Output
>>>
```
<img src="../../../images/py-logo-hf.PNG" style="width:25vw">
<br>
```python
# IMAGE AFTER MANIPULATION (Vertical flip)

# Using numpy's flip function to flip the image array. Flip on axis=1 for a Vertical flip
imarray = np.flip(imarray,axis=1)
            
# Showing/Displaying the image data as a plot         
plt.imshow(imarray)

# Output
>>>
```
<img src="../../../images/py-logo-vf.PNG" style="width:25vw">
<br>
```python
# IMAGE AFTER MANIPULATION (Color flip)

# Using numpy's flip function to flip the image array. Flip on axis=2 for a Color flip
imarray = np.flip(imarray,axis=2)
            
# Showing/Displaying the image data as a plot         
plt.imshow(imarray)

# Output
>>>
```
<img src="../../../images/py-logo-cf.PNG" style="width:25vw">
<br>
<b>Note:</b> The color flip is not an exact flip of color, i.e. complement of the color may or may not be visualized as the colors are flipped on RGB scales, so values of Red, Green and Blue values are switched so as to give rise to new combination of RGB values. These new combinations may or may not be the exact complement of the original color.


### Exercise

Perform flips in order to acheive the given output image from the given input image.
The link to input image is "../../../data/python-logo.png"

<b>Input Image:</b>
<img src="../../../images/py-logo.PNG" style="width:25vw">
<br>
<b>Expected Output Image:</b>
<img src="../../../images/py-logo-op.PNG" style="width:25vw">

### Solution code

```python
import numpy as np
# importing pyplot
import matplotlib.pyplot as plt
# importing image module from matplotlib
import matplotlib.image as mpimg
# Rendering all plots inline within the notebook
%matplotlib inline

# Reading the image file data into a variable
py_img=mpimg.imread('../../data/python-logo.png')

# Showing/Displaying the image data as a plot
# plt.imshow(py_img)

imarray = np.array(py_img) # Converting image data into n-dimensional array

# Looping the 3D array to scan for RGB values of each pixel 
imarray = np.flip(imarray,axis=1)
imarray = np.flip(imarray,axis=0)

# Showing/Displaying the image data as a plot         
plt.imshow(imarray)
```

## 7. MP3/WAV

Data can also come in audio format. Audio files are generally a large collection of signals (waves) and signal analysis/processing is commonly used to manipulate, mine and analyze audio data.

There are many python libraries which help in loading, playing, visualizing and analyzing various audio codecs. The built-in "os" library helps us play audio files using a default player. The "startfile" method of os library helps us play an audio file.

An example:
```python
import os
os.startfile("<Detailed path of the file to be played>")
```

#### Exercise

Try and execute the below code and play the file on your local player. 
* This code does not give any output on this Jupyter Notebook.
* The code uses operating system settings to launch an audio player. Since such settings are absent in this server environment, this code is best tried on a local installation of Jupyter Notebook (or any Python terminal/IDE).
* Upon successful execution of this code, a default music player opens in your system and will start playing the audio file.

In [None]:
import os
os.startfile("../../../data/Matteo-Amandoi_(Official_Music_HD).mp3")

### Solution code

```python
# Just run the above code on your local installation
```

### Opening a WAV file and analyzing it

WAV files are another format of audio encoding. WAV files can be opened with 2 common libraries:
1. scipy.io.wavfile
2. wave

<b>Opening a WAV file:</b>
Examples:
```python
# Using scipy.io.wavfile
from scipy.io.wavfile import read
a = read("../../../data/Matteo-Amandoi__Official_Music_HD_.wav")

# Using wave
import wave
a = wave.open("../../../data/Matteo-Amandoi__Official_Music_HD_.wav",'r') # 2nd parameter 'r' specifies 'read'mode
```

Note that the 'read' method from scipy module allows reading the entire audio file, which can be then transformed to other formats (such as an array), and analyzed. When the 'open' method from wave library is used, the file opens with a cursor, so we may read the audio source file, line by line or whole at a time, but in itself, the object 'a' (which is initialized with wave.open("Path to the file")) cannot be converted into a different data format.

#### Exercise

Open the given audio file using scipy.io.wavefile library. Refer to the above example code.

In [None]:
from scipy.io.wavfile import read
a = read("../../../data/Matteo-Amandoi__Official_Music_HD_.wav")

### Solution code

```python
# Just run the above cell
```

### Wave to Numpy Array and Signal Plotting

When the audio file is read in its entirety using the 'scipy.io.wavfile' module, we can convert the file into a numpy object and understand the underlying programmable data that constitutes an audio file. 

<b>Parts of a wave file:</b> When a wave file is read using the read() method of 'scipy.io.wavfile' module, it reads the audio file in two parts:
1. Audio sample rate - The first part of the data that is read, is an integer that conveys the sample rate of the audio data. The higher the sample rate, the more better the audio quality is said to be. The common sample rate could be around 44.1 kHz (or 44100) or higher. The significance and interpretation of this sample rate is that, every second of music/waves, are made up of 44,100 points of wave data.
2. Audio wave data - The second part of the the data is a two dimensional array, which signifies the audio wave amplitudes. The audio can have multiple channels. Generally most audio waves have 2 channels - left and right, similar to left and right speakers. Each element of the 2D array is in itself an array with 2 elements - wave amplitudes of left and right channels.


We can convert the amplitudes/wave data of the wave file into a numpy array, simply by using the numpy.array() method and specifying the data type as 'float' or any longer integer data types.

Example:
```python
from scipy.io.wavfile import read
a = read("../../../data/Matteo-Amandoi__Official_Music_HD_.wav")

import numpy as np
np.array(a[1],dtype=float)

# Output
>>> array([[ 0.,  0.],
>>>        [ 0.,  0.],
>>>        [ 0.,  0.],
>>>        ..., 
>>>        [ 0.,  0.],
>>>        [ 0.,  0.],
>>>        [ 0.,  0.]])
```

Note that the audio files' amplitudes may largely constitute zeroes or sometimes a constant number repeated across multiple time stamps. This indicates 'silence' (zero ammplitude) or 'white noise' (constant background noise amplitude).

#### Exercise

As mentioned above, most of the wave amplitudes are 'silent' - zero amplitudes. But when there the amplitude has some value that would mean that some sounds are being produced at that point of time.

<b>Task:</b>
* Use the link to the audio file provided.
* Load the file using scipy.io.wavfile library and convert wave data into a numpy object. (Refer to above example code)
* Print out the wave sample rate and first five wave amplitudes.
* Filter out the first five non-zero amplitudes from the list of all amplitudes in the audio to verify that the audio file does have some sound in it. Use simple conditional decision statements and make sure that you limit the extraction process to first five non-zero data.

In [None]:
audio_file_link = "../../../data/Matteo-Amandoi__Official_Music_HD_.wav"  

### Solution code

```python
from scipy.io.wavfile import read
import numpy as np

a = read(audio_file_link)

print("Sample rate is {:d} and wave amplitudes are {:}".format(a[0],np.array(a[1],dtype=float)[:5]))

i = 0

for x in a[1]:
    if x[0]!=0 or x[1]!=0:
        print(x)
        i+=1
        if i==5:
            break      
```

### Visualizing an audio wave

We can visualize audio waves using plots. By observation we can see that the wave data, which is amplitudes, can be plotted as an index of time to see how the sound wave would look like over time. Also, the amplitude data has a value for each channel. In the above scenario we see 2 numbers - right channel value and left channel value. Each channel value is to be plotted as a distinct line plot. So amplitude values would be plotted on y-axis and index of the amplitude data can be plotted on x-axis.

Now, as mentioned above, based on sample rate we can see how many amplitude data values make up one second of the audio wave. So when the scale of x-axis is adjusted to compress the sample rate number of data values into 1 second's audio data, we can visualize the sound wave on a relatively interpretable scale - the time scale. We do so by manipulating the x-ticks.

In the below example, we will visualize the left and right channels wave data of the music file "Matteo-Amandoi\_\_Official_Music\_HD\_.wav". Note that as the file is too large, we will only try to plot the first 10 seconds of wave data (10 * 44100 wave data points). 

```python
# Importing libraries
from scipy.io.wavfile import read
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

# Reading the wave file
a = read(audio_file_link)

# Clipping the first 10 seconds of wave data from the audio
audio_clip = a[1][:441001]

# Printing out sample rate and wave data
print("Sample rate is {:d} and wave amplitudes are {:}".format(a[0],np.array(audio_clip,dtype=float)[:5]))

# Separating left and right channel data and making the time axis
x = []
y_left = []
y_right = []

for index,number in enumerate(np.array(audio_clip,dtype=float)):
    x.append(index+1)
    y_left.append(number[0])
    y_right.append(number[1])

# Creating the plots    
plt.figure(figsize=(16, 4))
plt.subplot(2, 1, 1)
plt.plot(x, y_right, color='blue', alpha=0.5)

# labels
plt.ylabel('Wave amplitude')
plt.xlabel('Time in seconds')

# Modifying x-ticks to show wave data per second
tick_loc = range(44100,(len(audio_clip))+1,44100)
tick_lab = range(1,len(tick_loc)+1)

plt.xticks(tick_loc,tick_lab)
plt.title('Audio Wave of Amandoi - Right Channel')

# Creating the plots
plt.figure(figsize=(16, 4))
plt.subplot(2, 1, 2)
plt.plot(x, y_left, color='green', alpha=0.5)

# labels
plt.ylabel('Wave amplitude')
plt.xlabel('Time in seconds')

# Modifying x-ticks to show wave data per second
tick_loc = range(44100,(len(audio_clip))+1,44100)
tick_lab = range(1,len(tick_loc)+1)

plt.xticks(tick_loc,tick_lab)
plt.title('Audio Wave of Amandoi - Left Channel')

# Output
```

<img src="../../../images/amandoi-audio.PNG" style="width: 65vw;"> <br>

Note that the first 5 seconds of the audio is 0 amplitude, flat lines because it is silent and there is no music in these 5 seconds.

#### Exercise

* Replot the above audio file where you can overlap the left channel and right channel wave data.
* Do not forget to clip only the first 10 seconds of the audio. (Hint: You may reuse most of the above example code).
* Compare the two channel waves and visually identify some differences/similarities. Write down the conclusions.

### Solution code

```python
# Importing libraries
from scipy.io.wavfile import read
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

# Reading the wave file
a = read(audio_file_link)

# Clipping the first 10 seconds of wave data from the audio
audio_clip = a[1][:441001]

# Printing out sample rate and wave data
print("Sample rate is {:d} and wave amplitudes are {:}".format(a[0],np.array(audio_clip,dtype=float)[:5]))

# Separating left and right channel data and making the time axis
x = []
y_left = []
y_right = []

for index,number in enumerate(np.array(audio_clip,dtype=float)):
    x.append(index+1)
    y_left.append(number[0])
    y_right.append(number[1])

# Creating the plots
plt.subplots(figsize=(16, 4))
plt.plot(x, y_right, color='blue', alpha=0.5)
plt.plot(x, y_left, color='green', alpha=0.5)

# labels
plt.ylabel('Wave amplitude')
plt.xlabel('Time in seconds')

# Modifying x-ticks to show wave data per second
tick_loc = range(44100,(len(audio_clip))+1,44100)
tick_lab = range(1,len(tick_loc)+1)

plt.xticks(tick_loc,tick_lab)
plt.title('Audio Wave of Amandoi')


# Conclusion
# From the plot you will observe that for most part of the audio the left and right channels 
# have same amplitudes and wave patterns. However, from 6th second to 7.5 seconds the left channels
# amplitudes are slightly higher than the right channels amplitudes. Similarly, from 7.5 seconds onwards 
# until 9th second, the right channel has slightly higher amplitudes than the left channel amplitudes. 
```

## 8. Data Sources from the web

Sometimes the data we want to use is not readily available, we will have to find it from web sources. These can be **url links** to csv, html, json or other files. Interestingly the functions in pandas support data sources from url links by default. You need to choose the appropriate function for that corresponding data source. Let's see some examples:

* **Reading csv file from web**
``` python
pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data')
```

* **Reading html file from web**
``` python
tables = pd.read_html("https://www.fantasypros.com/nfl/reports/leaders/qb.php?year=2015")
df = tables[0]
df.head()
```

* **Reading json file from web**
``` python
df = pd.read_json('http://maps.googleapis.com/maps/api/geocode/json?address=google')
df.head()
```

In short, web sources are treated the same way as actual files.

### Exercise

Now analyze and import data from the following webpage using Python. <br>
https://people.sc.fsu.edu/~jburkardt/data/csv/snakes_count_1000.csv <br>
After importing, use *head()* function to print the rows in your dataframe.

### Solution code

```python
df = pd.read_csv('https://people.sc.fsu.edu/~jburkardt/data/csv/snakes_count_1000.csv')
df.head()
```